Merge branch 'sfx-xdp-fallback-tx-queues'

Íñigo Huguet says:

====================
sfc: fallback for lack of xdp tx queues

If there are not enough hardware resources to allocate one tx queue per
CPU for XDP, XDP_TX and XDP_REDIRECT actions were unavailable, and using
them resulted each time with the packet being drop and this message in
the logs: XDP TX failed (-22)

These patches implement 2 fallback solutions for 2 different situations
that might happen:
1. There are not enough free resources for all the tx queues, but there
   are some free resources available
2. There are not enough free resources at all for tx queues.

Both solutions are based in sharing tx queues, using __netif_tx_lock for
synchronization. In the second case, as there are not XDP TX queues to
share, network stack queues are used instead, but since we're taking
__netif_tx_lock, concurrent access to the queues is correctly protected.

The solution for this second case might affect performance both of XDP
traffic and normal traffice due to lock contention if both are used
intensively. That's why I call it a "last resort" fallback: it's not a
desirable situation, but at least we have XDP TX working.

Some tests has shown good results and indicate that the non-fallback
case is not being damaged by this changes. They are also promising for
the fallback cases. This is the test:
1. From another machine, send high amount of packets with pktgen, script
   samples/pktgen/pktgen_sample04_many_flows.sh
2. In the tested machine, run samples/bpf/xdp_rxq_info with arguments
   "-a XDP_TX --swapmac" and see the results
3. In the tested machine, run also pktgen_sample04 to create high TX
   normal traffic, and see how xdp_rxq_info results vary

Note that this test doesn't check the worst situations for the fallback
solutions because XDP_TX will only be executed from the same CPUs that
are processed by sfc, and not from every CPU in the system, so the
performance drop due to the highest locking contention doesn't happen.
I'd like to test that, as well, but I don't have access right now to a
proper environment.

Test results:

Without doing TX:
Before changes: ~2,900,000 pps
After changes, 1 queues/core: ~2,900,000 pps
After changes, 2 queues/core: ~2,900,000 pps
After changes, 8 queues/core: ~2,900,000 pps
After changes, borrowing from network stack: ~2,900,000 pps

With multiflow TX at the same time:
Before changes: ~1,700,000 - 2,900,000 pps
After changes, 1 queues/core: ~1,700,000 - 2,900,000 pps
After changes, 2 queues/core: ~1,700,000 pps
After changes, 8 queues/core: ~1,700,000 pps
After changes, borrowing from network stack: 1,150,000 pps

Sporadic "XDP TX failed (-5)" warnings are shown when running xdp program
and pktgen simultaneously. This was expected because XDP doesn't have any
buffering system if the NIC is under very high pressure. Thousands of
these warnings are shown in the case of borrowing net stack queues. As I
said before, this was also expected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>