Message ID | 20201215012907.3062-1-ivan@cloudflare.com |
---|---|
State | Superseded |
Headers | show |
Series | [net-next] sfc: reduce the number of requested xdp ev queues | expand |
On Mon, Dec 14, 2020 at 05:29:06PM -0800, Ivan Babrou wrote: > Without this change the driver tries to allocate too many queues, > breaching the number of available msi-x interrupts on machines > with many logical cpus and default adapter settings: > > Insufficient resources for 12 XDP event queues (24 other channels, max 32) > > Which in turn triggers EINVAL on XDP processing: > > sfc 0000:86:00.0 ext0: XDP TX failed (-22) > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> > --- > drivers/net/ethernet/sfc/efx_channels.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c > index a4a626e9cd9a..1bfeee283ea9 100644 > --- a/drivers/net/ethernet/sfc/efx_channels.c > +++ b/drivers/net/ethernet/sfc/efx_channels.c > @@ -17,6 +17,7 @@ > #include "rx_common.h" > #include "nic.h" > #include "sriov.h" > +#include "workarounds.h" > > /* This is the first interrupt mode to try out of: > * 0 => MSI-X > @@ -137,6 +138,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, > { > unsigned int n_channels = parallelism; > int vec_count; > + int tx_per_ev; > int n_xdp_tx; > int n_xdp_ev; > > @@ -149,9 +151,9 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, > * multiple tx queues, assuming tx and ev queues are both > * maximum size. > */ > - > + tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx); > n_xdp_tx = num_possible_cpus(); > - n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL); > + n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev); > > vec_count = pci_msix_vec_count(efx->pci_dev); > if (vec_count < 0) > -- > 2.29.2 -- Martin Habets <habetsm.xilinx@gmail.com>
On Tue, 15 Dec 2020 18:49:55 +0000 Edward Cree <ecree.xilinx@gmail.com> wrote: > On 15/12/2020 09:43, Jesper Dangaard Brouer wrote: > > On Mon, 14 Dec 2020 17:29:06 -0800 > > Ivan Babrou <ivan@cloudflare.com> wrote: > > > >> Without this change the driver tries to allocate too many queues, > >> breaching the number of available msi-x interrupts on machines > >> with many logical cpus and default adapter settings: > >> > >> Insufficient resources for 12 XDP event queues (24 other channels, max 32) > >> > >> Which in turn triggers EINVAL on XDP processing: > >> > >> sfc 0000:86:00.0 ext0: XDP TX failed (-22) > > > > I have a similar QA report with XDP_REDIRECT: > > sfc 0000:05:00.0 ens1f0np0: XDP redirect failed (-22) > > > > Here we are back to the issue we discussed with ixgbe, that NIC / msi-x > > interrupts hardware resources are not enough on machines with many > > logical cpus. > > > > After this fix, what will happen if (cpu >= efx->xdp_tx_queue_count) ? > > Same as happened before: the "failed -22". But this fix will make that > less likely to happen, because it ties more TXQs to each EVQ, and it's > the EVQs that are in short supply. > So, what I hear is that this fix is just pampering over the real issue. I suggest that you/we detect the situation, and have a code path that will take a lock (per 16 packets bulk) and solve the issue. If you care about maximum performance you can implement this via changing the ndo_xdp_xmit pointer to the fallback function when needed, to avoid having a to check for the fallback mode in the fast-path. > > (Strictly speaking, I believe the limitation is a software one, that > comes from the driver's channel structures having been designed a > decade ago when 32 cpus ought to be enough for anybody... AFAIR the > hardware is capable of giving us something like 1024 evqs if we ask > for them, it just might not have that many msi-x vectors for us.) > Anyway, the patch looks correct, so > Acked-by: Edward Cree <ecree.xilinx@gmail.com> -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
On 16/12/2020 08:45, Jesper Dangaard Brouer wrote: > So, what I hear is that this fix is just pampering over the real issue. Yes, it is, but it's better than nothing in the meantime while we work out the complete fix. > I suggest that you/we detect the situation, and have a code path that > will take a lock (per 16 packets bulk) and solve the issue. Imho that would _also_ paper over the issue, because it would mean the system degraded to a lower performance mode of operation, while still appearing to support XDP_TX. I think that that in general should not happen unless there is a way for the user to determine at runtime whether it has/should happen. Perhaps Marek's new XDP feature flags could include a "tx-lockless" flag to indicate this? -ed
On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote: > Without this change the driver tries to allocate too many queues, > breaching the number of available msi-x interrupts on machines > with many logical cpus and default adapter settings: > > Insufficient resources for 12 XDP event queues (24 other channels, max 32) > > Which in turn triggers EINVAL on XDP processing: > > sfc 0000:86:00.0 ext0: XDP TX failed (-22) > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Looks like the discussion may have concluded, but we don't take -next patches during the merge window, so please repost when net-next reopens. Thanks! -- # Form letter - net-next is closed We have already sent the networking pull request for 5.11 and therefore net-next is closed for new drivers, features, code refactoring and optimizations. We are currently accepting bug fixes only. Please repost when net-next reopens after 5.11-rc1 is cut. Look out for the announcement on the mailing list or check: http://vger.kernel.org/~davem/net-next.html RFC patches sent for review only are obviously welcome at any time.
On Thu, Dec 17, 2020 at 10:14 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote: > > Without this change the driver tries to allocate too many queues, > > breaching the number of available msi-x interrupts on machines > > with many logical cpus and default adapter settings: > > > > Insufficient resources for 12 XDP event queues (24 other channels, max 32) > > > > Which in turn triggers EINVAL on XDP processing: > > > > sfc 0000:86:00.0 ext0: XDP TX failed (-22) > > > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> > > Looks like the discussion may have concluded, but we don't take -next > patches during the merge window, so please repost when net-next reopens. > > Thanks! > -- > # Form letter - net-next is closed > > We have already sent the networking pull request for 5.11 and therefore > net-next is closed for new drivers, features, code refactoring and > optimizations. We are currently accepting bug fixes only. > > Please repost when net-next reopens after 5.11-rc1 is cut. Should I resend my patch now that the window is open or is bumping this thread enough? > Look out for the announcement on the mailing list or check: > http://vger.kernel.org/~davem/net-next.html > > RFC patches sent for review only are obviously welcome at any time.
On Tue, 19 Jan 2021 15:43:43 -0800 Ivan Babrou wrote: > On Thu, Dec 17, 2020 at 10:14 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 14 Dec 2020 17:29:06 -0800 Ivan Babrou wrote: > > > Without this change the driver tries to allocate too many queues, > > > breaching the number of available msi-x interrupts on machines > > > with many logical cpus and default adapter settings: > > > > > > Insufficient resources for 12 XDP event queues (24 other channels, max 32) > > > > > > Which in turn triggers EINVAL on XDP processing: > > > > > > sfc 0000:86:00.0 ext0: XDP TX failed (-22) > > > > > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> > > > > Looks like the discussion may have concluded, but we don't take -next > > patches during the merge window, so please repost when net-next reopens. > > > > Thanks! > > -- > > # Form letter - net-next is closed > > > > We have already sent the networking pull request for 5.11 and therefore > > net-next is closed for new drivers, features, code refactoring and > > optimizations. We are currently accepting bug fixes only. > > > > Please repost when net-next reopens after 5.11-rc1 is cut. > > Should I resend my patch now that the window is open or is bumping > this thread enough? You need to resend, thanks!
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index a4a626e9cd9a..1bfeee283ea9 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -17,6 +17,7 @@ #include "rx_common.h" #include "nic.h" #include "sriov.h" +#include "workarounds.h" /* This is the first interrupt mode to try out of: * 0 => MSI-X @@ -137,6 +138,7 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, { unsigned int n_channels = parallelism; int vec_count; + int tx_per_ev; int n_xdp_tx; int n_xdp_ev; @@ -149,9 +151,9 @@ static int efx_allocate_msix_channels(struct efx_nic *efx, * multiple tx queues, assuming tx and ev queues are both * maximum size. */ - + tx_per_ev = EFX_MAX_EVQ_SIZE / EFX_TXQ_MAX_ENT(efx); n_xdp_tx = num_possible_cpus(); - n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, EFX_MAX_TXQ_PER_CHANNEL); + n_xdp_ev = DIV_ROUND_UP(n_xdp_tx, tx_per_ev); vec_count = pci_msix_vec_count(efx->pci_dev); if (vec_count < 0)
Without this change the driver tries to allocate too many queues, breaching the number of available msi-x interrupts on machines with many logical cpus and default adapter settings: Insufficient resources for 12 XDP event queues (24 other channels, max 32) Which in turn triggers EINVAL on XDP processing: sfc 0000:86:00.0 ext0: XDP TX failed (-22) Signed-off-by: Ivan Babrou <ivan@cloudflare.com> --- drivers/net/ethernet/sfc/efx_channels.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)