Message ID | 20201001195247.66636-1-saeed@kernel.org |
---|---|
Headers | show |
Series | mlx5 fixes 2020-09-30 | expand |
On Thu, 1 Oct 2020 12:52:33 -0700 saeed@kernel.org wrote: > From: Shay Drory <shayd@mellanox.com> > > On error flow due to failure on driver load, driver can be > un-initializing while a health work is running in the background, > health work shouldn't be allowed at this point, as it needs resources to > be initialized and there is no point to recover on driver load failures. > > Therefore, introducing a new state bit to indicated if device is > initialized, for health work to check before trying to recover the driver. Can't you cancel this work? Or make sure it's not scheduled? IMHO those "INITILIZED" bits are an anti-pattern. > Fixes: b6e0b6bebe07 ("net/mlx5: Fix fatal error handling during device load") > Signed-off-by: Shay Drory <shayd@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> You signed off twice :) We should teach verify_signoff to catch that..
On Thu, 2020-10-01 at 16:15 -0700, Jakub Kicinski wrote: > On Thu, 1 Oct 2020 12:52:33 -0700 saeed@kernel.org wrote: > > From: Shay Drory <shayd@mellanox.com> > > > > On error flow due to failure on driver load, driver can be > > un-initializing while a health work is running in the background, > > health work shouldn't be allowed at this point, as it needs > > resources to > > be initialized and there is no point to recover on driver load > > failures. > > > > Therefore, introducing a new state bit to indicated if device is > > initialized, for health work to check before trying to recover the > > driver. > > Can't you cancel this work? Or make sure it's not scheduled? > IMHO those "INITILIZED" bits are an anti-pattern. > Shay didn't want to make this patch complicated for net, since this health work should start as early as possible and should be kept running after driver is initialized, even if the driver instance reloads after .. the main issue of the design is that we initialize + allocate the driver structures once on the first boot, after that all reloads will reuse the same structure, so there is some asymmetry that we need to deal with, but nothing is impossible, the solution will be more complicated but won't be too big to make it to net (i hope), I will drop this patch for now. > > Fixes: b6e0b6bebe07 ("net/mlx5: Fix fatal error handling during > > device load") > > Signed-off-by: Shay Drory <shayd@mellanox.com> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> > > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> > > You signed off twice :) > Will fix this, old mellanox email :/ > We should teach verify_signoff to catch that.. it is not exactly twice, different emails..
On Thu, 2020-10-01 at 16:24 -0700, Jakub Kicinski wrote: > On Thu, 1 Oct 2020 12:52:39 -0700 saeed@kernel.org wrote: > > - for (; i >= 0; i--) { > > + for (--i; i >= 0; i--) { > > while (i--) while(--i) I like this, less characters to maintain :)
On 10/2/2020 10:05, Saeed Mahameed wrote: > On Thu, 2020-10-01 at 16:24 -0700, Jakub Kicinski wrote: >> On Thu, 1 Oct 2020 12:52:39 -0700 saeed@kernel.org wrote: >>> - for (; i >= 0; i--) { >>> + for (--i; i >= 0; i--) { >> >> while (i--) > > while(--i) It has to be: while (i--) Case of i=0, > > I like this, less characters to maintain :) > Mark
On Fri, 2020-10-02 at 10:19 -0700, Mark Bloch wrote: > > On 10/2/2020 10:05, Saeed Mahameed wrote: > > On Thu, 2020-10-01 at 16:24 -0700, Jakub Kicinski wrote: > > > On Thu, 1 Oct 2020 12:52:39 -0700 saeed@kernel.org wrote: > > > > - for (; i >= 0; i--) { > > > > + for (--i; i >= 0; i--) { > > > > > > while (i--) > > > > while(--i) > > It has to be: while (i--) > Case of i=0, > woops ! while (i--) it is. Thanks Mark.
From: Saeed Mahameed <saeedm@nvidia.com> Hi Dave, This series introduces some fixes to mlx5 driver. v1->v2: - Patch #1 Don't return while mutex is held. (Dave) Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: Fix VLAN cleanup flow') ('net/mlx5e: Fix VLAN create flow') For -stable v4.16 ('net/mlx5: Fix request_irqs error flow') For -stable v5.4 ('net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU') ('net/mlx5: Avoid possible free of command entry while timeout comp handler') For -stable v5.7 ('net/mlx5e: Fix return status when setting unsupported FEC mode') For -stable v5.8 ('net/mlx5e: Fix race condition on nhe->n pointer in neigh update') Thanks, Saeed. --- The following changes since commit a59cf619787e628b31c310367f869fde26c8ede1: Merge branch 'Fix-bugs-in-Octeontx2-netdev-driver' (2020-09-30 15:07:19 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2020-09-30 for you to fetch changes up to ae2cc06daf21c2a38c6caca2c19599d61a5b3890: net/mlx5e: Fix race condition on nhe->n pointer in neigh update (2020-10-01 12:46:37 -0700) ---------------------------------------------------------------- mlx5-fixes-2020-09-30 ---------------------------------------------------------------- Aya Levin (6): net/mlx5e: Fix error path for RQ alloc net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU net/mlx5e: Fix driver's declaration to support GRE offload net/mlx5e: Fix return status when setting unsupported FEC mode net/mlx5e: Fix VLAN cleanup flow net/mlx5e: Fix VLAN create flow Eran Ben Elisha (4): net/mlx5: Fix a race when moving command interface to polling mode net/mlx5: Avoid possible free of command entry while timeout comp handler net/mlx5: poll cmd EQ in case of command timeout net/mlx5: Add retry mechanism to the command entry index allocation Maor Dickman (1): net/mlx5e: CT, Fix coverity issue Maor Gottlieb (1): net/mlx5: Fix request_irqs error flow Saeed Mahameed (1): net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible Shay Drory (1): net/mlx5: Don't allow health work when device is uninitialized Vlad Buslov (1): net/mlx5e: Fix race condition on nhe->n pointer in neigh update drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 198 +++++++++++++++------ drivers/net/ethernet/mellanox/mlx5/core/en.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en/port.c | 3 + .../net/ethernet/mellanox/mlx5/core/en/rep/neigh.c | 81 +++++---- drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 14 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 104 +++++++++-- drivers/net/ethernet/mellanox/mlx5/core/en_rep.h | 6 - drivers/net/ethernet/mellanox/mlx5/core/eq.c | 42 ++++- drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 ++ drivers/net/ethernet/mellanox/mlx5/core/lib/eq.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + .../net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 2 +- include/linux/mlx5/driver.h | 4 + 15 files changed, 364 insertions(+), 119 deletions(-)