diff mbox series

[RESEND,net] ice: Correctly deal with PFs that do not support RDMA

Message ID 20210909151223.572918-1-david.m.ertman@intel.com
State Superseded
Headers show
Series [RESEND,net] ice: Correctly deal with PFs that do not support RDMA | expand

Commit Message

Ertman, David M Sept. 9, 2021, 3:12 p.m. UTC
There are two cases where the current PF does not support RDMA
functionality.  The first is if the NVM loaded on the device is set
to not support RDMA (common_caps.rdma is false).  The second is if
the kernel bonding driver has included the current PF in an active
link aggregate.

When the driver has determined that this PF does not support RDMA, then
auxiliary devices should not be created on the auxiliary bus.  Without
a device on the auxiliary bus, even if the irdma driver is present, there
will be no RDMA activity attempted on this PF.

Currently, in the reset flow, an attempt to create auxiliary devices is
performed without regard to the ability of the PF.  There needs to be a
check in ice_aux_plug_dev (as the central point that creates auxiliary
devices) to see if the PF is in a state to support the functionality.

When disabling and re-enabling RDMA due to the inclusion/removal of the PF
in a link aggregate, we also need to set/clear the bit which controls
auxiliary device creation so that a reset recovery in a link aggregate
situation doesn't try to create auxiliary devices when it shouldn't.

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Reported-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h     | 2 ++
 drivers/net/ethernet/intel/ice/ice_idc.c | 6 ++++++
 2 files changed, 8 insertions(+)

Comments

Leon Romanovsky Sept. 10, 2021, 9:19 a.m. UTC | #1
On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:
> There are two cases where the current PF does not support RDMA
> functionality.  The first is if the NVM loaded on the device is set
> to not support RDMA (common_caps.rdma is false).  The second is if
> the kernel bonding driver has included the current PF in an active
> link aggregate.
> 
> When the driver has determined that this PF does not support RDMA, then
> auxiliary devices should not be created on the auxiliary bus. 

This part is wrong, auxiliary devices should always be created, in your case it
will be one eth device only without extra irdma device.

Your "bug" is that you mixed auxiliary bus devices with "regular" ones
and created eth device not as auxiliary one. This is why you are calling
to auxiliary_device_init() for RDMA only and fallback to non-auxiliary mode.

I hope that this is simple mistake while Intel folks rushed to merge irdma
and not deliberate decision to find a way to support out-of-tree drivers.

As a reminder, the whole idea of auxiliary bus is to have small,
independent vendor driver core logic that manages capabilities and
based on that creates/removes sub-devices (eth, rdma, vdpa ...), so
driver core can properly load/unload their respective drivers.

Thanks
Saleem, Shiraz Sept. 13, 2021, 3:49 p.m. UTC | #2
> Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> support RDMA

> 

> On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:

> > There are two cases where the current PF does not support RDMA

> > functionality.  The first is if the NVM loaded on the device is set to

> > not support RDMA (common_caps.rdma is false).  The second is if the

> > kernel bonding driver has included the current PF in an active link

> > aggregate.

> >

> > When the driver has determined that this PF does not support RDMA,

> > then auxiliary devices should not be created on the auxiliary bus.

> 

> This part is wrong, auxiliary devices should always be created, in your case it will

> be one eth device only without extra irdma device.


It is worth considering having an eth aux device/driver but is it a hard-and-fast rule?
In this case, the RDMA-capable PCI network device spawns an auxiliary device for RDMA
and the core driver is a network driver.

> 

> Your "bug" is that you mixed auxiliary bus devices with "regular" ones and created

> eth device not as auxiliary one. This is why you are calling to auxiliary_device_init()

> for RDMA only and fallback to non-auxiliary mode.


It's a design choice on how you carve out function(s) off your PCI core device to be
managed by auxiliary driver(s) and not a bug.

Shiraz
Ertman, David M Sept. 13, 2021, 4:07 p.m. UTC | #3
> -----Original Message-----

> From: Saleem, Shiraz <shiraz.saleem@intel.com>

> Sent: Monday, September 13, 2021 8:50 AM

> To: Leon Romanovsky <leon@kernel.org>; Ertman, David M

> <david.m.ertman@intel.com>

> Cc: davem@davemloft.net; kuba@kernel.org; yongxin.liu@windriver.com;

> Nguyen, Anthony L <anthony.l.nguyen@intel.com>;

> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Brandeburg, Jesse

> <jesse.brandeburg@intel.com>; intel-wired-lan@lists.osuosl.org; linux-

> rdma@vger.kernel.org; jgg@ziepe.ca; Williams, Dan J

> <dan.j.williams@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>;

> Parikh, Neerav <neerav.parikh@intel.com>; Samudrala, Sridhar

> <sridhar.samudrala@intel.com>

> Subject: RE: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> support RDMA

> 

> > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> > support RDMA

> >

> > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:

> > > There are two cases where the current PF does not support RDMA

> > > functionality.  The first is if the NVM loaded on the device is set to

> > > not support RDMA (common_caps.rdma is false).  The second is if the

> > > kernel bonding driver has included the current PF in an active link

> > > aggregate.

> > >

> > > When the driver has determined that this PF does not support RDMA,

> > > then auxiliary devices should not be created on the auxiliary bus.

> >

> > This part is wrong, auxiliary devices should always be created, in your case it

> will

> > be one eth device only without extra irdma device.

> 

> It is worth considering having an eth aux device/driver but is it a hard-and-

> fast rule?

> In this case, the RDMA-capable PCI network device spawns an auxiliary

> device for RDMA

> and the core driver is a network driver.

> 

> >

> > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and

> created

> > eth device not as auxiliary one. This is why you are calling to

> auxiliary_device_init()

> > for RDMA only and fallback to non-auxiliary mode.

> 

> It's a design choice on how you carve out function(s) off your PCI core device

> to be

> managed by auxiliary driver(s) and not a bug.

> 

> Shiraz


Also, regardless of whether netdev functionality is carved out into an auxiliary device or not, this code would still be necessary.

We don't want to carve out an auxiliary device to support a functionality that the base PCI device does not support.  Not having
the RDMA auxiliary device for an auxiliary driver to bind to is how we differentiate between devices that support RDMA and those
that don't.

Thanks,
DaveE
Leon Romanovsky Sept. 14, 2021, 3:10 a.m. UTC | #4
On Mon, Sep 13, 2021 at 03:49:43PM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> > support RDMA

> > 

> > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:

> > > There are two cases where the current PF does not support RDMA

> > > functionality.  The first is if the NVM loaded on the device is set to

> > > not support RDMA (common_caps.rdma is false).  The second is if the

> > > kernel bonding driver has included the current PF in an active link

> > > aggregate.

> > >

> > > When the driver has determined that this PF does not support RDMA,

> > > then auxiliary devices should not be created on the auxiliary bus.

> > 

> > This part is wrong, auxiliary devices should always be created, in your case it will

> > be one eth device only without extra irdma device.

> 

> It is worth considering having an eth aux device/driver but is it a hard-and-fast rule?

> In this case, the RDMA-capable PCI network device spawns an auxiliary device for RDMA

> and the core driver is a network driver.

> 

> > 

> > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and created

> > eth device not as auxiliary one. This is why you are calling to auxiliary_device_init()

> > for RDMA only and fallback to non-auxiliary mode.

> 

> It's a design choice on how you carve out function(s) off your PCI core device to be

> managed by auxiliary driver(s) and not a bug.


I'm not the one who is setting rules, just explaining what is wrong with
the current design and proposed solution.

The driver/core design expects three building blocks: logic that
enumerates (creates) devices, bus that connects those devices
(load/unload drivers) and specific drivers for every such device.

Such separation allows clean view from locking perspective (separated
devices), proper sysfs layout and same logic for the user space tools.

In your case, you connected ethernet driver to be "enumerator" and
replaced (duplicated) general driver/core logic that decides if to load
or not auxiliary device driver with your custom code.

Thanks

> 

> Shiraz
Leon Romanovsky Sept. 14, 2021, 3:16 a.m. UTC | #5
On Mon, Sep 13, 2021 at 04:07:28PM +0000, Ertman, David M wrote:
> > -----Original Message-----

> > From: Saleem, Shiraz <shiraz.saleem@intel.com>

> > Sent: Monday, September 13, 2021 8:50 AM

> > To: Leon Romanovsky <leon@kernel.org>; Ertman, David M

> > <david.m.ertman@intel.com>

> > Cc: davem@davemloft.net; kuba@kernel.org; yongxin.liu@windriver.com;

> > Nguyen, Anthony L <anthony.l.nguyen@intel.com>;

> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Brandeburg, Jesse

> > <jesse.brandeburg@intel.com>; intel-wired-lan@lists.osuosl.org; linux-

> > rdma@vger.kernel.org; jgg@ziepe.ca; Williams, Dan J

> > <dan.j.williams@intel.com>; Singhai, Anjali <anjali.singhai@intel.com>;

> > Parikh, Neerav <neerav.parikh@intel.com>; Samudrala, Sridhar

> > <sridhar.samudrala@intel.com>

> > Subject: RE: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> > support RDMA

> > 

> > > Subject: Re: [PATCH RESEND net] ice: Correctly deal with PFs that do not

> > > support RDMA

> > >

> > > On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:

> > > > There are two cases where the current PF does not support RDMA

> > > > functionality.  The first is if the NVM loaded on the device is set to

> > > > not support RDMA (common_caps.rdma is false).  The second is if the

> > > > kernel bonding driver has included the current PF in an active link

> > > > aggregate.

> > > >

> > > > When the driver has determined that this PF does not support RDMA,

> > > > then auxiliary devices should not be created on the auxiliary bus.

> > >

> > > This part is wrong, auxiliary devices should always be created, in your case it

> > will

> > > be one eth device only without extra irdma device.

> > 

> > It is worth considering having an eth aux device/driver but is it a hard-and-

> > fast rule?

> > In this case, the RDMA-capable PCI network device spawns an auxiliary

> > device for RDMA

> > and the core driver is a network driver.

> > 

> > >

> > > Your "bug" is that you mixed auxiliary bus devices with "regular" ones and

> > created

> > > eth device not as auxiliary one. This is why you are calling to

> > auxiliary_device_init()

> > > for RDMA only and fallback to non-auxiliary mode.

> > 

> > It's a design choice on how you carve out function(s) off your PCI core device

> > to be

> > managed by auxiliary driver(s) and not a bug.

> > 

> > Shiraz

> 

> Also, regardless of whether netdev functionality is carved out into an auxiliary device or not, this code would still be necessary.


Right

> 

> We don't want to carve out an auxiliary device to support a functionality that the base PCI device does not support.  Not having

> the RDMA auxiliary device for an auxiliary driver to bind to is how we differentiate between devices that support RDMA and those

> that don't.


This is right too.

My complain is that you mixed enumerator logic with eth driver and
create auxiliary bus only if your RDMA device exists. It is wrong.

Thanks

> 

> Thanks,

> DaveE

>
Jason Gunthorpe Sept. 24, 2021, 2:10 p.m. UTC | #6
On Thu, Sep 09, 2021 at 08:12:23AM -0700, Dave Ertman wrote:
> There are two cases where the current PF does not support RDMA

> functionality.  The first is if the NVM loaded on the device is set

> to not support RDMA (common_caps.rdma is false).  The second is if

> the kernel bonding driver has included the current PF in an active

> link aggregate.

> 

> When the driver has determined that this PF does not support RDMA, then

> auxiliary devices should not be created on the auxiliary bus.  Without

> a device on the auxiliary bus, even if the irdma driver is present, there

> will be no RDMA activity attempted on this PF.

> 

> Currently, in the reset flow, an attempt to create auxiliary devices is

> performed without regard to the ability of the PF.  There needs to be a

> check in ice_aux_plug_dev (as the central point that creates auxiliary

> devices) to see if the PF is in a state to support the functionality.

> 

> When disabling and re-enabling RDMA due to the inclusion/removal of the PF

> in a link aggregate, we also need to set/clear the bit which controls

> auxiliary device creation so that a reset recovery in a link aggregate

> situation doesn't try to create auxiliary devices when it shouldn't.

> 

> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")

> Reported-by: Yongxin Liu <yongxin.liu@windriver.com>

> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>

> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

>  drivers/net/ethernet/intel/ice/ice.h     | 2 ++

>  drivers/net/ethernet/intel/ice/ice_idc.c | 6 ++++++

>  2 files changed, 8 insertions(+)

> 

> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h

> index eadcb9958346..3c4f08d20414 100644

> +++ b/drivers/net/ethernet/intel/ice/ice.h

> @@ -695,6 +695,7 @@ static inline void ice_set_rdma_cap(struct ice_pf *pf)

>  {

>  	if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) {

>  		set_bit(ICE_FLAG_RDMA_ENA, pf->flags);

> +		set_bit(ICE_FLAG_AUX_ENA, pf->flags);

>  		ice_plug_aux_dev(pf);


I agree with Leon, there shouldn't be a flag for "aux en". aux is
enabled when a device on the aux bus is required. It should all be
rdma en, which already seems to have a bit.

Th only existing place that uses aux_ena immediately calls

		err = ice_init_rdma(pf);

So I'd just delete the whole thing and use rdma_ena. Frankly it looks
structured confusingly, the mlx implementation is better where this is
one function that synchronizes the aux bus with the current state of
the driver - adding/removing as required

Jason
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index eadcb9958346..3c4f08d20414 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -695,6 +695,7 @@  static inline void ice_set_rdma_cap(struct ice_pf *pf)
 {
 	if (pf->hw.func_caps.common_cap.rdma && pf->num_rdma_msix) {
 		set_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+		set_bit(ICE_FLAG_AUX_ENA, pf->flags);
 		ice_plug_aux_dev(pf);
 	}
 }
@@ -707,5 +708,6 @@  static inline void ice_clear_rdma_cap(struct ice_pf *pf)
 {
 	ice_unplug_aux_dev(pf);
 	clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+	clear_bit(ICE_FLAG_AUX_ENA, pf->flags);
 }
 #endif /* _ICE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 1f2afdf6cd48..adcc9a251595 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -271,6 +271,12 @@  int ice_plug_aux_dev(struct ice_pf *pf)
 	struct auxiliary_device *adev;
 	int ret;
 
+	/* if this PF doesn't support a technology that requires auxiliary
+	 * devices, then gracefully exit
+	 */
+	if (!ice_is_aux_ena(pf))
+		return 0;
+
 	iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
 	if (!iadev)
 		return -ENOMEM;