diff mbox series

[RFC,net] net: dsa: tear down devlink port regions when tearing down the devlink port on error

Message ID 20210902231738.1903476-1-vladimir.oltean@nxp.com
State New
Headers show
Series [RFC,net] net: dsa: tear down devlink port regions when tearing down the devlink port on error | expand

Commit Message

Vladimir Oltean Sept. 2, 2021, 11:17 p.m. UTC
Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")
decided it was fine to ignore errors on certain ports that fail to
probe, and go on with the ports that do probe fine.

Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")
noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get
called, and devlink notices after a timeout of 3700 seconds and prints a
WARN_ON. So it went ahead to unregister the devlink port. And because
there exists an UNUSED port flavour, we actually re-register the devlink
port as UNUSED.

Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to
DSA") added devlink port regions, which are set up by the driver and not
by DSA.

When we trigger the devlink port deregistration and reregistration as
unused, devlink now prints another WARN_ON, from here:

devlink_port_unregister:
	WARN_ON(!list_empty(&devlink_port->region_list));

So the port still has regions, which makes sense, because they were set
up by the driver, and the driver doesn't know we're unregistering the
devlink port.

Somebody needs to tear them down, and optionally (actually it would be
nice, to be consistent) set them up again for the new devlink port.

But DSA's layering stays in our way quite badly here.

The options I've considered are:

1. Introduce a function in devlink to just change a port's type and
   flavour. No dice, devlink keeps a lot of state, it really wants the
   port to not be registered when you set its parameters, so changing
   anything can only be done by destroying what we currently have and
   recreating it.

2. Make DSA cache the parameters passed to dsa_devlink_port_region_create,
   and the region returned, keep those in a list, then when the devlink
   port unregister needs to take place, the existing devlink regions are
   destroyed by DSA, and we replay the creation of new regions using the
   cached parameters. Problem: mv88e6xxx keeps the region pointers in
   chip->ports[port].region, and these will remain stale after DSA frees
   them. There are many things DSA can do, but updating mv88e6xxx's
   private pointers is not one of them.

3. Just let the driver do it. It's pretty horrible, but the other
   methods just don't seem to work.

Fixes: 08156ba430b4 ("net: dsa: Add devlink port regions support to DSA")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c    |  1 +
 drivers/net/dsa/mv88e6xxx/devlink.c | 16 ++++++++++++++++
 drivers/net/dsa/mv88e6xxx/devlink.h |  1 +
 include/net/dsa.h                   |  9 +++++++++
 net/dsa/dsa.c                       |  6 ++++++
 net/dsa/dsa2.c                      | 22 +++++++++++++++++-----
 6 files changed, 50 insertions(+), 5 deletions(-)

Comments

Leon Romanovsky Sept. 5, 2021, 7:07 a.m. UTC | #1
On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:
> Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")

> decided it was fine to ignore errors on certain ports that fail to

> probe, and go on with the ports that do probe fine.

> 

> Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")

> noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get

> called, and devlink notices after a timeout of 3700 seconds and prints a

> WARN_ON. So it went ahead to unregister the devlink port. And because

> there exists an UNUSED port flavour, we actually re-register the devlink

> port as UNUSED.

> 

> Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to

> DSA") added devlink port regions, which are set up by the driver and not

> by DSA.

> 

> When we trigger the devlink port deregistration and reregistration as

> unused, devlink now prints another WARN_ON, from here:

> 

> devlink_port_unregister:

> 	WARN_ON(!list_empty(&devlink_port->region_list));

> 

> So the port still has regions, which makes sense, because they were set

> up by the driver, and the driver doesn't know we're unregistering the

> devlink port.

> 

> Somebody needs to tear them down, and optionally (actually it would be

> nice, to be consistent) set them up again for the new devlink port.

> 

> But DSA's layering stays in our way quite badly here.


I don't know anything about DSA and what led to the decision to ignore
devlink registration errors, but devlink core is relying on the simple
assumption that everything is initialized correctly.

So if DSA needs to have not-initialized port, it should do all the needed
hacks internally.

Thanks
Vladimir Oltean Sept. 5, 2021, 8:45 a.m. UTC | #2
On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:
> On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> > Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")

> > decided it was fine to ignore errors on certain ports that fail to

> > probe, and go on with the ports that do probe fine.

> > 

> > Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")

> > noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get

> > called, and devlink notices after a timeout of 3700 seconds and prints a

> > WARN_ON. So it went ahead to unregister the devlink port. And because

> > there exists an UNUSED port flavour, we actually re-register the devlink

> > port as UNUSED.

> > 

> > Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to

> > DSA") added devlink port regions, which are set up by the driver and not

> > by DSA.

> > 

> > When we trigger the devlink port deregistration and reregistration as

> > unused, devlink now prints another WARN_ON, from here:

> > 

> > devlink_port_unregister:

> > 	WARN_ON(!list_empty(&devlink_port->region_list));

> > 

> > So the port still has regions, which makes sense, because they were set

> > up by the driver, and the driver doesn't know we're unregistering the

> > devlink port.

> > 

> > Somebody needs to tear them down, and optionally (actually it would be

> > nice, to be consistent) set them up again for the new devlink port.

> > 

> > But DSA's layering stays in our way quite badly here.

> 

> I don't know anything about DSA


It is sufficient to know in this case that it is a multi-port networking
driver.

> and what led to the decision to ignore devlink registration errors,


But we are not ignoring devlink registration errors...

The devlink_port must be initialized prior to initializing the net_device.

Initializing a certain net_device may fail due to reasons such as "PHY
not found". It is desirable in certain cases for a net_device
initialization failure to not fail the entire switch probe.

So at the very least, rollback of the registration of that port must be
performed before continuing => the devlink_port needs to be unregistered
when the net_device initialization has failed.

> but devlink core is relying on the simple assumption that everything

> is initialized correctly.

> 

> So if DSA needs to have not-initialized port, it should do all the needed

> hacks internally.


So the current problem is that the DSA framework does not ask the hardware
driver whether it has devlink port regions which need to be torn down
before unregistering the devlink port.

I was expecting the feedback to be "we need to introduce new methods in
struct dsa_switch_ops which do .port_setup and .port_teardown, similar
to the already existing per-switch .setup and .teardown, and drivers
which set up devlink port regions should set these up from the port
methods, so that DSA can simply call those when it needs to tear down a
devlink port without tearing down the entire switch and devlink instance".
The proposed patch is horrible and I agree, but not for the reasons you
might think it is.

Either way, "all the needed hacks" are already done internally, and from
devlink's perspective everything is initialized correctly, not sure what
this comment is about. I am really not changing anything in DSA's
interaction with the devlink core, other than ensuring we do not
unregister a devlink port with regions on it.
Leon Romanovsky Sept. 5, 2021, 10:25 a.m. UTC | #3
On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:
> On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> > > Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")

> > > decided it was fine to ignore errors on certain ports that fail to

> > > probe, and go on with the ports that do probe fine.

> > > 

> > > Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")

> > > noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get

> > > called, and devlink notices after a timeout of 3700 seconds and prints a

> > > WARN_ON. So it went ahead to unregister the devlink port. And because

> > > there exists an UNUSED port flavour, we actually re-register the devlink

> > > port as UNUSED.

> > > 

> > > Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to

> > > DSA") added devlink port regions, which are set up by the driver and not

> > > by DSA.

> > > 

> > > When we trigger the devlink port deregistration and reregistration as

> > > unused, devlink now prints another WARN_ON, from here:

> > > 

> > > devlink_port_unregister:

> > > 	WARN_ON(!list_empty(&devlink_port->region_list));

> > > 

> > > So the port still has regions, which makes sense, because they were set

> > > up by the driver, and the driver doesn't know we're unregistering the

> > > devlink port.

> > > 

> > > Somebody needs to tear them down, and optionally (actually it would be

> > > nice, to be consistent) set them up again for the new devlink port.

> > > 

> > > But DSA's layering stays in our way quite badly here.

> > 

> > I don't know anything about DSA

> 

> It is sufficient to know in this case that it is a multi-port networking

> driver.

> 

> > and what led to the decision to ignore devlink registration errors,

> 

> But we are not ignoring devlink registration errors...

> 

> The devlink_port must be initialized prior to initializing the net_device.

> 

> Initializing a certain net_device may fail due to reasons such as "PHY

> not found". It is desirable in certain cases for a net_device

> initialization failure to not fail the entire switch probe.

> 

> So at the very least, rollback of the registration of that port must be

> performed before continuing => the devlink_port needs to be unregistered

> when the net_device initialization has failed.

> 

> > but devlink core is relying on the simple assumption that everything

> > is initialized correctly.

> > 

> > So if DSA needs to have not-initialized port, it should do all the needed

> > hacks internally.

> 

> So the current problem is that the DSA framework does not ask the hardware

> driver whether it has devlink port regions which need to be torn down

> before unregistering the devlink port.

> 

> I was expecting the feedback to be "we need to introduce new methods in

> struct dsa_switch_ops which do .port_setup and .port_teardown, similar

> to the already existing per-switch .setup and .teardown, and drivers

> which set up devlink port regions should set these up from the port

> methods, so that DSA can simply call those when it needs to tear down a

> devlink port without tearing down the entire switch and devlink instance".

> The proposed patch is horrible and I agree, but not for the reasons you

> might think it is.

> 

> Either way, "all the needed hacks" are already done internally, and from

> devlink's perspective everything is initialized correctly, not sure what

> this comment is about. I am really not changing anything in DSA's

> interaction with the devlink core, other than ensuring we do not

> unregister a devlink port with regions on it.


That sentence means that your change is OK and you did it right by not
changing devlink port to hold not-working ports.

Thanks
Vladimir Oltean Sept. 5, 2021, 10:31 a.m. UTC | #4
On Sun, Sep 05, 2021 at 01:25:03PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:

> > On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> > > > Commit 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")

> > > > decided it was fine to ignore errors on certain ports that fail to

> > > > probe, and go on with the ports that do probe fine.

> > > > 

> > > > Commit fb6ec87f7229 ("net: dsa: Fix type was not set for devlink port")

> > > > noticed that devlink_port_type_eth_set(dlp, dp->slave); does not get

> > > > called, and devlink notices after a timeout of 3700 seconds and prints a

> > > > WARN_ON. So it went ahead to unregister the devlink port. And because

> > > > there exists an UNUSED port flavour, we actually re-register the devlink

> > > > port as UNUSED.

> > > > 

> > > > Commit 08156ba430b4 ("net: dsa: Add devlink port regions support to

> > > > DSA") added devlink port regions, which are set up by the driver and not

> > > > by DSA.

> > > > 

> > > > When we trigger the devlink port deregistration and reregistration as

> > > > unused, devlink now prints another WARN_ON, from here:

> > > > 

> > > > devlink_port_unregister:

> > > > 	WARN_ON(!list_empty(&devlink_port->region_list));

> > > > 

> > > > So the port still has regions, which makes sense, because they were set

> > > > up by the driver, and the driver doesn't know we're unregistering the

> > > > devlink port.

> > > > 

> > > > Somebody needs to tear them down, and optionally (actually it would be

> > > > nice, to be consistent) set them up again for the new devlink port.

> > > > 

> > > > But DSA's layering stays in our way quite badly here.

> > > 

> > > I don't know anything about DSA

> > 

> > It is sufficient to know in this case that it is a multi-port networking

> > driver.

> > 

> > > and what led to the decision to ignore devlink registration errors,

> > 

> > But we are not ignoring devlink registration errors...

> > 

> > The devlink_port must be initialized prior to initializing the net_device.

> > 

> > Initializing a certain net_device may fail due to reasons such as "PHY

> > not found". It is desirable in certain cases for a net_device

> > initialization failure to not fail the entire switch probe.

> > 

> > So at the very least, rollback of the registration of that port must be

> > performed before continuing => the devlink_port needs to be unregistered

> > when the net_device initialization has failed.

> > 

> > > but devlink core is relying on the simple assumption that everything

> > > is initialized correctly.

> > > 

> > > So if DSA needs to have not-initialized port, it should do all the needed

> > > hacks internally.

> > 

> > So the current problem is that the DSA framework does not ask the hardware

> > driver whether it has devlink port regions which need to be torn down

> > before unregistering the devlink port.

> > 

> > I was expecting the feedback to be "we need to introduce new methods in

> > struct dsa_switch_ops which do .port_setup and .port_teardown, similar

> > to the already existing per-switch .setup and .teardown, and drivers

> > which set up devlink port regions should set these up from the port

> > methods, so that DSA can simply call those when it needs to tear down a

> > devlink port without tearing down the entire switch and devlink instance".

> > The proposed patch is horrible and I agree, but not for the reasons you

> > might think it is.

> > 

> > Either way, "all the needed hacks" are already done internally, and from

> > devlink's perspective everything is initialized correctly, not sure what

> > this comment is about. I am really not changing anything in DSA's

> > interaction with the devlink core, other than ensuring we do not

> > unregister a devlink port with regions on it.

> 

> That sentence means that your change is OK and you did it right by not

> changing devlink port to hold not-working ports.


You're with me so far.

There is a second part. The ports with 'status = "disabled"' in the
device tree still get devlink ports registered, but with the
DEVLINK_PORT_FLAVOUR_UNUSED flavour and no netdev. These devlink ports
still have things like port regions exported.

What we do for ports that have failed to probe is to reinit their
devlink ports as DEVLINK_PORT_FLAVOUR_UNUSED, and their port regions, so
they effectively behave as though they were disabled in the device tree.
Leon Romanovsky Sept. 5, 2021, 10:47 a.m. UTC | #5
On Sun, Sep 05, 2021 at 01:31:25PM +0300, Vladimir Oltean wrote:
> On Sun, Sep 05, 2021 at 01:25:03PM +0300, Leon Romanovsky wrote:

> > On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:

> > > On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > > > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:


<...>

> > That sentence means that your change is OK and you did it right by not

> > changing devlink port to hold not-working ports.

> 

> You're with me so far.

> 

> There is a second part. The ports with 'status = "disabled"' in the

> device tree still get devlink ports registered, but with the

> DEVLINK_PORT_FLAVOUR_UNUSED flavour and no netdev. These devlink ports

> still have things like port regions exported.

> 

> What we do for ports that have failed to probe is to reinit their

> devlink ports as DEVLINK_PORT_FLAVOUR_UNUSED, and their port regions, so

> they effectively behave as though they were disabled in the device tree.


Yes, and this part require DSA knowledge that I don't have, because you
suggest fallback for any error during devlink port register, which can
fail for reasons that require proper unwind instead of reinit.

Thanks
Vladimir Oltean Sept. 5, 2021, 11:07 a.m. UTC | #6
On Sun, Sep 05, 2021 at 01:47:51PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 05, 2021 at 01:31:25PM +0300, Vladimir Oltean wrote:

> > On Sun, Sep 05, 2021 at 01:25:03PM +0300, Leon Romanovsky wrote:

> > > On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:

> > > > On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > > > > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> 

> <...>

> 

> > > That sentence means that your change is OK and you did it right by not

> > > changing devlink port to hold not-working ports.

> > 

> > You're with me so far.

> > 

> > There is a second part. The ports with 'status = "disabled"' in the

> > device tree still get devlink ports registered, but with the

> > DEVLINK_PORT_FLAVOUR_UNUSED flavour and no netdev. These devlink ports

> > still have things like port regions exported.

> > 

> > What we do for ports that have failed to probe is to reinit their

> > devlink ports as DEVLINK_PORT_FLAVOUR_UNUSED, and their port regions, so

> > they effectively behave as though they were disabled in the device tree.

> 

> Yes, and this part require DSA knowledge that I don't have, because you

> suggest fallback for any error during devlink port register,


Again, fallback but not during devlink port register. The devlink port
was registered just fine, but our plans changed midway. If you want to
create a net device with an associated devlink port, first you need to
create the devlink port and then the net device, then you need to link
the two using devlink_port_type_eth_set, at least according to my
understanding.

So the failure is during the creation of the **net device**, we now have a
devlink port which was originally intended to be of the Ethernet type
and have a physical flavour, but it will not be backed by any net device,
because the creation of that just failed. So the question is simply what
to do with that devlink port.

The only thing I said about the devlink API in the commit description is
that it would have been nice to just flip the type and flavour of a
devlink port, post registration. That would avoid a lot of complications
in DSA. But that is obviously not possible, and my patch does not even
attempt to do it. What DSA does today, and will still do after the patch
we are discussing on, is to unregister that initial devlink port, and
create another one with the unused flavour, and register that one.

The reason why we even bother to re-register a devlink port a second
time for a port that failed to create and initialize its net_device is
basically for consistency with the ports that are statically disabled in
the device tree. Since devlink is a mechanism through which we gain
insight into the hardware, and disabled ports are still physically
present, just, you know, disabled and not used, their devlink ports
still exist and can be used for things like dumping port regions.
We treat ports that fail to find their PHY during probing as
'dynamically disabled', and the expectation is for them to behave just
the same as ports that were statically disabled through the device tree.

My change is entirely about how to properly structure the code such that
we unregister the port regions that a devlink port might have, before
unregistering the devlink port itself, and how to re-register those port
regions then, after the new devlink port was registered.

> which can fail for reasons that require proper unwind instead of

> reinit.

> 

> Thanks
Leon Romanovsky Sept. 5, 2021, 1:01 p.m. UTC | #7
On Sun, Sep 05, 2021 at 02:07:35PM +0300, Vladimir Oltean wrote:
> On Sun, Sep 05, 2021 at 01:47:51PM +0300, Leon Romanovsky wrote:

> > On Sun, Sep 05, 2021 at 01:31:25PM +0300, Vladimir Oltean wrote:

> > > On Sun, Sep 05, 2021 at 01:25:03PM +0300, Leon Romanovsky wrote:

> > > > On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:

> > > > > On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > > > > > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> > 

> > <...>

> > 

> > > > That sentence means that your change is OK and you did it right by not

> > > > changing devlink port to hold not-working ports.

> > > 

> > > You're with me so far.

> > > 

> > > There is a second part. The ports with 'status = "disabled"' in the

> > > device tree still get devlink ports registered, but with the

> > > DEVLINK_PORT_FLAVOUR_UNUSED flavour and no netdev. These devlink ports

> > > still have things like port regions exported.

> > > 

> > > What we do for ports that have failed to probe is to reinit their

> > > devlink ports as DEVLINK_PORT_FLAVOUR_UNUSED, and their port regions, so

> > > they effectively behave as though they were disabled in the device tree.

> > 

> > Yes, and this part require DSA knowledge that I don't have, because you

> > suggest fallback for any error during devlink port register,

> 

> Again, fallback but not during devlink port register. The devlink port

> was registered just fine, but our plans changed midway. If you want to

> create a net device with an associated devlink port, first you need to

> create the devlink port and then the net device, then you need to link

> the two using devlink_port_type_eth_set, at least according to my

> understanding.

> 

> So the failure is during the creation of the **net device**, we now have a

> devlink port which was originally intended to be of the Ethernet type

> and have a physical flavour, but it will not be backed by any net device,

> because the creation of that just failed. So the question is simply what

> to do with that devlink port.


I lost you here, from known to me from the NIC, the **net devices** are
created with devlink_alloc() API call and devlink_port_register comes
later. It means that net device is created (or not) before devlink port
code.

Anyway, it is really not important to me as long as changes won't touch
net/core/devlink.c.

Thanks
Vladimir Oltean Sept. 5, 2021, 3:01 p.m. UTC | #8
On Sun, Sep 05, 2021 at 04:01:32PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 05, 2021 at 02:07:35PM +0300, Vladimir Oltean wrote:

> > On Sun, Sep 05, 2021 at 01:47:51PM +0300, Leon Romanovsky wrote:

> > > On Sun, Sep 05, 2021 at 01:31:25PM +0300, Vladimir Oltean wrote:

> > > > On Sun, Sep 05, 2021 at 01:25:03PM +0300, Leon Romanovsky wrote:

> > > > > On Sun, Sep 05, 2021 at 11:45:18AM +0300, Vladimir Oltean wrote:

> > > > > > On Sun, Sep 05, 2021 at 10:07:45AM +0300, Leon Romanovsky wrote:

> > > > > > > On Fri, Sep 03, 2021 at 02:17:38AM +0300, Vladimir Oltean wrote:

> > > 

> > > <...>

> > > 

> > > > > That sentence means that your change is OK and you did it right by not

> > > > > changing devlink port to hold not-working ports.

> > > > 

> > > > You're with me so far.

> > > > 

> > > > There is a second part. The ports with 'status = "disabled"' in the

> > > > device tree still get devlink ports registered, but with the

> > > > DEVLINK_PORT_FLAVOUR_UNUSED flavour and no netdev. These devlink ports

> > > > still have things like port regions exported.

> > > > 

> > > > What we do for ports that have failed to probe is to reinit their

> > > > devlink ports as DEVLINK_PORT_FLAVOUR_UNUSED, and their port regions, so

> > > > they effectively behave as though they were disabled in the device tree.

> > > 

> > > Yes, and this part require DSA knowledge that I don't have, because you

> > > suggest fallback for any error during devlink port register,

> > 

> > Again, fallback but not during devlink port register. The devlink port

> > was registered just fine, but our plans changed midway. If you want to

> > create a net device with an associated devlink port, first you need to

> > create the devlink port and then the net device, then you need to link

> > the two using devlink_port_type_eth_set, at least according to my

> > understanding.

> > 

> > So the failure is during the creation of the **net device**, we now have a

> > devlink port which was originally intended to be of the Ethernet type

> > and have a physical flavour, but it will not be backed by any net device,

> > because the creation of that just failed. So the question is simply what

> > to do with that devlink port.

> 

> I lost you here, from known to me from the NIC, the **net devices** are

> created with devlink_alloc() API call and devlink_port_register comes

> later. It means that net device is created (or not) before devlink port

> code.

> 

> Anyway, it is really not important to me as long as changes won't touch

> net/core/devlink.c.


Unless I am mistaken, there is a DEVLINK_PORT_TYPE_WARN_TIMEOUT of 3600
seconds until you can associate the devlink_port with the net_device via
devlink_port_type_eth_set, _after_ you've registered the net_device.
Is that incorrect, or is that not what the timeout is for?
Jakub Kicinski Sept. 7, 2021, 3:44 p.m. UTC | #9
On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:
> Again, fallback but not during devlink port register. The devlink port

> was registered just fine, but our plans changed midway. If you want to

> create a net device with an associated devlink port, first you need to

> create the devlink port and then the net device, then you need to link

> the two using devlink_port_type_eth_set, at least according to my

> understanding.

> 

> So the failure is during the creation of the **net device**, we now have a

> devlink port which was originally intended to be of the Ethernet type

> and have a physical flavour, but it will not be backed by any net device,

> because the creation of that just failed. So the question is simply what

> to do with that devlink port.


Is the failure you're referring to discovered inside the
register_netdevice() call?

> The only thing I said about the devlink API in the commit description is

> that it would have been nice to just flip the type and flavour of a

> devlink port, post registration. That would avoid a lot of complications

> in DSA. But that is obviously not possible, and my patch does not even

> attempt to do it. What DSA does today, and will still do after the patch

> we are discussing on, is to unregister that initial devlink port, and

> create another one with the unused flavour, and register that one.

> 

> The reason why we even bother to re-register a devlink port a second

> time for a port that failed to create and initialize its net_device is

> basically for consistency with the ports that are statically disabled in

> the device tree. Since devlink is a mechanism through which we gain

> insight into the hardware, and disabled ports are still physically

> present, just, you know, disabled and not used, their devlink ports

> still exist and can be used for things like dumping port regions.

> We treat ports that fail to find their PHY during probing as

> 'dynamically disabled', and the expectation is for them to behave just

> the same as ports that were statically disabled through the device tree.

> 

> My change is entirely about how to properly structure the code such that

> we unregister the port regions that a devlink port might have, before

> unregistering the devlink port itself, and how to re-register those port

> regions then, after the new devlink port was registered.
Florian Fainelli Sept. 7, 2021, 3:47 p.m. UTC | #10
On 9/7/2021 8:44 AM, Jakub Kicinski wrote:
> On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:

>> Again, fallback but not during devlink port register. The devlink port

>> was registered just fine, but our plans changed midway. If you want to

>> create a net device with an associated devlink port, first you need to

>> create the devlink port and then the net device, then you need to link

>> the two using devlink_port_type_eth_set, at least according to my

>> understanding.

>>

>> So the failure is during the creation of the **net device**, we now have a

>> devlink port which was originally intended to be of the Ethernet type

>> and have a physical flavour, but it will not be backed by any net device,

>> because the creation of that just failed. So the question is simply what

>> to do with that devlink port.

> 

> Is the failure you're referring to discovered inside the

> register_netdevice() call?


It is before, at the time we attempt to connect to the PHY device, prior 
to registering the netdev, we may fail that PHY connection, tearing down 
the entire switch because of that is highly undesirable.

Maybe we should re-order things a little bit and try to register devlink 
ports only after we successfully registered with the PHY/SFP and prior 
to registering the netdev?
-- 
Florian
Andrew Lunn Sept. 7, 2021, 4:43 p.m. UTC | #11
On Tue, Sep 07, 2021 at 08:47:35AM -0700, Florian Fainelli wrote:
> 

> 

> On 9/7/2021 8:44 AM, Jakub Kicinski wrote:

> > On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:

> > > Again, fallback but not during devlink port register. The devlink port

> > > was registered just fine, but our plans changed midway. If you want to

> > > create a net device with an associated devlink port, first you need to

> > > create the devlink port and then the net device, then you need to link

> > > the two using devlink_port_type_eth_set, at least according to my

> > > understanding.

> > > 

> > > So the failure is during the creation of the **net device**, we now have a

> > > devlink port which was originally intended to be of the Ethernet type

> > > and have a physical flavour, but it will not be backed by any net device,

> > > because the creation of that just failed. So the question is simply what

> > > to do with that devlink port.

> > 

> > Is the failure you're referring to discovered inside the

> > register_netdevice() call?

> 

> It is before, at the time we attempt to connect to the PHY device, prior to

> registering the netdev, we may fail that PHY connection, tearing down the

> entire switch because of that is highly undesirable.

> 

> Maybe we should re-order things a little bit and try to register devlink

> ports only after we successfully registered with the PHY/SFP and prior to

> registering the netdev?


Maybe, but it should not really matter. EPROBE_DEFER exists, and can
happen. The probe can fail for other reasons. All core code should be
cleanly undoable. Maybe we are pushing it a little by only wanting to
undo a single port, rather than the whole switch, but still, i would
make the core handle this, not rearrange the driver. It is not robust
otherwise.

     Andrew
Florian Fainelli Sept. 7, 2021, 4:49 p.m. UTC | #12
On 9/7/2021 9:43 AM, Andrew Lunn wrote:
> On Tue, Sep 07, 2021 at 08:47:35AM -0700, Florian Fainelli wrote:

>>

>>

>> On 9/7/2021 8:44 AM, Jakub Kicinski wrote:

>>> On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:

>>>> Again, fallback but not during devlink port register. The devlink port

>>>> was registered just fine, but our plans changed midway. If you want to

>>>> create a net device with an associated devlink port, first you need to

>>>> create the devlink port and then the net device, then you need to link

>>>> the two using devlink_port_type_eth_set, at least according to my

>>>> understanding.

>>>>

>>>> So the failure is during the creation of the **net device**, we now have a

>>>> devlink port which was originally intended to be of the Ethernet type

>>>> and have a physical flavour, but it will not be backed by any net device,

>>>> because the creation of that just failed. So the question is simply what

>>>> to do with that devlink port.

>>>

>>> Is the failure you're referring to discovered inside the

>>> register_netdevice() call?

>>

>> It is before, at the time we attempt to connect to the PHY device, prior to

>> registering the netdev, we may fail that PHY connection, tearing down the

>> entire switch because of that is highly undesirable.

>>

>> Maybe we should re-order things a little bit and try to register devlink

>> ports only after we successfully registered with the PHY/SFP and prior to

>> registering the netdev?

> 

> Maybe, but it should not really matter. EPROBE_DEFER exists, and can

> happen. The probe can fail for other reasons. All core code should be

> cleanly undoable. Maybe we are pushing it a little by only wanting to

> undo a single port, rather than the whole switch, but still, i would

> make the core handle this, not rearrange the driver. It is not robust

> otherwise.


Well yes, in case my comment was not clear, I was referring to the way 
that DSA register devlink ports, not how the mv88e6xxx driver does it. 
That is assuming that it is possible and there was not a reason for 
configuring the devlink ports ahead of the switch driver coming up.
-- 
Florian
Leon Romanovsky Sept. 7, 2021, 10:59 p.m. UTC | #13
On Tue, Sep 07, 2021 at 09:49:48AM -0700, Florian Fainelli wrote:
> 

> 

> On 9/7/2021 9:43 AM, Andrew Lunn wrote:

> > On Tue, Sep 07, 2021 at 08:47:35AM -0700, Florian Fainelli wrote:

> > > 

> > > 

> > > On 9/7/2021 8:44 AM, Jakub Kicinski wrote:

> > > > On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:

> > > > > Again, fallback but not during devlink port register. The devlink port

> > > > > was registered just fine, but our plans changed midway. If you want to

> > > > > create a net device with an associated devlink port, first you need to

> > > > > create the devlink port and then the net device, then you need to link

> > > > > the two using devlink_port_type_eth_set, at least according to my

> > > > > understanding.

> > > > > 

> > > > > So the failure is during the creation of the **net device**, we now have a

> > > > > devlink port which was originally intended to be of the Ethernet type

> > > > > and have a physical flavour, but it will not be backed by any net device,

> > > > > because the creation of that just failed. So the question is simply what

> > > > > to do with that devlink port.

> > > > 

> > > > Is the failure you're referring to discovered inside the

> > > > register_netdevice() call?

> > > 

> > > It is before, at the time we attempt to connect to the PHY device, prior to

> > > registering the netdev, we may fail that PHY connection, tearing down the

> > > entire switch because of that is highly undesirable.

> > > 

> > > Maybe we should re-order things a little bit and try to register devlink

> > > ports only after we successfully registered with the PHY/SFP and prior to

> > > registering the netdev?

> > 

> > Maybe, but it should not really matter. EPROBE_DEFER exists, and can

> > happen. The probe can fail for other reasons. All core code should be

> > cleanly undoable. Maybe we are pushing it a little by only wanting to

> > undo a single port, rather than the whole switch, but still, i would

> > make the core handle this, not rearrange the driver. It is not robust

> > otherwise.

> 

> Well yes, in case my comment was not clear, I was referring to the way that

> DSA register devlink ports, not how the mv88e6xxx driver does it. That is

> assuming that it is possible and there was not a reason for configuring the

> devlink ports ahead of the switch driver coming up.


This is the best arrangement. It is responsibility of the caller (DSA
core) to ensure that calls are valid prior to the devlink core.

Thanks

> -- 

> Florian
Vladimir Oltean Sept. 8, 2021, 8:21 p.m. UTC | #14
On Tue, Sep 07, 2021 at 09:49:48AM -0700, Florian Fainelli wrote:
> On 9/7/2021 9:43 AM, Andrew Lunn wrote:

> > On Tue, Sep 07, 2021 at 08:47:35AM -0700, Florian Fainelli wrote:

> > > 

> > > 

> > > On 9/7/2021 8:44 AM, Jakub Kicinski wrote:

> > > > On Sun, 5 Sep 2021 14:07:35 +0300 Vladimir Oltean wrote:

> > > > > Again, fallback but not during devlink port register. The devlink port

> > > > > was registered just fine, but our plans changed midway. If you want to

> > > > > create a net device with an associated devlink port, first you need to

> > > > > create the devlink port and then the net device, then you need to link

> > > > > the two using devlink_port_type_eth_set, at least according to my

> > > > > understanding.

> > > > > 

> > > > > So the failure is during the creation of the **net device**, we now have a

> > > > > devlink port which was originally intended to be of the Ethernet type

> > > > > and have a physical flavour, but it will not be backed by any net device,

> > > > > because the creation of that just failed. So the question is simply what

> > > > > to do with that devlink port.

> > > > 

> > > > Is the failure you're referring to discovered inside the

> > > > register_netdevice() call?

> > > 

> > > It is before, at the time we attempt to connect to the PHY device, prior to

> > > registering the netdev, we may fail that PHY connection, tearing down the

> > > entire switch because of that is highly undesirable.

> > > 

> > > Maybe we should re-order things a little bit and try to register devlink

> > > ports only after we successfully registered with the PHY/SFP and prior to

> > > registering the netdev?

> > 

> > Maybe, but it should not really matter. EPROBE_DEFER exists, and can

> > happen. The probe can fail for other reasons. All core code should be

> > cleanly undoable. Maybe we are pushing it a little by only wanting to

> > undo a single port, rather than the whole switch, but still, i would

> > make the core handle this, not rearrange the driver. It is not robust

> > otherwise.

> 

> Well yes, in case my comment was not clear, I was referring to the way that

> DSA register devlink ports, not how the mv88e6xxx driver does it. That is

> assuming that it is possible and there was not a reason for configuring the

> devlink ports ahead of the switch driver coming up.


There is a comment in dsa_switch_setup:

	/* Setup devlink port instances now, so that the switch
	 * setup() can register regions etc, against the ports
	 */

The fact of the matter is that in the current driver-facing API, there
is no better place to register devlink port regions than .setup: we have
no .port_setup and .port_teardown. This also forces us to register the
devlink ports earlier than we register the net devices.

In one of my previous replies to Leon I did indicate the introduction of
these two methods as a possibly less horrible way of packaging what we
have now in this patch as .port_reinit_as_unused, which is basically a
.port_teardown followed immediately by a .port_setup, as it would be
implemented by mv88e6xxx.

With the introduction of .port_setup and .port_teardown as an immediate
bug fix, reordering the devlink port registration vs the netdev connection
to the PHY could be done as further work, but it would sort of be a moot
point, since it would not solve any problem anymore.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c45ca2473743..76f580a12bac 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -6173,6 +6173,7 @@  static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.crosschip_lag_leave	= mv88e6xxx_crosschip_lag_leave,
 	.port_bridge_tx_fwd_offload = mv88e6xxx_bridge_tx_fwd_offload,
 	.port_bridge_tx_fwd_unoffload = mv88e6xxx_bridge_tx_fwd_unoffload,
+	.port_reinit_as_unused	= mv88e6xxx_port_reinit_as_unused,
 };
 
 static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
diff --git a/drivers/net/dsa/mv88e6xxx/devlink.c b/drivers/net/dsa/mv88e6xxx/devlink.c
index 0c0f5ea6680c..6c928b6af6d0 100644
--- a/drivers/net/dsa/mv88e6xxx/devlink.c
+++ b/drivers/net/dsa/mv88e6xxx/devlink.c
@@ -792,3 +792,19 @@  int mv88e6xxx_devlink_info_get(struct dsa_switch *ds,
 					      DEVLINK_INFO_VERSION_GENERIC_ASIC_ID,
 					      chip->info->name);
 }
+
+int mv88e6xxx_port_reinit_as_unused(struct dsa_switch *ds, int port)
+{
+	struct dsa_port *dp = dsa_to_port(ds, port);
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mv88e6xxx_teardown_devlink_regions_port(chip, port);
+	dsa_port_devlink_teardown(dp);
+	dp->type = DSA_PORT_TYPE_UNUSED;
+	err = dsa_port_devlink_setup(dp);
+	if (err)
+		return err;
+
+	return mv88e6xxx_setup_devlink_regions_port(ds, chip, port);
+}
diff --git a/drivers/net/dsa/mv88e6xxx/devlink.h b/drivers/net/dsa/mv88e6xxx/devlink.h
index 3d72db3dcf95..5a23e115f23f 100644
--- a/drivers/net/dsa/mv88e6xxx/devlink.h
+++ b/drivers/net/dsa/mv88e6xxx/devlink.h
@@ -14,6 +14,7 @@  int mv88e6xxx_devlink_param_set(struct dsa_switch *ds, u32 id,
 				struct devlink_param_gset_ctx *ctx);
 int mv88e6xxx_setup_devlink_regions(struct dsa_switch *ds);
 void mv88e6xxx_teardown_devlink_regions(struct dsa_switch *ds);
+int mv88e6xxx_port_reinit_as_unused(struct dsa_switch *ds, int port);
 
 int mv88e6xxx_devlink_info_get(struct dsa_switch *ds,
 			       struct devlink_info_req *req,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index f9a17145255a..046dbebbf647 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -846,6 +846,12 @@  struct dsa_switch_ops {
 						   enum devlink_sb_pool_type pool_type,
 						   u32 *p_cur, u32 *p_max);
 
+	/* Hook for drivers to tear down their port devlink regions when a
+	 * port failed to register and its devlink port must be torn down and
+	 * reinitialized by DSA as unused.
+	 */
+	int	(*port_reinit_as_unused)(struct dsa_switch *ds, int port);
+
 	/*
 	 * MTU change functionality. Switches can also adjust their MRU through
 	 * this method. By MTU, one understands the SDU (L2 payload) length.
@@ -961,6 +967,9 @@  static inline int dsa_devlink_port_to_port(struct devlink_port *port)
 	return port->index;
 }
 
+int dsa_port_devlink_setup(struct dsa_port *dp);
+void dsa_port_devlink_teardown(struct dsa_port *dp);
+
 struct dsa_switch_driver {
 	struct list_head	list;
 	const struct dsa_switch_ops *ops;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 1dc45e40f961..4d9e5fe5bbb7 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -440,6 +440,12 @@  dsa_devlink_port_region_create(struct dsa_switch *ds,
 {
 	struct dsa_port *dp = dsa_to_port(ds, port);
 
+	/* Make sure drivers provide the method for cleaning this up when the
+	 * port might need to be torn down at runtime.
+	 */
+	if (WARN_ON(!ds->ops->port_reinit_as_unused))
+		return NULL;
+
 	return devlink_port_region_create(&dp->devlink_port, ops,
 					  region_max_snapshots,
 					  region_size);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 1b2b25d7bd02..bc1da54fcf4c 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -488,7 +488,7 @@  static int dsa_port_setup(struct dsa_port *dp)
 	return 0;
 }
 
-static int dsa_port_devlink_setup(struct dsa_port *dp)
+int dsa_port_devlink_setup(struct dsa_port *dp)
 {
 	struct devlink_port *dlp = &dp->devlink_port;
 	struct dsa_switch_tree *dst = dp->ds->dst;
@@ -529,6 +529,7 @@  static int dsa_port_devlink_setup(struct dsa_port *dp)
 
 	return err;
 }
+EXPORT_SYMBOL_GPL(dsa_port_devlink_setup);
 
 static void dsa_port_teardown(struct dsa_port *dp)
 {
@@ -572,7 +573,7 @@  static void dsa_port_teardown(struct dsa_port *dp)
 	dp->setup = false;
 }
 
-static void dsa_port_devlink_teardown(struct dsa_port *dp)
+void dsa_port_devlink_teardown(struct dsa_port *dp)
 {
 	struct devlink_port *dlp = &dp->devlink_port;
 
@@ -580,6 +581,19 @@  static void dsa_port_devlink_teardown(struct dsa_port *dp)
 		devlink_port_unregister(dlp);
 	dp->devlink_port_setup = false;
 }
+EXPORT_SYMBOL_GPL(dsa_port_devlink_teardown);
+
+static int dsa_port_reinit_as_unused(struct dsa_port *dp)
+{
+	struct dsa_switch *ds = dp->ds;
+
+	if (ds->ops->port_reinit_as_unused)
+		return ds->ops->port_reinit_as_unused(ds, dp->index);
+
+	dsa_port_devlink_teardown(dp);
+	dp->type = DSA_PORT_TYPE_UNUSED;
+	return dsa_port_devlink_setup(dp);
+}
 
 static int dsa_devlink_info_get(struct devlink *dl,
 				struct devlink_info_req *req,
@@ -911,9 +925,7 @@  static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
 	list_for_each_entry(dp, &dst->ports, list) {
 		err = dsa_port_setup(dp);
 		if (err) {
-			dsa_port_devlink_teardown(dp);
-			dp->type = DSA_PORT_TYPE_UNUSED;
-			err = dsa_port_devlink_setup(dp);
+			err = dsa_port_reinit_as_unused(dp);
 			if (err)
 				goto teardown;
 			continue;