Message ID | 20220513233640.2518337-1-vladimir.oltean@nxp.com |
---|---|
Headers | show |
Series | Make phylink and DSA wait for PHY driver that defers probe | expand |
On Sat, May 14, 2022 at 02:36:38AM +0300, Vladimir Oltean wrote: > This patch set completes the picture described by > '[RFC,devicetree] of: property: mark "interrupts" as optional for fw_devlink' > https://patchwork.kernel.org/project/netdevbpf/patch/20220513201243.2381133-1-vladimir.oltean@nxp.com/ > > I've CCed non-networking maintainers just in case they want to gain a > better understanding. If not, apologies and please ignore the rest. > > > > My use case is to migrate a PHY driver from poll mode to interrupt mode > without breaking compatibility between new device trees and old kernels > which did not have a driver for that IRQ parent, and therefore (for > things to work) did not even have that interrupt listed in the "vintage > correct" DT blobs. Note that current kernels as of today are also > "old kernels" in this description. > > Creating some degree of compatibility has multiple components. > > 1. A PHY driver must eventually give up waiting for an IRQ provider, > since the dependency is optional and it can fall back to poll mode. > This is currently supported thanks to commit 74befa447e68 ("net: > mdio: don't defer probe forever if PHY IRQ provider is missing"). > > 2. Before it finally gives up, the PHY driver has a transient phase of > returning -EPROBE_DEFER. That transient phase causes some breakage > which is handled by this patch set, details below. > > 3. PHY device probing and Ethernet controller finding it and connecting > to it are async events. When both happen during probing, the problem > is that finding the PHY fails if the PHY defers probe, which results > in a missing PHY rather than waiting for it. Unfortunately there is > no universal way to address this problem, because the majority of > Ethernet drivers do not connect to the PHY during probe. So the > problem is fixed only for the driver that is of interest to me in > this context, DSA, and with special API exported by phylink > specifically for this purpose, to limit the impact on other drivers. There is a very different approach, which might be simpler. We know polling will always work. And it should be possible to transition between polling and interrupt at any point, so long as the phylock is held. So if you get -EPROBE_DEFFER during probe, mark some state in phydev that there should be an irq, but it is not around yet. When the phy is started, and phylib starts polling, look for the state and try getting the IRQ again. If successful, swap to interrupts, if not, keep polling. Maybe after 60 seconds of polling and trying, give up trying to find the irq and stick with polling. Andrew
On Fri, May 13, 2022 at 4:37 PM Vladimir Oltean <vladimir.oltean@nxp.com> wrote: > > This patch set completes the picture described by > '[RFC,devicetree] of: property: mark "interrupts" as optional for fw_devlink' > https://patchwork.kernel.org/project/netdevbpf/patch/20220513201243.2381133-1-vladimir.oltean@nxp.com/ I replied to that patch. I don't think we can pull that in. > I've CCed non-networking maintainers just in case they want to gain a > better understanding. If not, apologies and please ignore the rest. > > > > My use case is to migrate a PHY driver from poll mode to interrupt mode > without breaking compatibility between new device trees and old kernels > which did not have a driver for that IRQ parent, and therefore (for > things to work) did not even have that interrupt listed in the "vintage > correct" DT blobs. Note that current kernels as of today are also > "old kernels" in this description. > > Creating some degree of compatibility has multiple components. > > 1. A PHY driver must eventually give up waiting for an IRQ provider, > since the dependency is optional and it can fall back to poll mode. > This is currently supported thanks to commit 74befa447e68 ("net: > mdio: don't defer probe forever if PHY IRQ provider is missing"). > > 2. Before it finally gives up, the PHY driver has a transient phase of > returning -EPROBE_DEFER. That transient phase causes some breakage > which is handled by this patch set, details below. > > 3. PHY device probing and Ethernet controller finding it and connecting > to it are async events. When both happen during probing, the problem > is that finding the PHY fails if the PHY defers probe, which results > in a missing PHY rather than waiting for it. Unfortunately there is > no universal way to address this problem, because the majority of > Ethernet drivers do not connect to the PHY during probe. So the > problem is fixed only for the driver that is of interest to me in > this context, DSA, and with special API exported by phylink > specifically for this purpose, to limit the impact on other drivers. I'll take a closer look at this later this week, but once we add phy-handle support to fw_devlink (the device_bind_driver() is making it hard to add support), I think we can address most/all of these problems automatically. So hopefully we can work towards that? Actually this patch might already fix this for you: https://lore.kernel.org/lkml/20220429220933.1350374-1-saravanak@google.com/ Before fw_devlink, we'd give up on waiting on all suppliers, whether they had a driver (but hadn't yet probed for a multitude of reasons) or not. fw_devlink is smart about allowing consumers to probe without their suppliers only if the supplier has no driver or the driver fails (I'll send a patch for this). The deferred_probe_timeout is what's used to decide when to give up waiting for drivers. -Saravana > > Note that drivers that connect to the PHY at ndo_open are superficially > "fixed" by the patch at step 1 alone, and therefore don't need the > mechanism introduced in phylink here. This is because of the larger span > of time between PHY probe and opening the network interface (typically > initiated by user space). But this is the catch, nfsroot and other > in-kernel networking users can also open the net device, and this will > still expose the EPROBE_DEFER as a hard error for this second kind of > drivers. I don't know how to fix that. From this POV, it's better to do > what DSA does (connect to the PHY on probe). > > Vladimir Oltean (2): > net: phylink: allow PHY driver to defer probe when connecting via OF > node > net: dsa: wait for PHY to defer probe > > drivers/net/phy/phylink.c | 73 ++++++++++++++++++++++++++++++--------- > include/linux/phylink.h | 2 ++ > net/dsa/dsa2.c | 2 ++ > net/dsa/port.c | 6 ++-- > net/dsa/slave.c | 10 +++--- > 5 files changed, 70 insertions(+), 23 deletions(-) > > -- > 2.25.1 >