mbox series

[v2,00/11] fw_devlink improvements

Message ID 20230127001141.407071-1-saravanak@google.com
Headers show
Series fw_devlink improvements | expand

Message

Saravana Kannan Jan. 27, 2023, 12:11 a.m. UTC
This patch series improves fw_devlink in the following ways:

1. It no longer cares about a fwnode having a "compatible" property. It
   figures this our more dynamically. The only expectation is that
   fwnode that are converted to devices actually get probed by a driver
   for the dependencies to be enforced correctly.

2. Finer grained dependency tracking. fw_devlink will now create device
   links from the consumer to the actual resource's device (if it has one,
   Eg: gpio_device) instead of the parent supplier device. This improves
   things like async suspend/resume ordering, potentially remove the need
   for frameworks to create device links, more parallelized async probing,
   and better sync_state() tracking.

3. Handle hardware/software quirks where a child firmware node gets
   populated as a device before its parent firmware node AND actually
   supplies a non-optional resource to the parent firmware node's
   device.

4. Way more robust at cycle handling (see patch for the insane cases).

5. Stops depending on OF_POPULATED to figure out some corner cases.

6. Simplifies the work that needs to be done by the firmware specific
   code.

Sorry it took a while to roll in the fixes I gave in the v1 series
thread[1] into a v2 series.

Since I didn't make any additional changes on top of what I already gave
in the v1 thread and Dmitry is very eager to get this series going, I'm
sending it out without testing locally. I already tested these patches a
few months ago as part of the v1 series. So I don't expect any major
issues. I'll test them again on my end in the next few days and will
report here if I actually find anything wrong.

Tony, Naresh, Abel, Sudeep, Geert,

I got the following reviewed by's and tested by's a few months back, but
it's been 5 months since I sent out v1. So I wasn't sure if it was okay
to include them in the v2 commits. Let me know if you are okay with this
being included in the commits and/or if you want to test this series
again.

Reviewed-by: Tony Lindgren <tony@atomide.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Abel Vesa <abel.vesa@linaro.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Dmitry, Maxim(s), Miquel, Luca, Doug, Colin, Martin, Jean-Philippe,

I've Cc-ed you because I had pointed you to v1 of this series + the
patches in that thread at one point or another as a fix to some issue
you were facing. It'd appreciate it if you can test this series and
report any issues, or things it fixed and give Tested-bys.

In addition, if you can also apply a revert of this series[2] and delete
driver_deferred_probe_check_state() from your tree and see if you hit
any issues and report them, that'd be great too! I'm pretty sure some of
you will hit issues with that. I want to fix those next and then
revert[2].

Thanks,
Saravana

[1] - https://lore.kernel.org/lkml/20220810060040.321697-1-saravanak@google.com/
[2] - https://lore.kernel.org/lkml/20220819221616.2107893-1-saravanak@google.com/
[3] - https://lore.kernel.org/lkml/CAGETcx-JUV1nj8wBJrTPfyvM7=Mre5j_vkVmZojeiumUGG6QZQ@mail.gmail.com/

v1 -> v2:
- Fixed Patch 1 to handle a corner case discussed in [3].
- New patch 10 to handle "fsl,imx8mq-gpc" being initialized by 2 drivers.
- New patch 11 to add fw_devlink support for SCMI devices.

Cc: Abel Vesa <abel.vesa@linaro.org>
Cc: Alexander Stein <alexander.stein@ew.tq-group.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: John Stultz <jstultz@google.com>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Maxim Kiselev <bigunclemax@gmail.com>
Cc: Maxim Kochetkov <fido_max@inbox.ru>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Luca Weiss <luca.weiss@fairphone.com>
Cc: Colin Foster <colin.foster@in-advantage.com>
Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
Cc: Jean-Philippe Brucker <jpb@kernel.org>

Saravana Kannan (11):
  driver core: fw_devlink: Don't purge child fwnode's consumer links
  driver core: fw_devlink: Improve check for fwnode with no
    device/driver
  soc: renesas: Move away from using OF_POPULATED for fw_devlink
  gpiolib: Clear the gpio_device's fwnode initialized flag before adding
  driver core: fw_devlink: Add DL_FLAG_CYCLE support to device links
  driver core: fw_devlink: Allow marking a fwnode link as being part of
    a cycle
  driver core: fw_devlink: Consolidate device link flag computation
  driver core: fw_devlink: Make cycle detection more robust
  of: property: Simplify of_link_to_phandle()
  irqchip/irq-imx-gpcv2: Mark fwnode device as not initialized
  firmware: arm_scmi: Set fwnode for the scmi_device

 drivers/base/core.c             | 443 +++++++++++++++++++++-----------
 drivers/firmware/arm_scmi/bus.c |   2 +
 drivers/gpio/gpiolib.c          |   6 +
 drivers/irqchip/irq-imx-gpcv2.c |   1 +
 drivers/of/property.c           |  84 +-----
 drivers/soc/imx/gpcv2.c         |   1 +
 drivers/soc/renesas/rcar-sysc.c |   2 +-
 include/linux/device.h          |   1 +
 include/linux/fwnode.h          |  12 +-
 9 files changed, 332 insertions(+), 220 deletions(-)

Comments

Andy Shevchenko Jan. 27, 2023, 9:21 a.m. UTC | #1
On Thu, Jan 26, 2023 at 04:11:28PM -0800, Saravana Kannan wrote:
> When a device X is bound successfully to a driver, if it has a child
> firmware node Y that doesn't have a struct device created by then, we
> delete fwnode links where the child firmware node Y is the supplier. We
> did this to avoid blocking the consumers of the child firmware node Y
> from deferring probe indefinitely.
> 
> While that a step in the right direction, it's better to make the
> consumers of the child firmware node Y to be consumers of the device X
> because device X is probably implementing whatever functionality is
> represented by child firmware node Y. By doing this, we capture the
> device dependencies more accurately and ensure better
> probe/suspend/resume ordering.

...

>  static unsigned int defer_sync_state_count = 1;
>  static DEFINE_MUTEX(fwnode_link_lock);
>  static bool fw_devlink_is_permissive(void);
> +static void __fw_devlink_link_to_consumers(struct device *dev);
>  static bool fw_devlink_drv_reg_done;
>  static bool fw_devlink_best_effort;

I'm wondering if may avoid adding more forward declarations...

Perhaps it's a sign that devlink code should be split to its own
module?

...

> -int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
> +static int __fwnode_link_add(struct fwnode_handle *con,
> +			     struct fwnode_handle *sup)

I believe we tolerate a bit longer lines, so you may still have it on a single
line.

...

> +int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup)
> +{

> +	int ret = 0;

Redundant assignment.

> +	mutex_lock(&fwnode_link_lock);
> +	ret = __fwnode_link_add(con, sup);
> +	mutex_unlock(&fwnode_link_lock);
>  	return ret;
>  }

...

>  	if (dev->fwnode && dev->fwnode->dev == dev) {

You may have above something like


	fwnode = dev_fwnode(dev);
	if (fwnode && fwnode->dev == dev) {

>  		struct fwnode_handle *child;
>  		fwnode_links_purge_suppliers(dev->fwnode);
> +		mutex_lock(&fwnode_link_lock);
>  		fwnode_for_each_available_child_node(dev->fwnode, child)
> -			fw_devlink_purge_absent_suppliers(child);
> +			__fw_devlink_pickup_dangling_consumers(child,
> +							       dev->fwnode);

			__fw_devlink_pickup_dangling_consumers(child, fwnode);

> +		__fw_devlink_link_to_consumers(dev);
> +		mutex_unlock(&fwnode_link_lock);
>  	}
Andy Shevchenko Jan. 27, 2023, 9:29 a.m. UTC | #2
On Thu, Jan 26, 2023 at 04:11:32PM -0800, Saravana Kannan wrote:
> fw_devlink uses DL_FLAG_SYNC_STATE_ONLY device link flag for two
> purposes:
> 
> 1. To allow a parent device to proxy its child device's dependency on a
>    supplier so that the supplier doesn't get its sync_state() callback
>    before the child device/consumer can be added and probed. In this
>    usage scenario, we need to ignore cycles for ensure correctness of
>    sync_state() callbacks.
> 
> 2. When there are dependency cycles in firmware, we don't know which of
>    those dependencies are valid. So, we have to ignore them all wrt
>    probe ordering while still making sure the sync_state() callbacks
>    come correctly.
> 
> However, when detecting dependency cycles, there can be multiple
> dependency cycles between two devices that we need to detect. For
> example:
> 
> A -> B -> A and A -> C -> B -> A.
> 
> To detect multiple cycles correct, we need to be able to differentiate
> DL_FLAG_SYNC_STATE_ONLY device links used for (1) vs (2) above.
> 
> To allow this differentiation, add a DL_FLAG_CYCLE that can be use to
> mark use case (2). We can then use the DL_FLAG_CYCLE to decide which
> DL_FLAG_SYNC_STATE_ONLY device links to follow when looking for
> dependency cycles.

...

> +static inline bool device_link_flag_is_sync_state_only(u32 flags)
> +{
> +	return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE))
> +		== (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED);

Weird indentation, why not

	return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE)) ==
	       (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED);

?

> +}

...

>  			       DL_FLAG_AUTOREMOVE_SUPPLIER | \
>  			       DL_FLAG_AUTOPROBE_CONSUMER  | \
>  			       DL_FLAG_SYNC_STATE_ONLY | \
> -			       DL_FLAG_INFERRED)
> +			       DL_FLAG_INFERRED | \
> +			       DL_FLAG_CYCLE)

You can make less churn by squeezing the new one above the last one.
Geert Uytterhoeven Jan. 27, 2023, 9:30 a.m. UTC | #3
Hi Andy,

On Fri, Jan 27, 2023 at 10:25 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Thu, Jan 26, 2023 at 04:11:30PM -0800, Saravana Kannan wrote:
> > The OF_POPULATED flag was set to let fw_devlink know that the device
> > tree node will not have a struct device created for it. This information
> > is used by fw_devlink to avoid deferring the probe of consumers of this
> > device tree node.
> >
> > Let's use fwnode_dev_initialized() instead because it achieves the same
> > effect without using OF specific flags. This allows more generic code to
> > be written in driver core.
>
> ...
>
> > -             of_node_set_flag(np, OF_POPULATED);
> > +             fwnode_dev_initialized(&np->fwnode, true);
>
> of_fwnode_handle(np) ?

Or of_node_to_fwnode(). Looks like we have (at least) two of them...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Andy Shevchenko Jan. 27, 2023, 9:44 a.m. UTC | #4
On Fri, Jan 27, 2023 at 10:30:35AM +0100, Geert Uytterhoeven wrote:
> On Fri, Jan 27, 2023 at 10:25 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Thu, Jan 26, 2023 at 04:11:30PM -0800, Saravana Kannan wrote:

...

> > > -             of_node_set_flag(np, OF_POPULATED);
> > > +             fwnode_dev_initialized(&np->fwnode, true);
> >
> > of_fwnode_handle(np) ?
> 
> Or of_node_to_fwnode().

Not really.

> Looks like we have (at least) two of them...

Yes, and the latter one is IRQ subsystem invention. Should gone in favour of
the generic helper.
Colin Foster Jan. 27, 2023, 8:30 p.m. UTC | #5
On Thu, Jan 26, 2023 at 04:11:27PM -0800, Saravana Kannan wrote:
> Dmitry, Maxim(s), Miquel, Luca, Doug, Colin, Martin, Jean-Philippe,
> 
> I've Cc-ed you because I had pointed you to v1 of this series + the
> patches in that thread at one point or another as a fix to some issue
> you were facing. It'd appreciate it if you can test this series and
> report any issues, or things it fixed and give Tested-bys.

I applied this on my working net-next/main development branch and can
confirm I am able to successfully boot the Beaglebone Black.

Tested-by: Colin Foster <colin.foster@in-advantage.com>
Saravana Kannan Jan. 28, 2023, 7:18 a.m. UTC | #6
On Fri, Jan 27, 2023 at 12:11 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, Jan 27, 2023 at 1:11 AM Saravana Kannan <saravanak@google.com> wrote:
> > The OF_POPULATED flag was set to let fw_devlink know that the device
> > tree node will not have a struct device created for it. This information
> > is used by fw_devlink to avoid deferring the probe of consumers of this
> > device tree node.
> >
> > Let's use fwnode_dev_initialized() instead because it achieves the same
> > effect without using OF specific flags. This allows more generic code to
> > be written in driver core.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch!
>
> > --- a/drivers/soc/renesas/rcar-sysc.c
> > +++ b/drivers/soc/renesas/rcar-sysc.c
> > @@ -437,7 +437,7 @@ static int __init rcar_sysc_pd_init(void)
> >
> >         error = of_genpd_add_provider_onecell(np, &domains->onecell_data);
> >         if (!error)
> > -               of_node_set_flag(np, OF_POPULATED);
> > +               fwnode_dev_initialized(&np->fwnode, true);
>
> As drivers/soc/renesas/rmobile-sysc.c is already using this method,
> it should work fine.
>
> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> i.e. will queue in renesas-devel for v6.4.

Thanks! Does that mean I should drop this from this series? If two
maintainers pick the same patch up, will it cause problems? I'm
eventually expecting this series to be picked up by Greg into
driver-core-next.

-Saravana

>
> >
> >  out_put:
> >         of_node_put(np);
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Saravana Kannan Jan. 28, 2023, 7:34 a.m. UTC | #7
On Fri, Jan 27, 2023 at 1:33 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Thu, Jan 26, 2023 at 04:11:33PM -0800, Saravana Kannan wrote:
> > To improve detection and handling of dependency cycles, we need to be
> > able to mark fwnode links as being part of cycles. fwnode links marked
> > as being part of a cycle should not block their consumers from probing.
>
> ...
>
> > +     list_for_each_entry(link, &fwnode->suppliers, c_hook) {
> > +             if (link->flags & FWLINK_FLAG_CYCLE)
> > +                     continue;
> > +             return link->supplier;
>
> Hmm...

Thanks!

>
>                 if (!(link->flags & FWLINK_FLAG_CYCLE))
>                         return link->supplier;
>
> ?
>
> > +     }
> > +
> > +     return NULL;
>
> ...
>
> > -     if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
> > -         !fw_devlink_is_permissive()) {
> > -             sup_fw = list_first_entry(&dev->fwnode->suppliers,
> > -                                       struct fwnode_link,
> > -                                       c_hook)->supplier;
> > +     sup_fw = fwnode_links_check_suppliers(dev->fwnode);
>
> dev_fwnode() ?
>
> ...
>
> > -     val = !list_empty(&dev->fwnode->suppliers);
> > +     mutex_lock(&fwnode_link_lock);
> > +     val = !!fwnode_links_check_suppliers(dev->fwnode);
>
> Ditto?

Similar response as Patch 1 and Patch 4.


-Saravana
Maksim Kiselev Feb. 2, 2023, 5:36 p.m. UTC | #8
Hi Saravana,

> Can you try the patch at the end of this email under these
> configurations and tell me which ones fail vs pass? I don't need logs

I did these tests and here is the results:

1. On top of this series - Not works
2. Without this series    - Works
3. On top of the series with the fwnode_dev_initialized() deleted - Not works
4. Without this series, with the fwnode_dev_initialized() deleted  - Works

So your nvmem/core.c patch helps only when it is applied without the series.
But despite the fact that this helps to avoid getting stuck at probing
my ethernet device, there is still regression.

When the ethernet module is loaded it takes a lot of time to drop dependency
from the nvmem-cell with mac address.

Please look at the kernel logs below.

The first log corresponds to kernel with your nvmem/core.c patch:

    [    0.036462] ethernet@70000 Linked as a fwnode consumer to
clock-gating-control@1821c
    [    0.036572] ethernet@70000 Linked as a fwnode consumer to partition@1
    [    0.045596] device: 'f1070000.ethernet': device_add
    [    0.045854] ethernet@70000 Dropping the fwnode link to
clock-gating-control@1821c
    [    0.114990] device:
'platform:f1010600.spi:m25p80@0:partitions:partition@1--platform:f1070000.ethernet':
device_add
    [    0.115266] devices_kset: Moving f1070000.ethernet to end of list
    [    0.115308] platform f1070000.ethernet: Linked as a consumer to
f1010600.spi:m25p80@0:partitions:partition@1
    [    0.115345] ethernet@70000 Dropping the fwnode link to partition@1
    [    1.968232] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    2.088696] devices_kset: Moving f1070000.ethernet to end of list
    [    2.088988] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    2.152411] devices_kset: Moving f1070000.ethernet to end of list
    [    2.152735] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    2.153870] devices_kset: Moving f1070000.ethernet to end of list
    [    2.154152] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    2.644950] devices_kset: Moving f1070000.ethernet to end of list
    [    2.645282] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    3.169218] devices_kset: Moving f1070000.ethernet to end of list
    [    3.169506] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    3.170444] devices_kset: Moving f1070000.ethernet to end of list
    [    3.170721] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    3.419068] devices_kset: Moving f1070000.ethernet to end of list
    [    3.419359] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    3.521275] devices_kset: Moving f1070000.ethernet to end of list
    [    3.521564] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [    3.639196] devices_kset: Moving f1070000.ethernet to end of list
    [    3.639532] platform f1070000.ethernet: error -EPROBE_DEFER:
supplier f1010600.spi:m25p80@0:partitions:partition@1 not ready
    [   13.960144] platform f1070000.ethernet: Relaxing link with
f1010600.spi:m25p80@0:partitions:partition@1
    [   13.960260] devices_kset: Moving f1070000.ethernet to end of list
    [   13.971735] device: 'eth0': device_add
    [   13.974140] mvneta f1070000.ethernet eth0: Using device tree
mac address de:fa:ce:db:ab:e1
    [   13.974275] mvneta f1070000.ethernet: Dropping the link to
f1010600.spi:m25p80@0:partitions:partition@1
    [   13.974318] device:
'platform:f1010600.spi:m25p80@0:partitions:partition@1--platform:f1070000.ethernet':
device_unregister

It took around 13 seconds to obtain a mac from nvmem-cell and bring up
f1070000.ethernet


And here is the second log which corresponds to kernel without your
nvmem/core.c patch but also with reverted change 'bcdf0315':

    [    0.036285] ethernet@70000 Linked as a fwnode consumer to
clock-gating-control@1821c
    [    0.036395] ethernet@70000 Linked as a fwnode consumer to partition@1
    [    0.045416] device: 'f1070000.ethernet': device_add
    [    0.045674] ethernet@70000 Dropping the fwnode link to
clock-gating-control@1821c
    [    0.116136] ethernet@70000 Dropping the fwnode link to partition@1
    [    1.977060] device: 'eth0': device_add
    [    1.979145] mvneta f1070000.ethernet eth0: Using device tree
mac address de:fa:ce:db:ab:e1

It took around 1.5 second to obtain a mac from nvmem-cell

P.S. Your nvmem patch definitely helps to avoid a device probe stuck
but look like it is not best way to solve a problem which we discussed
in the MTD thread.

P.P.S. Also I don't know why your nvmem-cell patch doesn't help when it was
applied on top of this series. Maybe I missed something.
Miquel Raynal Feb. 24, 2023, 2:46 p.m. UTC | #9
Hi Saravana,

> > > > > So, can you please retest config 1 with all pr_debug and dev_dbg in
> > > > > drivers/core/base.c changed to the _info variants? And then share the
> > > > > kernel log from the beginning of boot? Maybe attach it to the email so
> > > > > it doesn't get word wrapped by your email client. And please point me
> > > > > to the .dts that corresponds to your board. Without that, I can't
> > > > > debug much.  
> > > >
> > > > Ok, I retested config 1 with all _debug logs changed to the _info. I
> > > > added the kernel log and the dts file to the attachment of this email.  
> > >
> > > Ah, so your device is not supported/present upstream? Even though it's
> > > not upstream, I'll help fix this because it should fix what I believe
> > > are unreported issues in upstream.
> > >
> > > Ok I know why configs 1 - 4 behaved the way they did and why my test
> > > patch didn't help.
> > >
> > > After staring at mtd/nvmem code for a few hours I think mtd/nvmem
> > > interaction is kind of a mess.  
> >
> > nvmem is a recent subsystem but mtd carries a lot of legacy stuff we
> > cannot really re-wire without breaking users, so nvmem on top of mtd
> > of course inherit from the fragile designs in place.  
> 
> Thanks for the context. Yeah, I figured. That's why I explicitly
> limited my comment to "interaction". Although, I'd love to see the MTD
> parsers all be converted to proper drivers that probe. MTD is
> essentially repeating the driver matching logic. I think it can be
> cleaned up to move to proper drivers and still not break backward
> compatibility. Not saying it'll be trivial, but it should be possible.
> Ironically MTD uses mtd_class but has real drivers that work on the
> device (compared to nvmem_bus below).
> 
> > > mtd core creates "partition" platform
> > > devices (including for nvmem-cells) that are probed by drivers in
> > > drivers/nvmem. However, there's no driver for "nvmem-cells" partition
> > > platform device. However, the nvmem core creates nvmem_device when
> > > nvmem_register() is called by MTD or these partition platform devices
> > > created by MTD. But these nvmem_devices are added to a nvmem_bus but
> > > the bus has no means to even register a driver (it should really be a
> > > nvmem_class and not nvmem_bus).  
> >
> > Srinivas, do you think we could change this?  
> 
> Yeah, this part gets a bit tricky. It depends on whether the sysfs
> files for nvmem devices is considered an ABI. Changing from bus to
> class would change the sysfs path for nvmem devices from:
> /sys/class/nvmem to /sys/bus/nvmem

Ok, so this is a no :)

> > > And the nvmem_device sometimes points
> > > to the DT node of the MTD device or sometimes the partition platform
> > > devices or maybe no DT node at all.  
> >
> > I guess this comes from the fact that this is not strongly defined in
> > mtd and depends on the situation (not mentioning 20 years of history
> > there as well). "mtd" is a bit inconsistent on what it means. Older
> > designs mixed: controllers, ECC engines when relevant and memories;
> > while these three components are completely separated. Hence
> > sometimes the mtd device ends up being the top level controller,
> > sometimes it's just one partition...
> >
> > But I'm surprised not all of them point to a DT node. Could you show us
> > an example? Because that might likely be unexpected (or perhaps I am
> > missing something).  
> 
> Well, the logic that sets the DT node for nvmem_device is like so:
> 
>         if (config->of_node)
>                 nvmem->dev.of_node = config->of_node;
>         else if (!config->no_of_node)
>                 nvmem->dev.of_node = config->dev->of_node;
> 
> So there's definitely a path (where both if's could be false) where
> the DT node will not get set. I don't know if that path is possible
> with the existing users of nvmem_register(), but it's definitely
> possible.

It's an actual path. I just checked more in details, this is the change
from 2018 which uses the no_of_node flag:
c4dfa25ab307 ("mtd: add support for reading MTD devices via the nvmem API")

It basically allows any mtd device to be accessible (read-only) through
nvmem. So mtd partitions or such which are not described in the DT may
just be accessed through nvmem (that is my current understanding).

There was later a patch in 2021 which prevented this flag to be
automatically set, so that if partitions (well, mtd devices in general)
were described in the DT, they would provide a valid of_node in order
to be used as cell providers (again, my understanding):
658c4448bbbf ("mtd: core: add nvmem-cells compatible to parse mtd as nvmem cells")

But I guess the major problem comes from the nvmem-cell compatible. I
am wondering if it would make sense to kind of transpose the meaning of
this compatible into a property. But, well, backward compatibility
would still be a problem I guess...

> > > So it's a mess of multiple devices pointing to the same DT node with
> > > no clear way to identify which ones will point to a DT node and which
> > > ones will probe and which ones won't. In the future, we shouldn't
> > > allow adding new compatible strings for partitions for which we don't
> > > plan on adding nvmem drivers.
> > >
> > > Can you give the patch at the end of the email a shot? It should fix
> > > the issue with this series and without this series. It just avoids
> > > this whole mess by not creating useless platform device for
> > > nvmem-cells compatible DT nodes.  
> >
> > Thanks a lot for your help.  
> 
> No problem. I want fw_devlink to work for everyone.
> 

Thanks,
Miquèl