diff mbox series

[v2,1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

Message ID 20220601070707.3946847-2-saravanak@google.com
State Accepted
Commit 5a46079a96451cfb15e4f5f01f73f7ba24ef851a
Headers show
Series deferred_probe_timeout logic clean up | expand

Commit Message

Saravana Kannan June 1, 2022, 7:06 a.m. UTC
Now that fw_devlink=on by default and fw_devlink supports
"power-domains" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Saravana Kannan June 9, 2022, 7:29 p.m. UTC | #1
On Thu, Jun 9, 2022 at 4:45 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 1 Jun 2022 at 09:07, Saravana Kannan <saravanak@google.com> wrote:
> >
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
>
> With fw_devlink=on by default - does that mean that the parameter
> can't be changed?
>
> Or perhaps the point is that we don't want to go back, but rather drop
> the fw_devlink parameter altogether when moving forward?

Good question. For now, keeping fw_devlink=off and
fw_devlink=permissive as debugging options that I can ask people to
try if some probe is getting blocked.

Or maybe if some ultra low memory use case wants to avoid create
device links, fwnode links, etc and can build everything in and have
init/probe happen in the right order.

But in the long run, I see a strong possibility for
fw_devlink=off/permissive being removed. I'd still want to keep it for
implementing =rpm where it'd also automatically enable PM runtime
tracking, but I don't understand that well enough yet to do it by
default.

> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Just a minor nitpick below. Nevertheless, feel free to add:
>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Thanks!

>
> > ---
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >                 mutex_unlock(&gpd_list_lock);
> >                 dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >                         __func__, PTR_ERR(pd));
> > -               return driver_deferred_probe_check_state(base_dev);
>
> Adding a brief comment about why -EPROBE_DEFER doesn't make sense
> here, would be nice.

Will do once all the reviews comeout/when I send v3.

I'm thinking something like:
/* fw_devlink will take care of retrying for missing suppliers */

-Saravana

>
> > +               return -ENODEV;
> >         }
> >
> >         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
>
> Kind regards
> Uffe
Tony Lindgren June 21, 2022, 7:28 a.m. UTC | #2
Hi,

* Saravana Kannan <saravanak@google.com> [700101 02:00]:
> Now that fw_devlink=on by default and fw_devlink supports
> "power-domains" property, the execution will never get to the point
> where driver_deferred_probe_check_state() is called before the supplier
> has probed successfully or before deferred probe timeout has expired.
> 
> So, delete the call and replace it with -ENODEV.

Looks like this causes omaps to not boot in Linux next. With this
simple-pm-bus fails to probe initially as the power-domain is not
yet available. On platform_probe() genpd_get_from_provider() returns
-ENOENT.

Seems like other stuff is potentially broken too, any ideas on
how to fix this?

Regards,

Tony



> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/domain.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 739e52cd4aba..3e86772d5fac 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>  		mutex_unlock(&gpd_list_lock);
>  		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
>  			__func__, PTR_ERR(pd));
> -		return driver_deferred_probe_check_state(base_dev);
> +		return -ENODEV;
>  	}
>  
>  	dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> -- 
> 2.36.1.255.ge46751e96f-goog
>
Saravana Kannan June 21, 2022, 7:34 p.m. UTC | #3
On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
>
> Hi,
>
> * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
>
> Looks like this causes omaps to not boot in Linux next.

Can you please point me to an example DTS I could use for debugging
this? I'm assuming you are leaving fw_devlink=on and not turning it
off or putting it in permissive mode.

> With this
> simple-pm-bus fails to probe initially as the power-domain is not
> yet available.

Before we get to late_initcall(), I'd expect this series to not have
any impact because both fw_devlink and
driver_deferred_probe_check_state() should be causing the device's
probe to get deferred until the PM domain device comes up.

To double check this, without this series, can you give me the list of
"supplier:*" symlinks under a simple-pm-bus device's sysfs folder
that's having problems with this series? And for all those symlinks,
cat the "status" file under that directory?

In the system where this is failing, is the PM domain driver loaded as
a module at a later point?

Couple of other things to try (with the patches) to narrow this down:
* Can you set driver_probe_timeout=0 in the command line and see if that helps?
* Can you set it to something high like 30 or even larger and see if it helps?

> On platform_probe() genpd_get_from_provider() returns
> -ENOENT.

This error is with the series I assume?

> Seems like other stuff is potentially broken too, any ideas on
> how to fix this?

I'll want to understand the issue first. It's not yet clear to me why
fw_devlink isn't blocking the probe of the simple-pm-bus device until
the PM domain device shows up. And if it is not blocking, then why and
at what point in boot it's giving up and letting the probe get to this
point where there's an error.

-Saravana

>
>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >               mutex_unlock(&gpd_list_lock);
> >               dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >                       __func__, PTR_ERR(pd));
> > -             return driver_deferred_probe_check_state(base_dev);
> > +             return -ENODEV;
> >       }
> >
> >       dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Tony Lindgren June 22, 2022, 4:58 a.m. UTC | #4
Hi,

* Saravana Kannan <saravanak@google.com> [220621 19:29]:
> On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > Hi,
> >
> > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > Now that fw_devlink=on by default and fw_devlink supports
> > > "power-domains" property, the execution will never get to the point
> > > where driver_deferred_probe_check_state() is called before the supplier
> > > has probed successfully or before deferred probe timeout has expired.
> > >
> > > So, delete the call and replace it with -ENODEV.
> >
> > Looks like this causes omaps to not boot in Linux next.
> 
> Can you please point me to an example DTS I could use for debugging
> this? I'm assuming you are leaving fw_devlink=on and not turning it
> off or putting it in permissive mode.

Sure, this seems to happen at least with simple-pm-bus as the top
level interconnect with a configured power-domains property:

$ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus

This issue is no directly related fw_devlink. It is a side effect of
removing driver_deferred_probe_check_state(). We no longer return
-EPROBE_DEFER at the end of driver_deferred_probe_check_state().

> > On platform_probe() genpd_get_from_provider() returns
> > -ENOENT.
> 
> This error is with the series I assume?

On the first probe genpd_get_from_provider() will return -ENOENT in
both cases. The list is empty on the first probe and there are no
genpd providers at this point.

Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
ends up getting changed to -EPROBE_DEFER at the end of
driver_deferred_probe_check_state(), we are now missing that.

Regards,

Tony
Saravana Kannan June 22, 2022, 7:09 p.m. UTC | #5
On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
>
> Hi,
>
> * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > >
> > > > So, delete the call and replace it with -ENODEV.
> > >
> > > Looks like this causes omaps to not boot in Linux next.
> >
> > Can you please point me to an example DTS I could use for debugging
> > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > off or putting it in permissive mode.
>
> Sure, this seems to happen at least with simple-pm-bus as the top
> level interconnect with a configured power-domains property:
>
> $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus

Thanks for the example. I generally start looking from dts (not dtsi)
files in case there are some DT property override/additions after the
dtsi files are included in the dts file. But I'll assume for now
that's not the case. If there's a specific dts file for a board I can
look from that'd be helpful to rule out those kinds of issues.

For now, I looked at arch/arm/boot/dts/omap4.dtsi.

>
> This issue is no directly related fw_devlink. It is a side effect of
> removing driver_deferred_probe_check_state(). We no longer return
> -EPROBE_DEFER at the end of driver_deferred_probe_check_state().

Yes, I understand the issue. But driver_deferred_probe_check_state()
was deleted because fw_devlink=on should have short circuited the
probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
probe function and hitting this -ENOENT failure. That's why I was
asking the other questions.

> > > On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> >
> > This error is with the series I assume?
>
> On the first probe genpd_get_from_provider() will return -ENOENT in
> both cases. The list is empty on the first probe and there are no
> genpd providers at this point.
>
> Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> ends up getting changed to -EPROBE_DEFER at the end of
> driver_deferred_probe_check_state(), we are now missing that.

Right, I was aware -ENOENT would be returned if we got this far. But
the point of this series is that you shouldn't have gotten that far
before your pm domain device is ready. Hence my questions from the
earlier reply.

Can I get answers to rest of my questions in the first reply please?
That should help us figure out why fw_devlink let us get this far.
Summarize them here to make it easy:
* Are you running with fw_devlink=on?
* Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
* If it's not built-in, can you please try deferred_probe_timeout=0
and deferred_probe_timeout=30 and see if either one of them help?
* Can I get the output of "ls -d supplier:*" and "cat
supplier:*/status" output from the sysfs dir for the ocp device
without this series where it boots properly.

Thanks,
Saravana
Tony Lindgren June 23, 2022, 7:01 a.m. UTC | #6
* Saravana Kannan <saravanak@google.com> [220622 19:05]:
> On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> >
> > Hi,
> >
> > * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > "power-domains" property, the execution will never get to the point
> > > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > > has probed successfully or before deferred probe timeout has expired.
> > > > >
> > > > > So, delete the call and replace it with -ENODEV.
> > > >
> > > > Looks like this causes omaps to not boot in Linux next.
> > >
> > > Can you please point me to an example DTS I could use for debugging
> > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > off or putting it in permissive mode.
> >
> > Sure, this seems to happen at least with simple-pm-bus as the top
> > level interconnect with a configured power-domains property:
> >
> > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus
> 
> Thanks for the example. I generally start looking from dts (not dtsi)
> files in case there are some DT property override/additions after the
> dtsi files are included in the dts file. But I'll assume for now
> that's not the case. If there's a specific dts file for a board I can
> look from that'd be helpful to rule out those kinds of issues.
> 
> For now, I looked at arch/arm/boot/dts/omap4.dtsi.

OK it should be very similar for all the affected SoCs.

> > This issue is no directly related fw_devlink. It is a side effect of
> > removing driver_deferred_probe_check_state(). We no longer return
> > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> 
> Yes, I understand the issue. But driver_deferred_probe_check_state()
> was deleted because fw_devlink=on should have short circuited the
> probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> probe function and hitting this -ENOENT failure. That's why I was
> asking the other questions.

OK. So where is the -EPROBE_DEFER supposed to happen without
driver_deferred_probe_check_state() then?

> > > > On platform_probe() genpd_get_from_provider() returns
> > > > -ENOENT.
> > >
> > > This error is with the series I assume?
> >
> > On the first probe genpd_get_from_provider() will return -ENOENT in
> > both cases. The list is empty on the first probe and there are no
> > genpd providers at this point.
> >
> > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > ends up getting changed to -EPROBE_DEFER at the end of
> > driver_deferred_probe_check_state(), we are now missing that.
> 
> Right, I was aware -ENOENT would be returned if we got this far. But
> the point of this series is that you shouldn't have gotten that far
> before your pm domain device is ready. Hence my questions from the
> earlier reply.

OK

> Can I get answers to rest of my questions in the first reply please?
> That should help us figure out why fw_devlink let us get this far.
> Summarize them here to make it easy:
> * Are you running with fw_devlink=on?

Yes with the default with no specific kernel params so looks like
FW_DEVLINK_FLAGS_ON.

> * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?

Yes

> * If it's not built-in, can you please try deferred_probe_timeout=0
> and deferred_probe_timeout=30 and see if either one of them help?

It's built in so I did not try these.

> * Can I get the output of "ls -d supplier:*" and "cat
> supplier:*/status" output from the sysfs dir for the ocp device
> without this series where it boots properly.

Hmm so I'm not seeing any supplier for the top level ocp device in
the booting case without your patches. I see the suppliers for the
ocp child device instances only.

Without your patches I see simple-pm-bus probe initially with
EPROBE_DEFER like I described earlier, and then simple-pm-bus probes
on the second try.

Regards,

Tony
Saravana Kannan June 23, 2022, 8:21 a.m. UTC | #7
On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > "power-domains" property, the execution will never get to the point
> > > > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > > > has probed successfully or before deferred probe timeout has expired.
> > > > > >
> > > > > > So, delete the call and replace it with -ENODEV.
> > > > >
> > > > > Looks like this causes omaps to not boot in Linux next.
> > > >
> > > > Can you please point me to an example DTS I could use for debugging
> > > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > > off or putting it in permissive mode.
> > >
> > > Sure, this seems to happen at least with simple-pm-bus as the top
> > > level interconnect with a configured power-domains property:
> > >
> > > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus
> >
> > Thanks for the example. I generally start looking from dts (not dtsi)
> > files in case there are some DT property override/additions after the
> > dtsi files are included in the dts file. But I'll assume for now
> > that's not the case. If there's a specific dts file for a board I can
> > look from that'd be helpful to rule out those kinds of issues.
> >
> > For now, I looked at arch/arm/boot/dts/omap4.dtsi.
>
> OK it should be very similar for all the affected SoCs.
>
> > > This issue is no directly related fw_devlink. It is a side effect of
> > > removing driver_deferred_probe_check_state(). We no longer return
> > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> >
> > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > was deleted because fw_devlink=on should have short circuited the
> > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > probe function and hitting this -ENOENT failure. That's why I was
> > asking the other questions.
>
> OK. So where is the -EPROBE_DEFER supposed to happen without
> driver_deferred_probe_check_state() then?

device_links_check_suppliers() call inside really_probe() would short
circuit and return an -EPROBE_DEFER if the device links are created as
expected.

>
> > > > > On platform_probe() genpd_get_from_provider() returns
> > > > > -ENOENT.
> > > >
> > > > This error is with the series I assume?
> > >
> > > On the first probe genpd_get_from_provider() will return -ENOENT in
> > > both cases. The list is empty on the first probe and there are no
> > > genpd providers at this point.
> > >
> > > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > > ends up getting changed to -EPROBE_DEFER at the end of
> > > driver_deferred_probe_check_state(), we are now missing that.
> >
> > Right, I was aware -ENOENT would be returned if we got this far. But
> > the point of this series is that you shouldn't have gotten that far
> > before your pm domain device is ready. Hence my questions from the
> > earlier reply.
>
> OK
>
> > Can I get answers to rest of my questions in the first reply please?
> > That should help us figure out why fw_devlink let us get this far.
> > Summarize them here to make it easy:
> > * Are you running with fw_devlink=on?
>
> Yes with the default with no specific kernel params so looks like
> FW_DEVLINK_FLAGS_ON.
>
> > * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
>
> Yes
>
> > * If it's not built-in, can you please try deferred_probe_timeout=0
> > and deferred_probe_timeout=30 and see if either one of them help?
>
> It's built in so I did not try these.
>
> > * Can I get the output of "ls -d supplier:*" and "cat
> > supplier:*/status" output from the sysfs dir for the ocp device
> > without this series where it boots properly.
>
> Hmm so I'm not seeing any supplier for the top level ocp device in
> the booting case without your patches. I see the suppliers for the
> ocp child device instances only.

Hmmm... this is strange (that the device link isn't there), but this
is what I suspected.

Now we need to figure out why it's missing. There are only a few
things that could cause this and I don't see any of those. I already
checked to make sure the power domain in this instance had a proper
driver with a probe() function -- if it didn't, then that's one thing
that'd could have caused the missing device link. The device does seem
to have a proper driver, so looks like I can rule that out.

Can you point me to the dts file that corresponds to the specific
board you are testing this one? I probably won't find anything, but I
want to rule out some of the possibilities.

All the device link creation logic is inside drivers/base/core.c. So
if you can look at the existing messages or add other stuff to figure
out why the device link isn't getting created, that'd be handy. In
either case, I'll continue staring at the DT and code to see what
might be happening here.

-Saravana
Alexander Stein June 23, 2022, 12:08 p.m. UTC | #8
Hi,

Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> Hi,
> 
> * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> > 
> > So, delete the call and replace it with -ENODEV.
> 
> Looks like this causes omaps to not boot in Linux next. With this
> simple-pm-bus fails to probe initially as the power-domain is not
> yet available. On platform_probe() genpd_get_from_provider() returns
> -ENOENT.
> 
> Seems like other stuff is potentially broken too, any ideas on
> how to fix this?

I think I'm hit by this as well, although I do not get a lockup.
In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts 
and probing of 38320000.blk-ctrl fails as the power-domain is not (yet) 
registed. See the (filtered) dmesg output:

> [    0.744245] PM: Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@0 [    0.744756] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@2 [    0.745012] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@3 [    0.745268] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@4 [    0.746121] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@7 [    0.746400] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@8 [    0.746665] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@9 [    0.746927] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@a [    0.748870]
> imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed to attach bus power
> domain [    1.265279] PM: Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@5 [    1.265861] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@6

blk-ctrl@38320000 requires the power-domain 'pgc_vpu', which is power-domain@6 
in pgc.

Best regards,
Alexander

> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> > 
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev,
> > struct device *base_dev,> 
> >  		mutex_unlock(&gpd_list_lock);
> >  		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >  		
> >  			__func__, PTR_ERR(pd));
> > 
> > -		return driver_deferred_probe_check_state(base_dev);
> > +		return -ENODEV;
> > 
> >  	}
> >  	
> >  	dev_dbg(dev, "adding to PM domain %s\n", pd->name);
Saravana Kannan June 30, 2022, 11:10 p.m. UTC | #9
On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > >
> > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > was deleted because fw_devlink=on should have short circuited the
> > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > asking the other questions.
> > >
> > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > driver_deferred_probe_check_state() then?
> >
> > device_links_check_suppliers() call inside really_probe() would short
> > circuit and return an -EPROBE_DEFER if the device links are created as
> > expected.
>
> OK
>
> > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > the booting case without your patches. I see the suppliers for the
> > > ocp child device instances only.
> >
> > Hmmm... this is strange (that the device link isn't there), but this
> > is what I suspected.
>
> Yup, maybe it's because of the supplier being a device in the child
> interconnect for the ocp.

Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
isn't being created.

So the aggregated view is something like (I had to set tabs = 4 space
to fit it within 80 cols):

    ocp: ocp {         <========================= Consumer
        compatible = "simple-pm-bus";
        power-domains = <&prm_per>; <=========== Supplier ref

                l4_wkup: interconnect@44c00000 {
            compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";

            segment@200000 {  /* 0x44e00000 */
                compatible = "simple-pm-bus";

                target-module@0 { /* 0x44e00000, ap 8 58.0 */
                    compatible = "ti,sysc-omap4", "ti,sysc";

                    prcm: prcm@0 {
                        compatible = "ti,am3-prcm", "simple-bus";

                        prm_per: prm@c00 { <========= Actual Supplier
                            compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
                        };
                    };
                };
            };
        };
    };

The power-domain supplier is the great-great-great-grand-child of the
consumer. It's not clear to me how this is valid. What does it even
mean?

Rob, is this considered a valid DT?

Geert, thoughts on whether this is a correct use of simple-pm-bus device?

Also, how is the power domain attach/get working in this case? As far
as I can tell, at least for "simple-pm-bus" devices, the pm domain
attachment is happening under:
really_probe() -> call_driver_probe -> platform_probe() ->
dev_pm_domain_attach()

So, how is the pm domain attach succeeding in the first place without
my changes?

> > Now we need to figure out why it's missing. There are only a few
> > things that could cause this and I don't see any of those. I already
> > checked to make sure the power domain in this instance had a proper
> > driver with a probe() function -- if it didn't, then that's one thing
> > that'd could have caused the missing device link. The device does seem
> > to have a proper driver, so looks like I can rule that out.
> >
> > Can you point me to the dts file that corresponds to the specific
> > board you are testing this one? I probably won't find anything, but I
> > want to rule out some of the possibilities.
>
> You can use the beaglebone black dts for example, that's
> arch/arm/boot/dts/am335x-boneblack.dts and uses am33xx.dtsi for
> ocp interconnect with simple-pm-bus.
>
> > All the device link creation logic is inside drivers/base/core.c. So
> > if you can look at the existing messages or add other stuff to figure
> > out why the device link isn't getting created, that'd be handy. In
> > either case, I'll continue staring at the DT and code to see what
> > might be happening here.
>
> In device_links_check_suppliers() I see these ocp suppliers:
>
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00d00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e01000.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00c00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00e00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e01100.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier fixedregulator0: link->status: 1 link->flags: 000001c0
>
> No -EPROBE_DEFER is returned in device_links_check_suppliers() for
> 44e00c00.prm supplier for beaglebone black for example, 0 gets
> returned.

Yeah, the "1c0" flags are SYNC_STATE_ONLY device links and aren't
relevant to the issue we are seeing. Those links are being created as
a proxy for other descendant devices of ocp that haven't been added
yet, but are consumers of these *.prm devices. They are mainly meant
for correctness of sync_state() callbacks of the supplier and don't
affect probe order. For example: target-module@56000000 is a consumer
of prm_gfx 44e01100.prm.

-Saravana
Saravana Kannan June 30, 2022, 11:30 p.m. UTC | #10
On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
>
> On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > >
> > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > asking the other questions.
> > > > >
> > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > driver_deferred_probe_check_state() then?
> > > >
> > > > device_links_check_suppliers() call inside really_probe() would short
> > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > expected.
> > >
> > > OK
> > >
> > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > the booting case without your patches. I see the suppliers for the
> > > > > ocp child device instances only.
> > > >
> > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > is what I suspected.
> > >
> > > Yup, maybe it's because of the supplier being a device in the child
> > > interconnect for the ocp.
> >
> > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > isn't being created.
> >
> > So the aggregated view is something like (I had to set tabs = 4 space
> > to fit it within 80 cols):
> >
> >     ocp: ocp {         <========================= Consumer
> >         compatible = "simple-pm-bus";
> >         power-domains = <&prm_per>; <=========== Supplier ref
> >
> >                 l4_wkup: interconnect@44c00000 {
> >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> >
> >             segment@200000 {  /* 0x44e00000 */
> >                 compatible = "simple-pm-bus";
> >
> >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> >                     compatible = "ti,sysc-omap4", "ti,sysc";
> >
> >                     prcm: prcm@0 {
> >                         compatible = "ti,am3-prcm", "simple-bus";
> >
> >                         prm_per: prm@c00 { <========= Actual Supplier
> >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> >                         };
> >                     };
> >                 };
> >             };
> >         };
> >     };
> >
> > The power-domain supplier is the great-great-great-grand-child of the
> > consumer. It's not clear to me how this is valid. What does it even
> > mean?
> >
> > Rob, is this considered a valid DT?
>
> Valid DT for broken h/w.

I'm not sure even in that case it's valid. When the parent device is
in reset (when the SoC is coming out of reset), there's no way the
descendant is functional. And if the descendant is not functional, how
is the parent device powered up? This just feels like an incorrect
representation of the real h/w.

> So the domain must be default on and then simple-pm-bus is going to
> hold a reference to the domain preventing it from ever getting powered
> off and things seem to work. Except what happens during suspend?

But how can simple-pm-bus even get a reference? The PM domain can't
get added until we are well into the probe of the simple-pm-bus and
AFAICT the genpd attach is done before the driver probe is even
called.

-Saravana
Tony Lindgren July 1, 2022, 5:33 a.m. UTC | #11
* Saravana Kannan <saravanak@google.com> [220630 23:25]:
> On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > >
> > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > >
> > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > asking the other questions.
> > > > > >
> > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > driver_deferred_probe_check_state() then?
> > > > >
> > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > expected.
> > > >
> > > > OK
> > > >
> > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > ocp child device instances only.
> > > > >
> > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > is what I suspected.
> > > >
> > > > Yup, maybe it's because of the supplier being a device in the child
> > > > interconnect for the ocp.
> > >
> > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > isn't being created.
> > >
> > > So the aggregated view is something like (I had to set tabs = 4 space
> > > to fit it within 80 cols):
> > >
> > >     ocp: ocp {         <========================= Consumer
> > >         compatible = "simple-pm-bus";
> > >         power-domains = <&prm_per>; <=========== Supplier ref
> > >
> > >                 l4_wkup: interconnect@44c00000 {
> > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > >
> > >             segment@200000 {  /* 0x44e00000 */
> > >                 compatible = "simple-pm-bus";
> > >
> > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > >
> > >                     prcm: prcm@0 {
> > >                         compatible = "ti,am3-prcm", "simple-bus";
> > >
> > >                         prm_per: prm@c00 { <========= Actual Supplier
> > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > >                         };
> > >                     };
> > >                 };
> > >             };
> > >         };
> > >     };
> > >
> > > The power-domain supplier is the great-great-great-grand-child of the
> > > consumer. It's not clear to me how this is valid. What does it even
> > > mean?
> > >
> > > Rob, is this considered a valid DT?
> >
> > Valid DT for broken h/w.
> 
> I'm not sure even in that case it's valid. When the parent device is
> in reset (when the SoC is coming out of reset), there's no way the
> descendant is functional. And if the descendant is not functional, how
> is the parent device powered up? This just feels like an incorrect
> representation of the real h/w.

It should be correct representation based on scanning the interconnects
and looking at the documentation. Some interconnect parts are wired
always-on and some interconnect instances may be dual-mapped.

We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
Maybe that also now fails in addition to the top level interconnect
probing no longer producing -EPROBE_DEFER.

> > So the domain must be default on and then simple-pm-bus is going to
> > hold a reference to the domain preventing it from ever getting powered
> > off and things seem to work. Except what happens during suspend?
> 
> But how can simple-pm-bus even get a reference? The PM domain can't
> get added until we are well into the probe of the simple-pm-bus and
> AFAICT the genpd attach is done before the driver probe is even
> called.

The prm/prcm gets of_platform_populate() called on it early.

Regards,

Tony
Tony Lindgren July 1, 2022, 6:12 a.m. UTC | #12
* Tony Lindgren <tony@atomide.com> [220701 08:33]:
> * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > >
> > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > >
> > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > asking the other questions.
> > > > > > >
> > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > driver_deferred_probe_check_state() then?
> > > > > >
> > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > expected.
> > > > >
> > > > > OK
> > > > >
> > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > ocp child device instances only.
> > > > > >
> > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > is what I suspected.
> > > > >
> > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > interconnect for the ocp.
> > > >
> > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > isn't being created.
> > > >
> > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > to fit it within 80 cols):
> > > >
> > > >     ocp: ocp {         <========================= Consumer
> > > >         compatible = "simple-pm-bus";
> > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > >
> > > >                 l4_wkup: interconnect@44c00000 {
> > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > >
> > > >             segment@200000 {  /* 0x44e00000 */
> > > >                 compatible = "simple-pm-bus";
> > > >
> > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > >
> > > >                     prcm: prcm@0 {
> > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > >
> > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > >                         };
> > > >                     };
> > > >                 };
> > > >             };
> > > >         };
> > > >     };
> > > >
> > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > consumer. It's not clear to me how this is valid. What does it even
> > > > mean?
> > > >
> > > > Rob, is this considered a valid DT?
> > >
> > > Valid DT for broken h/w.
> > 
> > I'm not sure even in that case it's valid. When the parent device is
> > in reset (when the SoC is coming out of reset), there's no way the
> > descendant is functional. And if the descendant is not functional, how
> > is the parent device powered up? This just feels like an incorrect
> > representation of the real h/w.
> 
> It should be correct representation based on scanning the interconnects
> and looking at the documentation. Some interconnect parts are wired
> always-on and some interconnect instances may be dual-mapped.
> 
> We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
> Maybe that also now fails in addition to the top level interconnect
> probing no longer producing -EPROBE_DEFER.
> 
> > > So the domain must be default on and then simple-pm-bus is going to
> > > hold a reference to the domain preventing it from ever getting powered
> > > off and things seem to work. Except what happens during suspend?
> > 
> > But how can simple-pm-bus even get a reference? The PM domain can't
> > get added until we are well into the probe of the simple-pm-bus and
> > AFAICT the genpd attach is done before the driver probe is even
> > called.
> 
> The prm/prcm gets of_platform_populate() called on it early.

The hackish patch below makes things boot for me, not convinced this
is the preferred fix compared to earlier deferred probe handling though.
Going back to the init level tinkering seems like a step back to me.

Regards,

Tony

8< ----------------
diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
--- a/drivers/soc/ti/omap_prm.c
+++ b/drivers/soc/ti/omap_prm.c
@@ -991,4 +991,9 @@ static struct platform_driver omap_prm_driver = {
 		.of_match_table	= omap_prm_id_table,
 	},
 };
-builtin_platform_driver(omap_prm_driver);
+
+static int __init omap_prm_init(void)
+{
+        return platform_driver_register(&omap_prm_driver);
+}
+subsys_initcall(omap_prm_init);
Geert Uytterhoeven July 1, 2022, 7:38 a.m. UTC | #13
Hi Saravana,

On Fri, Jul 1, 2022 at 1:11 AM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > >
> > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > asking the other questions.
> > > >
> > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > driver_deferred_probe_check_state() then?
> > >
> > > device_links_check_suppliers() call inside really_probe() would short
> > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > expected.
> >
> > OK
> >
> > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > the booting case without your patches. I see the suppliers for the
> > > > ocp child device instances only.
> > >
> > > Hmmm... this is strange (that the device link isn't there), but this
> > > is what I suspected.
> >
> > Yup, maybe it's because of the supplier being a device in the child
> > interconnect for the ocp.
>
> Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> isn't being created.
>
> So the aggregated view is something like (I had to set tabs = 4 space
> to fit it within 80 cols):
>
>     ocp: ocp {         <========================= Consumer
>         compatible = "simple-pm-bus";
>         power-domains = <&prm_per>; <=========== Supplier ref
>
>                 l4_wkup: interconnect@44c00000 {
>             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
>
>             segment@200000 {  /* 0x44e00000 */
>                 compatible = "simple-pm-bus";
>
>                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
>                     compatible = "ti,sysc-omap4", "ti,sysc";
>
>                     prcm: prcm@0 {
>                         compatible = "ti,am3-prcm", "simple-bus";
>
>                         prm_per: prm@c00 { <========= Actual Supplier
>                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
>                         };
>                     };
>                 };
>             };
>         };
>     };
>
> The power-domain supplier is the great-great-great-grand-child of the
> consumer. It's not clear to me how this is valid. What does it even
> mean?
>
> Rob, is this considered a valid DT?
>
> Geert, thoughts on whether this is a correct use of simple-pm-bus device?

Well, if the hardware is wired that way...

It's not that dissimilar from CPU cores, and interrupt and GPIO
controllers in power domains and clocked by controllable clocks:
you can cut the branch you're sitting on, and you have to be careful
when going to sleep, and make sure your wake-up sources are still
functional.

> Also, how is the power domain attach/get working in this case? As far
> as I can tell, at least for "simple-pm-bus" devices, the pm domain
> attachment is happening under:
> really_probe() -> call_driver_probe -> platform_probe() ->
> dev_pm_domain_attach()
>
> So, how is the pm domain attach succeeding in the first place without
> my changes?

That's a software thing ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Saravana Kannan July 1, 2022, 8:10 a.m. UTC | #14
On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
>
> * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > >
> > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > >
> > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > asking the other questions.
> > > > > > > >
> > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > >
> > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > expected.
> > > > > >
> > > > > > OK
> > > > > >
> > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > ocp child device instances only.
> > > > > > >
> > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > is what I suspected.
> > > > > >
> > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > interconnect for the ocp.
> > > > >
> > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > isn't being created.
> > > > >
> > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > to fit it within 80 cols):
> > > > >
> > > > >     ocp: ocp {         <========================= Consumer
> > > > >         compatible = "simple-pm-bus";
> > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > >
> > > > >                 l4_wkup: interconnect@44c00000 {
> > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > >
> > > > >             segment@200000 {  /* 0x44e00000 */
> > > > >                 compatible = "simple-pm-bus";
> > > > >
> > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > >
> > > > >                     prcm: prcm@0 {
> > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > >
> > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > >                         };
> > > > >                     };
> > > > >                 };
> > > > >             };
> > > > >         };
> > > > >     };
> > > > >
> > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > mean?
> > > > >
> > > > > Rob, is this considered a valid DT?
> > > >
> > > > Valid DT for broken h/w.
> > >
> > > I'm not sure even in that case it's valid. When the parent device is
> > > in reset (when the SoC is coming out of reset), there's no way the
> > > descendant is functional. And if the descendant is not functional, how
> > > is the parent device powered up? This just feels like an incorrect
> > > representation of the real h/w.
> >
> > It should be correct representation based on scanning the interconnects
> > and looking at the documentation. Some interconnect parts are wired
> > always-on and some interconnect instances may be dual-mapped.

Thanks for helping to debug this. Appreciate it.

> >
> > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().

:'(

I checked out the code. These prm devices just get populated with NULL
as the parent. So they are effectively top level devices from the
perspective of driver core.

> > Maybe that also now fails in addition to the top level interconnect
> > probing no longer producing -EPROBE_DEFER.

As far as I can tell pdata_quirks_init_clocks() is just adding these
prm devices (amongst other drivers). So I don't expect that to fail.

> >
> > > > So the domain must be default on and then simple-pm-bus is going to
> > > > hold a reference to the domain preventing it from ever getting powered
> > > > off and things seem to work. Except what happens during suspend?
> > >
> > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > get added until we are well into the probe of the simple-pm-bus and
> > > AFAICT the genpd attach is done before the driver probe is even
> > > called.
> >
> > The prm/prcm gets of_platform_populate() called on it early.

:'(

> The hackish patch below makes things boot for me, not convinced this
> is the preferred fix compared to earlier deferred probe handling though.
> Going back to the init level tinkering seems like a step back to me.

The goal of fw_devlink is to avoid init level tinkering and it does
help with that in general. But these kinds of quirks are going to need
a few exceptions -- with them being quirks and all. And this change
will avoid an unnecessary deferred probe (that used to happen even
before my change).

The other option to handle this quirk is to create the invalid
(consumer is parent of supplier) fwnode_link between the prm device
and its consumers when the prm device is populated. Then fw_devlink
will end up creating a device link when ocp gets added. But I'm not
sure if it's going to be easy to find and add all those consumers.

I'd say, for now, let's go with this patch below. I'll see if I can
get fw_devlink to handle these odd quirks without breaking the normal
cases or making them significantly slower. But that'll take some time
and I'm not sure there'll be a nice solution.

Thanks,
Saravana

> Regards,
>
> Tony
>
> 8< ----------------
> diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
> --- a/drivers/soc/ti/omap_prm.c
> +++ b/drivers/soc/ti/omap_prm.c
> @@ -991,4 +991,9 @@ static struct platform_driver omap_prm_driver = {
>                 .of_match_table = omap_prm_id_table,
>         },
>  };
> -builtin_platform_driver(omap_prm_driver);
> +
> +static int __init omap_prm_init(void)
> +{
> +        return platform_driver_register(&omap_prm_driver);
> +}
> +subsys_initcall(omap_prm_init);
> --
> 2.36.1
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>
Saravana Kannan July 1, 2022, 8:26 a.m. UTC | #15
On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > >
> > > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > >
> > > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > > >
> > > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > > asking the other questions.
> > > > > > > > >
> > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > >
> > > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > > expected.
> > > > > > >
> > > > > > > OK
> > > > > > >
> > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > > ocp child device instances only.
> > > > > > > >
> > > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > > is what I suspected.
> > > > > > >
> > > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > > interconnect for the ocp.
> > > > > >
> > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > > isn't being created.
> > > > > >
> > > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > > to fit it within 80 cols):
> > > > > >
> > > > > >     ocp: ocp {         <========================= Consumer
> > > > > >         compatible = "simple-pm-bus";
> > > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > > >
> > > > > >                 l4_wkup: interconnect@44c00000 {
> > > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > >
> > > > > >             segment@200000 {  /* 0x44e00000 */
> > > > > >                 compatible = "simple-pm-bus";
> > > > > >
> > > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > >
> > > > > >                     prcm: prcm@0 {
> > > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > > >
> > > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > > >                         };
> > > > > >                     };
> > > > > >                 };
> > > > > >             };
> > > > > >         };
> > > > > >     };
> > > > > >
> > > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > > mean?
> > > > > >
> > > > > > Rob, is this considered a valid DT?
> > > > >
> > > > > Valid DT for broken h/w.
> > > >
> > > > I'm not sure even in that case it's valid. When the parent device is
> > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > descendant is functional. And if the descendant is not functional, how
> > > > is the parent device powered up? This just feels like an incorrect
> > > > representation of the real h/w.
> > >
> > > It should be correct representation based on scanning the interconnects
> > > and looking at the documentation. Some interconnect parts are wired
> > > always-on and some interconnect instances may be dual-mapped.
>
> Thanks for helping to debug this. Appreciate it.
>
> > >
> > > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
>
> :'(
>
> I checked out the code. These prm devices just get populated with NULL
> as the parent. So they are effectively top level devices from the
> perspective of driver core.
>
> > > Maybe that also now fails in addition to the top level interconnect
> > > probing no longer producing -EPROBE_DEFER.
>
> As far as I can tell pdata_quirks_init_clocks() is just adding these
> prm devices (amongst other drivers). So I don't expect that to fail.
>
> > >
> > > > > So the domain must be default on and then simple-pm-bus is going to
> > > > > hold a reference to the domain preventing it from ever getting powered
> > > > > off and things seem to work. Except what happens during suspend?
> > > >
> > > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > > get added until we are well into the probe of the simple-pm-bus and
> > > > AFAICT the genpd attach is done before the driver probe is even
> > > > called.
> > >
> > > The prm/prcm gets of_platform_populate() called on it early.
>
> :'(
>
> > The hackish patch below makes things boot for me, not convinced this
> > is the preferred fix compared to earlier deferred probe handling though.
> > Going back to the init level tinkering seems like a step back to me.
>
> The goal of fw_devlink is to avoid init level tinkering and it does
> help with that in general. But these kinds of quirks are going to need
> a few exceptions -- with them being quirks and all. And this change
> will avoid an unnecessary deferred probe (that used to happen even
> before my change).
>
> The other option to handle this quirk is to create the invalid
> (consumer is parent of supplier) fwnode_link between the prm device
> and its consumers when the prm device is populated. Then fw_devlink
> will end up creating a device link when ocp gets added. But I'm not
> sure if it's going to be easy to find and add all those consumers.
>
> I'd say, for now, let's go with this patch below. I'll see if I can
> get fw_devlink to handle these odd quirks without breaking the normal
> cases or making them significantly slower. But that'll take some time
> and I'm not sure there'll be a nice solution.

Can you check if this hack helps? If so, then I can think about
whether we can pick it up without breaking everything else. Copy-paste
tab mess up warning.

-Saravana

8< ----------------

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 967f79b59016..f671a7528719 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1138,18 +1138,6 @@ static int of_link_to_phandle(struct device_node *con_np,
                return -ENODEV;
        }

-       /*
-        * Don't allow linking a device node as a consumer of one of its
-        * descendant nodes. By definition, a child node can't be a functional
-        * dependency for the parent node.
-        */
-       if (of_is_ancestor_of(con_np, sup_np)) {
-               pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
-                        con_np, sup_np);
-               of_node_put(sup_np);
-               return -EINVAL;
-       }
-
        /*
         * Don't create links to "early devices" that won't have struct devices
         * created for them.
@@ -1163,6 +1151,25 @@ static int of_link_to_phandle(struct device_node *con_np,
                of_node_put(sup_np);
                return -ENODEV;
        }
+
+       /*
+        * Don't allow linking a device node as a consumer of one of its
+        * descendant nodes. By definition, a child node can't be a functional
+        * dependency for the parent node.
+        *
+        * However, if the child node already has a device while the parent is
+        * in the process of being added, it's probably some weird quirk
+        * handling. So, don't both checking if the consumer is an ancestor of
+        * the supplier.
+        */
+       if (!sup_dev && of_is_ancestor_of(con_np, sup_np)) {
+               pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
+                        con_np, sup_np);
+               put_device(sup_dev);
+               of_node_put(sup_np);
+               return -EINVAL;
+       }
+
Tony Lindgren July 1, 2022, 1 p.m. UTC | #16
* Saravana Kannan <saravanak@google.com> [220701 08:21]:
> On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > > > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > >
> > > > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > > > >
> > > > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > > > asking the other questions.
> > > > > > > > > >
> > > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > > >
> > > > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > > > expected.
> > > > > > > >
> > > > > > > > OK
> > > > > > > >
> > > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > > > ocp child device instances only.
> > > > > > > > >
> > > > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > > > is what I suspected.
> > > > > > > >
> > > > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > > > interconnect for the ocp.
> > > > > > >
> > > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > > > isn't being created.
> > > > > > >
> > > > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > > > to fit it within 80 cols):
> > > > > > >
> > > > > > >     ocp: ocp {         <========================= Consumer
> > > > > > >         compatible = "simple-pm-bus";
> > > > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > > > >
> > > > > > >                 l4_wkup: interconnect@44c00000 {
> > > > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > > >
> > > > > > >             segment@200000 {  /* 0x44e00000 */
> > > > > > >                 compatible = "simple-pm-bus";
> > > > > > >
> > > > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > > >
> > > > > > >                     prcm: prcm@0 {
> > > > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > > > >
> > > > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > > > >                         };
> > > > > > >                     };
> > > > > > >                 };
> > > > > > >             };
> > > > > > >         };
> > > > > > >     };
> > > > > > >
> > > > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > > > mean?
> > > > > > >
> > > > > > > Rob, is this considered a valid DT?
> > > > > >
> > > > > > Valid DT for broken h/w.
> > > > >
> > > > > I'm not sure even in that case it's valid. When the parent device is
> > > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > > descendant is functional. And if the descendant is not functional, how
> > > > > is the parent device powered up? This just feels like an incorrect
> > > > > representation of the real h/w.
> > > >
> > > > It should be correct representation based on scanning the interconnects
> > > > and looking at the documentation. Some interconnect parts are wired
> > > > always-on and some interconnect instances may be dual-mapped.
> >
> > Thanks for helping to debug this. Appreciate it.
> >
> > > >
> > > > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
> >
> > :'(
> >
> > I checked out the code. These prm devices just get populated with NULL
> > as the parent. So they are effectively top level devices from the
> > perspective of driver core.
> >
> > > > Maybe that also now fails in addition to the top level interconnect
> > > > probing no longer producing -EPROBE_DEFER.
> >
> > As far as I can tell pdata_quirks_init_clocks() is just adding these
> > prm devices (amongst other drivers). So I don't expect that to fail.
> >
> > > >
> > > > > > So the domain must be default on and then simple-pm-bus is going to
> > > > > > hold a reference to the domain preventing it from ever getting powered
> > > > > > off and things seem to work. Except what happens during suspend?
> > > > >
> > > > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > > > get added until we are well into the probe of the simple-pm-bus and
> > > > > AFAICT the genpd attach is done before the driver probe is even
> > > > > called.
> > > >
> > > > The prm/prcm gets of_platform_populate() called on it early.
> >
> > :'(
> >
> > > The hackish patch below makes things boot for me, not convinced this
> > > is the preferred fix compared to earlier deferred probe handling though.
> > > Going back to the init level tinkering seems like a step back to me.
> >
> > The goal of fw_devlink is to avoid init level tinkering and it does
> > help with that in general. But these kinds of quirks are going to need
> > a few exceptions -- with them being quirks and all. And this change
> > will avoid an unnecessary deferred probe (that used to happen even
> > before my change).
> >
> > The other option to handle this quirk is to create the invalid
> > (consumer is parent of supplier) fwnode_link between the prm device
> > and its consumers when the prm device is populated. Then fw_devlink
> > will end up creating a device link when ocp gets added. But I'm not
> > sure if it's going to be easy to find and add all those consumers.
> >
> > I'd say, for now, let's go with this patch below. I'll see if I can
> > get fw_devlink to handle these odd quirks without breaking the normal
> > cases or making them significantly slower. But that'll take some time
> > and I'm not sure there'll be a nice solution.
> 
> Can you check if this hack helps? If so, then I can think about
> whether we can pick it up without breaking everything else. Copy-paste
> tab mess up warning.

Yeah so manually applying your patch while updating it against
next-20220624 kernel boots for me. I ended up with the following
changes FYI.

Also, looks like both with the initcall change for prm, and the patch
below, there seems to be also another problem where my test devices no
longer properly idle somehow compared to reverting the your two patches
in next.

Regards,

Tony

8< -------------------
diff --git a/drivers/of/property.c b/drivers/of/property.c
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1138,18 +1138,6 @@ static int of_link_to_phandle(struct device_node *con_np,
 		return -ENODEV;
 	}
 
-	/*
-	 * Don't allow linking a device node as a consumer of one of its
-	 * descendant nodes. By definition, a child node can't be a functional
-	 * dependency for the parent node.
-	 */
-	if (of_is_ancestor_of(con_np, sup_np)) {
-		pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
-			 con_np, sup_np);
-		of_node_put(sup_np);
-		return -EINVAL;
-	}
-
 	/*
 	 * Don't create links to "early devices" that won't have struct devices
 	 * created for them.
@@ -1163,9 +1151,27 @@ static int of_link_to_phandle(struct device_node *con_np,
 		of_node_put(sup_np);
 		return -ENODEV;
 	}
-	put_device(sup_dev);
+
+	/*
+	 * Don't allow linking a device node as a consumer of one of its
+	 * descendant nodes. By definition, a child node can't be a functional
+	 * dependency for the parent node.
+	 *
+	 * However, if the child node already has a device while the parent is
+	 * in the process of being added, it's probably some weird quirk
+	 * handling. So, don't both checking if the consumer is an ancestor of
+	 * the supplier.
+	 */
+	if (!sup_dev && of_is_ancestor_of(con_np, sup_np)) {
+		pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
+			 con_np, sup_np);
+		put_device(sup_dev);
+		of_node_put(sup_np);
+		return -EINVAL;
+	}
 
 	fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
+	put_device(sup_dev);
 	of_node_put(sup_np);
 
 	return 0;
Sudeep Holla July 1, 2022, 3:08 p.m. UTC | #17
Hi, Saravana,

On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:

[...]

> Can you check if this hack helps? If so, then I can think about
> whether we can pick it up without breaking everything else. Copy-paste
> tab mess up warning.

Sorry for jumping in late and not even sure if this is right thread.
I have not bisected anything yet, but I am seeing issues on my Juno R2
with SCMI enabled power domains and Coresight AMBA devices.

OF: amba_device_add() failed (-19) for /etf@20010000
OF: amba_device_add() failed (-19) for /tpiu@20030000
OF: amba_device_add() failed (-19) for /funnel@20040000
OF: amba_device_add() failed (-19) for /etr@20070000
OF: amba_device_add() failed (-19) for /stm@20100000
OF: amba_device_add() failed (-19) for /replicator@20120000
OF: amba_device_add() failed (-19) for /cpu-debug@22010000
OF: amba_device_add() failed (-19) for /etm@22040000
OF: amba_device_add() failed (-19) for /cti@22020000
OF: amba_device_add() failed (-19) for /funnel@220c0000
OF: amba_device_add() failed (-19) for /cpu-debug@22110000
OF: amba_device_add() failed (-19) for /etm@22140000
OF: amba_device_add() failed (-19) for /cti@22120000
OF: amba_device_add() failed (-19) for /cpu-debug@23010000
OF: amba_device_add() failed (-19) for /etm@23040000
OF: amba_device_add() failed (-19) for /cti@23020000
OF: amba_device_add() failed (-19) for /funnel@230c0000
OF: amba_device_add() failed (-19) for /cpu-debug@23110000
OF: amba_device_add() failed (-19) for /etm@23140000
OF: amba_device_add() failed (-19) for /cti@23120000
OF: amba_device_add() failed (-19) for /cpu-debug@23210000
OF: amba_device_add() failed (-19) for /etm@23240000
OF: amba_device_add() failed (-19) for /cti@23220000
OF: amba_device_add() failed (-19) for /cpu-debug@23310000
OF: amba_device_add() failed (-19) for /etm@23340000
OF: amba_device_add() failed (-19) for /cti@23320000
OF: amba_device_add() failed (-19) for /cti@20020000
OF: amba_device_add() failed (-19) for /cti@20110000
OF: amba_device_add() failed (-19) for /funnel@20130000
OF: amba_device_add() failed (-19) for /etf@20140000
OF: amba_device_add() failed (-19) for /funnel@20150000
OF: amba_device_add() failed (-19) for /cti@20160000

These are working fine with deferred probe in the mainline.
I tried the hack you have suggested here(rather Tony's version), also
tried with fw_devlink=0 and fw_devlink=1 && fw_devlink.strict=0
No change in the behaviour.

The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
coresight devices are mostly in juno-cs-r1r2.dtsi

Let me know if there is anything obvious or you want me to bisect which
means I need more time. I can do that next week.
Saravana Kannan July 1, 2022, 7:13 p.m. UTC | #18
On Fri, Jul 1, 2022 at 8:08 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> Hi, Saravana,
>
> On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:
>
> [...]
>
> > Can you check if this hack helps? If so, then I can think about
> > whether we can pick it up without breaking everything else. Copy-paste
> > tab mess up warning.
>
> Sorry for jumping in late and not even sure if this is right thread.
> I have not bisected anything yet, but I am seeing issues on my Juno R2
> with SCMI enabled power domains and Coresight AMBA devices.
>
> OF: amba_device_add() failed (-19) for /etf@20010000
> OF: amba_device_add() failed (-19) for /tpiu@20030000
> OF: amba_device_add() failed (-19) for /funnel@20040000
> OF: amba_device_add() failed (-19) for /etr@20070000
> OF: amba_device_add() failed (-19) for /stm@20100000
> OF: amba_device_add() failed (-19) for /replicator@20120000
> OF: amba_device_add() failed (-19) for /cpu-debug@22010000
> OF: amba_device_add() failed (-19) for /etm@22040000
> OF: amba_device_add() failed (-19) for /cti@22020000
> OF: amba_device_add() failed (-19) for /funnel@220c0000
> OF: amba_device_add() failed (-19) for /cpu-debug@22110000
> OF: amba_device_add() failed (-19) for /etm@22140000
> OF: amba_device_add() failed (-19) for /cti@22120000
> OF: amba_device_add() failed (-19) for /cpu-debug@23010000
> OF: amba_device_add() failed (-19) for /etm@23040000
> OF: amba_device_add() failed (-19) for /cti@23020000
> OF: amba_device_add() failed (-19) for /funnel@230c0000
> OF: amba_device_add() failed (-19) for /cpu-debug@23110000
> OF: amba_device_add() failed (-19) for /etm@23140000
> OF: amba_device_add() failed (-19) for /cti@23120000
> OF: amba_device_add() failed (-19) for /cpu-debug@23210000
> OF: amba_device_add() failed (-19) for /etm@23240000
> OF: amba_device_add() failed (-19) for /cti@23220000
> OF: amba_device_add() failed (-19) for /cpu-debug@23310000
> OF: amba_device_add() failed (-19) for /etm@23340000
> OF: amba_device_add() failed (-19) for /cti@23320000
> OF: amba_device_add() failed (-19) for /cti@20020000
> OF: amba_device_add() failed (-19) for /cti@20110000
> OF: amba_device_add() failed (-19) for /funnel@20130000
> OF: amba_device_add() failed (-19) for /etf@20140000
> OF: amba_device_add() failed (-19) for /funnel@20150000
> OF: amba_device_add() failed (-19) for /cti@20160000
>
> These are working fine with deferred probe in the mainline.
> I tried the hack you have suggested here(rather Tony's version),

Thanks for trying that.

> also
> tried with fw_devlink=0 and fw_devlink=1

0 and 1 aren't valid input to fw_devlink. But yeah, I don't expect
disabling it to make anything better.

> && fw_devlink.strict=0
> No change in the behaviour.
>
> The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
> coresight devices are mostly in juno-cs-r1r2.dtsi

Thanks

> Let me know if there is anything obvious or you want me to bisect which
> means I need more time. I can do that next week.

I'll let you know once I poke at the DTS. We need to figure out why
fw_devlink wasn't blocking these from getting to the error (same as in
Tony's case). But since these are amba devices, I think I have some
guesses.

This is an old series that had some issues in some cases and I haven't
gotten around to looking at it. You can give that a shot if you can
apply it to a recent tree.
https://lore.kernel.org/lkml/20210304195101.3843496-1-saravanak@google.com/

After looking at that old patch again, I think I know what's going on.
For normal devices, the pm domain attach happens AFTER the device is
added and fw_devlink has had a chance to set up device links. And if
the suppliers aren't ready, really_probe() won't get as far as
dev_pm_domain_attach(). But for amba, the clock and pm domain
suppliers are "grabbed" before adding the device.

So with that old patch + always returning -EPROBE_DEFER in
amba_device_add() if amba_read_periphid() fails should fix your issue.

-Saravana
Saravana Kannan July 5, 2022, 8:44 a.m. UTC | #19
On Fri, Jul 1, 2022 at 12:13 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Fri, Jul 1, 2022 at 8:08 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > Hi, Saravana,
> >
> > On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:
> >
> > [...]
> >
> > > Can you check if this hack helps? If so, then I can think about
> > > whether we can pick it up without breaking everything else. Copy-paste
> > > tab mess up warning.
> >
> > Sorry for jumping in late and not even sure if this is right thread.
> > I have not bisected anything yet, but I am seeing issues on my Juno R2
> > with SCMI enabled power domains and Coresight AMBA devices.
> >
> > OF: amba_device_add() failed (-19) for /etf@20010000
> > OF: amba_device_add() failed (-19) for /tpiu@20030000
> > OF: amba_device_add() failed (-19) for /funnel@20040000
> > OF: amba_device_add() failed (-19) for /etr@20070000
> > OF: amba_device_add() failed (-19) for /stm@20100000
> > OF: amba_device_add() failed (-19) for /replicator@20120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@22010000
> > OF: amba_device_add() failed (-19) for /etm@22040000
> > OF: amba_device_add() failed (-19) for /cti@22020000
> > OF: amba_device_add() failed (-19) for /funnel@220c0000
> > OF: amba_device_add() failed (-19) for /cpu-debug@22110000
> > OF: amba_device_add() failed (-19) for /etm@22140000
> > OF: amba_device_add() failed (-19) for /cti@22120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23010000
> > OF: amba_device_add() failed (-19) for /etm@23040000
> > OF: amba_device_add() failed (-19) for /cti@23020000
> > OF: amba_device_add() failed (-19) for /funnel@230c0000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23110000
> > OF: amba_device_add() failed (-19) for /etm@23140000
> > OF: amba_device_add() failed (-19) for /cti@23120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23210000
> > OF: amba_device_add() failed (-19) for /etm@23240000
> > OF: amba_device_add() failed (-19) for /cti@23220000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23310000
> > OF: amba_device_add() failed (-19) for /etm@23340000
> > OF: amba_device_add() failed (-19) for /cti@23320000
> > OF: amba_device_add() failed (-19) for /cti@20020000
> > OF: amba_device_add() failed (-19) for /cti@20110000
> > OF: amba_device_add() failed (-19) for /funnel@20130000
> > OF: amba_device_add() failed (-19) for /etf@20140000
> > OF: amba_device_add() failed (-19) for /funnel@20150000
> > OF: amba_device_add() failed (-19) for /cti@20160000
> >
> > These are working fine with deferred probe in the mainline.
> > I tried the hack you have suggested here(rather Tony's version),
>
> Thanks for trying that.
>
> > also
> > tried with fw_devlink=0 and fw_devlink=1
>
> 0 and 1 aren't valid input to fw_devlink. But yeah, I don't expect
> disabling it to make anything better.
>
> > && fw_devlink.strict=0
> > No change in the behaviour.
> >
> > The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
> > coresight devices are mostly in juno-cs-r1r2.dtsi
>
> Thanks
>
> > Let me know if there is anything obvious or you want me to bisect which
> > means I need more time. I can do that next week.
>
> I'll let you know once I poke at the DTS. We need to figure out why
> fw_devlink wasn't blocking these from getting to the error (same as in
> Tony's case). But since these are amba devices, I think I have some
> guesses.
>
> This is an old series that had some issues in some cases and I haven't
> gotten around to looking at it. You can give that a shot if you can
> apply it to a recent tree.
> https://lore.kernel.org/lkml/20210304195101.3843496-1-saravanak@google.com/

I rebased it to driver-core-next and tested the patch  (for
correctness, not with your issue though). I'm fairly sure it should
help with your issue. Can you give it a shot please?

https://lore.kernel.org/lkml/20220705083934.3974140-1-saravanak@google.com/T/#u

-Saravana

>
> After looking at that old patch again, I think I know what's going on.
> For normal devices, the pm domain attach happens AFTER the device is
> added and fw_devlink has had a chance to set up device links. And if
> the suppliers aren't ready, really_probe() won't get as far as
> dev_pm_domain_attach(). But for amba, the clock and pm domain
> suppliers are "grabbed" before adding the device.
>
> So with that old patch + always returning -EPROBE_DEFER in
> amba_device_add() if amba_read_periphid() fails should fix your issue.
>
> -Saravana
Tony Lindgren July 12, 2022, 7:12 a.m. UTC | #20
* Tony Lindgren <tony@atomide.com> [220701 16:00]:
> Also, looks like both with the initcall change for prm, and the patch
> below, there seems to be also another problem where my test devices no
> longer properly idle somehow compared to reverting the your two patches
> in next.

Sorry looks like was a wrong conclusion. While trying to track down this
issue, I cannot reproduce it. So I don't see issues idling with either
the initcall change or your test patch.

Not sure what caused my earlier tests to fail though. Maybe a config
change to enable more debugging, or possibly some kind of warm reset vs
cold reset type issue.

Regards,

Tony
Saravana Kannan July 13, 2022, 12:49 a.m. UTC | #21
On Tue, Jul 12, 2022 at 12:12 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Tony Lindgren <tony@atomide.com> [220701 16:00]:
> > Also, looks like both with the initcall change for prm, and the patch
> > below, there seems to be also another problem where my test devices no
> > longer properly idle somehow compared to reverting the your two patches
> > in next.
>
> Sorry looks like was a wrong conclusion. While trying to track down this
> issue, I cannot reproduce it. So I don't see issues idling with either
> the initcall change or your test patch.
>
> Not sure what caused my earlier tests to fail though. Maybe a config
> change to enable more debugging, or possibly some kind of warm reset vs
> cold reset type issue.

Thanks for getting back to me about the false alarm.

OK, so it looks like my patch to drivers/of/property.c fixed the issue
for you. In that case, let me test that a bit more thoroughly on my
end to make sure it's not breaking any existing functionality. And if
it's not breaking, I'll land that in the kernel eventually. Might be a
bit too late for 5.19. I'm considering temporarily reverting my series
depending on how the rest of the issues from my series go.

-Saravana
Tony Lindgren July 13, 2022, 8:06 a.m. UTC | #22
* Saravana Kannan <saravanak@google.com> [220713 00:44]:
> On Tue, Jul 12, 2022 at 12:12 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Tony Lindgren <tony@atomide.com> [220701 16:00]:
> > > Also, looks like both with the initcall change for prm, and the patch
> > > below, there seems to be also another problem where my test devices no
> > > longer properly idle somehow compared to reverting the your two patches
> > > in next.
> >
> > Sorry looks like was a wrong conclusion. While trying to track down this
> > issue, I cannot reproduce it. So I don't see issues idling with either
> > the initcall change or your test patch.
> >
> > Not sure what caused my earlier tests to fail though. Maybe a config
> > change to enable more debugging, or possibly some kind of warm reset vs
> > cold reset type issue.
> 
> Thanks for getting back to me about the false alarm.

FYI I'm pretty sure I had also some pending sdhci related patches applied
while testing causing extra issues.

> OK, so it looks like my patch to drivers/of/property.c fixed the issue
> for you. In that case, let me test that a bit more thoroughly on my
> end to make sure it's not breaking any existing functionality. And if
> it's not breaking, I'll land that in the kernel eventually. Might be a
> bit too late for 5.19. I'm considering temporarily reverting my series
> depending on how the rest of the issues from my series go.

OK. Seems the series is otherwise working and in case of issues, partial
revert should be enough in the worst case. But yeah, probably some more
testing is needed.

Regards,

Tony
diff mbox series

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 739e52cd4aba..3e86772d5fac 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2730,7 +2730,7 @@  static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 		mutex_unlock(&gpd_list_lock);
 		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
 			__func__, PTR_ERR(pd));
-		return driver_deferred_probe_check_state(base_dev);
+		return -ENODEV;
 	}
 
 	dev_dbg(dev, "adding to PM domain %s\n", pd->name);