diff mbox series

[v20,2/5] usb: dwc3: core: Host wake up support from system suspend

Message ID 1654158277-12921-3-git-send-email-quic_kriskura@quicinc.com
State Superseded
Headers show
Series USB DWC3 host wake up support from system suspend | expand

Commit Message

Krishna Kurapati June 2, 2022, 8:24 a.m. UTC
From: Sandeep Maheswaram <quic_c_sanm@quicinc.com>

Check wakeup-source property for dwc3 core node to set the
wakeup capability. Drop the device_init_wakeup call from
runtime suspend and resume.

If the dwc3 is wakeup capable, don't power down the USB PHY(s).
The glue drivers are expected to take care of configuring the
additional wakeup settings if needed based on the dwc3 wakeup
capability status. In some SOC designs, powering off the PHY is
resulting in higher leakage, so this patch save power on such boards.

Signed-off-by: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Reviewed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
---
 drivers/usb/dwc3/core.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

Comments

Matthias Kaehlcke June 6, 2022, 8:45 p.m. UTC | #1
On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> Hi Krishna,
> 
> with this version I see xHCI errors on my SC7180 based system, like
> these:
> 
> [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> 
> [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> 
> After resume a downstream hub isn't enumerated again.
> 
> So far I didn't see those with v13, but I aso saw the first error with
> v16.

It also happens with v13, but only when a wakeup capable vUSB <= 2
device is plugged in. Initially I used a wakeup capable USB3 to
Ethernet adapter to trigger the wakeup case, however older versions
of this series that use usb_wakeup_enabled_descendants() to check
for wakeup capable devices didn't actually check for vUSB > 2
devices.

So the case were the controller/PHYs is powered down works, but
the controller is unhappy when the runtime PM path is used during
system suspend.

> On Thu, Jun 02, 2022 at 01:54:34PM +0530, Krishna Kurapati wrote:
> > From: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
> > 
> > Check wakeup-source property for dwc3 core node to set the
> > wakeup capability. Drop the device_init_wakeup call from
> > runtime suspend and resume.
> > 
> > If the dwc3 is wakeup capable, don't power down the USB PHY(s).
> > The glue drivers are expected to take care of configuring the
> > additional wakeup settings if needed based on the dwc3 wakeup
> > capability status. In some SOC designs, powering off the PHY is
> > resulting in higher leakage, so this patch save power on such boards.
> > 
> > Signed-off-by: Sandeep Maheswaram <quic_c_sanm@quicinc.com>
> > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
> > Reviewed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
> > ---
> >  drivers/usb/dwc3/core.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> > index e027c04..b99d3c2 100644
> > --- a/drivers/usb/dwc3/core.c
> > +++ b/drivers/usb/dwc3/core.c
> > @@ -1787,6 +1787,7 @@ static int dwc3_probe(struct platform_device *pdev)
> >  
> >  	platform_set_drvdata(pdev, dwc);
> >  	dwc3_cache_hwparams(dwc);
> > +	device_init_wakeup(&pdev->dev, of_property_read_bool(dev->of_node, "wakeup-source"));
> >  
> >  	spin_lock_init(&dwc->lock);
> >  	mutex_init(&dwc->mutex);
> > @@ -1948,7 +1949,7 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
> >  		dwc3_core_exit(dwc);
> >  		break;
> >  	case DWC3_GCTL_PRTCAP_HOST:
> > -		if (!PMSG_IS_AUTO(msg)) {
> > +		if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) {
> >  			dwc3_core_exit(dwc);
> >  			break;
> >  		}
> > @@ -2009,7 +2010,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
> >  		spin_unlock_irqrestore(&dwc->lock, flags);
> >  		break;
> >  	case DWC3_GCTL_PRTCAP_HOST:
> > -		if (!PMSG_IS_AUTO(msg)) {
> > +		if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) {
> >  			ret = dwc3_core_init_for_resume(dwc);
> >  			if (ret)
> >  				return ret;
> > @@ -2086,8 +2087,6 @@ static int dwc3_runtime_suspend(struct device *dev)
> >  	if (ret)
> >  		return ret;
> >  
> > -	device_init_wakeup(dev, true);
> > -
> >  	return 0;
> >  }
> >  
> > @@ -2096,8 +2095,6 @@ static int dwc3_runtime_resume(struct device *dev)
> >  	struct dwc3     *dwc = dev_get_drvdata(dev);
> >  	int		ret;
> >  
> > -	device_init_wakeup(dev, false);
> > -
> >  	ret = dwc3_resume_common(dwc, PMSG_AUTO_RESUME);
> >  	if (ret)
> >  		return ret;
> > -- 
> > 2.7.4
> >
Matthias Kaehlcke June 13, 2022, 6:08 p.m. UTC | #2
On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > Hi Krishna,
> > 
> > with this version I see xHCI errors on my SC7180 based system, like
> > these:
> > 
> > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > 
> > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > 
> > After resume a downstream hub isn't enumerated again.
> > 
> > So far I didn't see those with v13, but I aso saw the first error with
> > v16.
> 
> It also happens with v13, but only when a wakeup capable vUSB <= 2
> device is plugged in. Initially I used a wakeup capable USB3 to
> Ethernet adapter to trigger the wakeup case, however older versions
> of this series that use usb_wakeup_enabled_descendants() to check
> for wakeup capable devices didn't actually check for vUSB > 2
> devices.
> 
> So the case were the controller/PHYs is powered down works, but
> the controller is unhappy when the runtime PM path is used during
> system suspend.

The issue isn't seen on all systems using dwc3-qcom and the problem starts
during probe(). The expected probe sequence is something like this:

dwc3_qcom_probe
  dwc3_qcom_of_register_core
    dwc3_probe

  if (device_can_wakeup(&qcom->dwc3->dev))
    ...

The important part is that device_can_wakeup() is called after dwc3_probe()
has completed. That's what I see on a QC SC7280 system, where wakeup is
generally working with these patches.

However on a QC SC7180 system dwc3_probe() is deferred and only executed after
dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
With that the controller/driver ends up in an unhappy state after system
suspend.

Probing is deferred on SC7180 because device_links_check_suppliers() finds
that '88e3000.phy' isn't ready yet.
Matthias Kaehlcke June 14, 2022, 5:53 p.m. UTC | #3
On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > Hi Krishna,
> > > 
> > > with this version I see xHCI errors on my SC7180 based system, like
> > > these:
> > > 
> > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > 
> > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > 
> > > After resume a downstream hub isn't enumerated again.
> > > 
> > > So far I didn't see those with v13, but I aso saw the first error with
> > > v16.
> > 
> > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > device is plugged in. Initially I used a wakeup capable USB3 to
> > Ethernet adapter to trigger the wakeup case, however older versions
> > of this series that use usb_wakeup_enabled_descendants() to check
> > for wakeup capable devices didn't actually check for vUSB > 2
> > devices.
> > 
> > So the case were the controller/PHYs is powered down works, but
> > the controller is unhappy when the runtime PM path is used during
> > system suspend.
> 
> The issue isn't seen on all systems using dwc3-qcom and the problem starts
> during probe(). The expected probe sequence is something like this:
> 
> dwc3_qcom_probe
>   dwc3_qcom_of_register_core
>     dwc3_probe
> 
>   if (device_can_wakeup(&qcom->dwc3->dev))
>     ...
> 
> The important part is that device_can_wakeup() is called after dwc3_probe()
> has completed. That's what I see on a QC SC7280 system, where wakeup is
> generally working with these patches.
> 
> However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> With that the controller/driver ends up in an unhappy state after system
> suspend.
> 
> Probing is deferred on SC7180 because device_links_check_suppliers() finds
> that '88e3000.phy' isn't ready yet.

It seems device links could be used to make sure the dwc3 core is present:

  Another example for an inconsistent state would be a device link that
  represents a driver presence dependency, yet is added from the consumer’s
  ->probe callback while the supplier hasn’t probed yet: Had the driver core
  known about the device link earlier, it wouldn’t have probed the consumer
  in the first place. The onus is thus on the consumer to check presence of
  the supplier after adding the link, and defer probing on non-presence.

  https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage


You could add something like this to dwc3_qcom_of_register_core():


  device_link_add(dev, &qcom->dwc3->dev,
  		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);

  if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
      ret = -EPROBE_DEFER;
Pavan Kondeti June 16, 2022, 9:11 a.m. UTC | #4
Hi Matthias/Krishna,

On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > Hi Krishna,
> > > > 
> > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > these:
> > > > 
> > > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > 
> > > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > 
> > > > After resume a downstream hub isn't enumerated again.
> > > > 
> > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > v16.
> > > 
> > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > Ethernet adapter to trigger the wakeup case, however older versions
> > > of this series that use usb_wakeup_enabled_descendants() to check
> > > for wakeup capable devices didn't actually check for vUSB > 2
> > > devices.
> > > 
> > > So the case were the controller/PHYs is powered down works, but
> > > the controller is unhappy when the runtime PM path is used during
> > > system suspend.
> > 
> > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > during probe(). The expected probe sequence is something like this:
> > 
> > dwc3_qcom_probe
> >   dwc3_qcom_of_register_core
> >     dwc3_probe
> > 
> >   if (device_can_wakeup(&qcom->dwc3->dev))
> >     ...
> > 
> > The important part is that device_can_wakeup() is called after dwc3_probe()
> > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > generally working with these patches.
> > 
> > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > With that the controller/driver ends up in an unhappy state after system
> > suspend.
> > 
> > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > that '88e3000.phy' isn't ready yet.
> 
> It seems device links could be used to make sure the dwc3 core is present:
> 
>   Another example for an inconsistent state would be a device link that
>   represents a driver presence dependency, yet is added from the consumer’s
>   ->probe callback while the supplier hasn’t probed yet: Had the driver core
>   known about the device link earlier, it wouldn’t have probed the consumer
>   in the first place. The onus is thus on the consumer to check presence of
>   the supplier after adding the link, and defer probing on non-presence.
> 
>   https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> 
> 
> You could add something like this to dwc3_qcom_of_register_core():
> 
> 
>   device_link_add(dev, &qcom->dwc3->dev,
>   		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> 
>   if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
>       ret = -EPROBE_DEFER;
> 
> 
I am not very sure how the device_link_add() API works. we are the parent and
creating a depdency on child probe. That does not sound correct to me. Any
ways, I have another question.

When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
goto depopulate label which calls of_platform_depopulate() which destroy the
child devices that are populated. how does that ensure that child probe is
completed by the time, our probe is called again. The child device it self is
gone. Is this working because when our probe is called next time, the child
probe depenencies are resolved?

Thanks,
Pavan
Matthias Kaehlcke June 16, 2022, 5:15 p.m. UTC | #5
On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> Hi Matthias/Krishna,
> 
> On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > > Hi Krishna,
> > > > > 
> > > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > > these:
> > > > > 
> > > > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > > 
> > > > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > > 
> > > > > After resume a downstream hub isn't enumerated again.
> > > > > 
> > > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > > v16.
> > > > 
> > > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > > Ethernet adapter to trigger the wakeup case, however older versions
> > > > of this series that use usb_wakeup_enabled_descendants() to check
> > > > for wakeup capable devices didn't actually check for vUSB > 2
> > > > devices.
> > > > 
> > > > So the case were the controller/PHYs is powered down works, but
> > > > the controller is unhappy when the runtime PM path is used during
> > > > system suspend.
> > > 
> > > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > > during probe(). The expected probe sequence is something like this:
> > > 
> > > dwc3_qcom_probe
> > >   dwc3_qcom_of_register_core
> > >     dwc3_probe
> > > 
> > >   if (device_can_wakeup(&qcom->dwc3->dev))
> > >     ...
> > > 
> > > The important part is that device_can_wakeup() is called after dwc3_probe()
> > > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > > generally working with these patches.
> > > 
> > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > > With that the controller/driver ends up in an unhappy state after system
> > > suspend.
> > > 
> > > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > > that '88e3000.phy' isn't ready yet.
> > 
> > It seems device links could be used to make sure the dwc3 core is present:
> > 
> >   Another example for an inconsistent state would be a device link that
> >   represents a driver presence dependency, yet is added from the consumer’s
> >   ->probe callback while the supplier hasn’t probed yet: Had the driver core
> >   known about the device link earlier, it wouldn’t have probed the consumer
> >   in the first place. The onus is thus on the consumer to check presence of
> >   the supplier after adding the link, and defer probing on non-presence.
> > 
> >   https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> > 
> > 
> > You could add something like this to dwc3_qcom_of_register_core():
> > 
> > 
> >   device_link_add(dev, &qcom->dwc3->dev,
> >   		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> > 
> >   if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
> >       ret = -EPROBE_DEFER;
> > 
> > 
> I am not very sure how the device_link_add() API works. we are the parent and
> creating a depdency on child probe. That does not sound correct to me.

The functional dependency is effectively there, the driver already assumes that
the dwc3 core was probed when of_platform_populate() returns.

The device link itself doesn't create the dependency on the probe(), the check
of the link status below does.

Another option would be to add a link to the PHYs to the dwc3-qcom node in
the device tree, but I don't think that would be a better solution (and I
expect Rob would oppose this).

I'm open to other solutions, so far the device link is the cleanest that came
to my mind.

I think the root issue is the driver architecture, with two interdependent
drivers for the same IP block, instead of a single framework driver with a
common part (dwc3 core) and vendor specific hooks/data.

> Any ways, I have another question.
> 
> When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
> goto depopulate label which calls of_platform_depopulate() which destroy the
> child devices that are populated. how does that ensure that child probe is
> completed by the time, our probe is called again. The child device it self is
> gone. Is this working because when our probe is called next time, the child
> probe depenencies are resolved?

Good point! It doesn't really ensure that the child is probed (actually it
won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
the PHYs should be ready and dwc3_probe() be invoked through
of_platform_populate().
Pavan Kondeti June 20, 2022, 8:54 a.m. UTC | #6
+Felipe, Bjorn

On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote:
> On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> > Hi Matthias/Krishna,
> > 
> > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > > > Hi Krishna,
> > > > > > 
> > > > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > > > these:
> > > > > > 
> > > > > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > > > 
> > > > > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > > > 
> > > > > > After resume a downstream hub isn't enumerated again.
> > > > > > 
> > > > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > > > v16.
> > > > > 
> > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > > > Ethernet adapter to trigger the wakeup case, however older versions
> > > > > of this series that use usb_wakeup_enabled_descendants() to check
> > > > > for wakeup capable devices didn't actually check for vUSB > 2
> > > > > devices.
> > > > > 
> > > > > So the case were the controller/PHYs is powered down works, but
> > > > > the controller is unhappy when the runtime PM path is used during
> > > > > system suspend.
> > > > 
> > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > > > during probe(). The expected probe sequence is something like this:
> > > > 
> > > > dwc3_qcom_probe
> > > >   dwc3_qcom_of_register_core
> > > >     dwc3_probe
> > > > 
> > > >   if (device_can_wakeup(&qcom->dwc3->dev))
> > > >     ...
> > > > 
> > > > The important part is that device_can_wakeup() is called after dwc3_probe()
> > > > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > > > generally working with these patches.
> > > > 
> > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > > > With that the controller/driver ends up in an unhappy state after system
> > > > suspend.
> > > > 
> > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > > > that '88e3000.phy' isn't ready yet.
> > > 
> > > It seems device links could be used to make sure the dwc3 core is present:
> > > 
> > >   Another example for an inconsistent state would be a device link that
> > >   represents a driver presence dependency, yet is added from the consumer’s
> > >   ->probe callback while the supplier hasn’t probed yet: Had the driver core
> > >   known about the device link earlier, it wouldn’t have probed the consumer
> > >   in the first place. The onus is thus on the consumer to check presence of
> > >   the supplier after adding the link, and defer probing on non-presence.
> > > 
> > >   https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> > > 
> > > 
> > > You could add something like this to dwc3_qcom_of_register_core():
> > > 
> > > 
> > >   device_link_add(dev, &qcom->dwc3->dev,
> > >   		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> > > 
> > >   if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
> > >       ret = -EPROBE_DEFER;
> > > 
> > > 
> > I am not very sure how the device_link_add() API works. we are the parent and
> > creating a depdency on child probe. That does not sound correct to me.
> 
> The functional dependency is effectively there, the driver already assumes that
> the dwc3 core was probed when of_platform_populate() returns.
> 
> The device link itself doesn't create the dependency on the probe(), the check
> of the link status below does.
> 
> Another option would be to add a link to the PHYs to the dwc3-qcom node in
> the device tree, but I don't think that would be a better solution (and I
> expect Rob would oppose this).
> 
> I'm open to other solutions, so far the device link is the cleanest that came
> to my mind.
> 
> I think the root issue is the driver architecture, with two interdependent
> drivers for the same IP block, instead of a single framework driver with a
> common part (dwc3 core) and vendor specific hooks/data.
> 
> > Any ways, I have another question.
> > 
> > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
> > goto depopulate label which calls of_platform_depopulate() which destroy the
> > child devices that are populated. how does that ensure that child probe is
> > completed by the time, our probe is called again. The child device it self is
> > gone. Is this working because when our probe is called next time, the child
> > probe depenencies are resolved?
> 
> Good point! It doesn't really ensure that the child is probed (actually it
> won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
> could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
> the PHYs should be ready and dwc3_probe() be invoked through
> of_platform_populate().

This is a generic problem i.e if a parent can only proceed after the child
devices are bounded (i.e probed successfully), how to ensure this behavior
from the parent's probe? Since we can't block the parent probe (async probe is
not the default behavior), we have to identify the condition that the children
are deferring probe, so that parent also can do that.

Can we add a API in drivers core to tell if a device probe is deferred or
not? This can be done by testing list_empty(&dev->p->deferred_probe) under
deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this
API return value.

Another alternative would be explicitly checking if the child device suppliers
are ready or not before adding child device. That would require decoupling
of_platform_populate() to creating devices and adding devices.

Note that this problem is not just limited to suppliers not ready. if the
dwc3-qcom is made asynchronous probe, then its child also probed
asynchronously and there is no guarantee that child would be probed by the
time of_platform_populate() is returned.  The bus notifier might come handy
in this case. The parent can register for this notifier and waiting for
the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND
notifications. This would also work in our case, if we move to
of_platform_populate() outside the probe().

Would like to hear other people thoughts on this.

Thanks,
Pavan
Matthias Kaehlcke June 23, 2022, 6:38 p.m. UTC | #7
On Mon, Jun 20, 2022 at 02:24:15PM +0530, Pavan Kondeti wrote:
> +Felipe, Bjorn
> 
> On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote:
> > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> > > Hi Matthias/Krishna,
> > > 
> > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> > > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > > > > Hi Krishna,
> > > > > > > 
> > > > > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > > > > these:
> > > > > > > 
> > > > > > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > > > > 
> > > > > > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > > > > 
> > > > > > > After resume a downstream hub isn't enumerated again.
> > > > > > > 
> > > > > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > > > > v16.
> > > > > > 
> > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > > > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > > > > Ethernet adapter to trigger the wakeup case, however older versions
> > > > > > of this series that use usb_wakeup_enabled_descendants() to check
> > > > > > for wakeup capable devices didn't actually check for vUSB > 2
> > > > > > devices.
> > > > > > 
> > > > > > So the case were the controller/PHYs is powered down works, but
> > > > > > the controller is unhappy when the runtime PM path is used during
> > > > > > system suspend.
> > > > > 
> > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > > > > during probe(). The expected probe sequence is something like this:
> > > > > 
> > > > > dwc3_qcom_probe
> > > > >   dwc3_qcom_of_register_core
> > > > >     dwc3_probe
> > > > > 
> > > > >   if (device_can_wakeup(&qcom->dwc3->dev))
> > > > >     ...
> > > > > 
> > > > > The important part is that device_can_wakeup() is called after dwc3_probe()
> > > > > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > > > > generally working with these patches.
> > > > > 
> > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > > > > With that the controller/driver ends up in an unhappy state after system
> > > > > suspend.
> > > > > 
> > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > > > > that '88e3000.phy' isn't ready yet.
> > > > 
> > > > It seems device links could be used to make sure the dwc3 core is present:
> > > > 
> > > >   Another example for an inconsistent state would be a device link that
> > > >   represents a driver presence dependency, yet is added from the consumer’s
> > > >   ->probe callback while the supplier hasn’t probed yet: Had the driver core
> > > >   known about the device link earlier, it wouldn’t have probed the consumer
> > > >   in the first place. The onus is thus on the consumer to check presence of
> > > >   the supplier after adding the link, and defer probing on non-presence.
> > > > 
> > > >   https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> > > > 
> > > > 
> > > > You could add something like this to dwc3_qcom_of_register_core():
> > > > 
> > > > 
> > > >   device_link_add(dev, &qcom->dwc3->dev,
> > > >   		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> > > > 
> > > >   if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
> > > >       ret = -EPROBE_DEFER;
> > > > 
> > > > 
> > > I am not very sure how the device_link_add() API works. we are the parent and
> > > creating a depdency on child probe. That does not sound correct to me.
> > 
> > The functional dependency is effectively there, the driver already assumes that
> > the dwc3 core was probed when of_platform_populate() returns.
> > 
> > The device link itself doesn't create the dependency on the probe(), the check
> > of the link status below does.
> > 
> > Another option would be to add a link to the PHYs to the dwc3-qcom node in
> > the device tree, but I don't think that would be a better solution (and I
> > expect Rob would oppose this).
> > 
> > I'm open to other solutions, so far the device link is the cleanest that came
> > to my mind.
> > 
> > I think the root issue is the driver architecture, with two interdependent
> > drivers for the same IP block, instead of a single framework driver with a
> > common part (dwc3 core) and vendor specific hooks/data.
> > 
> > > Any ways, I have another question.
> > > 
> > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
> > > goto depopulate label which calls of_platform_depopulate() which destroy the
> > > child devices that are populated. how does that ensure that child probe is
> > > completed by the time, our probe is called again. The child device it self is
> > > gone. Is this working because when our probe is called next time, the child
> > > probe depenencies are resolved?
> > 
> > Good point! It doesn't really ensure that the child is probed (actually it
> > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
> > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
> > the PHYs should be ready and dwc3_probe() be invoked through
> > of_platform_populate().
> 
> This is a generic problem i.e if a parent can only proceed after the child
> devices are bounded (i.e probed successfully), how to ensure this behavior
> from the parent's probe? Since we can't block the parent probe (async probe is
> not the default behavior), we have to identify the condition that the children
> are deferring probe, so that parent also can do that.
> 
> Can we add a API in drivers core to tell if a device probe is deferred or
> not? This can be done by testing list_empty(&dev->p->deferred_probe) under
> deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this
> API return value.

That could be an option.

> Another alternative would be explicitly checking if the child device suppliers
> are ready or not before adding child device. That would require decoupling
> of_platform_populate() to creating devices and adding devices.

It might require a new API since there are plenty of users of
of_platform_populate() that rely on the current behavior.

> Note that this problem is not just limited to suppliers not ready. if the
> dwc3-qcom is made asynchronous probe, then its child also probed
> asynchronously and there is no guarantee that child would be probed by the
> time of_platform_populate() is returned.  The bus notifier might come handy
> in this case. The parent can register for this notifier and waiting for
> the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND
> notifications. This would also work in our case, if we move to
> of_platform_populate() outside the probe().

If I understand correctly the outcome would be a probe() in two stages. The
first does as much as it can do without the dwc3 core and leaves the device
in a state where it isn't really functional, and the second stage does the
rest when BUS_NOTIFY_BOUND_DRIVER is received for the dwc3 core device.

A concern could be the need for additional conditions in some code paths to
deal with the half-initialized device.

Why would of_platform_populate() be moved outside of probe()?

To avoid the half-initialized device probe() could block until
BUS_NOTIFY_BOUND_DRIVER is received. Probably that should be done with a
timeout to avoid blocking forever in case of a problem with probing the
dwc3 core.
Pavan Kondeti June 24, 2022, 8:58 a.m. UTC | #8
On Thu, Jun 23, 2022 at 11:38:06AM -0700, Matthias Kaehlcke wrote:
> On Mon, Jun 20, 2022 at 02:24:15PM +0530, Pavan Kondeti wrote:
> > +Felipe, Bjorn
> > 
> > On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote:
> > > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> > > > Hi Matthias/Krishna,
> > > > 
> > > > On Tue, Jun 14, 2022 at 10:53:35AM -0700, Matthias Kaehlcke wrote:
> > > > > On Mon, Jun 13, 2022 at 11:08:32AM -0700, Matthias Kaehlcke wrote:
> > > > > > On Mon, Jun 06, 2022 at 01:45:51PM -0700, Matthias Kaehlcke wrote:
> > > > > > > On Thu, Jun 02, 2022 at 12:35:42PM -0700, Matthias Kaehlcke wrote:
> > > > > > > > Hi Krishna,
> > > > > > > > 
> > > > > > > > with this version I see xHCI errors on my SC7180 based system, like
> > > > > > > > these:
> > > > > > > > 
> > > > > > > > [   65.352605] xhci-hcd xhci-hcd.13.auto: xHC error in resume, USBSTS 0x401, Reinit
> > > > > > > > 
> > > > > > > > [  101.307155] xhci-hcd xhci-hcd.13.auto: WARN: xHC CMD_RUN timeout
> > > > > > > > 
> > > > > > > > After resume a downstream hub isn't enumerated again.
> > > > > > > > 
> > > > > > > > So far I didn't see those with v13, but I aso saw the first error with
> > > > > > > > v16.
> > > > > > > 
> > > > > > > It also happens with v13, but only when a wakeup capable vUSB <= 2
> > > > > > > device is plugged in. Initially I used a wakeup capable USB3 to
> > > > > > > Ethernet adapter to trigger the wakeup case, however older versions
> > > > > > > of this series that use usb_wakeup_enabled_descendants() to check
> > > > > > > for wakeup capable devices didn't actually check for vUSB > 2
> > > > > > > devices.
> > > > > > > 
> > > > > > > So the case were the controller/PHYs is powered down works, but
> > > > > > > the controller is unhappy when the runtime PM path is used during
> > > > > > > system suspend.
> > > > > > 
> > > > > > The issue isn't seen on all systems using dwc3-qcom and the problem starts
> > > > > > during probe(). The expected probe sequence is something like this:
> > > > > > 
> > > > > > dwc3_qcom_probe
> > > > > >   dwc3_qcom_of_register_core
> > > > > >     dwc3_probe
> > > > > > 
> > > > > >   if (device_can_wakeup(&qcom->dwc3->dev))
> > > > > >     ...
> > > > > > 
> > > > > > The important part is that device_can_wakeup() is called after dwc3_probe()
> > > > > > has completed. That's what I see on a QC SC7280 system, where wakeup is
> > > > > > generally working with these patches.
> > > > > > 
> > > > > > However on a QC SC7180 system dwc3_probe() is deferred and only executed after
> > > > > > dwc3_qcom_probe(). As a result the device_can_wakeup() call returns false.
> > > > > > With that the controller/driver ends up in an unhappy state after system
> > > > > > suspend.
> > > > > > 
> > > > > > Probing is deferred on SC7180 because device_links_check_suppliers() finds
> > > > > > that '88e3000.phy' isn't ready yet.
> > > > > 
> > > > > It seems device links could be used to make sure the dwc3 core is present:
> > > > > 
> > > > >   Another example for an inconsistent state would be a device link that
> > > > >   represents a driver presence dependency, yet is added from the consumer’s
> > > > >   ->probe callback while the supplier hasn’t probed yet: Had the driver core
> > > > >   known about the device link earlier, it wouldn’t have probed the consumer
> > > > >   in the first place. The onus is thus on the consumer to check presence of
> > > > >   the supplier after adding the link, and defer probing on non-presence.
> > > > > 
> > > > >   https://www.kernel.org/doc/html/v5.18/driver-api/device_link.html#usage
> > > > > 
> > > > > 
> > > > > You could add something like this to dwc3_qcom_of_register_core():
> > > > > 
> > > > > 
> > > > >   device_link_add(dev, &qcom->dwc3->dev,
> > > > >   		  DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_AUTOPROBE_CONSUMER);
> > > > > 
> > > > >   if (qcom->dwc3->dev.links.status != DL_DEV_DRIVER_BOUND)
> > > > >       ret = -EPROBE_DEFER;
> > > > > 
> > > > > 
> > > > I am not very sure how the device_link_add() API works. we are the parent and
> > > > creating a depdency on child probe. That does not sound correct to me.
> > > 
> > > The functional dependency is effectively there, the driver already assumes that
> > > the dwc3 core was probed when of_platform_populate() returns.
> > > 
> > > The device link itself doesn't create the dependency on the probe(), the check
> > > of the link status below does.
> > > 
> > > Another option would be to add a link to the PHYs to the dwc3-qcom node in
> > > the device tree, but I don't think that would be a better solution (and I
> > > expect Rob would oppose this).
> > > 
> > > I'm open to other solutions, so far the device link is the cleanest that came
> > > to my mind.
> > > 
> > > I think the root issue is the driver architecture, with two interdependent
> > > drivers for the same IP block, instead of a single framework driver with a
> > > common part (dwc3 core) and vendor specific hooks/data.
> > > 
> > > > Any ways, I have another question.
> > > > 
> > > > When dwc3_qcom_of_register_core() returns error back to dwc3_qcom_probe(), we
> > > > goto depopulate label which calls of_platform_depopulate() which destroy the
> > > > child devices that are populated. how does that ensure that child probe is
> > > > completed by the time, our probe is called again. The child device it self is
> > > > gone. Is this working because when our probe is called next time, the child
> > > > probe depenencies are resolved?
> > > 
> > > Good point! It doesn't really ensure that the child is probed (actually it
> > > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
> > > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
> > > the PHYs should be ready and dwc3_probe() be invoked through
> > > of_platform_populate().
> > 
> > This is a generic problem i.e if a parent can only proceed after the child
> > devices are bounded (i.e probed successfully), how to ensure this behavior
> > from the parent's probe? Since we can't block the parent probe (async probe is
> > not the default behavior), we have to identify the condition that the children
> > are deferring probe, so that parent also can do that.
> > 
> > Can we add a API in drivers core to tell if a device probe is deferred or
> > not? This can be done by testing list_empty(&dev->p->deferred_probe) under
> > deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this
> > API return value.
> 
> That could be an option.
> 
> > Another alternative would be explicitly checking if the child device suppliers
> > are ready or not before adding child device. That would require decoupling
> > of_platform_populate() to creating devices and adding devices.
> 
> It might require a new API since there are plenty of users of
> of_platform_populate() that rely on the current behavior.

Agree. A new API is needed. we also have to consider multiple children
scenario, where one child is ready but others are not etc.

> 
> > Note that this problem is not just limited to suppliers not ready. if the
> > dwc3-qcom is made asynchronous probe, then its child also probed
> > asynchronously and there is no guarantee that child would be probed by the
> > time of_platform_populate() is returned.  The bus notifier might come handy
> > in this case. The parent can register for this notifier and waiting for
> > the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND
> > notifications. This would also work in our case, if we move to
> > of_platform_populate() outside the probe().
> 
> If I understand correctly the outcome would be a probe() in two stages. The
> first does as much as it can do without the dwc3 core and leaves the device
> in a state where it isn't really functional, and the second stage does the
> rest when BUS_NOTIFY_BOUND_DRIVER is received for the dwc3 core device.
> 
> A concern could be the need for additional conditions in some code paths to
> deal with the half-initialized device.
> 
> Why would of_platform_populate() be moved outside of probe()?
> 
> To avoid the half-initialized device probe() could block until
> BUS_NOTIFY_BOUND_DRIVER is received. Probably that should be done with a
> timeout to avoid blocking forever in case of a problem with probing the
> dwc3 core.

Right, we have to split the probe() into two parts. The second stage which
runs asynchronously should wait for the dwc3 core probe to happen. if at all
dwc3 core probe is falied (in which case also we get a notification) for any
reason other than EPROBE_DEFER, we have to undo the first stage.

Thansk,
Pavan
Stephen Boyd June 27, 2022, 8:02 p.m. UTC | #9
Quoting Pavan Kondeti (2022-06-20 01:54:15)
> +Felipe, Bjorn
>
> On Thu, Jun 16, 2022 at 10:15:49AM -0700, Matthias Kaehlcke wrote:
> > On Thu, Jun 16, 2022 at 02:41:10PM +0530, Pavan Kondeti wrote:
> >
> > Good point! It doesn't really ensure that the child is probed (actually it
> > won't be probed and DL_FLAG_AUTOPROBE_CONSUMER doesn't make sense here), it
> > could happen that dwc3_qcom_probe() is deferred multiple times, but eventually
> > the PHYs should be ready and dwc3_probe() be invoked through
> > of_platform_populate().
>
> This is a generic problem i.e if a parent can only proceed after the child
> devices are bounded (i.e probed successfully), how to ensure this behavior
> from the parent's probe? Since we can't block the parent probe (async probe is
> not the default behavior), we have to identify the condition that the children
> are deferring probe, so that parent also can do that.
>
> Can we add a API in drivers core to tell if a device probe is deferred or
> not? This can be done by testing list_empty(&dev->p->deferred_probe) under
> deferred_probe_mutex mutex. The parent can return EPROBE_DEFER based on this
> API return value.
>
> Another alternative would be explicitly checking if the child device suppliers
> are ready or not before adding child device. That would require decoupling
> of_platform_populate() to creating devices and adding devices.
>
> Note that this problem is not just limited to suppliers not ready. if the
> dwc3-qcom is made asynchronous probe, then its child also probed
> asynchronously and there is no guarantee that child would be probed by the
> time of_platform_populate() is returned.  The bus notifier might come handy
> in this case. The parent can register for this notifier and waiting for
> the children device's BUS_NOTIFY_BOUND_DRIVER/BUS_NOTIFY_DRIVER_NOT_BOUND
> notifications. This would also work in our case, if we move to
> of_platform_populate() outside the probe().
>
> Would like to hear other people thoughts on this.
>

I'm not following very closely but it sounds like a problem that may be
solved by using the component driver code (see
include/linux/component.h). That would let you move anything that needs
to be done once the child devices probe to the aggregate driver 'bind'
function (see struct component_master_ops::bind).
Krishna Kurapati June 30, 2022, 6:13 p.m. UTC | #10
On 6/30/2022 3:45 AM, Stephen Boyd wrote:
> Quoting Pavan Kondeti (2022-06-27 22:31:48)
>> On Mon, Jun 27, 2022 at 01:02:49PM -0700, Stephen Boyd wrote:
>>> Quoting Pavan Kondeti (2022-06-20 01:54:15)
>>>> Would like to hear other people thoughts on this.
>>>>
>>> I'm not following very closely but it sounds like a problem that may be
>>> solved by using the component driver code (see
>>> include/linux/component.h). That would let you move anything that needs
>>> to be done once the child devices probe to the aggregate driver 'bind'
>>> function (see struct component_master_ops::bind).
>> Thanks Stephen for letting us know about the component device framework.
>>
>> IIUC,
>>
>> - dwc3-qcom (parent of the dwc3 core) registers as a component master by
>> calling component_master_add_with_match() before calling
>> of_platform_populate(). The match callback could be as simple as comparing
>> the device against our child device.
>>
>> - The dwc3 core (child) at the end of its probe can add as a component by calling
>> component_add().
>>
>> - The above triggers the component_master_ops::bind callback implemented in
>>    dwc3-qcom driver which signals that we are good to go.
>>
>> - The dwc-qcom can call component_bind_all() to finish the formality i.e
>>    telling the dwc3 core that we are good to go.
>>
>> Is my understanding correct? This is what we are looking for i.e a way for
>> the child device(s) to signal the parent when the former is bounded.
> Sounds about right to me.
>
>> Also what happens when the child device probe fails for any reason. i.e
>> component_add() would never be called so the master driver i.e dwc3-qcom would
>> wait indefinitely. May be it needs to implement a timeout or runtime suspend
>> etc should take care of keeping the resoures in suspend state.
> When the child fails probe, it should return -EPROBE_DEFER if probe
> needs to be deferred. Then the driver will attempt probe at a later
> time. If probe fails without defer then it will never work and dwc3-qcom
> will wait indefinitely. Not much we can do in that situation.
Hi Stephen,

   Thanks for the idea. But doesn't adding dwc3 as a component to an agg
driver meanthat this change needs to be done on all glue drivers, as
component_bind_all( ) from master componentis supposed to let the dwc3
core know that we are good to go ?
> dwc3-qcom should wait for dwc3 core to call component_add() and then do
> whatever needs to be done once the dwc3 core is registered in the
> dwc3-qcom bind callback. Honestly this may all be a little overkill if
> there's only two drivers here, dwc3-qcom and dwc3 core. It could
> probably just be some callback from dwc3 core at the end of probe that
> calls some function in dwc3-qcom.
Since the issue we are facing is that the ssphy device links are not ready
causing the dwc3 probe not being invoked, can we add an API as Pavan 
suggested
to check if deferred_probe listfor dwc3 device is empty or not andbased on
that we can choose to defer our qcomprobe ? In this case, we don't need to
touch the dwc3 core driver and would be making changesonly in qcom glue 
driver.
Matthias Kaehlcke July 1, 2022, 1:10 a.m. UTC | #11
On Thu, Jun 30, 2022 at 11:43:01PM +0530, Krishna Kurapati PSSNV wrote:
> 
> On 6/30/2022 3:45 AM, Stephen Boyd wrote:
> > Quoting Pavan Kondeti (2022-06-27 22:31:48)
> > > On Mon, Jun 27, 2022 at 01:02:49PM -0700, Stephen Boyd wrote:
> > > > Quoting Pavan Kondeti (2022-06-20 01:54:15)
> > > > > Would like to hear other people thoughts on this.
> > > > > 
> > > > I'm not following very closely but it sounds like a problem that may be
> > > > solved by using the component driver code (see
> > > > include/linux/component.h). That would let you move anything that needs
> > > > to be done once the child devices probe to the aggregate driver 'bind'
> > > > function (see struct component_master_ops::bind).
> > > Thanks Stephen for letting us know about the component device framework.
> > > 
> > > IIUC,
> > > 
> > > - dwc3-qcom (parent of the dwc3 core) registers as a component master by
> > > calling component_master_add_with_match() before calling
> > > of_platform_populate(). The match callback could be as simple as comparing
> > > the device against our child device.
> > > 
> > > - The dwc3 core (child) at the end of its probe can add as a component by calling
> > > component_add().
> > > 
> > > - The above triggers the component_master_ops::bind callback implemented in
> > >    dwc3-qcom driver which signals that we are good to go.
> > > 
> > > - The dwc-qcom can call component_bind_all() to finish the formality i.e
> > >    telling the dwc3 core that we are good to go.
> > > 
> > > Is my understanding correct? This is what we are looking for i.e a way for
> > > the child device(s) to signal the parent when the former is bounded.
> > Sounds about right to me.
> > 
> > > Also what happens when the child device probe fails for any reason. i.e
> > > component_add() would never be called so the master driver i.e dwc3-qcom would
> > > wait indefinitely. May be it needs to implement a timeout or runtime suspend
> > > etc should take care of keeping the resoures in suspend state.
> > When the child fails probe, it should return -EPROBE_DEFER if probe
> > needs to be deferred. Then the driver will attempt probe at a later
> > time. If probe fails without defer then it will never work and dwc3-qcom
> > will wait indefinitely. Not much we can do in that situation.
> Hi Stephen,
> 
>   Thanks for the idea. But doesn't adding dwc3 as a component to an agg
> driver meanthat this change needs to be done on all glue drivers, as
> component_bind_all( ) from master componentis supposed to let the dwc3
> core know that we are good to go ?

Ideally all glue drivers would add component support, however I don't think
it is strictly necessary. Currently the dwc3 core already assumes that
everything is in place when it is probed. The core could have empty bind()
and unbind() callbacks, with that things in the core would remain
essentially as they are and the core doesn't depend on the glue driver to
call component_bind_all().

> > dwc3-qcom should wait for dwc3 core to call component_add() and then do
> > whatever needs to be done once the dwc3 core is registered in the
> > dwc3-qcom bind callback. Honestly this may all be a little overkill if
> > there's only two drivers here, dwc3-qcom and dwc3 core. It could
> > probably just be some callback from dwc3 core at the end of probe that
> > calls some function in dwc3-qcom.
> Since the issue we are facing is that the ssphy device links are not ready
> causing the dwc3 probe not being invoked, can we add an API as Pavan
> suggested
> to check if deferred_probe listfor dwc3 device is empty or not andbased on
> that we can choose to defer our qcomprobe ? In this case, we don't need to
> touch the dwc3 core driver and would be making changesonly in qcom glue
> driver.

As mentioned above, it shouldn't be necessary to add component support to
all the glue drivers. An API to check for deferred probing is an option,
however there is a possible race condition: When the dwc3-qcom driver checks
for a deferred probe the core could still be probing, in that situation the
glue would proceed before the core driver is ready. That could be avoided
with the component based approach.
Matthias Kaehlcke July 1, 2022, 3:52 p.m. UTC | #12
On Fri, Jul 01, 2022 at 03:45:26PM +0530, Pavan Kondeti wrote:
> On Thu, Jun 30, 2022 at 06:10:50PM -0700, Matthias Kaehlcke wrote:
> > > > dwc3-qcom should wait for dwc3 core to call component_add() and then do
> > > > whatever needs to be done once the dwc3 core is registered in the
> > > > dwc3-qcom bind callback. Honestly this may all be a little overkill if
> > > > there's only two drivers here, dwc3-qcom and dwc3 core. It could
> > > > probably just be some callback from dwc3 core at the end of probe that
> > > > calls some function in dwc3-qcom.
> > > Since the issue we are facing is that the ssphy device links are not ready
> > > causing the dwc3 probe not being invoked, can we add an API as Pavan
> > > suggested
> > > to check if deferred_probe listfor dwc3 device is empty or not andbased on
> > > that we can choose to defer our qcomprobe ? In this case, we don't need to
> > > touch the dwc3 core driver and would be making changesonly in qcom glue
> > > driver.
> > 
> > As mentioned above, it shouldn't be necessary to add component support to
> > all the glue drivers. An API to check for deferred probing is an option,
> > however there is a possible race condition: When the dwc3-qcom driver checks
> > for a deferred probe the core could still be probing, in that situation the
> > glue would proceed before the core driver is ready. That could be avoided
> > with the component based approach.
> 
> The race can happen only if asynchronous probe is enabled, otherwise the
> child's probe happens synchronously in of_platform_populate() 

I was thinking about the case where the dwc3-qcom probe is initially deferred,
then the deferred probe starts shortly after (asynchronously) and now the
dwc3-qcom driver does its check. Probably it's not very likely to happen ...

> OTOH, would the below condition suffice for our needs here? if our device
> is not bounded to a driver, we check the state of initcalls and return
> either error or -EPROBE_DEFER
> 
> diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c
> index 7b6eff5..519a503 100644
> --- a/drivers/usb/dwc3/dwc3-qcom.c
> +++ b/drivers/usb/dwc3/dwc3-qcom.c
> @@ -722,6 +722,9 @@ static int dwc3_qcom_of_register_core(struct platform_device *pdev)
>  		dev_err(dev, "failed to get dwc3 platform device\n");
>  	}
>  
> +	if (!qcom->dwc3->dev.driver)
> +		return driver_deferred_probe_check_state(&qcom->dwc3->dev);
> +
>  node_put:
>  	of_node_put(dwc3_np);

I like the simplicity of it, no need for new APIs.

The components based approach would be slightly safer, but in practice I
think this should be good enough.
Matthias Kaehlcke July 13, 2022, 1:34 a.m. UTC | #13
On Fri, Jul 08, 2022 at 04:37:19PM +0530, Krishna Kurapati PSSNV wrote:
>    On 7/6/2022 12:28 PM, Krishna Kurapati PSSNV wrote:
> 
>    On 7/1/2022 9:22 PM, Matthias Kaehlcke wrote:
> 
> On Fri, Jul 01, 2022 at 03:45:26PM +0530, Pavan Kondeti wrote:
> 
> On Thu, Jun 30, 2022 at 06:10:50PM -0700, Matthias Kaehlcke wrote:
> 
> dwc3-qcom should wait for dwc3 core to call component_add() and then do
> whatever needs to be done once the dwc3 core is registered in the
> dwc3-qcom bind callback. Honestly this may all be a little overkill if
> there's only two drivers here, dwc3-qcom and dwc3 core. It could
> probably just be some callback from dwc3 core at the end of probe that
> calls some function in dwc3-qcom.
> 
> Since the issue we are facing is that the ssphy device links are not ready
> causing the dwc3 probe not being invoked, can we add an API as Pavan
> suggested
> to check if deferred_probe listfor dwc3 device is empty or not andbased on
> that we can choose to defer our qcomprobe ? In this case, we don't need to
> touch the dwc3 core driver and would be making changesonly in qcom glue
> driver.
> 
> As mentioned above, it shouldn't be necessary to add component support to
> all the glue drivers. An API to check for deferred probing is an option,
> however there is a possible race condition: When the dwc3-qcom driver checks
> for a deferred probe the core could still be probing, in that situation the
> glue would proceed before the core driver is ready. That could be avoided
> with the component based approach.
> 
> The race can happen only if asynchronous probe is enabled, otherwise the
> child's probe happens synchronously in of_platform_populate()
> 
> I was thinking about the case where the dwc3-qcom probe is initially deferred,
> then the deferred probe starts shortly after (asynchronously) and now the
> dwc3-qcom driver does its check. Probably it's not very likely to happen ...
> 
> 
> OTOH, would the below condition suffice for our needs here? if our device
> is not bounded to a driver, we check the state of initcalls and return
> either error or -EPROBE_DEFER
> 
> diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c
> index 7b6eff5..519a503 100644
> --- a/drivers/usb/dwc3/dwc3-qcom.c
> +++ b/drivers/usb/dwc3/dwc3-qcom.c
> @@ -722,6 +722,9 @@ static int dwc3_qcom_of_register_core(struct platform_device
>  *pdev)
>                 dev_err(dev, "failed to get dwc3 platform device\n");
>         }
> 
> +       if (!qcom->dwc3->dev.driver)
> +               return driver_deferred_probe_check_state(&qcom->dwc3->dev);
> +
>  node_put:
>         of_node_put(dwc3_np);
> 
> I like the simplicity of it, no need for new APIs.
> 
> The components based approach would be slightly safer, but in practice I
> think this should be good enough.
> 
>    Hi Pavan, Mathias,
>      I have tested the suggested code and verified that it works on
>    sc7180. I see that the API has been removed recently in the following
>    patch :\
>    commit 9cbffc7a59561be950ecc675d19a3d2b45202b2b
>    Author: Saravana Kannan [1]<saravanak@google.com>
>    Date:   Wed Jun 1 00:07:05 2022 -0700
>    driver core: Delete driver_deferred_probe_check_state()
>    Can we make a patch and add it back to the kernel for this purpose ?
>    Hi Saravana,
>      Can you help suggest if we can revert your patch or make a new one to
>    add back the function.

The cover letter [1] of the 'deferred_probe_timeout logic clean up'
series [2] has more context:


  A lot of the deferred_probe_timeout logic is redundant with
  fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
  a few cases.

  This series tries to delete the redundant logic, simplify the frameworks
  that use driver_deferred_probe_check_state(), enable
  deferred_probe_timeout=10 by default, and fixes the nfsroot failure
  case.

  The overall idea of this series is to replace the global behavior of
  driver_deferred_probe_check_state() where all devices give up waiting on
  supplier at the same time with a more granular behavior:

  1. Devices with all their suppliers successfully probed by late_initcall
     probe as usual and avoid unnecessary deferred probe attempts.

  2. At or after late_initcall, in cases where boot would break because of
     fw_devlink=on being strict about the ordering, we

     a. Temporarily relax the enforcement to probe any unprobed devices
        that can probe successfully in the current state of the system.
        For example, when we boot with a NFS rootfs and no network device
        has probed.
     b. Go back to enforcing the ordering for any devices that haven't
        probed.

  3. After deferred probe timeout expires, we permanently give up waiting
     on supplier devices without drivers. At this point, whatever devices
     can probe without some of their optional suppliers end up probing.

  In the case where module support is disabled, it's fairly
  straightforward and all device probes are completed before the initcalls
  are done.

[1] https://lore.kernel.org/all/20220601070707.3946847-1-saravanak@google.com/
[2] https://patchwork.kernel.org/project/linux-pm/list/?series=646471&archive=both&state=*


Does anything speak against returning -EPROBE_DEFER directly from dwc3-qcom's
probe()? Now with deferred_probe_timeout > 0 there should be at least no endless
probing.
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index e027c04..b99d3c2 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1787,6 +1787,7 @@  static int dwc3_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, dwc);
 	dwc3_cache_hwparams(dwc);
+	device_init_wakeup(&pdev->dev, of_property_read_bool(dev->of_node, "wakeup-source"));
 
 	spin_lock_init(&dwc->lock);
 	mutex_init(&dwc->mutex);
@@ -1948,7 +1949,7 @@  static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
 		dwc3_core_exit(dwc);
 		break;
 	case DWC3_GCTL_PRTCAP_HOST:
-		if (!PMSG_IS_AUTO(msg)) {
+		if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) {
 			dwc3_core_exit(dwc);
 			break;
 		}
@@ -2009,7 +2010,7 @@  static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
 		spin_unlock_irqrestore(&dwc->lock, flags);
 		break;
 	case DWC3_GCTL_PRTCAP_HOST:
-		if (!PMSG_IS_AUTO(msg)) {
+		if (!PMSG_IS_AUTO(msg) && !device_can_wakeup(dwc->dev)) {
 			ret = dwc3_core_init_for_resume(dwc);
 			if (ret)
 				return ret;
@@ -2086,8 +2087,6 @@  static int dwc3_runtime_suspend(struct device *dev)
 	if (ret)
 		return ret;
 
-	device_init_wakeup(dev, true);
-
 	return 0;
 }
 
@@ -2096,8 +2095,6 @@  static int dwc3_runtime_resume(struct device *dev)
 	struct dwc3     *dwc = dev_get_drvdata(dev);
 	int		ret;
 
-	device_init_wakeup(dev, false);
-
 	ret = dwc3_resume_common(dwc, PMSG_AUTO_RESUME);
 	if (ret)
 		return ret;