diff mbox series

PM runtime_error handling missing in many drivers?

Message ID 20220620144231.GA23345@axis.com
State New
Headers show
Series PM runtime_error handling missing in many drivers? | expand

Commit Message

Vincent Whitchurch June 20, 2022, 2:42 p.m. UTC
Many drivers do something like the following to resume their hardware
before performing some hardware access when user space ask for it:

	ret = pm_runtime_resume_and_get(dev);
	if (ret)
		return ret;

But if the ->runtime_resume() callback fails, then the
power.runtime_error is set and any further attempts to use
pm_runtime_resume_and_get() will fail, as documented in
Documentation/power/runtime_pm.rst.

This means that if a driver sees an error even once from, say, an I2C
transaction in its ->runtime_resume() callback, then the driver will
permanently stop to work.  My guess is that this is *not* the behaviour
intended by driver writers.  I would expect that the driver re-attempts
to access the hardware the next time user space tries to use the device,
so that the driver is resilient against temporary failures.

I noticed this with drivers/iio/light/vcnl4000.c.

During a read of iio:device0/in_illuminance_raw, an error is injected
into the I2C transaction (the dump_stack() indicates the location) and
the I2C transaction fails:

[110190.730000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
[110190.730000][   T27] i2c_write: i2c-0 #0 a=009 f=0000 l=3 [00-00-00]
[110778.040000][   T27] i2c_result: i2c-0 n=1 ret=0
[110778.040000][   T27] CPU: 0 PID: 27 Comm: python3 Not tainted 5.19.0-rc3+ #71
[110778.040000][   T27] Stack:
[110778.040000][   T27]  60a27dc6 60a27dc6 688af710 61085fc0
[110778.040000][   T27]  61085f90 60a27dc6 60069bf0 00000001
[110778.040000][   T27]  688af750 60905626 609055b3 61085fe0
[110778.040000][   T27] Call Trace:
[110778.040000][   T27]  [<608d05dc>] show_stack.cold+0x166/0x2a7
[110778.040000][   T27]  [<60905626>] dump_stack_lvl+0x73/0x92
[110778.040000][   T27]  [<6090566e>] dump_stack+0x29/0x31
[110778.040000][   T27]  [<6092463e>] __i2c_transfer.cold+0x28/0x49
[110778.040000][   T27]  [<607445dd>] i2c_smbus_xfer_emulated+0x28d/0xc60
[110778.040000][   T27]  [<607452c2>] __i2c_smbus_xfer+0x312/0x8e0
[110778.040000][   T27]  [<60745a40>] i2c_smbus_xfer+0x1b0/0x2e0
[110778.040000][   T27]  [<60745e76>] i2c_smbus_write_word_data+0x46/0x60
[110778.040000][   T27]  [<68968755>] vcnl4200_set_power_state+0x45/0x160 [vcnl4000]
[110778.040000][   T27]  [<689680ee>] vcnl4000_runtime_resume+0x2e/0x40 [vcnl4000]
[110778.040000][   T27]  [<606d262f>] __rpm_callback+0x5f/0x3e0
[110778.040000][   T27]  [<606d2b44>] rpm_callback+0x194/0x1e0
[110778.040000][   T27]  [<606d48fb>] rpm_resume+0xd5b/0x11d0
[110778.040000][   T27]  [<606d5a77>] __pm_runtime_resume+0xb7/0x120
[110778.040000][   T27]  [<68969903>] vcnl4000_set_pm_runtime_state.isra.0+0x43/0x1f0 [vcnl4000]
[110778.040000][   T27]  [<68969b36>] vcnl4000_read_raw+0x86/0x250 [vcnl4000]
[110778.040000][   T27]  [<607794fd>] iio_read_channel_info+0x10d/0x130
[110778.040000][   T27]  [<60698553>] dev_attr_show+0x23/0x80
[110778.040000][   T27]  [<60441174>] sysfs_kf_seq_show+0x144/0x2d0
[110778.040000][   T27]  [<6043ce9e>] kernfs_seq_show+0x2e/0x40
[110778.040000][   T27]  [<6033fe40>] seq_read_iter+0x310/0xb20
[110778.040000][   T27]  [<6043e0ea>] kernfs_fop_read_iter+0x2da/0x520
[110778.040000][   T27]  [<602cd46e>] new_sync_read+0x1ae/0x2f0
[110778.040000][   T27]  [<602d1e84>] vfs_read+0x344/0x4a0
[110778.040000][   T27]  [<602d2885>] ksys_read+0xb5/0x270
[110778.040000][   T27]  [<602d2a63>] sys_read+0x23/0x30
[110778.040000][   T27]  [<60051aea>] handle_syscall+0x1ba/0x250
[110778.040000][   T27]  [<6006be9b>] userspace+0x3bb/0x600
[110778.040000][   T27]  [<60047c8b>] fork_handler+0xcb/0xe0
[110778.040000][   T27] rpm_idle: i2c-0 flags-5 cnt-0  dep-0  auto-1 p-0 irq-0 child-0
[110778.040000][   T27] rpm_return_int: rpm_idle+0x250/0x970:i2c-0 ret=-11
[110778.040000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-5

The above is OK, the read of the file naturally fails since
pm_runtime_resume_and_get() fails.  But all further reads of the file
from user space fail even before getting to any register access, due to
the behaviour described above.

[110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
[110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22

The following patch fixes the issue on vcnl4000, but is this the right
fix?  And, unless I'm missing something, there are dozens of drivers
with the same problem.

Comments

Oliver Neukum June 21, 2022, 9:38 a.m. UTC | #1
On 20.06.22 16:42, Vincent Whitchurch wrote:

Hi,

> Many drivers do something like the following to resume their hardware
> before performing some hardware access when user space ask for it:
> 
> 	ret = pm_runtime_resume_and_get(dev);
> 	if (ret)
> 		return ret;
> 
> But if the ->runtime_resume() callback fails, then the
> power.runtime_error is set and any further attempts to use
> pm_runtime_resume_and_get() will fail, as documented in
> Documentation/power/runtime_pm.rst.

Whether this is properly documented is debatable.
4. Runtime PM Device Helper Functions (as a chapter)
does _not_ document it.


> [110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
> [110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
> 
> The following patch fixes the issue on vcnl4000, but is this the right in the


> fix?  And, unless I'm missing something, there are dozens of drivers
> with the same problem.

Yes. The point of pm_runtime_resume_and_get() is to remove the need
for handling errors when the resume fails. So I fail to see why a
permanent record of a failure makes sense for this API.

> diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
> index e02e92bc2928..082b8969fe2f 100644
> --- a/drivers/iio/light/vcnl4000.c
> +++ b/drivers/iio/light/vcnl4000.c
> @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
>  
>  	if (on) {
>  		ret = pm_runtime_resume_and_get(dev);
> +		if (ret)
> +			pm_runtime_set_suspended(dev);
>  	} else {
>  		pm_runtime_mark_last_busy(dev);
>  		ret = pm_runtime_put_autosuspend(dev);

If you need to add this to every driver, you can just as well add it to
pm_runtime_resume_and_get() to avoid the duplication.

But I am afraid we need to ask a deeper question. Is there a point
in recording failures to resume? The error code is reported back.
If a driver wishes to act upon it, it can. The core really only
uses the result to block new PM operations.
But nobody requests a resume unless it is necessary. Thus I fail
to see the point of checking this flag in resume as opposed to
suspend. If we fail, we fail, why not retry? It seems to me that the
record should be used only during runtime suspend.

And as an immediate band aid, some errors like ENOMEM should
never be recorded.

	Regards
		Oliver
Vincent Whitchurch July 8, 2022, 11:03 a.m. UTC | #2
On Tue, Jun 21, 2022 at 11:38:33AM +0200, Oliver Neukum wrote:
> On 20.06.22 16:42, Vincent Whitchurch wrote:
> > [110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
> > [110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
> > 
> > The following patch fixes the issue on vcnl4000, but is this the right in the
> > fix?  And, unless I'm missing something, there are dozens of drivers
> > with the same problem.
> 
> Yes. The point of pm_runtime_resume_and_get() is to remove the need
> for handling errors when the resume fails. So I fail to see why a
> permanent record of a failure makes sense for this API.

I don't understand it either.

> > diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
> > index e02e92bc2928..082b8969fe2f 100644
> > --- a/drivers/iio/light/vcnl4000.c
> > +++ b/drivers/iio/light/vcnl4000.c
> > @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
> >  
> >  	if (on) {
> >  		ret = pm_runtime_resume_and_get(dev);
> > +		if (ret)
> > +			pm_runtime_set_suspended(dev);
> >  	} else {
> >  		pm_runtime_mark_last_busy(dev);
> >  		ret = pm_runtime_put_autosuspend(dev);
> 
> If you need to add this to every driver, you can just as well add it to
> pm_runtime_resume_and_get() to avoid the duplication.

Yes, the documentation says that the error should be cleared, but it's
unclear why the driver is expected to do it.  From the documentation it
looks the driver is supposed to choose between pm_runtime_set_active()
and pm_runtime_set_suspended() to clear the error, but how/why is this
choice supposed to be made in the driver when the driver doesn't know
more than the framework about the status of the device?

Perhaps Rafael can shed some light on this.

> But I am afraid we need to ask a deeper question. Is there a point
> in recording failures to resume? The error code is reported back.
> If a driver wishes to act upon it, it can. The core really only
> uses the result to block new PM operations.
> But nobody requests a resume unless it is necessary. Thus I fail
> to see the point of checking this flag in resume as opposed to
> suspend. If we fail, we fail, why not retry? It seems to me that the
> record should be used only during runtime suspend.

I guess this is also a question for Rafael.

Even if the error recording is removed from runtime_resume and only done
on suspend failures, all these drivers still have the problem of not
clearing the error, since the next resume will fail if that is not done.

> And as an immediate band aid, some errors like ENOMEM should
> never be recorded.
Wysocki, Rafael J July 8, 2022, 8:10 p.m. UTC | #3
On 7/8/2022 1:03 PM, Vincent Whitchurch wrote:
> On Tue, Jun 21, 2022 at 11:38:33AM +0200, Oliver Neukum wrote:
>> On 20.06.22 16:42, Vincent Whitchurch wrote:
>>> [110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>>> [110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
>>>
>>> The following patch fixes the issue on vcnl4000, but is this the right in the
>>> fix?  And, unless I'm missing something, there are dozens of drivers
>>> with the same problem.
>> Yes. The point of pm_runtime_resume_and_get() is to remove the need
>> for handling errors when the resume fails. So I fail to see why a
>> permanent record of a failure makes sense for this API.
> I don't understand it either.
>
>>> diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
>>> index e02e92bc2928..082b8969fe2f 100644
>>> --- a/drivers/iio/light/vcnl4000.c
>>> +++ b/drivers/iio/light/vcnl4000.c
>>> @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
>>>   
>>>   	if (on) {
>>>   		ret = pm_runtime_resume_and_get(dev);
>>> +		if (ret)
>>> +			pm_runtime_set_suspended(dev);
>>>   	} else {
>>>   		pm_runtime_mark_last_busy(dev);
>>>   		ret = pm_runtime_put_autosuspend(dev);
>> If you need to add this to every driver, you can just as well add it to
>> pm_runtime_resume_and_get() to avoid the duplication.
> Yes, the documentation says that the error should be cleared, but it's
> unclear why the driver is expected to do it.  From the documentation it
> looks the driver is supposed to choose between pm_runtime_set_active()
> and pm_runtime_set_suspended() to clear the error, but how/why is this
> choice supposed to be made in the driver when the driver doesn't know
> more than the framework about the status of the device?
>
> Perhaps Rafael can shed some light on this.

The driver always knows more than the framework about the device's 
actual state.  The framework only knows that something failed, but it 
doesn't know what it was and what way it failed.


>> But I am afraid we need to ask a deeper question. Is there a point
>> in recording failures to resume? The error code is reported back.
>> If a driver wishes to act upon it, it can. The core really only
>> uses the result to block new PM operations.
>> But nobody requests a resume unless it is necessary. Thus I fail
>> to see the point of checking this flag in resume as opposed to
>> suspend. If we fail, we fail, why not retry? It seems to me that the
>> record should be used only during runtime suspend.
> I guess this is also a question for Rafael.
>
> Even if the error recording is removed from runtime_resume and only done
> on suspend failures, all these drivers still have the problem of not
> clearing the error, since the next resume will fail if that is not done.

The idea was that drivers would clear these errors.


>> And as an immediate band aid, some errors like ENOMEM should
>> never be recorded.
Oliver Neukum July 26, 2022, 9:05 a.m. UTC | #4
On 08.07.22 22:10, Rafael J. Wysocki wrote:
> On 7/8/2022 1:03 PM, Vincent Whitchurch wrote:

>> Perhaps Rafael can shed some light on this.
> 
> The driver always knows more than the framework about the device's
> actual state.  The framework only knows that something failed, but it
> doesn't know what it was and what way it failed.

Hi,

thinking long and deeply about this I do not think that this seemingly
obvious assertion is actually correct.

> The idea was that drivers would clear these errors.

I am afraid that is a deeply hidden layering violation. Yes, a driver's
resume() method may have failed. In that case, if that is the same
driver, it will obviously already know about the failure.

PM operations, however, are operating on a tree. A driver requesting
a resume may get an error code. But it has no idea where this error
comes from. The generic code knows at least that.

Let's look at at a USB storage device. The request to resume comes
from sd.c. sd.c is certainly not equipped to handle a PCI error
condition that has prevented a USB host controller from resuming.

I am afraid this part of the API has issues. And they keep growing
the more we divorce the device driver from the bus driver, which
actually does the PM operation.

	Regards
		Oliver
Rafael J. Wysocki July 26, 2022, 3:41 p.m. UTC | #5
On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 08.07.22 22:10, Rafael J. Wysocki wrote:
> > On 7/8/2022 1:03 PM, Vincent Whitchurch wrote:
>
> >> Perhaps Rafael can shed some light on this.
> >
> > The driver always knows more than the framework about the device's
> > actual state.  The framework only knows that something failed, but it
> > doesn't know what it was and what way it failed.
>
> Hi,
>
> thinking long and deeply about this I do not think that this seemingly
> obvious assertion is actually correct.

I guess that depends on what is regarded as "the framework".  I mean
the PM-runtime code, excluding the bus type or equivalent.

> > The idea was that drivers would clear these errors.
>
> I am afraid that is a deeply hidden layering violation. Yes, a driver's
> resume() method may have failed. In that case, if that is the same
> driver, it will obviously already know about the failure.

So presumably it will do something to recover and avoid returning the
error in the first place.
Oliver Neukum July 27, 2022, 8:08 a.m. UTC | #6
On 26.07.22 17:41, Rafael J. Wysocki wrote:
> On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@suse.com> wrote:

> I guess that depends on what is regarded as "the framework".  I mean
> the PM-runtime code, excluding the bus type or equivalent.

Yes, we have multiple candidates in the generic case. Easy to overengineer.

>>> The idea was that drivers would clear these errors.
>>
>> I am afraid that is a deeply hidden layering violation. Yes, a driver's
>> resume() method may have failed. In that case, if that is the same
>> driver, it will obviously already know about the failure.
> 
> So presumably it will do something to recover and avoid returning the
> error in the first place.

Yes, but that does not help us if they do return an error.

> From the PM-runtime core code perspective, if an error is returned by
> a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent
> suspend is also likely to fail.

True.

> If a resume callback returns an error, any subsequent suspend or
> resume operations are likely to fail.

Also true, but the consequences are different.

> Storing the error effectively prevents subsequent operations from
> being carried out in both cases and that's why it is done.

I am afraid seeing these two operations as equivalent for this
purpose is a problem for two reasons:

1. suspend can be initiated by the generic framework
2. a failure to suspend leads to worse power consumption,
   while a failure to resume is -EIO, at best

>> PM operations, however, are operating on a tree. A driver requesting
>> a resume may get an error code. But it has no idea where this error
>> comes from. The generic code knows at least that.
> 
> Well, what do you mean by "the generic code"?

In this case the device model, which has the tree and all dependencies.
Error handling here is potentially very complicated because

1. a driver can experience an error from a node higher in the tree
2. a driver can trigger a failure in a sibling
3. a driver for a node can be less specific than the drivers higher up

Reducing this to a single error condition is difficult.
Suppose you have a USB device with two interfaces. The driver for A
initiates a resume. Interface A is resumed; B reports an error.
Should this block further attempts to suspend the whole device?

>> Let's look at at a USB storage device. The request to resume comes
>> from sd.c. sd.c is certainly not equipped to handle a PCI error
>> condition that has prevented a USB host controller from resuming.
> 
> Sure, but this doesn't mean that suspending or resuming the device is
> a good idea until the error condition gets resolved.

Suspending clearly yes. Resuming is another matter. It has to work
if you want to operate without errors.

>> I am afraid this part of the API has issues. And they keep growing
>> the more we divorce the device driver from the bus driver, which
>> actually does the PM operation.
> 
> Well, in general suspending or resuming a device is a collaborative
> effort and if one of the pieces falls over, making it work again
> involves fixing up the failing piece and notifying the others that it
> is ready again.  However, that part isn't covered and I'm not sure if
> it can be covered in a sufficiently generic way.

True. But that still cannot solve the question what is to be done
if error handling fails. Hence my proposal:
- record all failures
- heed the record only when suspending

	Regards
		Oliver
Rafael J. Wysocki July 27, 2022, 4:31 p.m. UTC | #7
On Wed, Jul 27, 2022 at 10:08 AM Oliver Neukum <oneukum@suse.com> wrote:
>
>
>
> On 26.07.22 17:41, Rafael J. Wysocki wrote:
> > On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@suse.com> wrote:
>
> > I guess that depends on what is regarded as "the framework".  I mean
> > the PM-runtime code, excluding the bus type or equivalent.
>
> Yes, we have multiple candidates in the generic case. Easy to overengineer.
>
> >>> The idea was that drivers would clear these errors.
> >>
> >> I am afraid that is a deeply hidden layering violation. Yes, a driver's
> >> resume() method may have failed. In that case, if that is the same
> >> driver, it will obviously already know about the failure.
> >
> > So presumably it will do something to recover and avoid returning the
> > error in the first place.
>
> Yes, but that does not help us if they do return an error.
>
> > From the PM-runtime core code perspective, if an error is returned by
> > a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent
> > suspend is also likely to fail.
>
> True.
>
> > If a resume callback returns an error, any subsequent suspend or
> > resume operations are likely to fail.
>
> Also true, but the consequences are different.
>
> > Storing the error effectively prevents subsequent operations from
> > being carried out in both cases and that's why it is done.
>
> I am afraid seeing these two operations as equivalent for this
> purpose is a problem for two reasons:
>
> 1. suspend can be initiated by the generic framework

Resume can be initiated by generic code too.

> 2. a failure to suspend leads to worse power consumption,
>    while a failure to resume is -EIO, at best

Yes, a failure to resume is a big deal.

> >> PM operations, however, are operating on a tree. A driver requesting
> >> a resume may get an error code. But it has no idea where this error
> >> comes from. The generic code knows at least that.
> >
> > Well, what do you mean by "the generic code"?
>
> In this case the device model, which has the tree and all dependencies.
> Error handling here is potentially very complicated because
>
> 1. a driver can experience an error from a node higher in the tree

Well, there can be an error coming from a parent or a supplier, but
the driver will not receive it directly.

> 2. a driver can trigger a failure in a sibling
> 3. a driver for a node can be less specific than the drivers higher up

I'm not sure I understand the above correctly.

> Reducing this to a single error condition is difficult.

Fair enough.

> Suppose you have a USB device with two interfaces. The driver for A
> initiates a resume. Interface A is resumed; B reports an error.
> Should this block further attempts to suspend the whole device?

It should IMV.

> >> Let's look at at a USB storage device. The request to resume comes
> >> from sd.c. sd.c is certainly not equipped to handle a PCI error
> >> condition that has prevented a USB host controller from resuming.
> >
> > Sure, but this doesn't mean that suspending or resuming the device is
> > a good idea until the error condition gets resolved.
>
> Suspending clearly yes. Resuming is another matter. It has to work
> if you want to operate without errors.

Well, it has to physically work in the first place.  If it doesn't,
the rest is a bit moot, because you end up with a non-functional
device that appears to be permanently suspended.

> >> I am afraid this part of the API has issues. And they keep growing
> >> the more we divorce the device driver from the bus driver, which
> >> actually does the PM operation.
> >
> > Well, in general suspending or resuming a device is a collaborative
> > effort and if one of the pieces falls over, making it work again
> > involves fixing up the failing piece and notifying the others that it
> > is ready again.  However, that part isn't covered and I'm not sure if
> > it can be covered in a sufficiently generic way.
>
> True. But that still cannot solve the question what is to be done
> if error handling fails. Hence my proposal:
> - record all failures
> - heed the record only when suspending

I guess that would boil down to moving the power.runtime_error update
from rpm_callback() to rpm_suspend()?
diff mbox series

Patch

diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
index e02e92bc2928..082b8969fe2f 100644
--- a/drivers/iio/light/vcnl4000.c
+++ b/drivers/iio/light/vcnl4000.c
@@ -414,6 +414,8 @@  static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
 
 	if (on) {
 		ret = pm_runtime_resume_and_get(dev);
+		if (ret)
+			pm_runtime_set_suspended(dev);
 	} else {
 		pm_runtime_mark_last_busy(dev);
 		ret = pm_runtime_put_autosuspend(dev);