[v1,0/3] introduce priority-based shutdown support

Message ID	20231124145338.3112416-1-o.rempel@pengutronix.de
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=none From: Oleksij Rempel <o.rempel@pengutronix.de> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Rafael J. Wysocki" <rafael@kernel.org>, Ulf Hansson <ulf.hansson@linaro.org>, Mark Brown <broonie@kernel.org> Cc: Oleksij Rempel <o.rempel@pengutronix.de>, kernel@pengutronix.de, linux-kernel@vger.kernel.org, linux-mmc@vger.kernel.org, linux-pm@vger.kernel.org, =?utf-8?q?S=C3=B8ren_Andersen?= <san@skov.dk> Subject: [PATCH v1 0/3] introduce priority-based shutdown support Date: Fri, 24 Nov 2023 15:53:35 +0100 Message-Id: <20231124145338.3112416-1-o.rempel@pengutronix.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	introduce priority-based shutdown support \| expand [v1,0/3] introduce priority-based shutdown support [v1,1/3] driver core: move core part of device_shutdown() to a separate function [v1,2/3] driver core: introduce prioritized device shutdown sequence [v1,3/3] mmc: core: increase shutdown priority for MMC devices

Oleksij Rempel Nov. 24, 2023, 2:53 p.m. UTC

Hi,

This patch series introduces support for prioritized device shutdown.
The main goal is to enable prioritization for shutting down specific
devices, particularly crucial in scenarios like power loss where
hardware damage can occur if not handled properly.

Oleksij Rempel (3):
  driver core: move core part of device_shutdown() to a separate
    function
  driver core: introduce prioritized device shutdown sequence
  mmc: core: increase shutdown priority for MMC devices

 drivers/base/core.c    | 157 +++++++++++++++++++++++++++--------------
 drivers/mmc/core/bus.c |   2 +
 include/linux/device.h |  51 ++++++++++++-
 kernel/reboot.c        |   4 +-
 4 files changed, 157 insertions(+), 57 deletions(-)

Christian Loehle Nov. 27, 2023, 10:13 a.m. UTC | #1

On 25/11/2023 08:50, Oleksij Rempel wrote:
> On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote:
>> On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote:
>>> On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote:
>>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
>>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
>>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
>>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
>>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
>>>>>>>
>>>>>>>>> This came out of some discussions about trying to handle emergency power
>>>>>>>>> failure notifications.
>>>>>>>
>>>>>>>> I'm sorry, but I don't know what that means.  Are you saying that the
>>>>>>>> kernel is now going to try to provide a hard guarantee that some devices
>>>>>>>> are going to be shut down in X number of seconds when asked?  If so, why
>>>>>>>> not do this in userspace?
>>>>>>>
>>>>>>> No, it was initially (or when I initially saw it anyway) handling of
>>>>>>> notifications from regulators that they're in trouble and we have some
>>>>>>> small amount of time to do anything we might want to do about it before
>>>>>>> we expire.
>>>>>>
>>>>>> So we are going to guarantee a "time" in which we are going to do
>>>>>> something?  Again, if that's required, why not do it in userspace using
>>>>>> a RT kernel?
>>>>>
>>>>> For the HW in question I have only 100ms time before power loss. By
>>>>> doing it over use space some we will have even less time to react.
>>>>
>>>> Why can't userspace react that fast?  Why will the kernel be somehow
>>>> faster?  Speed should be the same, just get the "power is cut" signal
>>>> and have userspace flush and unmount the disk before power is gone.  Why
>>>> can the kernel do this any differently?
>>>>
>>>>> In fact, this is not a new requirement. It exist on different flavors of
>>>>> automotive Linux for about 10 years. Linux in cars should be able to
>>>>> handle voltage drops for example on ignition and so on. The only new thing is
>>>>> the attempt to mainline it.
>>>>
>>>> But your patch is not guaranteeing anything, it's just doing a "I want
>>>> this done before the other devices are handled", that's it.  There is no
>>>> chance that 100ms is going to be a requirement, or that some other
>>>> device type is not going to come along and demand to be ahead of your
>>>> device in the list.
>>>>
>>>> So you are going to have a constant fight among device types over the
>>>> years, and people complaining that the kernel is now somehow going to
>>>> guarantee that a device is shutdown in a set amount of time, which
>>>> again, the kernel can not guarantee here.
>>>>
>>>> This might work as a one-off for a specific hardware platform, which is
>>>> odd, but not anything you really should be adding for anyone else to use
>>>> here as your reasoning for it does not reflect what the code does.
>>>
>>> I see. Good point.
>>>
>>> In my case umount is not needed, there is not enough time to write down
>>> the data. We should send a shutdown command to the eMMC ASAP.
>>
>> If you don't care about the data, why is a shutdown command to the
>> hardware needed?  What does that do that makes anything "safe" if your
>> data is lost.
> 
> It prevents HW damage. In a typical automotive under-voltage labor it is
> usually possible to reproduce X amount of bricked eMMCs or NANDs on Y
> amount of under-voltage cycles (I do not have exact numbers right now).
> Even if the numbers not so high in the labor tests (sometimes something
> like one bricked device in a month of tests), the field returns are
> significant enough to care about software solution for this problem.
> 
> Same problem was seen not only in automotive devices, but also in
> industrial or agricultural. With other words, it is important enough to bring
> some kind of solution mainline.
> 

IMO that is a serious problem with the used storage / eMMC in that case and it
is not suitable for industrial/automotive uses?
Any industrial/automotive-suitable storage device should detect under-voltage and
just treat it as a power-down/loss, and while that isn't nice for the storage device,
it really shouldn't be able to brick a device (within <1M cycles anyway).
What does the storage module vendor say about this?

BR,
Christian

Matti Vaittinen Nov. 27, 2023, 12:54 p.m. UTC | #2

pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
(gregkh@linuxfoundation.org) kirjoitti:
>
> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > >
> > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > failure notifications.
> > > >
> > > > > I'm sorry, but I don't know what that means.  Are you saying that the
> > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > are going to be shut down in X number of seconds when asked?  If so, why
> > > > > not do this in userspace?
> > > >
> > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > notifications from regulators that they're in trouble and we have some
> > > > small amount of time to do anything we might want to do about it before
> > > > we expire.
> > >
> > > So we are going to guarantee a "time" in which we are going to do
> > > something?  Again, if that's required, why not do it in userspace using
> > > a RT kernel?
> >
> > For the HW in question I have only 100ms time before power loss. By
> > doing it over use space some we will have even less time to react.
>
> Why can't userspace react that fast?  Why will the kernel be somehow
> faster?  Speed should be the same, just get the "power is cut" signal
> and have userspace flush and unmount the disk before power is gone.  Why
> can the kernel do this any differently?
>
> > In fact, this is not a new requirement. It exist on different flavors of
> > automotive Linux for about 10 years. Linux in cars should be able to
> > handle voltage drops for example on ignition and so on. The only new thing is
> > the attempt to mainline it.
>
> But your patch is not guaranteeing anything, it's just doing a "I want
> this done before the other devices are handled", that's it.  There is no
> chance that 100ms is going to be a requirement, or that some other
> device type is not going to come along and demand to be ahead of your
> device in the list.
>
> So you are going to have a constant fight among device types over the
> years, and people complaining that the kernel is now somehow going to
> guarantee that a device is shutdown in a set amount of time, which
> again, the kernel can not guarantee here.
>
> This might work as a one-off for a specific hardware platform, which is
> odd, but not anything you really should be adding for anyone else to use
> here as your reasoning for it does not reflect what the code does.

I was (am) interested in knowing how/where the regulator error
notifications are utilized - hence I asked this in ELCE last summer.
Replies indeed mostly pointed to automotive and handling the under
voltage events.

As to what has changed (I think this was asked in another mail on this
topic) - I understood from the discussions that the demand of running
systems with as low power as possible is even more
important/desirable. Hence, the under-voltage events are more usual
than they were when cars used to be working by burning flammable
liquids :)

Anyways, what I thought I'd comment on is that the severity of the
regulator error notifications can be given from device-tree. Rationale
behind this is that figuring out whether a certain detected problem is
fatal or not (in embedded systems) should be done by the board
designers, per board. Maybe the understanding which hardware should
react first is also a property of hardware and could come from the
device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
set unconditionally for EMMC, systems could set shutdown priority per
board and per device explicitly using device-tree?

Yours,
    -- Matti

Greg Kroah-Hartman Nov. 27, 2023, 1:08 p.m. UTC | #3

On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote:
> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
> (gregkh@linuxfoundation.org) kirjoitti:
> >
> > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
> > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
> > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
> > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
> > > > >
> > > > > > > This came out of some discussions about trying to handle emergency power
> > > > > > > failure notifications.
> > > > >
> > > > > > I'm sorry, but I don't know what that means.  Are you saying that the
> > > > > > kernel is now going to try to provide a hard guarantee that some devices
> > > > > > are going to be shut down in X number of seconds when asked?  If so, why
> > > > > > not do this in userspace?
> > > > >
> > > > > No, it was initially (or when I initially saw it anyway) handling of
> > > > > notifications from regulators that they're in trouble and we have some
> > > > > small amount of time to do anything we might want to do about it before
> > > > > we expire.
> > > >
> > > > So we are going to guarantee a "time" in which we are going to do
> > > > something?  Again, if that's required, why not do it in userspace using
> > > > a RT kernel?
> > >
> > > For the HW in question I have only 100ms time before power loss. By
> > > doing it over use space some we will have even less time to react.
> >
> > Why can't userspace react that fast?  Why will the kernel be somehow
> > faster?  Speed should be the same, just get the "power is cut" signal
> > and have userspace flush and unmount the disk before power is gone.  Why
> > can the kernel do this any differently?
> >
> > > In fact, this is not a new requirement. It exist on different flavors of
> > > automotive Linux for about 10 years. Linux in cars should be able to
> > > handle voltage drops for example on ignition and so on. The only new thing is
> > > the attempt to mainline it.
> >
> > But your patch is not guaranteeing anything, it's just doing a "I want
> > this done before the other devices are handled", that's it.  There is no
> > chance that 100ms is going to be a requirement, or that some other
> > device type is not going to come along and demand to be ahead of your
> > device in the list.
> >
> > So you are going to have a constant fight among device types over the
> > years, and people complaining that the kernel is now somehow going to
> > guarantee that a device is shutdown in a set amount of time, which
> > again, the kernel can not guarantee here.
> >
> > This might work as a one-off for a specific hardware platform, which is
> > odd, but not anything you really should be adding for anyone else to use
> > here as your reasoning for it does not reflect what the code does.
> 
> I was (am) interested in knowing how/where the regulator error
> notifications are utilized - hence I asked this in ELCE last summer.
> Replies indeed mostly pointed to automotive and handling the under
> voltage events.
> 
> As to what has changed (I think this was asked in another mail on this
> topic) - I understood from the discussions that the demand of running
> systems with as low power as possible is even more
> important/desirable. Hence, the under-voltage events are more usual
> than they were when cars used to be working by burning flammable
> liquids :)
> 
> Anyways, what I thought I'd comment on is that the severity of the
> regulator error notifications can be given from device-tree. Rationale
> behind this is that figuring out whether a certain detected problem is
> fatal or not (in embedded systems) should be done by the board
> designers, per board. Maybe the understanding which hardware should
> react first is also a property of hardware and could come from the
> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
> set unconditionally for EMMC, systems could set shutdown priority per
> board and per device explicitly using device-tree?

Yes, using device tree would be good, but now you have created something
that is device-tree-specific and not all the world is device tree :(

Also, many devices are finally moving out to non-device-tree busses,
like PCI and USB, so how would you handle them in this type of scheme?

thanks,

greg k-h

Mark Brown Nov. 27, 2023, 2:24 p.m. UTC | #4

On Mon, Nov 27, 2023 at 01:08:24PM +0000, Greg Kroah-Hartman wrote:

> Yes, using device tree would be good, but now you have created something
> that is device-tree-specific and not all the world is device tree :(

AFAICT the idiomatic thing for ACPI would be platform quirks based on
DMI information.  Yay ACPI.  If the system is more Linux targetted then
you can use _DSD properties to store DT properties, these can then be
parsed out in a firmware interface neutral way via the fwnode API.  I'm
not sure there's any avoiding dealing with firmware interface specifics
at some point if we need platform description.

> Also, many devices are finally moving out to non-device-tree busses,
> like PCI and USB, so how would you handle them in this type of scheme?

DT does have bindings for devices on discoverable buses like PCI - I
think the original thing was for vendors cheaping out on EEPROMs though
it's also useful when things are soldered down in embedded systems.

Matti Vaittinen Nov. 27, 2023, 2:49 p.m. UTC | #5

On 11/27/23 15:08, Greg Kroah-Hartman wrote:
> On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote:
>> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman
>> (gregkh@linuxfoundation.org) kirjoitti:
>>>
>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote:
>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote:
>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote:
>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote:
>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote:
>>>>>>
>>>>>>>> This came out of some discussions about trying to handle emergency power
>>>>>>>> failure notifications.
>>>>>>
>>>>>>> I'm sorry, but I don't know what that means.  Are you saying that the
>>>>>>> kernel is now going to try to provide a hard guarantee that some devices
>>>>>>> are going to be shut down in X number of seconds when asked?  If so, why
>>>>>>> not do this in userspace?
>>>>>>
>>>>>> No, it was initially (or when I initially saw it anyway) handling of
>>>>>> notifications from regulators that they're in trouble and we have some
>>>>>> small amount of time to do anything we might want to do about it before
>>>>>> we expire.
>>>>>
>>>>> So we are going to guarantee a "time" in which we are going to do
>>>>> something?  Again, if that's required, why not do it in userspace using
>>>>> a RT kernel?
>>>>
>>>> For the HW in question I have only 100ms time before power loss. By
>>>> doing it over use space some we will have even less time to react.
>>>
>>> Why can't userspace react that fast?  Why will the kernel be somehow
>>> faster?  Speed should be the same, just get the "power is cut" signal
>>> and have userspace flush and unmount the disk before power is gone.  Why
>>> can the kernel do this any differently?
>>>
>>>> In fact, this is not a new requirement. It exist on different flavors of
>>>> automotive Linux for about 10 years. Linux in cars should be able to
>>>> handle voltage drops for example on ignition and so on. The only new thing is
>>>> the attempt to mainline it.
>>>
>>> But your patch is not guaranteeing anything, it's just doing a "I want
>>> this done before the other devices are handled", that's it.  There is no
>>> chance that 100ms is going to be a requirement, or that some other
>>> device type is not going to come along and demand to be ahead of your
>>> device in the list.
>>>
>>> So you are going to have a constant fight among device types over the
>>> years, and people complaining that the kernel is now somehow going to
>>> guarantee that a device is shutdown in a set amount of time, which
>>> again, the kernel can not guarantee here.
>>>
>>> This might work as a one-off for a specific hardware platform, which is
>>> odd, but not anything you really should be adding for anyone else to use
>>> here as your reasoning for it does not reflect what the code does.
>>
>> I was (am) interested in knowing how/where the regulator error
>> notifications are utilized - hence I asked this in ELCE last summer.
>> Replies indeed mostly pointed to automotive and handling the under
>> voltage events.
>>
>> As to what has changed (I think this was asked in another mail on this
>> topic) - I understood from the discussions that the demand of running
>> systems with as low power as possible is even more
>> important/desirable. Hence, the under-voltage events are more usual
>> than they were when cars used to be working by burning flammable
>> liquids :)
>>
>> Anyways, what I thought I'd comment on is that the severity of the
>> regulator error notifications can be given from device-tree. Rationale
>> behind this is that figuring out whether a certain detected problem is
>> fatal or not (in embedded systems) should be done by the board
>> designers, per board. Maybe the understanding which hardware should
>> react first is also a property of hardware and could come from the
>> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE"
>> set unconditionally for EMMC, systems could set shutdown priority per
>> board and per device explicitly using device-tree?
> 
> Yes, using device tree would be good, but now you have created something
> that is device-tree-specific and not all the world is device tree :(

True. However, my understanding is that the regulator subsystem is 
largely written to work with DT-based systems. Hence supporting the 
DT-based solution would probably fit to this specific use-case as source 
of problem notifications is the regulator subsystem.

> Also, many devices are finally moving out to non-device-tree busses,
> like PCI and USB, so how would you handle them in this type of scheme?

I do readily admit I don't have [all ;) ] the answers. I also think that 
if we add support for prioritized shutdown on device-tree-based systems, 
people may eventually want to use this on non device-tree setups too. 
There may also be other use-cases for prioritized shutdown (Don't know 
what they would be though).

For now I would leave that to be the problem of the folks who need non 
device-tree systems when (if) this needs realizes. Assuming there was 
the handling of priorities in place, the missing piece would then be to 
find out the place to store this hardware specific priority information. 
If this is solved for the non DT cases, then the DT-based and non 
DT-based solutions can co-exist.

Just a suggestion though. I am not working on under-voltage "stuff" 
right now.

Yours,
	-- Matti

Ulf Hansson Nov. 30, 2023, 9:57 a.m. UTC | #6

On Fri, 24 Nov 2023 at 15:53, Oleksij Rempel <o.rempel@pengutronix.de> wrote:
>
> Hi,
>
> This patch series introduces support for prioritized device shutdown.
> The main goal is to enable prioritization for shutting down specific
> devices, particularly crucial in scenarios like power loss where
> hardware damage can occur if not handled properly.
>
> Oleksij Rempel (3):
>   driver core: move core part of device_shutdown() to a separate
>     function
>   driver core: introduce prioritized device shutdown sequence
>   mmc: core: increase shutdown priority for MMC devices
>
>  drivers/base/core.c    | 157 +++++++++++++++++++++++++++--------------
>  drivers/mmc/core/bus.c |   2 +
>  include/linux/device.h |  51 ++++++++++++-
>  kernel/reboot.c        |   4 +-
>  4 files changed, 157 insertions(+), 57 deletions(-)
>

Sorry for joining the discussions a bit late! Besides the valuable
feedback that you already received from others (which indicates that
we have quite some work to do in the commit messages to better explain
and justify these changes), I wanted to share my overall thoughts
around this.

So, I fully understand the reason behind the $subject series, as we
unfortunately can't rely on flash-based (NAND/NOR) storage devices
being 100% tolerant to sudden-power failures. Besides for the reasons
already discussed in the thread, the robustness simply depends on the
"quality" of the FTL (flash translation layer) and the NAND/NOR/etc
device it runs.

For example, back in the days when Android showed up, we were testing
YAFFS and UBIFS on rawNAND, which failed miserably after just a few
thousands of power-cycles. It was even worse with ext3/4 on the early
variants of eMMC devices, as those survived only a few hundreds of
power-cycles. Now, I assume this has improved a lot over the years,
but I haven't really verified this myself.

That said, for eMMC and other flash-based storage devices, industrial
or not, I think it would make sense to try to notify the device about
the power-failure, if possible. This would add another level of
mitigation, I think.

Francesco Dolcini Nov. 30, 2023, 9:59 p.m. UTC | #7

Hello all,

On Mon, Nov 27, 2023 at 12:36:11PM +0100, Oleksij Rempel wrote:
> On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote:
> > > Same problem was seen not only in automotive devices, but also in
> > > industrial or agricultural. With other words, it is important enough to bring
> > > some kind of solution mainline.
> > > 
> > 
> > IMO that is a serious problem with the used storage / eMMC in that case and it
> > is not suitable for industrial/automotive uses?
> > Any industrial/automotive-suitable storage device should detect under-voltage and
> > just treat it as a power-down/loss, and while that isn't nice for the storage device,
> > it really shouldn't be able to brick a device (within <1M cycles anyway).
> > What does the storage module vendor say about this?
> 
> Good question. I do not have insights ATM. I'll forward it.

[v1,0/3] introduce priority-based shutdown support

Message

Comments