diff mbox series

scsi: sd: stop SSD (non-rotational) disks before reboot

Message ID 499138c8-b6d5-ef4a-2780-4f750ed337d3@0882a8b5-c6c3-11e9-b005-00805fc181fe
State New
Headers show
Series scsi: sd: stop SSD (non-rotational) disks before reboot | expand

Commit Message

Simon Arlott June 17, 2020, 6:49 p.m. UTC
I need to use "reboot=p" on my desktop because one of the PCIe devices
does not appear after a warm boot. This results in a very cold boot
because the BIOS turns the PSU off and on.

The scsi sd shutdown process does not send a stop command to disks
before the reboot happens (stop commands are only sent for a shutdown).

The result is that all of my SSDs experience a sudden power loss on
every reboot, which is undesirable behaviour. These events are recorded
in the SMART attributes.

Avoiding a stop of the disk on a reboot is appropriate for HDDs because
they're likely to continue to be powered (and should not be told to spin
down only to spin up again) but the default behaviour for SSDs should
be changed to stop them before the reboot.

Add a "stop_before_reboot" module parameter that can be used to control
the shutdown behaviour of disks before a reboot. The default will be
to stop non-rotational disks (SSDs) only, but it can be configured to
stop all disks if it is known that power will be lost completely on a
reboot.

  sd_mod.stop_before_reboot=<integer>
    0 = never
    1 = non-rotational disks only (default)
    2 = all disks

The behaviour on shutdown is unchanged: all disks are unconditionally
stopped.

The disk I/O will be mostly quiescent at reboot time (and there is a
sync first) but this should be added to stable kernels to protect all
SSDs from unexpected power loss during a reboot by default. There is
the potential for an unexpected power loss to corrupt data depending
on the SSD model/firmware.

Cc: stable@vger.kernel.org
Signed-off-by: Simon Arlott <simon@octiron.net>
---
 Documentation/scsi/scsi-parameters.rst |  7 +++++++
 drivers/scsi/sd.c                      | 22 +++++++++++++++++++---
 2 files changed, 26 insertions(+), 3 deletions(-)

Comments

David Laight July 3, 2020, 2:13 p.m. UTC | #1
From: Pavel Machek

> Sent: 02 July 2020 22:17

> > > during a FLASH write or erase can cause from weakened cells, to much

> > > larger damage.  It is possible to harden the chip or the design against

> > > this, but it is *expensive*.  And even if warded off by hardening and no

> > > FLASH damage happens, an erase/program cycle must be done on the whole

> > > erase block to clean up the incomplete program cycle.

> >

> > It should have been SSD's(including FW) responsibility to avoid data loss when

> > the SSD is doing its own BG writing, because power cut can happen any time

> > from SSD's viewpoint.

> 

> It should be their responsibility. But we know how well that works

> (not well), so we try hard (and should try hard) to power SSDs down

> cleanly.


I hope modern SSD disks are better than very old CF drives.

I had one where the entire contents got scrambled after an unexpected
power removal.
I suspect it was in the middle of a 'wear levelling' activity.
Even though it was only a FAT filesystem I was glad I didn't
actually need to recover any of the data.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Pavel Machek July 4, 2020, 11:49 a.m. UTC | #2
Hi!

> > Sent: 02 July 2020 22:17

> > > > during a FLASH write or erase can cause from weakened cells, to much

> > > > larger damage.  It is possible to harden the chip or the design against

> > > > this, but it is *expensive*.  And even if warded off by hardening and no

> > > > FLASH damage happens, an erase/program cycle must be done on the whole

> > > > erase block to clean up the incomplete program cycle.

> > >

> > > It should have been SSD's(including FW) responsibility to avoid data loss when

> > > the SSD is doing its own BG writing, because power cut can happen any time

> > > from SSD's viewpoint.

> > 

> > It should be their responsibility. But we know how well that works

> > (not well), so we try hard (and should try hard) to power SSDs down

> > cleanly.

> 

> I hope modern SSD disks are better than very old CF drives.


Testing showed there were not few yars ago.

> I had one where the entire contents got scrambled after an unexpected

> power removal.


If you have SSD you are willing to kill, I believe you can get to same
result with a bit of patience.

Best regards,
								Pavel-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Henrique de Moraes Holschuh July 5, 2020, 9:31 p.m. UTC | #3
On Thu, 18 Jun 2020, Christoph Hellwig wrote:
> > For SSDs, I don't think an extra stop should ever be an issue.

> 

> Extra shutdowns will usually cause additional P/E cycles.


I am not so sure.  We're talking about enforcing clean shutdowns here
(from the SSD PoV).

A system reboot takes enough time that the SSD is likely to do about the
same amount of P cycles commiting to FLASH any important data that it
would trigger by a shutdown sequence, simply because it should not keep
important data in RAM for too long.  By extension, it would not increase
E cycles either.

OTOH, unclean shutdowns *always* cause extra P/E, and that's if you're
lucky enough for it to not cause anything much worse.

-- 
  Henrique Holschuh
Henrique de Moraes Holschuh July 5, 2020, 10:19 p.m. UTC | #4
On Tue, 30 Jun 2020, Ming Lei wrote:
> On Wed, Jun 24, 2020 at 5:01 AM Henrique de Moraes Holschuh

> <hmh@hmh.eng.br> wrote:

> > Cache flushes do not matter that much when SSDs and sudden power cuts

> > are involved.  Power cuts at the wrong time harm the FLASH itself, it is

> > not about still-in-flight data.

> >

> > Keep in mind that SSDs do a _lot_ of background writing, and power cuts

> 

> What is the __lot__ of SSD's BG writing? GC?


GC, and scrubbing.

> > during a FLASH write or erase can cause from weakened cells, to much

> > larger damage.  It is possible to harden the chip or the design against

> > this, but it is *expensive*.  And even if warded off by hardening and no

> > FLASH damage happens, an erase/program cycle must be done on the whole

> > erase block to clean up the incomplete program cycle.

> 

> It should have been SSD's(including FW) responsibility to avoid data loss when

> the SSD is doing its own BG writing, because power cut can happen any time

> from SSD's viewpoint.


Oh, I fully agree.  And yet, we had devices from several large vendors
complaining about unclean shutdowns.  So, "it should have been", as
usual, amounts to very little in the end.

> > When you do not follow these rules, well, excellent datacenter-class

> > SSDs have super-capacitor power banks that actually work.  Most SSDs do

> > not, although they hopefully came a long way and hopefully modern SSDs

> > are not as easily to brick as they were reported to be three or four

> > years ago.

> 

> I remember that DC SSDs often don't support BG GC.


And have proper supercap local power banks, etc.  I'd say they're not
really relevant to this thread.

-- 
  Henrique Holschuh
Christoph Hellwig July 7, 2020, 10:18 a.m. UTC | #5
On Sun, Jul 05, 2020 at 06:31:25PM -0300, Henrique de Moraes Holschuh wrote:
> On Thu, 18 Jun 2020, Christoph Hellwig wrote:

> > > For SSDs, I don't think an extra stop should ever be an issue.

> > 

> > Extra shutdowns will usually cause additional P/E cycles.

> 

> I am not so sure.  We're talking about enforcing clean shutdowns here

> (from the SSD PoV).

> 

> A system reboot takes enough time that the SSD is likely to do about the

> same amount of P cycles commiting to FLASH any important data that it

> would trigger by a shutdown sequence, simply because it should not keep

> important data in RAM for too long.  By extension, it would not increase

> E cycles either.

> 

> OTOH, unclean shutdowns *always* cause extra P/E, and that's if you're

> lucky enough for it to not cause anything much worse.


The point is - with a normal system that doesn't required your odd
reboot method we'll normally not shut down the SSD at all, and that
won't require a P/E cycle.

But the whole thing is a moot point - if you quirk your system to
require a poweroff to reboot the kernel should trat it as a power off
as far as shutdown/remove callbacks are concerned and everything will
just work as intended.
diff mbox series

Patch

diff --git a/Documentation/scsi/scsi-parameters.rst b/Documentation/scsi/scsi-parameters.rst
index 9aba897c97ac..fd64d0d43861 100644
--- a/Documentation/scsi/scsi-parameters.rst
+++ b/Documentation/scsi/scsi-parameters.rst
@@ -101,6 +101,13 @@  parameters may be changed at runtime by the command
 			allowing boot to proceed.  none ignores them, expecting
 			user space to do the scan.
 
+	sd_mod.stop_before_reboot=
+			[SCSI] configure stop action for disks before a reboot
+			Format: <integer>
+			0 = never
+			1 = non-rotational disks only (default)
+			2 = all disks
+
 	sim710=		[SCSI,HW]
 			See header of drivers/scsi/sim710.c.
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d90fefffe31b..1cd652e037ab 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -98,6 +98,12 @@  MODULE_ALIAS_SCSI_DEVICE(TYPE_MOD);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 MODULE_ALIAS_SCSI_DEVICE(TYPE_ZBC);
 
+static unsigned int stop_before_reboot = 1;
+
+module_param(stop_before_reboot, uint, 0644);
+MODULE_PARM_DESC(stop_before_reboot,
+		 "stop disks before reboot (1=non-rotational, 2=all)");
+
 #if !defined(CONFIG_DEBUG_BLOCK_EXT_DEVT)
 #define SD_MINORS	16
 #else
@@ -3576,9 +3582,19 @@  static void sd_shutdown(struct device *dev)
 		sd_sync_cache(sdkp, NULL);
 	}
 
-	if (system_state != SYSTEM_RESTART && sdkp->device->manage_start_stop) {
-		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
-		sd_start_stop_device(sdkp, 0);
+	if (sdkp->device->manage_start_stop) {
+		bool stop_disk = (system_state != SYSTEM_RESTART);
+
+		if (stop_before_reboot > 1) { /* stop all disks */
+			stop_disk = true;
+		} else if (stop_before_reboot) { /* non-rotational only */
+			stop_disk |= blk_queue_nonrot(sdkp->disk->queue);
+		}
+
+		if (stop_disk) {
+			sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
+			sd_start_stop_device(sdkp, 0);
+		}
 	}
 }