mbox series

[0/3] SCSI core and UFS patches for kernel v6.4

Message ID 20230412204125.3222615-1-bvanassche@acm.org
Headers show
Series SCSI core and UFS patches for kernel v6.4 | expand

Message

Bart Van Assche April 12, 2023, 8:41 p.m. UTC
Hi Martin,

This patch series includes a patch for making sd_shutdown() fail future I/O
and also two UFS patches.

Patch 3/3 of this series has been posted earlier. Compared to the previous
version of that patch, a Fixes: tag has been added.

Please consider this patch series for the next merge window. 

Thanks,

Bart.

Bart Van Assche (3):
  scsi: sd: Let sd_shutdown() fail future I/O
  scsi: ufs: Simplify ufshcd_wl_shutdown()
  scsi: ufs: Increase the START STOP UNIT timeout from one to ten
    seconds

 drivers/scsi/sd.c         |  9 ++++++++-
 drivers/ufs/core/ufshcd.c | 15 +++------------
 2 files changed, 11 insertions(+), 13 deletions(-)

Comments

Ming Lei April 13, 2023, 1:09 a.m. UTC | #1
On Wed, Apr 12, 2023 at 01:41:23PM -0700, Bart Van Assche wrote:
> System shutdown happens as follows (see e.g. the systemd source file
> src/shutdown/shutdown.c):
> * sync() is called.
> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
> * If the reboot() system call returns, log an error message.
> 
> The reboot() system call causes the kernel to call kernel_restart(),
> kernel_halt() or kernel_power_off(). Each of these functions calls
> device_shutdown(). device_shutdown() calls sd_shutdown().
> 
> After sd_shutdown() has been called the .shutdown() callback of the LLD
> will be called. This makes it unsafe to submit I/O after sd_shutdown()
> has finished.

unsafe often means a bug, can you explain it in details about the 'unsafe'?

> Let sd_shutdown() fail future I/O such that LLD .shutdown()
> callbacks can be simplified.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Mike Christie <michael.christie@oracle.com>
> Cc: Tomas Henzl <thenzl@redhat.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/sd.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 4bb87043e6db..629f5889caf2 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3699,12 +3699,13 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>  static void sd_shutdown(struct device *dev)
>  {
>  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
> +	struct request_queue *q;
>  
>  	if (!sdkp)
>  		return;         /* this can happen */
>  
>  	if (pm_runtime_suspended(dev))
> -		return;
> +		goto fail_future_io;
>  
>  	if (sdkp->WCE && sdkp->media_present) {
>  		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
> @@ -3715,6 +3716,12 @@ static void sd_shutdown(struct device *dev)
>  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
>  		sd_start_stop_device(sdkp, 0);
>  	}
> +
> +fail_future_io:
> +	q = sdkp->disk->queue;
> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> +	blk_mq_freeze_queue(q);
> +	blk_mq_unfreeze_queue(q);

freeze queue can slow down reboot a lot especially when there are lots of
LUNs/Hosts because each device ->shutdown() is serialized.

I think it can be done by changing sdev state to SDEV_OFFLINE here.


Thanks,
Ming
Bart Van Assche April 13, 2023, 5:18 p.m. UTC | #2
On 4/12/23 18:09, Ming Lei wrote:
> On Wed, Apr 12, 2023 at 01:41:23PM -0700, Bart Van Assche wrote:
>> System shutdown happens as follows (see e.g. the systemd source file
>> src/shutdown/shutdown.c):
>> * sync() is called.
>> * reboot(RB_AUTOBOOT/RB_HALT_SYSTEM/RB_POWER_OFF) is called.
>> * If the reboot() system call returns, log an error message.
>>
>> The reboot() system call causes the kernel to call kernel_restart(),
>> kernel_halt() or kernel_power_off(). Each of these functions calls
>> device_shutdown(). device_shutdown() calls sd_shutdown().
>>
>> After sd_shutdown() has been called the .shutdown() callback of the LLD
>> will be called. This makes it unsafe to submit I/O after sd_shutdown()
>> has finished.
> 
> unsafe often means a bug, can you explain it in details about the 'unsafe'?

In this case that means that a kernel crash could be triggered.

>> +fail_future_io:
>> +	q = sdkp->disk->queue;
>> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
>> +	blk_mq_freeze_queue(q);
>> +	blk_mq_unfreeze_queue(q);
> 
> freeze queue can slow down reboot a lot especially when there are lots of
> LUNs/Hosts because each device ->shutdown() is serialized.
> 
> I think it can be done by changing sdev state to SDEV_OFFLINE here.

Hi Ming,

Changing the SCSI device state is not sufficient to wait for pending 
commands to finish. How about replacing this patch with the patch below?


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4bb87043e6db..4017b5412ba4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3699,12 +3699,13 @@ static int sd_start_stop_device(struct scsi_disk 
*sdkp, int start)
  static void sd_shutdown(struct device *dev)
  {
  	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	struct request_queue *q;

  	if (!sdkp)
  		return;         /* this can happen */

  	if (pm_runtime_suspended(dev))
-		return;
+		goto fail_future_io;

  	if (sdkp->WCE && sdkp->media_present) {
  		sd_printk(KERN_NOTICE, sdkp, "Synchronizing SCSI cache\n");
@@ -3715,6 +3716,14 @@ static void sd_shutdown(struct device *dev)
  		sd_printk(KERN_NOTICE, sdkp, "Stopping disk\n");
  		sd_start_stop_device(sdkp, 0);
  	}
+
+fail_future_io:
+	q = sdkp->disk->queue;
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+	if (!scsi_host_busy(sdkp->device->host))
+		return;
+	blk_mq_freeze_queue(q);
+	blk_mq_unfreeze_queue(q);
  }

  static int sd_suspend_common(struct device *dev, bool