diff mbox series

[v2,2/3] scsi: Make scsi_forget_host() wait for request queue removal

Message ID 20220630213733.17689-3-bvanassche@acm.org
State New
Headers show
Series Call blk_mq_free_tag_set() earlier | expand

Commit Message

Bart Van Assche June 30, 2022, 9:37 p.m. UTC
Prepare for freeing the host tag set earlier by making scsi_forget_host()
wait until all activity on the host tag set has stopped.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/scsi_scan.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Mike Christie July 1, 2022, 4:25 p.m. UTC | #1
On 6/30/22 4:37 PM, Bart Van Assche wrote:
>  	struct scsi_device *sdev;
> @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>  	spin_lock_irq(shost->host_lock);
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> +		if (sdev->sdev_state == SDEV_DEL &&
> +		    blk_queue_dead(sdev->request_queue)) {
>  			continue;
> +		}
> +		if (sdev->sdev_state == SDEV_DEL) {
> +			get_device(&sdev->sdev_gendev);
> +			spin_unlock_irq(shost->host_lock);
> +
> +			while (!blk_queue_dead(sdev->request_queue))
> +				msleep(10);
> +
> +			spin_lock_irq(shost->host_lock);
> +			put_device(&sdev->sdev_gendev);
> +			goto restart;
> +		}
>  		spin_unlock_irq(shost->host_lock);
>  		__scsi_remove_device(sdev);
>  		goto restart;

Are there 2 ways to hit your issue?

1. Normal case. srp_remove_one frees srp_device. Then all refs
to host are dropped and we call srp_exit_cmd_priv which accesses
the freed srp_device?

You don't need the above patch for this case right.

2. Are you hitting a race? Something did a scsi_remove_device. It
set the device state to SDEV_DEL. It's stuck in blk_cleanup_queue
blk_freeze_queue. Now we do the above code. Without your patch
we just move on by that device. Commands could still be accessed.
With your patch we wait for for that other thread to complete the
device destruction.


Could you also solve both issues by adding a scsi_host_template
scsi_host release function that's called from scsi_host_dev_release. A
new srp_host_release would free structs like the srp device from there.
Or would we still have an issue for #2 where we just don't know how
far blk_freeze_queue has got so commands could still be hitting our
queue_rq function when we are doing scsi_host_dev_release?

I like how your patch makes it so we know after scsi_remove_host
has returned that the device is really gone. Could we have a similar
race as in #2 with someone doing a scsi_remove_device and transport
doing scsi_remove_target?

We would not hit the same use after free from the tag set exit_cmd_priv
being called. But, for example, if we wanted to free some transport level
resources that running commands reference after scsi_target_remove could
we hit a use after free? If so maybe we want to move this wait/check to
__scsi_remove_device?
Bart Van Assche July 1, 2022, 11:44 p.m. UTC | #2
On 7/1/22 09:25, Mike Christie wrote:
> On 6/30/22 4:37 PM, Bart Van Assche wrote:
>>   	struct scsi_device *sdev;
>> @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost)
>>    restart:
>>   	spin_lock_irq(shost->host_lock);
>>   	list_for_each_entry(sdev, &shost->__devices, siblings) {
>> -		if (sdev->sdev_state == SDEV_DEL)
>> +		if (sdev->sdev_state == SDEV_DEL &&
>> +		    blk_queue_dead(sdev->request_queue)) {
>>   			continue;
>> +		}
>> +		if (sdev->sdev_state == SDEV_DEL) {
>> +			get_device(&sdev->sdev_gendev);
>> +			spin_unlock_irq(shost->host_lock);
>> +
>> +			while (!blk_queue_dead(sdev->request_queue))
>> +				msleep(10);
>> +
>> +			spin_lock_irq(shost->host_lock);
>> +			put_device(&sdev->sdev_gendev);
>> +			goto restart;
>> +		}
>>   		spin_unlock_irq(shost->host_lock);
>>   		__scsi_remove_device(sdev);
>>   		goto restart;
> 
> Are there 2 ways to hit your issue?
> 
> 1. Normal case. srp_remove_one frees srp_device. Then all refs
> to host are dropped and we call srp_exit_cmd_priv which accesses
> the freed srp_device?
> 
> You don't need the above patch for this case right.
> 
> 2. Are you hitting a race? Something did a scsi_remove_device. It
> set the device state to SDEV_DEL. It's stuck in blk_cleanup_queue
> blk_freeze_queue. Now we do the above code. Without your patch
> we just move on by that device. Commands could still be accessed.
> With your patch we wait for for that other thread to complete the
> device destruction.
> 
> 
> Could you also solve both issues by adding a scsi_host_template
> scsi_host release function that's called from scsi_host_dev_release. A
> new srp_host_release would free structs like the srp device from there.
> Or would we still have an issue for #2 where we just don't know how
> far blk_freeze_queue has got so commands could still be hitting our
> queue_rq function when we are doing scsi_host_dev_release?
> 
> I like how your patch makes it so we know after scsi_remove_host
> has returned that the device is really gone. Could we have a similar
> race as in #2 with someone doing a scsi_remove_device and transport
> doing scsi_remove_target?
> 
> We would not hit the same use after free from the tag set exit_cmd_priv
> being called. But, for example, if we wanted to free some transport level
> resources that running commands reference after scsi_target_remove could
> we hit a use after free? If so maybe we want to move this wait/check to
> __scsi_remove_device?

Hi Mike,

Although I'm not sure that I completely understood the above: my goal 
with this patch is to make sure that all activity on the host tag set 
has stopped by the time scsi_forget_host() returns. I do not only want 
to cover the case where scsi_remove_host() is called by the ib_srp 
driver but also the case where a SCSI device that was created by the 
ib_srp driver is removed before scsi_remove_host() is called. Users can 
trigger this scenario by writing into /sys/class/scsi_device/*/*/delete.

Adding a new callback into scsi_host_dev_release() would not help 
because the scsi_host_dev_release() call may happen long after 
scsi_forget_host() returned.

Does this answer your question?

Bart.
Ming Lei July 5, 2022, 3:38 a.m. UTC | #3
On Thu, Jun 30, 2022 at 02:37:32PM -0700, Bart Van Assche wrote:
> Prepare for freeing the host tag set earlier by making scsi_forget_host()
> wait until all activity on the host tag set has stopped.

I think it is correct to free the host tag during removing host.

> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: John Garry <john.garry@huawei.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/scsi_scan.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 5c3bb4ceeac3..c8331ccdde95 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1961,6 +1961,16 @@ void scsi_scan_host(struct Scsi_Host *shost)
>  }
>  EXPORT_SYMBOL(scsi_scan_host);
>  
> +/**
> + * scsi_forget_host() - Remove all SCSI devices from a host.
> + * @shost: SCSI host to remove devices from.
> + *
> + * Removes all SCSI devices that have not yet been removed. For the SCSI devices
> + * for which removal started before scsi_forget_host(), wait until the
> + * associated request queue has reached the "dead" state. In that state it is
> + * guaranteed that no new requests will be allocated and also that no requests
> + * are in progress anymore.
> + */
>  void scsi_forget_host(struct Scsi_Host *shost)
>  {
>  	struct scsi_device *sdev;
> @@ -1970,8 +1980,21 @@ void scsi_forget_host(struct Scsi_Host *shost)
>   restart:
>  	spin_lock_irq(shost->host_lock);
>  	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> +		if (sdev->sdev_state == SDEV_DEL &&
> +		    blk_queue_dead(sdev->request_queue)) {
>  			continue;
> +		}
> +		if (sdev->sdev_state == SDEV_DEL) {
> +			get_device(&sdev->sdev_gendev);
> +			spin_unlock_irq(shost->host_lock);
> +
> +			while (!blk_queue_dead(sdev->request_queue))
> +				msleep(10);

Thinking of further, this report on UAF on SRP resource and Changhui's
report on UAF of host->hostt should belong to same kind of issue:

1) after scsi_remove_host() returns, either the HBA driver module can be
unloaded and hostt can't be used, or any HBA resources can be freed, both
may cause UAF in scsi_mq_exit_request, so moving freeing host tagset to
scsi_remove_host() is correct.

static void scsi_mq_exit_request()
{
	...
    if (shost->hostt->exit_cmd_priv)
		shost->hostt->exit_cmd_priv(shost, cmd);
}


2) checking request queue dead here isn't good, which not only relies
on block layer implementation detail, but also can't avoid UAF on ->hostt,
I'd suggest to fix them all. The attached patch may drain all sdevs, which
can replace the 2nd patch if you don't mind, but the 3rd patch is still needed:


diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index e1cb187402fd..6445718c2b18 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -189,6 +189,14 @@ void scsi_remove_host(struct Scsi_Host *shost)
 	transport_unregister_device(&shost->shost_gendev);
 	device_unregister(&shost->shost_dev);
 	device_del(&shost->shost_gendev);
+
+	/*
+	 * Once returning, the scsi LLD module can be unloaded, so we have
+	 * to wait until our descendants are released, otherwise our host
+	 * device reference can be grabbed by them, then use-after-free
+	 * on shost->hostt will be triggered
+	 */
+	wait_for_completion(&shost->targets_completion);
 }
 EXPORT_SYMBOL(scsi_remove_host);
 
@@ -393,6 +401,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	INIT_LIST_HEAD(&shost->starved_list);
 	init_waitqueue_head(&shost->host_wait);
 	mutex_init(&shost->scan_mutex);
+	init_completion(&shost->targets_completion);
 
 	index = ida_simple_get(&host_index_ida, 0, 0, GFP_KERNEL);
 	if (index < 0) {
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 10e5bffc34aa..b6e6354da396 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -545,10 +545,8 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-	struct module *mod = sdev->host->hostt->module;
-
+	module_put(sdev->host->hostt->module);
 	put_device(&sdev->sdev_gendev);
-	module_put(mod);
 }
 EXPORT_SYMBOL(scsi_device_put);
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 2ef78083f1ef..931333a48372 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -406,9 +406,14 @@ static void scsi_target_destroy(struct scsi_target *starget)
 static void scsi_target_dev_release(struct device *dev)
 {
 	struct device *parent = dev->parent;
+	struct Scsi_Host *shost = dev_to_shost(parent);
 	struct scsi_target *starget = to_scsi_target(dev);
 
 	kfree(starget);
+
+	if (!atomic_dec_return(&shost->nr_targets))
+		complete_all(&shost->targets_completion);
+
 	put_device(parent);
 }
 
@@ -521,6 +526,7 @@ static struct scsi_target *scsi_alloc_target(struct device *parent,
 	starget->state = STARGET_CREATED;
 	starget->scsi_level = SCSI_2;
 	starget->max_target_blocked = SCSI_DEFAULT_TARGET_BLOCKED;
+	atomic_inc(&shost->nr_targets);
  retry:
 	spin_lock_irqsave(shost->host_lock, flags);
 
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 94091d5281ba..28c51ef0ea54 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -449,12 +449,9 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 	struct scsi_vpd *vpd_pg80 = NULL, *vpd_pg83 = NULL;
 	struct scsi_vpd *vpd_pg0 = NULL, *vpd_pg89 = NULL;
 	unsigned long flags;
-	struct module *mod;
 
 	sdev = container_of(work, struct scsi_device, ew.work);
 
-	mod = sdev->host->hostt->module;
-
 	scsi_dh_release_device(sdev);
 
 	parent = sdev->sdev_gendev.parent;
@@ -505,17 +502,11 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
 
 	if (parent)
 		put_device(parent);
-	module_put(mod);
 }
 
 static void scsi_device_dev_release(struct device *dev)
 {
 	struct scsi_device *sdp = to_scsi_device(dev);
-
-	/* Set module pointer as NULL in case of module unloading */
-	if (!try_module_get(sdp->host->hostt->module))
-		sdp->host->hostt->module = NULL;
-
 	execute_in_process_context(scsi_device_dev_release_usercontext,
 				   &sdp->ew);
 }
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 37718373defc..d0baffd62287 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -710,6 +710,9 @@ struct Scsi_Host {
 	/* ldm bits */
 	struct device		shost_gendev, shost_dev;
 
+	atomic_t nr_targets;
+	struct completion   targets_completion;
+
 	/*
 	 * Points to the transport data (if any) which is allocated
 	 * separately


Thanks,
Ming
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 5c3bb4ceeac3..c8331ccdde95 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1961,6 +1961,16 @@  void scsi_scan_host(struct Scsi_Host *shost)
 }
 EXPORT_SYMBOL(scsi_scan_host);
 
+/**
+ * scsi_forget_host() - Remove all SCSI devices from a host.
+ * @shost: SCSI host to remove devices from.
+ *
+ * Removes all SCSI devices that have not yet been removed. For the SCSI devices
+ * for which removal started before scsi_forget_host(), wait until the
+ * associated request queue has reached the "dead" state. In that state it is
+ * guaranteed that no new requests will be allocated and also that no requests
+ * are in progress anymore.
+ */
 void scsi_forget_host(struct Scsi_Host *shost)
 {
 	struct scsi_device *sdev;
@@ -1970,8 +1980,21 @@  void scsi_forget_host(struct Scsi_Host *shost)
  restart:
 	spin_lock_irq(shost->host_lock);
 	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
+		if (sdev->sdev_state == SDEV_DEL &&
+		    blk_queue_dead(sdev->request_queue)) {
 			continue;
+		}
+		if (sdev->sdev_state == SDEV_DEL) {
+			get_device(&sdev->sdev_gendev);
+			spin_unlock_irq(shost->host_lock);
+
+			while (!blk_queue_dead(sdev->request_queue))
+				msleep(10);
+
+			spin_lock_irq(shost->host_lock);
+			put_device(&sdev->sdev_gendev);
+			goto restart;
+		}
 		spin_unlock_irq(shost->host_lock);
 		__scsi_remove_device(sdev);
 		goto restart;