diff mbox series

[v3] scsi: add non-sleeping variant of scsi_device_put() and use it in alua

Message ID 20230124154953.17807-1-mwilck@suse.com
State Superseded
Headers show
Series [v3] scsi: add non-sleeping variant of scsi_device_put() and use it in alua | expand

Commit Message

Martin Wilck Jan. 24, 2023, 3:49 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

Since the might_sleep() annotation was added in scsi_device_put() and
alua_rtpg_queue(), we have seen repeated reports of "BUG: sleeping function
called from invalid context" [1], [2]. alua_rtpg_queue() is always called
from contexts where the caller must hold at least one reference to the scsi
device in question. This means that the reference taken by
alua_rtpg_queue() itself can't be the last one, and thus can be dropped
without entering the code path in which scsi_device_put() might actually
sleep.

Add a new helper function, scsi_device_put_nosleep() for cases like this,
where a device reference is put from atomic context, and at the same time
it is certain that this reference is not the last one, and use it from
alua_rtpg_queue().

[1] https://lore.kernel.org/linux-scsi/b49e37d5-edfb-4c56-3eeb-62c7d5855c00@linux.ibm.com/
[2] https://lore.kernel.org/linux-scsi/55c35e64-a7d4-9072-46fd-e8eae6a90e96@linux.ibm.com/

Fixes: f93ed747e2c7 ("scsi: core: Release SCSI devices synchronously")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Cc: Benjamin Block <bblock@linux.ibm.com>
Reported-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>

----
Changes v2 -> v3:
 - clarified the comment on scsi_device_put_nosleep() (Bart van Assche)

Changes v1 -> v2:
 - rebased onto the scsi-fixes branch.
 - fixed several mistakes pointed out by Steffen Maier.
---
 drivers/scsi/device_handler/scsi_dh_alua.c |  2 +-
 drivers/scsi/scsi.c                        | 24 ++++++++++++++++++----
 include/scsi/scsi_device.h                 |  1 +
 3 files changed, 22 insertions(+), 5 deletions(-)

Comments

Bart Van Assche Jan. 24, 2023, 6:01 p.m. UTC | #1
On 1/24/23 07:49, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since the might_sleep() annotation was added in scsi_device_put() and
> alua_rtpg_queue(), we have seen repeated reports of "BUG: sleeping function
> called from invalid context" [1], [2]. alua_rtpg_queue() is always called
> from contexts where the caller must hold at least one reference to the scsi
> device in question. This means that the reference taken by
> alua_rtpg_queue() itself can't be the last one, and thus can be dropped
> without entering the code path in which scsi_device_put() might actually
> sleep.
> 
> Add a new helper function, scsi_device_put_nosleep() for cases like this,
> where a device reference is put from atomic context, and at the same time
> it is certain that this reference is not the last one, and use it from
> alua_rtpg_queue().

Something I should have asked earlier, has this alternative been 
considered? This alternative has the advantage that no new functions are 
introduced.

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1426b9b03612..9feb0323bc44 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -588,8 +588,6 @@ void scsi_device_put(struct scsi_device *sdev)
  {
  	struct module *mod = sdev->host->hostt->module;

-	might_sleep();
-
  	put_device(&sdev->sdev_gendev);
  	module_put(mod);
  }
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 981d1bab2120..8ef9a5494340 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -452,5 +451,7 @@ static void scsi_device_dev_release(struct device
  	unsigned long flags;

+	might_sleep();
+
  	scsi_dh_release_device(sdev);

  	parent = sdev->sdev_gendev.parent;

Thanks,

Bart.
Martin Wilck Jan. 24, 2023, 7:48 p.m. UTC | #2
On Tue, 2023-01-24 at 10:01 -0800, Bart Van Assche wrote:
> On 1/24/23 07:49, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > Since the might_sleep() annotation was added in scsi_device_put()
> > and
> > alua_rtpg_queue(), we have seen repeated reports of "BUG: sleeping
> > function
> > called from invalid context" [1], [2]. alua_rtpg_queue() is always
> > called
> > from contexts where the caller must hold at least one reference to
> > the scsi
> > device in question. This means that the reference taken by
> > alua_rtpg_queue() itself can't be the last one, and thus can be
> > dropped
> > without entering the code path in which scsi_device_put() might
> > actually
> > sleep.
> > 
> > Add a new helper function, scsi_device_put_nosleep() for cases like
> > this,
> > where a device reference is put from atomic context, and at the
> > same time
> > it is certain that this reference is not the last one, and use it
> > from
> > alua_rtpg_queue().
> 
> Something I should have asked earlier, has this alternative been 
> considered? This alternative has the advantage that no new functions
> are 
> introduced.

Looks good to me, too. No, I didn't have this idea before.

Martin

> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 1426b9b03612..9feb0323bc44 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -588,8 +588,6 @@ void scsi_device_put(struct scsi_device *sdev)
>   {
>         struct module *mod = sdev->host->hostt->module;
> 
> -       might_sleep();
> -
>         put_device(&sdev->sdev_gendev);
>         module_put(mod);
>   }
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 981d1bab2120..8ef9a5494340 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -452,5 +451,7 @@ static void scsi_device_dev_release(struct device
>         unsigned long flags;
> 
> +       might_sleep();
> +
>         scsi_dh_release_device(sdev);
> 
>         parent = sdev->sdev_gendev.parent;
> 
> Thanks,
> 
> Bart.
Bart Van Assche Jan. 24, 2023, 8:33 p.m. UTC | #3
On 1/24/23 11:48, Martin Wilck wrote:
>> Something I should have asked earlier, has this alternative been
>> considered? This alternative has the advantage that no new functions
>> are
>> introduced.
> 
> Looks good to me, too. No, I didn't have this idea before.

Hi Martin,

Do you plan to post my suggestion as a patch or do you perhaps expect me 
to do that?

Thanks,

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 29a2865b8e2e..8f3aac9e6a50 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -1032,7 +1032,7 @@  static bool alua_rtpg_queue(struct alua_port_group *pg,
 			kref_put(&pg->kref, release_port_group);
 	}
 	if (sdev)
-		scsi_device_put(sdev);
+		scsi_device_put_nosleep(sdev);
 
 	return true;
 }
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 1426b9b03612..fb7befacd831 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -576,6 +576,25 @@  int scsi_device_get(struct scsi_device *sdev)
 }
 EXPORT_SYMBOL(scsi_device_get);
 
+/**
+ * scsi_device_put_nosleep  -  release a reference to a scsi_device
+ * @sdev:	device to release a reference on.
+ *
+ * Description: Release a reference to the scsi_device and decrements the use
+ * count of the underlying LLDD module. This function may only be called from
+ * a call context where it is certain that the reference dropped is not the
+ * last one. Exception: calling this from scsi_device_put() is legal, because
+ * scsi_device_put() has a might_sleep() annotation.
+ */
+void scsi_device_put_nosleep(struct scsi_device *sdev)
+{
+	struct module *mod = sdev->host->hostt->module;
+
+	put_device(&sdev->sdev_gendev);
+	module_put(mod);
+}
+EXPORT_SYMBOL(scsi_device_put_nosleep);
+
 /**
  * scsi_device_put  -  release a reference to a scsi_device
  * @sdev:	device to release a reference on.
@@ -586,12 +605,9 @@  EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-	struct module *mod = sdev->host->hostt->module;
-
 	might_sleep();
 
-	put_device(&sdev->sdev_gendev);
-	module_put(mod);
+	scsi_device_put_nosleep(sdev);
 }
 EXPORT_SYMBOL(scsi_device_put);
 
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3642b8e3928b..f1ad48c9c601 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -365,6 +365,7 @@  void scsi_attach_vpd(struct scsi_device *sdev);
 
 extern struct scsi_device *scsi_device_from_queue(struct request_queue *q);
 extern int __must_check scsi_device_get(struct scsi_device *);
+void scsi_device_put_nosleep(struct scsi_device *sdev);
 extern void scsi_device_put(struct scsi_device *);
 extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
 					      uint, uint, u64);