diff mbox series

[v3,4/8] scsi: call scsi_stop_queue() without state_mutex held

Message ID 20230607182249.22623-5-mwilck@suse.com
State New
Headers show
Series scsi: fixes for targets with many LUNs, and scsi_target_block rework | expand

Commit Message

Martin Wilck June 7, 2023, 6:22 p.m. UTC
From: Martin Wilck <mwilck@suse.com>

sdev->state_mutex protects only sdev->sdev_state. There's no reason
to keep it held while calling scsi_stop_queue().

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bart Van Assche June 8, 2023, 2:12 p.m. UTC | #1
On 6/7/23 22:44, Christoph Hellwig wrote:
>>> Thanks. This wasn't obvious to me from the current code. I'll add a
>>> comment in the next version.
>>
>> The crucial question is now, is it sufficient to call
>> blk_mq_quiesce_queue_nowait() under the mutex, or does the call to
>> blk_mq_wait_quiesce_done() have to be under the mutex, too?
>> The latter would actually kill off our attempt to fix the delay
>> in fc_remote_port_delete() that was caused by repeated
>> synchronize_rcu() calls.
>>
>> But if I understand you correctly, moving the wait out of the mutex
>> should be ok. I'll update the series accordingly.
> 
> I can't think of a reason we'd want to lock over the wait, but Bart
> knows this code way better than I do.

Unless deep changes would be made in the block layer 
blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done() 
functions, moving blk_mq_wait_quiesce_done() outside the critical 
section seems fine to me.

Thanks,

Bart.
Martin Wilck June 12, 2023, 11:15 a.m. UTC | #2
On Thu, 2023-06-08 at 13:54 -0500, Mike Christie wrote:
> On 6/8/23 9:12 AM, Bart Van Assche wrote:
> > On 6/7/23 22:44, Christoph Hellwig wrote:
> > > > > Thanks. This wasn't obvious to me from the current code. I'll
> > > > > add a
> > > > > comment in the next version.
> > > > 
> > > > The crucial question is now, is it sufficient to call
> > > > blk_mq_quiesce_queue_nowait() under the mutex, or does the call
> > > > to
> > > > blk_mq_wait_quiesce_done() have to be under the mutex, too?
> > > > The latter would actually kill off our attempt to fix the delay
> > > > in fc_remote_port_delete() that was caused by repeated
> > > > synchronize_rcu() calls.
> > > > 
> > > > But if I understand you correctly, moving the wait out of the
> > > > mutex
> > > > should be ok. I'll update the series accordingly.
> > > 
> > > I can't think of a reason we'd want to lock over the wait, but
> > > Bart
> > > knows this code way better than I do.
> > 
> > Unless deep changes would be made in the block layer
> > blk_mq_quiesce_queue_nowait() and/or blk_mq_wait_quiesce_done()
> > functions, moving blk_mq_wait_quiesce_done() outside the critical
> > section seems fine to me.
> 
> If it helps, I tested with iscsi and the mutex changes discussed
> above
> and it worked ok for me.

I guess the race that Bart was hinting at is hard to trigger.

I would like to remark that the fact that we need to hold the SCSI
state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a
layering issue to me. Not sure if, and how, this could be avoided,
though.

Regards
Martin
Bart Van Assche June 12, 2023, 1:41 p.m. UTC | #3
On 6/12/23 04:15, Martin Wilck wrote:
> I guess the race that Bart was hinting at is hard to trigger.

Are you sure about this? I think this scenario can be triggered by 
writing into the sysfs attribute that changes the SCSI device state 
while a scsi_target_block() call is in progress. See also 
store_state_field().

> I would like to remark that the fact that we need to hold the SCSI
> state_mutex while calling blk_mq_quiesce_queue_nowait() looks like a
> layering issue to me. Not sure if, and how, this could be avoided,
> though.

I do not agree that this is a layering issue. Is holding a mutex around 
a call of a function in a lower layer ever a layering issue?

What matters is to be very careful with locks while invoking callback 
functions. See also slide 7 in Ousterhout's presentation "Why Threads 
Are A Bad Idea (for most purposes)" from 1996 
(https://web.stanford.edu/~ouster/cgi-bin/papers/threads.pdf).

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ce5788643011..26e7ce25fa05 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2795,9 +2795,9 @@  static void scsi_device_block(struct scsi_device *sdev, void *data)
 
 	mutex_lock(&sdev->state_mutex);
 	err = __scsi_internal_device_block_nowait(sdev);
+	mutex_unlock(&sdev->state_mutex);
 	if (err == 0)
 		scsi_stop_queue(sdev, false);
-	mutex_unlock(&sdev->state_mutex);
 
 	WARN_ONCE(err, "__scsi_internal_device_block_nowait(%s) failed: err = %d\n",
 		  dev_name(&sdev->sdev_gendev), err);