mbox series

[0/6] iscsi: Speed up failover with lots of devices.

Message ID 20220226230435.38733-1-michael.christie@oracle.com
Headers show
Series iscsi: Speed up failover with lots of devices. | expand

Message

Mike Christie Feb. 26, 2022, 11:04 p.m. UTC
In:

https://lore.kernel.org/all/CAK3e-EZbJMDHkozGiz8LnMNAZ+SoCA+QeK0kpkqM4vQ4pz86SQ@mail.gmail.com/t/ 

Zhengyuan Liu found an issue where failovers are taking a long time
with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid
expects most nl operations to be fast (ignoring mem issues) and when
the session block code was written blocking a queue/scsi_device was
just setting some flag bits and state values more or less. Now a block
call will actually handle IO that has been sent to the driver, so it
can be expensive. When you add in more and more devices, then a
session block call will take longer and longer.

This patchset moves the recovery and unbind operations to a per
session work queue instead of the mix or per session, host and module.

Comments

Mike Christie Feb. 28, 2022, 3:53 p.m. UTC | #1
Hey Lee,
On 2/26/22 5:04 PM, Mike Christie wrote:
> In:
> 
> https://lore.kernel.org/all/CAK3e-EZbJMDHkozGiz8LnMNAZ+SoCA+QeK0kpkqM4vQ4pz86SQ@mail.gmail.com/t/ 
> 
> Zhengyuan Liu found an issue where failovers are taking a long time
> with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid
> expects most nl operations to be fast (ignoring mem issues) and when
> the session block code was written blocking a queue/scsi_device was
> just setting some flag bits and state values more or less. Now a block
> call will actually handle IO that has been sent to the driver, so it
> can be expensive. When you add in more and more devices, then a
> session block call will take longer and longer.
> 
> This patchset moves the recovery and unbind operations to a per
> session work queue instead of the mix or per session, host and module.


If you get to the end of the patchset and wonder if there is a patch
missing, there is :)

I have one more patchset that is related to this, but not required for
to handle Zhengyuan Liu's issue. I think I can also kill
iscsi_conn_cleanup_workq and use the iscsi wq instead, but I want to
think about it some more and test it out. And since it's not needed
to handle the issue in the thread below, it should be ok to do
separately.

It might just be a simple kill iscsi_conn_cleanup_workq and use the
iscsi wq, or I might be able to go more invasive and drop some
code. I'm not sure yet.
Martin K. Petersen March 2, 2022, 4:20 a.m. UTC | #2
Mike,

> Zhengyuan Liu found an issue where failovers are taking a long time
> with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid
> expects most nl operations to be fast (ignoring mem issues) and when
> the session block code was written blocking a queue/scsi_device was
> just setting some flag bits and state values more or less. Now a block
> call will actually handle IO that has been sent to the driver, so it
> can be expensive. When you add in more and more devices, then a
> session block call will take longer and longer.
>
> This patchset moves the recovery and unbind operations to a per
> session work queue instead of the mix or per session, host and module.

Applied to 5.18/scsi-staging, thanks!
Martin K. Petersen March 9, 2022, 4:14 a.m. UTC | #3
On Sat, 26 Feb 2022 17:04:29 -0600, Mike Christie wrote:

> In:
> 
> https://lore.kernel.org/all/CAK3e-EZbJMDHkozGiz8LnMNAZ+SoCA+QeK0kpkqM4vQ4pz86SQ@mail.gmail.com/t/
> 
> Zhengyuan Liu found an issue where failovers are taking a long time
> with lots of devices (/dev/sdXYZ nodes). The problem is that iscsid
> expects most nl operations to be fast (ignoring mem issues) and when
> the session block code was written blocking a queue/scsi_device was
> just setting some flag bits and state values more or less. Now a block
> call will actually handle IO that has been sent to the driver, so it
> can be expensive. When you add in more and more devices, then a
> session block call will take longer and longer.
> 
> [...]

Applied to 5.18/scsi-queue, thanks!

[1/6] scsi: iscsi: Fix recovery and ublocking race.
      https://git.kernel.org/mkp/scsi/c/8dd3dff3bf3e
[2/6] scsi: iscsi: Speed up session unblocking and removal.
      https://git.kernel.org/mkp/scsi/c/b07c348f8ffb
[3/6] scsi: iscsi: Remove iscsi_scan_finished.
      https://git.kernel.org/mkp/scsi/c/d8ec5d67b8bb
[4/6] scsi: iscsi, ql4: Use per session workqueue for unbinding.
      https://git.kernel.org/mkp/scsi/c/5842ea366831
[5/6] scsi: iscsi: Use the session workqueue for recovery.
      https://git.kernel.org/mkp/scsi/c/7cb6683ce761
[6/6] scsi: iscsi: Drop temp workq_name.
      https://git.kernel.org/mkp/scsi/c/69af1c9577aa