mbox series

[00/13] scsi: Support LUN/target based error handle

Message ID 20230723234422.1629194-1-haowenchao2@huawei.com
Headers show
Series scsi: Support LUN/target based error handle | expand

Message

Wenchao Hao July 23, 2023, 11:44 p.m. UTC
The origin error handle would set host to recovery state and perform
error recovery operations, and makes all LUNs which share a same host
can not handle IOs. This phenomenon is unbearable for systems which
deploy many LUNs in one HBA.

This patchset introduce support for LUN/target based error handle,
drivers can chose if to implement it. They can implement LUN, target or
both of LUN and target based error handle by their own error handle
strategy. The first patch defined this framework, it abstract three
key operations which are: add error command, wake up error handle, block
ios when error command is added and recoverying. Drivers should
implement these three function callbacks and setup to SCSI middle level.

Besides the basic framework, this patchset also add a basic LUN/target
based error handle strategy.

For LUN based eh, it would try check sense, start unit and reset LUN,
if all above steps can not recovery all error commands, fallback to
further recovery like tartget based (if implemented) or host based error
handle.

It's same for tartget based eh, it would try check sense, start unit,
reset LUN and reset target. If all above steps can not recovery all error
commands, fallback to further recovery which is host based error handle.

This patchset is tested by scsi_debug which support single LUN error
injection, the scsi_debug patches is here:

https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t

Wenchao Hao (13):
  scsi: Define basic framework for driver LUN/target based error handle
  scsi:scsi_error: Move complete variable eh_action from shost to sdevice
  scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
  scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
  scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
  scsi:scsi_error: Add flags to mark error handle steps has done
  scsi:scsi_error: Define helper to perform LUN based error handle
  scsi:scsi_error: Add LUN based error handler based previous helper
  scsi:core: increase/decrease target_busy without check can_queue
  scsi:scsi_error: Define helper to perform target based error handle
  scsi:scsi_error: Add target based error handler based previous helper
  scsi:scsi_debug: Add param to control if setup LUN based error handle
  scsi:scsi_debug: Add param to control if setup target based error handle

 drivers/scsi/scsi_debug.c  |  19 +
 drivers/scsi/scsi_error.c  | 705 ++++++++++++++++++++++++++++++++++---
 drivers/scsi/scsi_lib.c    |  23 +-
 drivers/scsi/scsi_priv.h   |  20 ++
 include/scsi/scsi_device.h |  97 +++++
 include/scsi/scsi_eh.h     |   4 +
 include/scsi/scsi_host.h   |   2 -
 7 files changed, 813 insertions(+), 57 deletions(-)

Comments

Wenchao Hao Aug. 21, 2023, 1:31 p.m. UTC | #1
On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
> 
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
> 

Ping...

Is anyone reviewing these changes?

> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
> 
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
> 
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
> 
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
> 
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
> 
> Wenchao Hao (13):
>    scsi: Define basic framework for driver LUN/target based error handle
>    scsi:scsi_error: Move complete variable eh_action from shost to sdevice
>    scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
>    scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
>    scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
>    scsi:scsi_error: Add flags to mark error handle steps has done
>    scsi:scsi_error: Define helper to perform LUN based error handle
>    scsi:scsi_error: Add LUN based error handler based previous helper
>    scsi:core: increase/decrease target_busy without check can_queue
>    scsi:scsi_error: Define helper to perform target based error handle
>    scsi:scsi_error: Add target based error handler based previous helper
>    scsi:scsi_debug: Add param to control if setup LUN based error handle
>    scsi:scsi_debug: Add param to control if setup target based error handle
> 
>   drivers/scsi/scsi_debug.c  |  19 +
>   drivers/scsi/scsi_error.c  | 705 ++++++++++++++++++++++++++++++++++---
>   drivers/scsi/scsi_lib.c    |  23 +-
>   drivers/scsi/scsi_priv.h   |  20 ++
>   include/scsi/scsi_device.h |  97 +++++
>   include/scsi/scsi_eh.h     |   4 +
>   include/scsi/scsi_host.h   |   2 -
>   7 files changed, 813 insertions(+), 57 deletions(-)
>
Wenchao Hao Aug. 30, 2023, 9:45 a.m. UTC | #2
On 2023/7/24 7:44, Wenchao Hao wrote:

Ping again...

> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
> 
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
> 
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
> 
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
> 
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
> 
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
> 
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
> 
> Wenchao Hao (13):
>    scsi: Define basic framework for driver LUN/target based error handle
>    scsi:scsi_error: Move complete variable eh_action from shost to sdevice
>    scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
>    scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
>    scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
>    scsi:scsi_error: Add flags to mark error handle steps has done
>    scsi:scsi_error: Define helper to perform LUN based error handle
>    scsi:scsi_error: Add LUN based error handler based previous helper
>    scsi:core: increase/decrease target_busy without check can_queue
>    scsi:scsi_error: Define helper to perform target based error handle
>    scsi:scsi_error: Add target based error handler based previous helper
>    scsi:scsi_debug: Add param to control if setup LUN based error handle
>    scsi:scsi_debug: Add param to control if setup target based error handle
> 
>   drivers/scsi/scsi_debug.c  |  19 +
>   drivers/scsi/scsi_error.c  | 705 ++++++++++++++++++++++++++++++++++---
>   drivers/scsi/scsi_lib.c    |  23 +-
>   drivers/scsi/scsi_priv.h   |  20 ++
>   include/scsi/scsi_device.h |  97 +++++
>   include/scsi/scsi_eh.h     |   4 +
>   include/scsi/scsi_host.h   |   2 -
>   7 files changed, 813 insertions(+), 57 deletions(-)
>