mbox series

[0/5] scsi:scsi_debug: Add error injection for single device

Message ID 20230323115601.178494-1-haowenchao2@huawei.com
Headers show
Series scsi:scsi_debug: Add error injection for single device | expand

Message

Wenchao Hao March 23, 2023, 11:55 a.m. UTC
The original error injection mechanism was based on scsi_host which
could not inject fault for a single SCSI device.

This patchset provides the ability to inject errors for a single
SCSI device. Now we supports inject timeout errors, queuecommand
errors, and hostbyte, driverbyte, statusbyte, and sense data for
specific SCSI Command.

The first patch add an sysfs interface to add and inquiry single
device's error injection info; the second patch defined how to remove
an injection which has been added. The following 3 patches use the
injection info and generate the related error type.

Wenchao Hao (5):
  scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
    injection
  scsi:scsi_debug: Define grammar to remove added error injection
  scsi:scsi_debug: timeout command if the error is injected
  scsi:scsi_debug: Return failed value if the error is injected
  scsi:scsi_debug: set command's result and sense data if the error is
    injected

 drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 296 insertions(+)

Comments

Wenchao Hao March 23, 2023, 1:13 p.m. UTC | #1
On 2023/3/23 20:40, John Garry wrote:
> On 23/03/2023 11:55, Wenchao Hao wrote:
>> The original error injection mechanism was based on scsi_host which
>> could not inject fault for a single SCSI device.
>>
>> This patchset provides the ability to inject errors for a single
>> SCSI device. Now we supports inject timeout errors, queuecommand
>> errors, and hostbyte, driverbyte, statusbyte, and sense data for
>> specific SCSI Command.
> 
> There is already a basic mechanism to generate errors - like timeouts - on "nth" command. Can you say why you want this new interface? What special scenarios are you trying to test/validate (which could not be achieved based on the current mechanism)?
>

I am testing a new error handle policy which is based on single scsi_device
without set host to RECOVERY. So I need a method to generate errors for
single SCSI devices.

While we can not generate errors for single device with current mechanism
because it is designed for host-wide error generation.
  
> With this series we would have 2x methods to inject errors, which is less than ideal, and they seem to possibly conflict as well, e.g. I set timeout for nth command via current interface and then use the new interface to set timeout for some other cadence. What behavior to expect ...?

I did not take this issue in consideration. I now assume the users would
not use these 2 methods at same time.

What's more, I don not know where to write the usage of this newly added
interface, maybe we can explain these in doc?

> 
> I'm not saying that I am a huge fan of the current inject mechanism, but at the very least you need to provide more justification for this series.
>>>
>> The first patch add an sysfs interface to add and inquiry single
>> device's error injection info; the second patch defined how to remove
>> an injection which has been added. The following 3 patches use the
>> injection info and generate the related error type.
>>
>> Wenchao Hao (5):
>>    scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
>>      injection
>>    scsi:scsi_debug: Define grammar to remove added error injection
>>    scsi:scsi_debug: timeout command if the error is injected
>>    scsi:scsi_debug: Return failed value if the error is injected
>>    scsi:scsi_debug: set command's result and sense data if the error is
>>      injected
>>
>>   drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 296 insertions(+)
>>
> 
>
John Garry March 23, 2023, 4:25 p.m. UTC | #2
On 23/03/2023 13:13, haowenchao (C) wrote:
> On 2023/3/23 20:40, John Garry wrote:
>> On 23/03/2023 11:55, Wenchao Hao wrote:
>>> The original error injection mechanism was based on scsi_host which
>>> could not inject fault for a single SCSI device.
>>>
>>> This patchset provides the ability to inject errors for a single
>>> SCSI device. Now we supports inject timeout errors, queuecommand
>>> errors, and hostbyte, driverbyte, statusbyte, and sense data for
>>> specific SCSI Command.
>>
>> There is already a basic mechanism to generate errors - like timeouts 
>> - on "nth" command. Can you say why you want this new interface? What 
>> special scenarios are you trying to test/validate (which could not be 
>> achieved based on the current mechanism)?
>>
> 
> I am testing a new error handle policy which is based on single scsi_device
> without set host to RECOVERY. So I need a method to generate errors for
> single SCSI devices.
> 
> While we can not generate errors for single device with current mechanism
> because it is designed for host-wide error generation.
> 
>> With this series we would have 2x methods to inject errors, which is 
>> less than ideal, and they seem to possibly conflict as well, e.g. I 
>> set timeout for nth command via current interface and then use the new 
>> interface to set timeout for some other cadence. What behavior to 
>> expect ...?
> 
> I did not take this issue in consideration. I now assume the users would
> not use these 2 methods at same time.
> 
> What's more, I don not know where to write the usage of this newly added
> interface, maybe we can explain these in doc?

sysfs entries are described in Documentation/ABI, but please don't add 
elaborate programming interfaces in sysfs files (like in these patches) 
- a sysfs file should be just for reading or writing a single value

> 
>>
>> I'm not saying that I am a huge fan of the current inject mechanism, 
>> but at the very least you need to provide more justification for this 
>> series.
>>>>
>>> The first patch add an sysfs interface to add and inquiry single
>>> device's error injection info; the second patch defined how to remove
>>> an injection which has been added. The following 3 patches use the
>>> injection info and generate the related error type.
>>>
>>> Wenchao Hao (5):
>>>    scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
>>>      injection
>>>    scsi:scsi_debug: Define grammar to remove added error injection
>>>    scsi:scsi_debug: timeout command if the error is injected
>>>    scsi:scsi_debug: Return failed value if the error is injected
>>>    scsi:scsi_debug: set command's result and sense data if the error is
>>>      injected
>>>
>>>   drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 296 insertions(+)
>>>
>>
>>
>
Wenchao Hao March 24, 2023, 3:42 a.m. UTC | #3
On 2023/3/24 1:24, Douglas Gilbert wrote:
> On 2023-03-23 12:25, John Garry wrote:
>> On 23/03/2023 13:13, haowenchao (C) wrote:
>>> On 2023/3/23 20:40, John Garry wrote:
>>>> On 23/03/2023 11:55, Wenchao Hao wrote:
>>>>> The original error injection mechanism was based on scsi_host which
>>>>> could not inject fault for a single SCSI device.
>>>>>
>>>>> This patchset provides the ability to inject errors for a single
>>>>> SCSI device. Now we supports inject timeout errors, queuecommand
>>>>> errors, and hostbyte, driverbyte, statusbyte, and sense data for
>>>>> specific SCSI Command.
>>>>
>>>> There is already a basic mechanism to generate errors - like timeouts - on "nth" command. Can you say why you want this new interface? What special scenarios are you trying to test/validate (which could not be achieved based on the current mechanism)?
>>>>
>>>
>>> I am testing a new error handle policy which is based on single scsi_device
>>> without set host to RECOVERY. So I need a method to generate errors for
>>> single SCSI devices.
>>>
>>> While we can not generate errors for single device with current mechanism
>>> because it is designed for host-wide error generation.
>>>
>>>> With this series we would have 2x methods to inject errors, which is less than ideal, and they seem to possibly conflict as well, e.g. I set timeout for nth command via current interface and then use the new interface to set timeout for some other cadence. What behavior to expect ...?
>>>
>>> I did not take this issue in consideration. I now assume the users would
>>> not use these 2 methods at same time.
>>>
>>> What's more, I don not know where to write the usage of this newly added
>>> interface, maybe we can explain these in doc?
>>
>> sysfs entries are described in Documentation/ABI, but please don't add elaborate programming interfaces in sysfs files (like in these patches) - a sysfs file should be just for reading or writing a single value
> 
> Hi,
> Maybe this link might help for scsi_debug documentation:
>      https://doug-gilbert.github.io/scsi_debug.html
> 
> And rather than sysfs for complicated, per (pseudo_ device
> settings, perhaps we could think about a SCSI mechanism like
> the "Unit Attention" mode page [0x0] which is vendor specific
> and used by Seagate and WDC for this sort of thing.
> A framework is already in the scsi_debug driver to change
> some mode page settings:
> 
> # sdparm /dev/sg0
>      /dev/sg0: Linux     scsi_debug        0191
> Read write error recovery mode page:
>    AWRE          1  [cha: n, def:  1]
>    ARRE          1  [cha: n, def:  1]
>    PER           0  [cha: n, def:  0]
> Caching (SBC) mode page:
>    WCE           1  [cha: y, def:  1]
>    RCD           0  [cha: n, def:  0]
> Control mode page:
>    SWP           0  [cha: n, def:  0]
> Informational exceptions control mode page:
>    EWASC         0  [cha: n, def:  0]
>    DEXCPT        1  [cha: n, def:  1]
>    MRIE          0  [cha: y, def:  0]
> 
> As can be seen WCE and MRIE are changeable, so
> 
> # sdparm --clear=WCE /dev/sg0
> # sdparm --get=WCE /dev/sg0
>      /dev/sg0: Linux     scsi_debug        0191
> WCE           0  [cha: y, def:  1]
> 
> 
> Doug Gilbert
> 
> 

Do you mean define scsi_debug's own format of mode page0(Vendor specific)
which contains these error injection info, and set/get these parameters
via sdparm?
If so, do we need to modify the sdparm code for these changes?

I want to add more injections in scsi_debug to test the SCSI middle layer,
for example, control return SUCCESS in scsi_debug_abort() or
scsi_debug_device_reset().

These injections are more oriented to developers to trigger and observe
the error handler of SCSI middle layer.

We can extend other error injections conveniently via my interface,
for example, add a new error code to add a new injection to control the
return value of scsi_debug_abort().

If it's not recommended to add this interface in sysfs, what about proc? Like
/proc/scsi/scsi, we can write "scsi remove-single-device h:c:t:l" to manage
device.

>>>> I'm not saying that I am a huge fan of the current inject mechanism, but at the very least you need to provide more justification for this series.
>>>>>>
>>>>> The first patch add an sysfs interface to add and inquiry single
>>>>> device's error injection info; the second patch defined how to remove
>>>>> an injection which has been added. The following 3 patches use the
>>>>> injection info and generate the related error type.
>>>>>
>>>>> Wenchao Hao (5):
>>>>>    scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
>>>>>      injection
>>>>>    scsi:scsi_debug: Define grammar to remove added error injection
>>>>>    scsi:scsi_debug: timeout command if the error is injected
>>>>>    scsi:scsi_debug: Return failed value if the error is injected
>>>>>    scsi:scsi_debug: set command's result and sense data if the error is
>>>>>      injected
>>>>>
>>>>>   drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
>>>>>   1 file changed, 296 insertions(+)
>>>>>
>>>>
>>>>
>>>
>>
> 
>
Bart Van Assche March 24, 2023, 4:01 p.m. UTC | #4
On 3/23/23 20:42, haowenchao (C) wrote:
> If sysfs is not recommended, what about proc?

procfs is for process information and should not be used for any other 
purpose.

Thanks,

Bart.
Douglas Gilbert March 24, 2023, 5:31 p.m. UTC | #5
On 2023-03-23 23:42, haowenchao (C) wrote:
> On 2023/3/24 1:24, Douglas Gilbert wrote:
>> On 2023-03-23 12:25, John Garry wrote:
>>> On 23/03/2023 13:13, haowenchao (C) wrote:
>>>> On 2023/3/23 20:40, John Garry wrote:
>>>>> On 23/03/2023 11:55, Wenchao Hao wrote:
>>>>>> The original error injection mechanism was based on scsi_host which
>>>>>> could not inject fault for a single SCSI device.
>>>>>>
>>>>>> This patchset provides the ability to inject errors for a single
>>>>>> SCSI device. Now we supports inject timeout errors, queuecommand
>>>>>> errors, and hostbyte, driverbyte, statusbyte, and sense data for
>>>>>> specific SCSI Command.
>>>>>
>>>>> There is already a basic mechanism to generate errors - like timeouts - on 
>>>>> "nth" command. Can you say why you want this new interface? What special 
>>>>> scenarios are you trying to test/validate (which could not be achieved 
>>>>> based on the current mechanism)?
>>>>>
>>>>
>>>> I am testing a new error handle policy which is based on single scsi_device
>>>> without set host to RECOVERY. So I need a method to generate errors for
>>>> single SCSI devices.
>>>>
>>>> While we can not generate errors for single device with current mechanism
>>>> because it is designed for host-wide error generation.
>>>>
>>>>> With this series we would have 2x methods to inject errors, which is less 
>>>>> than ideal, and they seem to possibly conflict as well, e.g. I set timeout 
>>>>> for nth command via current interface and then use the new interface to set 
>>>>> timeout for some other cadence. What behavior to expect ...?
>>>>
>>>> I did not take this issue in consideration. I now assume the users would
>>>> not use these 2 methods at same time.
>>>>
>>>> What's more, I don not know where to write the usage of this newly added
>>>> interface, maybe we can explain these in doc?
>>>
>>> sysfs entries are described in Documentation/ABI, but please don't add 
>>> elaborate programming interfaces in sysfs files (like in these patches) - a 
>>> sysfs file should be just for reading or writing a single value
>>
>> Hi,
>> Maybe this link might help for scsi_debug documentation:
>>      https://doug-gilbert.github.io/scsi_debug.html
>>
>> And rather than sysfs for complicated, per (pseudo_ device
>> settings, perhaps we could think about a SCSI mechanism like
>> the "Unit Attention" mode page [0x0] which is vendor specific
>> and used by Seagate and WDC for this sort of thing.
>> A framework is already in the scsi_debug driver to change
>> some mode page settings:
>>
>> # sdparm /dev/sg0
>>      /dev/sg0: Linux     scsi_debug        0191
>> Read write error recovery mode page:
>>    AWRE          1  [cha: n, def:  1]
>>    ARRE          1  [cha: n, def:  1]
>>    PER           0  [cha: n, def:  0]
>> Caching (SBC) mode page:
>>    WCE           1  [cha: y, def:  1]
>>    RCD           0  [cha: n, def:  0]
>> Control mode page:
>>    SWP           0  [cha: n, def:  0]
>> Informational exceptions control mode page:
>>    EWASC         0  [cha: n, def:  0]
>>    DEXCPT        1  [cha: n, def:  1]
>>    MRIE          0  [cha: y, def:  0]
>>
>> As can be seen WCE and MRIE are changeable, so
>>
>> # sdparm --clear=WCE /dev/sg0
>> # sdparm --get=WCE /dev/sg0
>>      /dev/sg0: Linux     scsi_debug        0191
>> WCE           0  [cha: y, def:  1]
>>
>>
>> Doug Gilbert
>>
>>
> 
> Do you mean define scsi_debug's own format of mode page0(Vendor specific)
> which contains these error injection info, and set/get these parameters
> via sdparm?
> If so, do we need to modify the sdparm code for these changes?

Not a problem.

> I want to add more injections in scsi_debug to test the SCSI middle layer,
> for example, control return SUCCESS in scsi_debug_abort() or
> scsi_debug_device_reset().
> 
> These injections are more oriented to developers to trigger and observe
> the error handler of SCSI middle layer.
> 
> We can extend other error injections conveniently via my interface,
> for example, add a new error code to add a new injection to control the
> return value of scsi_debug_abort().
> 
> If it's not recommended to add this interface in sysfs, what about proc? Like
> /proc/scsi/scsi, we can write "scsi remove-single-device h:c:t:l" to manage
> device.

In the past, procfs has been used for this sort of thing
but the powers that be want to phase that usage out.

debugfs, even though it is usually mounted under sysfs at
/sys/kernel/debug , does not seem to have the "one value
per variable" restriction. So debugfs or configfs
( /sys/kernel/config ) might be better candidates.
See 'df -ahT' .

>>>>> I'm not saying that I am a huge fan of the current inject mechanism, but at 
>>>>> the very least you need to provide more justification for this series.
>>>>>>>
>>>>>> The first patch add an sysfs interface to add and inquiry single
>>>>>> device's error injection info; the second patch defined how to remove
>>>>>> an injection which has been added. The following 3 patches use the
>>>>>> injection info and generate the related error type.
>>>>>>
>>>>>> Wenchao Hao (5):
>>>>>>    scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
>>>>>>      injection
>>>>>>    scsi:scsi_debug: Define grammar to remove added error injection
>>>>>>    scsi:scsi_debug: timeout command if the error is injected
>>>>>>    scsi:scsi_debug: Return failed value if the error is injected
>>>>>>    scsi:scsi_debug: set command's result and sense data if the error is
>>>>>>      injected
>>>>>>
>>>>>>   drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
>>>>>>   1 file changed, 296 insertions(+)
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
Wenchao Hao March 25, 2023, 3:23 a.m. UTC | #6
On 2023/3/25 1:31, Douglas Gilbert wrote:
> On 2023-03-23 23:42, haowenchao (C) wrote:
>> On 2023/3/24 1:24, Douglas Gilbert wrote:
>>> On 2023-03-23 12:25, John Garry wrote:
>>>> On 23/03/2023 13:13, haowenchao (C) wrote:
>>>>> On 2023/3/23 20:40, John Garry wrote:
>>>>>> On 23/03/2023 11:55, Wenchao Hao wrote:
>>>>>>> The original error injection mechanism was based on scsi_host which
>>>>>>> could not inject fault for a single SCSI device.
>>>>>>>
>>>>>>> This patchset provides the ability to inject errors for a single
>>>>>>> SCSI device. Now we supports inject timeout errors, queuecommand
>>>>>>> errors, and hostbyte, driverbyte, statusbyte, and sense data for
>>>>>>> specific SCSI Command.
>>>>>>
>>>>>> There is already a basic mechanism to generate errors - like timeouts - on "nth" command. Can you say why you want this new interface? What special scenarios are you trying to test/validate (which could not be achieved based on the current mechanism)?
>>>>>>
>>>>>
>>>>> I am testing a new error handle policy which is based on single scsi_device
>>>>> without set host to RECOVERY. So I need a method to generate errors for
>>>>> single SCSI devices.
>>>>>
>>>>> While we can not generate errors for single device with current mechanism
>>>>> because it is designed for host-wide error generation.
>>>>>
>>>>>> With this series we would have 2x methods to inject errors, which is less than ideal, and they seem to possibly conflict as well, e.g. I set timeout for nth command via current interface and then use the new interface to set timeout for some other cadence. What behavior to expect ...?
>>>>>
>>>>> I did not take this issue in consideration. I now assume the users would
>>>>> not use these 2 methods at same time.
>>>>>
>>>>> What's more, I don not know where to write the usage of this newly added
>>>>> interface, maybe we can explain these in doc?
>>>>
>>>> sysfs entries are described in Documentation/ABI, but please don't add elaborate programming interfaces in sysfs files (like in these patches) - a sysfs file should be just for reading or writing a single value
>>>
>>> Hi,
>>> Maybe this link might help for scsi_debug documentation:
>>>      https://doug-gilbert.github.io/scsi_debug.html
>>>
>>> And rather than sysfs for complicated, per (pseudo_ device
>>> settings, perhaps we could think about a SCSI mechanism like
>>> the "Unit Attention" mode page [0x0] which is vendor specific
>>> and used by Seagate and WDC for this sort of thing.
>>> A framework is already in the scsi_debug driver to change
>>> some mode page settings:
>>>
>>> # sdparm /dev/sg0
>>>      /dev/sg0: Linux     scsi_debug        0191
>>> Read write error recovery mode page:
>>>    AWRE          1  [cha: n, def:  1]
>>>    ARRE          1  [cha: n, def:  1]
>>>    PER           0  [cha: n, def:  0]
>>> Caching (SBC) mode page:
>>>    WCE           1  [cha: y, def:  1]
>>>    RCD           0  [cha: n, def:  0]
>>> Control mode page:
>>>    SWP           0  [cha: n, def:  0]
>>> Informational exceptions control mode page:
>>>    EWASC         0  [cha: n, def:  0]
>>>    DEXCPT        1  [cha: n, def:  1]
>>>    MRIE          0  [cha: y, def:  0]
>>>
>>> As can be seen WCE and MRIE are changeable, so
>>>
>>> # sdparm --clear=WCE /dev/sg0
>>> # sdparm --get=WCE /dev/sg0
>>>      /dev/sg0: Linux     scsi_debug        0191
>>> WCE           0  [cha: y, def:  1]
>>>
>>>
>>> Doug Gilbert
>>>
>>>
>>
>> Do you mean define scsi_debug's own format of mode page0(Vendor specific)
>> which contains these error injection info, and set/get these parameters
>> via sdparm?
>> If so, do we need to modify the sdparm code for these changes?
> 
> Not a problem.
> 
>> I want to add more injections in scsi_debug to test the SCSI middle layer,
>> for example, control return SUCCESS in scsi_debug_abort() or
>> scsi_debug_device_reset().
>>
>> These injections are more oriented to developers to trigger and observe
>> the error handler of SCSI middle layer.
>>
>> We can extend other error injections conveniently via my interface,
>> for example, add a new error code to add a new injection to control the
>> return value of scsi_debug_abort().
>>
>> If it's not recommended to add this interface in sysfs, what about proc? Like
>> /proc/scsi/scsi, we can write "scsi remove-single-device h:c:t:l" to manage
>> device.
> 
> In the past, procfs has been used for this sort of thing
> but the powers that be want to phase that usage out.
> 
> debugfs, even though it is usually mounted under sysfs at
> /sys/kernel/debug , does not seem to have the "one value
> per variable" restriction. So debugfs or configfs
> ( /sys/kernel/config ) might be better candidates.
> See 'df -ahT' .
> 

Define scsi_debug's own format of mode page0 seems too complex and we have to
change the sdparm code. In order to make a better scalability, I prefer to add
these interface via debugfs. Actually, I design the format of the "error interface"
described in patch1 by referring to interfaces of ftrace.

The new interfaces based on debugfs would look like following:
/sys/kernel/debug/scsi_debug
	|
	+-- 0:0:0:1
	|	|
	|	\-- error
	|
	\-- 0:0:0:2
		|
		\-- error

>>>>>> I'm not saying that I am a huge fan of the current inject mechanism, but at the very least you need to provide more justification for this series.
>>>>>>>>
>>>>>>> The first patch add an sysfs interface to add and inquiry single
>>>>>>> device's error injection info; the second patch defined how to remove
>>>>>>> an injection which has been added. The following 3 patches use the
>>>>>>> injection info and generate the related error type.
>>>>>>>
>>>>>>> Wenchao Hao (5):
>>>>>>>    scsi:scsi_debug: Add sysfs interface to manage scsi devices' error
>>>>>>>      injection
>>>>>>>    scsi:scsi_debug: Define grammar to remove added error injection
>>>>>>>    scsi:scsi_debug: timeout command if the error is injected
>>>>>>>    scsi:scsi_debug: Return failed value if the error is injected
>>>>>>>    scsi:scsi_debug: set command's result and sense data if the error is
>>>>>>>      injected
>>>>>>>
>>>>>>>   drivers/scsi/scsi_debug.c | 296 ++++++++++++++++++++++++++++++++++++++
>>>>>>>   1 file changed, 296 insertions(+)
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>