mbox series

[v5,00/10] scsi:scsi_debug: Add error injection for single device

Message ID 20230922092906.2645265-1-haowenchao2@huawei.com
Headers show
Series scsi:scsi_debug: Add error injection for single device | expand

Message

Wenchao Hao Sept. 22, 2023, 9:28 a.m. UTC
The original error injection mechanism was based on scsi_host which
could not inject fault for a single SCSI device.

This patchset provides the ability to inject errors for a single
SCSI device. Now we supports inject timeout errors, queuecommand
errors, and hostbyte, driverbyte, statusbyte, and sense data for
specific SCSI Command. Two new error injection is defined to make
abort command or reset LUN failed.

Besides error injection for single device, this patchset add a
new interface to make reset target failed for each scsi_target.

The first two patch add an debugfs interface to add and inquiry single
device's error injection info; the third patch defined how to remove
an injection which has been added. The following 5 patches use the
injection info and generate the related error type. The last one just
Add a new interface to make reset target failed.

V5:
  - Using rcu list to sync between error inject add, remove and check
  - Add module parameter "allow_restart" to control scsi_device's
    allow_restart flag

V4:
  - Fix BUG_ON triggered by schedule in atomic context when rmmod scsi_debug
    Closes: https://lore.kernel.org/oe-lkp/202308031027.5941ce5f-oliver.sang@intel.com

V3:
  - Add two more error types to fail abort command and lun reset
  - Fix memleak when rmmod scsi_debug without clearing errors injected
  - Fix memkeak because did not implement release in sdebug_error_fops
  - Fix possible NULL point access in scsi_debug_slave_destroy
  - Move specific error type's description to each single patch which
    implement this error type
  - Add interface to make target reset fail

V2:
  - Using debugfs rather than sysfs attribute interface to manage error

Wenchao Hao (10):
  scsi: scsi_debug: create scsi_debug directory in the debugfs
    filesystem
  scsi: scsi_debug: Add interface to manage single device's error inject
  scsi: scsi_debug: Define grammar to remove added error injection
  scsi: scsi_debug: timeout command if the error is injected
  scsi: scsi_debug: Return failed value if the error is injected
  scsi: scsi_debug: set command's result and sense data if the error is
    injected
  scsi: scsi_debug: Add new error injection abort failed
  scsi: scsi_debug: Add new error injection reset lun failed
  scsi: scsi_debug: Add debugfs interface to fail target reset
  scsi: scsi_debug: Add param to control sdev's allow_restart

 drivers/scsi/scsi_debug.c | 557 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 552 insertions(+), 5 deletions(-)

Comments

Martin K. Petersen Sept. 27, 2023, 3:38 p.m. UTC | #1
Doug,

> The original error injection mechanism was based on scsi_host which
> could not inject fault for a single SCSI device.
>
> This patchset provides the ability to inject errors for a single SCSI
> device. Now we supports inject timeout errors, queuecommand errors,
> and hostbyte, driverbyte, statusbyte, and sense data for specific SCSI
> Command. Two new error injection is defined to make abort command or
> reset LUN failed.

Please review patches 7 through 10. Thank you!
Douglas Gilbert Sept. 28, 2023, 1:13 a.m. UTC | #2
On 2023-09-22 05:28, Wenchao Hao wrote:
> Create directory scsi_debug in the root of the debugfs filesystem.
> Prepare to add interface for manage error injection.
> 
> Acked-by: Douglas Gilbert <dgilbert@interlog.com>
> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
> ---
>   drivers/scsi/scsi_debug.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index 9c0af50501f9..35c336271b13 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -41,6 +41,7 @@
>   #include <linux/random.h>
>   #include <linux/xarray.h>
>   #include <linux/prefetch.h>
> +#include <linux/debugfs.h>
>   
>   #include <net/checksum.h>
>   
> @@ -862,6 +863,8 @@ static const int device_qfull_result =
>   
>   static const int condition_met_result = SAM_STAT_CONDITION_MET;
>   
> +static struct dentry *sdebug_debugfs_root;
> +
>   
>   /* Only do the extra work involved in logical block provisioning if one or
>    * more of the lbpu, lbpws or lbpws10 parameters are given and we are doing
> @@ -7011,6 +7014,8 @@ static int __init scsi_debug_init(void)
>   		goto driver_unreg;
>   	}
>   
> +	sdebug_debugfs_root = debugfs_create_dir("scsi_debug", NULL);

debugfs_create_dir() can fail and return NULL. Looking at other drivers, most
seem to assume it will work. Since the scsi_debug driver is often used to test
abnormal situations, perhaps adding something like:
     if (!sdebug_debugfs_root)
         pr_info("%s: failed to create initial debugfs directory\n", __func__);

might save someone a bit of time if a NULL dereference on sdebug_debugfs_root
follows later. That is what the mpt3sas driver does.

Doug Gilbert

> +
>   	for (k = 0; k < hosts_to_add; k++) {
>   		if (want_store && k == 0) {
>   			ret = sdebug_add_host_helper(idx);
> @@ -7057,6 +7062,7 @@ static void __exit scsi_debug_exit(void)
>   
>   	sdebug_erase_all_stores(false);
>   	xa_destroy(per_store_ap);
> +	debugfs_remove(sdebug_debugfs_root);
>   }
>   
>   device_initcall(scsi_debug_init);
Wenchao Hao Sept. 28, 2023, 1:38 a.m. UTC | #3
On 2023/9/28 9:13, Douglas Gilbert wrote:
> On 2023-09-22 05:28, Wenchao Hao wrote:
>> Create directory scsi_debug in the root of the debugfs filesystem.
>> Prepare to add interface for manage error injection.
>>
>> Acked-by: Douglas Gilbert <dgilbert@interlog.com>
>> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
>> ---
>>   drivers/scsi/scsi_debug.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
>> index 9c0af50501f9..35c336271b13 100644
>> --- a/drivers/scsi/scsi_debug.c
>> +++ b/drivers/scsi/scsi_debug.c
>> @@ -41,6 +41,7 @@
>>   #include <linux/random.h>
>>   #include <linux/xarray.h>
>>   #include <linux/prefetch.h>
>> +#include <linux/debugfs.h>
>>   #include <net/checksum.h>
>> @@ -862,6 +863,8 @@ static const int device_qfull_result =
>>   static const int condition_met_result = SAM_STAT_CONDITION_MET;
>> +static struct dentry *sdebug_debugfs_root;
>> +
>>   /* Only do the extra work involved in logical block provisioning if one or
>>    * more of the lbpu, lbpws or lbpws10 parameters are given and we are doing
>> @@ -7011,6 +7014,8 @@ static int __init scsi_debug_init(void)
>>           goto driver_unreg;
>>       }
>> +    sdebug_debugfs_root = debugfs_create_dir("scsi_debug", NULL);
> 
> debugfs_create_dir() can fail and return NULL. Looking at other drivers, most
> seem to assume it will work. Since the scsi_debug driver is often used to test
> abnormal situations, perhaps adding something like:
>      if (!sdebug_debugfs_root)
>          pr_info("%s: failed to create initial debugfs directory\n", __func__);
> 
> might save someone a bit of time if a NULL dereference on sdebug_debugfs_root
> follows later. That is what the mpt3sas driver does.
> 

Yes, I would fix it by checking return value of debugfs related call
after your review suggestions for other patches.

> Doug Gilbert
> 
>> +
>>       for (k = 0; k < hosts_to_add; k++) {
>>           if (want_store && k == 0) {
>>               ret = sdebug_add_host_helper(idx);
>> @@ -7057,6 +7062,7 @@ static void __exit scsi_debug_exit(void)
>>       sdebug_erase_all_stores(false);
>>       xa_destroy(per_store_ap);
>> +    debugfs_remove(sdebug_debugfs_root);
>>   }
>>   device_initcall(scsi_debug_init);
> 
>
Douglas Gilbert Oct. 6, 2023, 4:06 p.m. UTC | #4
On 2023-09-22 05:29, Wenchao Hao wrote:
> Add error injection type 3 to make scsi_debug_abort() return FAILED.
> Fail abort command foramt:

s/foramt/format/

> 
>    +--------+------+-------------------------------------------------------+
>    | Column | Type | Description                                           |
>    +--------+------+-------------------------------------------------------+
>    |   1    |  u8  | Error type, fixed to 0x3                              |
>    +--------+------+-------------------------------------------------------+
>    |   2    |  s32 | Error count                                           |
>    |        |      |  0: this rule will be ignored                         |
>    |        |      |  positive: the rule will always take effect           |
>    |        |      |  negative: the rule takes effect n times where -n is  |
>    |        |      |            the value given. Ignored after n times     |
>    +--------+------+-------------------------------------------------------+
>    |   3    |  x8  | SCSI command opcode, 0xff for all commands            |
>    +--------+------+-------------------------------------------------------+
> 
> Examples:
>      error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
>      echo "0 -10 0x12" > ${error}
> will make the device return FAILED when abort inquiry command 10 times.

Tested with:
    # sg_raw -t 10 -r 1k /dev/sg1 12 00 00 00 60 00

After 10 seconds (the timeout specified to sg_raw) I saw this:
    >>> transport error: Host_status=0x03 [DID_TIME_OUT]

And the
     # cat /sys/kernel/debug/scsi_debug/1\:0\:0\:0/error

Count value changed from -10 up to 0 for each invocation of the INQUIRY
command. Thereafter the INQUIRY command worked.

Looks good.

   Tested-by: Douglas Gilbert <dgilbert@interlog.com>

> 
> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
> ---
>   drivers/scsi/scsi_debug.c | 40 +++++++++++++++++++++++++++++++++++++++
>   1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index fe1f7421f617..8a16cb9642a6 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -293,6 +293,8 @@ enum sdebug_err_type {
>   	ERR_FAIL_CMD		= 2,	/* make specific scsi command's */
>   					/* queuecmd return succeed but */
>   					/* with errors set in scsi_cmnd */
> +	ERR_ABORT_CMD_FAILED	= 3,	/* control return FAILED from */
> +					/* scsi_debug_abort() */
>   };
>   
>   struct sdebug_err_inject {
> @@ -970,6 +972,7 @@ static int sdebug_error_show(struct seq_file *m, void *p)
>   	list_for_each_entry_rcu(err, &devip->inject_err_list, list) {
>   		switch (err->type) {
>   		case ERR_TMOUT_CMD:
> +		case ERR_ABORT_CMD_FAILED:
>   			seq_printf(m, "%d\t%d\t0x%x\n", err->type, err->cnt,
>   				err->cmd);
>   		break;
> @@ -1031,6 +1034,7 @@ static ssize_t sdebug_error_write(struct file *file, const char __user *ubuf,
>   
>   	switch (inject_type) {
>   	case ERR_TMOUT_CMD:
> +	case ERR_ABORT_CMD_FAILED:
>   		if (sscanf(buf, "%d %d %hhx", &inject->type, &inject->cnt,
>   			   &inject->cmd) != 3)
>   			goto out_error;
> @@ -5504,9 +5508,39 @@ static void stop_all_queued(void)
>   	mutex_unlock(&sdebug_host_list_mutex);
>   }
>   
> +static int sdebug_fail_abort(struct scsi_cmnd *cmnd)
> +{
> +	struct scsi_device *sdp = cmnd->device;
> +	struct sdebug_dev_info *devip = (struct sdebug_dev_info *)sdp->hostdata;
> +	struct sdebug_err_inject *err;
> +	unsigned char *cmd = cmnd->cmnd;
> +	int ret = 0;
> +
> +	if (devip == NULL)
> +		return 0;
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(err, &devip->inject_err_list, list) {
> +		if (err->type == ERR_ABORT_CMD_FAILED &&
> +		    (err->cmd == cmd[0] || err->cmd == 0xff)) {
> +			ret = !!err->cnt;
> +			if (err->cnt < 0)
> +				err->cnt++;
> +
> +			rcu_read_unlock();
> +			return ret;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return 0;
> +}
> +
>   static int scsi_debug_abort(struct scsi_cmnd *SCpnt)
>   {
>   	bool ok = scsi_debug_abort_cmnd(SCpnt);
> +	u8 *cmd = SCpnt->cmnd;
> +	u8 opcode = cmd[0];
>   
>   	++num_aborts;
>   
> @@ -5515,6 +5549,12 @@ static int scsi_debug_abort(struct scsi_cmnd *SCpnt)
>   			    "%s: command%s found\n", __func__,
>   			    ok ? "" : " not");
>   
> +	if (sdebug_fail_abort(SCpnt)) {
> +		scmd_printk(KERN_INFO, SCpnt, "fail abort command 0x%x\n",
> +			    opcode);
> +		return FAILED;
> +	}
> +
>   	return SUCCESS;
>   }
>
Douglas Gilbert Oct. 8, 2023, 11:14 p.m. UTC | #5
On 2023-09-22 05:29, Wenchao Hao wrote:
> The interface is found at
> /sys/kernel/debug/scsi_debug/target<h:c:t>/fail_reset where <h:c:t>
> identifies the target to inject errors on. It's a simple bool type
> interface which would make this target's reset fail if set to 'Y'.
> 
> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
> Reported-by: kernel test robot <oliver.sang@intel.com>

Tested by setting 'echo 1 > /sys/bus/pseudo/drivers/scsi_debug/opts'
and observing 'tail -f /var/log/syslog'. Looks good including that
fail_reset is readable so its current state can be checked.

Tested-by: Douglas Gilbert <dgilbert@interlog.com>

<snip>