diff mbox series

[v3,16/18] scsi: ufs: Synchronize SCSI and UFS error handling

Message ID 20210722033439.26550-17-bvanassche@acm.org
State New
Headers show
Series UFS patches for kernel v5.15 | expand

Commit Message

Bart Van Assche July 22, 2021, 3:34 a.m. UTC
Use the SCSI error handler instead of a custom error handling strategy.
This change reduces the number of potential races in the UFS drivers since
the UFS error handler and the SCSI error handler no longer run concurrently.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Can Guo <cang@codeaurora.org>
Cc: Asutosh Das <asutoshd@codeaurora.org>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 102 ++++++++++++++++++++------------------
 drivers/scsi/ufs/ufshcd.h |   4 --
 2 files changed, 55 insertions(+), 51 deletions(-)

Comments

Bean Huo Aug. 2, 2021, 2:24 p.m. UTC | #1
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:
> Use the SCSI error handler instead of a custom error handling

> strategy.

> This change reduces the number of potential races in the UFS drivers

> since

> the UFS error handler and the SCSI error handler no longer run

> concurrently.

> 

> Cc: Adrian Hunter <adrian.hunter@intel.com>

> Cc: Stanley Chu <stanley.chu@mediatek.com>

> Cc: Can Guo <cang@codeaurora.org>

> Cc: Asutosh Das <asutoshd@codeaurora.org>

> Cc: Avri Altman <avri.altman@wdc.com>

> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

> 


Reviewed-by: Bean Huo <beanhuo@micron.com>

Tested-by: Bean Huo <beanhuo@micron.com>
Adrian Hunter Aug. 28, 2021, 9:47 a.m. UTC | #2
On 22/07/21 6:34 am, Bart Van Assche wrote:
> Use the SCSI error handler instead of a custom error handling strategy.

> This change reduces the number of potential races in the UFS drivers since

> the UFS error handler and the SCSI error handler no longer run concurrently.

> 

> Cc: Adrian Hunter <adrian.hunter@intel.com>

> Cc: Stanley Chu <stanley.chu@mediatek.com>

> Cc: Can Guo <cang@codeaurora.org>

> Cc: Asutosh Das <asutoshd@codeaurora.org>

> Cc: Avri Altman <avri.altman@wdc.com>

> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

> ---


Hi

There is a deadlock that seems to be related to this patch, because now
requests are blocked while the error handler waits on the host_sem.


Example:

ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.
ufshcd_wl_suspend() wins the race but now PM requests deadlock:

because:
 scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

because:
 scsi_schedule_eh() has done:
	    scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)


Some questions for thought:

Won't any holder of host_sem deadlock if it tries to do SCSI requests
and the error handler is waiting on host_sem?

Won't runtime resume deadlock if it is initiated by the error handler?


Regards
Adrian
Avri Altman Aug. 29, 2021, 7:17 a.m. UTC | #3
Hi, 
> On 22/07/21 6:34 am, Bart Van Assche wrote:

> > Use the SCSI error handler instead of a custom error handling strategy.

> > This change reduces the number of potential races in the UFS drivers

> > since the UFS error handler and the SCSI error handler no longer run

> concurrently.

> >

> > Cc: Adrian Hunter <adrian.hunter@intel.com>

> > Cc: Stanley Chu <stanley.chu@mediatek.com>

> > Cc: Can Guo <cang@codeaurora.org>

> > Cc: Asutosh Das <asutoshd@codeaurora.org>

> > Cc: Avri Altman <avri.altman@wdc.com>

> > Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Shouldn't a transport template ops be used only for scsi_transport modules?

Thanks,
Avri
Adrian Hunter Aug. 29, 2021, 9:57 a.m. UTC | #4
On 28/08/21 12:47 pm, Adrian Hunter wrote:
> On 22/07/21 6:34 am, Bart Van Assche wrote:

>> Use the SCSI error handler instead of a custom error handling strategy.

>> This change reduces the number of potential races in the UFS drivers since

>> the UFS error handler and the SCSI error handler no longer run concurrently.

>>

>> Cc: Adrian Hunter <adrian.hunter@intel.com>

>> Cc: Stanley Chu <stanley.chu@mediatek.com>

>> Cc: Can Guo <cang@codeaurora.org>

>> Cc: Asutosh Das <asutoshd@codeaurora.org>

>> Cc: Avri Altman <avri.altman@wdc.com>

>> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

>> ---

> 

> Hi

> 

> There is a deadlock that seems to be related to this patch, because now

> requests are blocked while the error handler waits on the host_sem.

> 

> 

> Example:

> 

> ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.

> ufshcd_wl_suspend() wins the race but now PM requests deadlock:

> 

> because:

>  scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE


That is scsi_host_queue_ready() is FALSE because scsi_host_in_recovery() is TRUE

> 

> because:

>  scsi_schedule_eh() has done:

> 	    scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||

> 	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)

> 

> 

> Some questions for thought:

> 

> Won't any holder of host_sem deadlock if it tries to do SCSI requests

> and the error handler is waiting on host_sem?

> 

> Won't runtime resume deadlock if it is initiated by the error handler?

> 

> 

> Regards

> Adrian

>
Bart Van Assche Aug. 29, 2021, 9:33 p.m. UTC | #5
On 8/29/21 00:17, Avri Altman wrote:
> Shouldn't a transport template ops be used only for scsi_transport modules?


If a transport template is used by multiple SCSI LLD drivers then the 
transport implementation should be implemented as a kernel module of its 
own. Since the transport template introduced by this patch is not used 
by any other kernel module I think it's fine to define this transport 
template in the UFS driver.

Bart.
Bart Van Assche Aug. 29, 2021, 10:18 p.m. UTC | #6
On 8/28/21 02:47, Adrian Hunter wrote:
> There is a deadlock that seems to be related to this patch, because now

> requests are blocked while the error handler waits on the host_sem.


Hi Adrian,

Some but not all of the issues mentioned below have been introduced by 
patch 16/18. Anyway, thank you for having shared your concerns.

> Example:

> 

> ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.

> ufshcd_wl_suspend() wins the race but now PM requests deadlock:

> 

> because:

>   scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

> 

> because:

>   scsi_schedule_eh() has done:

> 	    scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||

> 	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)

> 

> 

> Some questions for thought:

> 

> Won't any holder of host_sem deadlock if it tries to do SCSI requests

> and the error handler is waiting on host_sem?

> 

> Won't runtime resume deadlock if it is initiated by the error handler?


My understanding is that host_sem is used for the following purposes:
- To prevent that sysfs attributes are read or written after shutdown
   has started (hba->shutting_down).
- To serialize sysfs attribute access, clock scaling, error handling,
   the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.

I propose to make the following changes:
- Instead of checking the value of hba->shutting_down from inside sysfs
   attribute callbacks, remove sysfs attributes before starting shutdown.
   That will remove the need to check hba->shutting_down from inside
   sysfs attribute callbacks.
- Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()
   and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs
   attribute access is not the responsibility of a SCSI LLD - this is the
   responsibility of the power management core.
- Split host_sem. I don't see how else to address the potential deadlock
   between the error handler and runtime resume.

Do you agree with the above?

Thanks,

Bart.
Adrian Hunter Aug. 31, 2021, 7:24 a.m. UTC | #7
On 30/08/21 1:18 am, Bart Van Assche wrote:
> On 8/28/21 02:47, Adrian Hunter wrote:

>> There is a deadlock that seems to be related to this patch, because now

>> requests are blocked while the error handler waits on the host_sem.

> 

> Hi Adrian,

> 

> Some but not all of the issues mentioned below have been introduced by patch 16/18. Anyway, thank you for having shared your concerns.

> 

>> Example:

>>

>> ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.

>> ufshcd_wl_suspend() wins the race but now PM requests deadlock:

>>

>> because:

>>   scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

>>

>> because:

>>   scsi_schedule_eh() has done:

>>         scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||

>>         scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)

>>

>>

>> Some questions for thought:

>>

>> Won't any holder of host_sem deadlock if it tries to do SCSI requests

>> and the error handler is waiting on host_sem?

>>

>> Won't runtime resume deadlock if it is initiated by the error handler?

> 

> My understanding is that host_sem is used for the following purposes:

> - To prevent that sysfs attributes are read or written after shutdown

>   has started (hba->shutting_down).

> - To serialize sysfs attribute access, clock scaling, error handling,

>   the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.

> 

> I propose to make the following changes:

> - Instead of checking the value of hba->shutting_down from inside sysfs

>   attribute callbacks, remove sysfs attributes before starting shutdown.

>   That will remove the need to check hba->shutting_down from inside

>   sysfs attribute callbacks.

> - Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()

>   and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs

>   attribute access is not the responsibility of a SCSI LLD - this is the

>   responsibility of the power management core.

> - Split host_sem. I don't see how else to address the potential deadlock

>   between the error handler and runtime resume.

> 

> Do you agree with the above?


Looking some more:

sysfs and debugfs use direct access, so there is probably not a problem
there.

bsg also uses direct access but doesn't appear to have synchronization
so there is maybe a gap there.  That is an existing problem.

As an aside, the current synchronization for direct access doesn't make
complete sense because the lock (host_sem) remains held across retries
(e.g. ufshcd_query_descriptor_retry) preventing error handling between
retries.  That is an existing problem.

ufshcd_wl_suspend() and ufshcd_wl_shutdown() could wait for error handling
and then disable it somehow. ufshcd_wl_resume() would have to enable it.

That leaves runtime PM.  Since the error handler can block runtime resume,
it cannot wait for runtime resume, it must exit.  Another complication is
that the PM workqueue (pm_wq) gets frozen early during system suspend, so
requesting an asynchronous runtime resume won't necessarily make any
progress.

How does splitting the host_sem address the potential deadlock
between the error handler and runtime resume?
Adrian Hunter Aug. 31, 2021, 10:04 a.m. UTC | #8
On 31/08/21 10:24 am, Adrian Hunter wrote:
> On 30/08/21 1:18 am, Bart Van Assche wrote:

>> On 8/28/21 02:47, Adrian Hunter wrote:

>>> There is a deadlock that seems to be related to this patch, because now

>>> requests are blocked while the error handler waits on the host_sem.

>>

>> Hi Adrian,

>>

>> Some but not all of the issues mentioned below have been introduced by patch 16/18. Anyway, thank you for having shared your concerns.

>>

>>> Example:

>>>

>>> ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.

>>> ufshcd_wl_suspend() wins the race but now PM requests deadlock:

>>>

>>> because:

>>>   scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

>>>

>>> because:

>>>   scsi_schedule_eh() has done:

>>>         scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||

>>>         scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)

>>>

>>>

>>> Some questions for thought:

>>>

>>> Won't any holder of host_sem deadlock if it tries to do SCSI requests

>>> and the error handler is waiting on host_sem?

>>>

>>> Won't runtime resume deadlock if it is initiated by the error handler?

>>

>> My understanding is that host_sem is used for the following purposes:

>> - To prevent that sysfs attributes are read or written after shutdown

>>   has started (hba->shutting_down).

>> - To serialize sysfs attribute access, clock scaling, error handling,

>>   the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.

>>

>> I propose to make the following changes:

>> - Instead of checking the value of hba->shutting_down from inside sysfs

>>   attribute callbacks, remove sysfs attributes before starting shutdown.

>>   That will remove the need to check hba->shutting_down from inside

>>   sysfs attribute callbacks.

>> - Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()

>>   and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs

>>   attribute access is not the responsibility of a SCSI LLD - this is the

>>   responsibility of the power management core.

>> - Split host_sem. I don't see how else to address the potential deadlock

>>   between the error handler and runtime resume.

>>

>> Do you agree with the above?

> 

> Looking some more:

> 

> sysfs and debugfs use direct access, so there is probably not a problem

> there.


Except with runtime pm, but might be OK if ufshcd_rpm_get_sync() is moved
before down(&hba->host_sem).

> 

> bsg also uses direct access but doesn't appear to have synchronization

> so there is maybe a gap there.  That is an existing problem.

> 

> As an aside, the current synchronization for direct access doesn't make

> complete sense because the lock (host_sem) remains held across retries

> (e.g. ufshcd_query_descriptor_retry) preventing error handling between

> retries.  That is an existing problem.

> 

> ufshcd_wl_suspend() and ufshcd_wl_shutdown() could wait for error handling

> and then disable it somehow. ufshcd_wl_resume() would have to enable it.

> 

> That leaves runtime PM.  Since the error handler can block runtime resume,

> it cannot wait for runtime resume, it must exit.  Another complication is

> that the PM workqueue (pm_wq) gets frozen early during system suspend, so

> requesting an asynchronous runtime resume won't necessarily make any

> progress.

> 

> How does splitting the host_sem address the potential deadlock

> between the error handler and runtime resume?

>
Bart Van Assche Aug. 31, 2021, 5:18 p.m. UTC | #9
On 8/31/21 12:24 AM, Adrian Hunter wrote:
> How does splitting the host_sem address the potential deadlock

> between the error handler and runtime resume?


Hmm ... how do runtime resume and the error handler deadlock? If 
shost->eh_noresume == 0 then scsi_error_handler() will call 
scsi_autopm_get_host() before invoking the eh_strategy_handler callback. 
The definition of scsi_autopm_get_host() is as follows:

int scsi_autopm_get_host(struct Scsi_Host *shost)
{
	int	err;

	err = pm_runtime_get_sync(&shost->shost_gendev);
	if (err < 0 && err !=-EACCES)
		pm_runtime_put_sync(&shost->shost_gendev);
	else
		err = 0;
	return err;
}

The power management operations used for shost_gendev instances are 
defined by scsi_bus_pm_ops (see also scsi_host_alloc()). The following 
function is the runtime resume function referenced by scsi_bus_pm_ops 
and skips shost_gendevs since these are not SCSI devices:

static int scsi_runtime_resume(struct device *dev)
{
	int err = 0;

	dev_dbg(dev, "scsi_runtime_resume\n");
	if (scsi_is_sdev_device(dev))
		err = sdev_runtime_resume(dev);

	/* Insert hooks here for targets, hosts, and transport classes*/

	return err;
}

In addition to the above function the runtime resume callback of the UFS 
platform device is also invoked. I think all these functions call 
ufshcd_runtime_resume(). As far as I can see ufshcd_runtime_resume() 
does not touch host_sem?

Bart.
Adrian Hunter Sept. 1, 2021, 7:42 a.m. UTC | #10
On 31/08/21 8:18 pm, Bart Van Assche wrote:
> On 8/31/21 12:24 AM, Adrian Hunter wrote:

>> How does splitting the host_sem address the potential deadlock

>> between the error handler and runtime resume?

> 

> Hmm ... how do runtime resume and the error handler deadlock? If shost->eh_noresume == 0 then scsi_error_handler() will call scsi_autopm_get_host() before invoking the eh_strategy_handler callback. The definition of scsi_autopm_get_host() is as follows:

> 

> int scsi_autopm_get_host(struct Scsi_Host *shost)

> {

>     int    err;

> 

>     err = pm_runtime_get_sync(&shost->shost_gendev);

>     if (err < 0 && err !=-EACCES)

>         pm_runtime_put_sync(&shost->shost_gendev);

>     else

>         err = 0;

>     return err;

> }


That resumes the host, but the problem is with resuming the UFS device.
The UFS device can be resumed by ufshcd_err_handling_prepare(),
notably before it calls ufshcd_scsi_block_requests()

> 

> The power management operations used for shost_gendev instances are defined by scsi_bus_pm_ops (see also scsi_host_alloc()). The following function is the runtime resume function referenced by scsi_bus_pm_ops and skips shost_gendevs since these are not SCSI devices:

> 

> static int scsi_runtime_resume(struct device *dev)

> {

>     int err = 0;

> 

>     dev_dbg(dev, "scsi_runtime_resume\n");

>     if (scsi_is_sdev_device(dev))

>         err = sdev_runtime_resume(dev);

> 

>     /* Insert hooks here for targets, hosts, and transport classes*/

> 

>     return err;

> }

> 

> In addition to the above function the runtime resume callback of the UFS platform device is also invoked. I think all these functions call ufshcd_runtime_resume(). As far as I can see ufshcd_runtime_resume() does not touch host_sem?


No it doesn't use host_sem.  The problem is with issuing requests to a blocked queue.
If the UFS device is in SLEEP state, runtime resume will try to do a
SCSI request to change to ACTIVE state.  That will block while the error
handler is running.  So if the error handler is waiting on runtime resume,
deadlock.

Here is an example:

First instrument debugfs stats file to trigger SCSI error handling:

From: Adrian Hunter <adrian.hunter@intel.com>

Date: Wed, 1 Sep 2021 09:16:34 +0300
Subject: [PATCH] HACK: scsi: ufs: Hack debugfs stats to do scsi_schedule_eh()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

---
 drivers/scsi/ufs/ufs-debugfs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/ufs/ufs-debugfs.c b/drivers/scsi/ufs/ufs-debugfs.c
index 4e1ff209b9336..614ba42203a54 100644
--- a/drivers/scsi/ufs/ufs-debugfs.c
+++ b/drivers/scsi/ufs/ufs-debugfs.c
@@ -3,6 +3,8 @@
 
 #include <linux/debugfs.h>
 
+#include <scsi/scsi_transport.h>
+#include "../scsi_transport_api.h"
 #include "ufs-debugfs.h"
 #include "ufshcd.h"
 
@@ -40,6 +42,10 @@ static int ufs_debugfs_stats_show(struct seq_file *s, void *data)
 	PRT("Host Resets: %llu\n", HOST_RESET);
 	PRT("SCSI command aborts: %llu\n", ABORT);
 #undef PRT
+	seq_printf(s, "\n%s: Calling scsi_schedule_eh\n\n", __func__);
+	dev_info(hba->dev, "%s: Calling scsi_schedule_eh\n", __func__);
+	hba->force_reset = true;
+	scsi_schedule_eh(hba->host);
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(ufs_debugfs_stats);
-- 
2.25.1


Then set an rpm_lvl to SLEEP:

# echo 2 > /sys/bus/pci/drivers/ufshcd/0000\:00\:12.5/rpm_lvl
# grep -H . /sys/bus/pci/drivers/ufshcd/0000\:00\:12.5/*pm*
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/rpm_lvl:2
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/rpm_target_dev_state:SLEEP
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/rpm_target_link_state:ACTIVE
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/spm_lvl:5
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/spm_target_dev_state:POWERDOWN
/sys/bus/pci/drivers/ufshcd/0000:00:12.5/spm_target_link_state:OFF


I have to do a little IO to make sure the new rpm_lvl
has been used to runtime suspend:

# dd if=/dev/sdb of=/dev/null bs=4096 count=1 &
# 1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.202648 s, 20.2 kB/


Then trigger the error handler while runtime suspended:

# cat /sys/kernel/debug/ufshcd/0000\:00\:12.5/stats
PHY Adapter Layer errors (except LINERESET): 0
Data Link Layer errors: 0
Network Layer errors: 0
Transport Layer errors: 0
Generic DME errors: 0
Auto-hibernate errors: 0
IS Fatal errors (CEFES, SBFES, HCFES, DFES): 0
DME Link Startup errors: 0
PM Resume errors: 0
PM Suspend errors : 0
Logical Unit Resets: 0
Host Resets: 1
SCSI command aborts: 0

ufs_debugfs_stats_show: Calling scsi_schedule_eh


And show blocked tasks:

# echo w > /proc/sysrq-trigger
# dmesg | tail -29
[  269.223247] sysrq: Show Blocked State
[  269.224452] task:scsi_eh_1       state:D stack:13936 pid:  258 ppid:     2 flags:0x00004000
[  269.225318] Call Trace:
[  269.226133]  __schedule+0x26c/0x6c0
[  269.227265]  schedule+0x3f/0xa0
[  269.228472]  schedule_timeout+0x209/0x290
[  269.228872]  ? blk_mq_sched_dispatch_requests+0x2b/0x50
[  269.229247]  io_schedule_timeout+0x14/0x40
[  269.229825] wait_for_completion_io+0x81/0xe0
[  269.230273]  blk_execute_rq+0x64/0xd0
[  269.230868]  __scsi_execute+0x109/0x260
[  269.231301] ufshcd_set_dev_pwr_mode+0xe2/0x1c0 [ufshcd_core]
[  269.231754]  __ufshcd_wl_resume+0x96/0x220 [ufshcd_core]
[  269.232146] ufshcd_wl_runtime_resume+0x28/0xd0 [ufshcd_core]
[  269.232756]  scsi_runtime_resume+0x76/0xb0
[  269.233499]  ? scsi_autopm_put_device+0x20/0x20
[  269.234171]  __rpm_callback+0x3b/0x110
[  269.235095]  ? scsi_autopm_put_device+0x20/0x20
[  269.235988]  rpm_callback+0x54/0x60
[  269.236607]  rpm_resume+0x503/0x700
[  269.237134]  ? __pm_runtime_idle+0x4c/0xe0
[  269.237822]  __pm_runtime_resume+0x45/0x70
[  269.238376] ufshcd_err_handler+0x112/0x9f0 [ufshcd_core]
[  269.238928]  ? __pm_runtime_resume+0x53/0x70
[  269.239392]  scsi_error_handler+0xa8/0x3a0
[  269.239832]  ? scsi_eh_get_sense+0x140/0x140
[  269.240266]  kthread+0x11f/0x140
[  269.240860]  ? set_kthread_struct+0x40/0x40
[  269.241275]  ret_from_fork+0x1f/0x30
#
Bart Van Assche Sept. 1, 2021, 8:46 p.m. UTC | #11
On 9/1/21 12:42 AM, Adrian Hunter wrote:
> No it doesn't use host_sem.  The problem is with issuing requests to a blocked queue.

> If the UFS device is in SLEEP state, runtime resume will try to do a

> SCSI request to change to ACTIVE state.  That will block while the error

> handler is running.  So if the error handler is waiting on runtime resume,

> deadlock.


Please define "UFS device". Does this refer to the physical device or to 
a LUN?

I agree that suspending or resuming a LUN involves executing a SCSI 
command. See also __ufshcd_wl_suspend() and __ufshcd_wl_resume(). These 
functions are used to suspend or resume a LUN and not to suspend or 
resume the UFS device.

However, I don't see how the above scenario would lead to a deadlock? 
The UFS error handler (ufshcd_err_handler()) works at the link level and 
may resume the SCSI host and/or UFS device (hba->host and hba->dev). The 
UFS error handler must not try to resume any of the LUNs since that 
involves executing SCSI commands.

Bart.
Adrian Hunter Sept. 2, 2021, 6:02 a.m. UTC | #12
On 1/09/21 11:46 pm, Bart Van Assche wrote:
> On 9/1/21 12:42 AM, Adrian Hunter wrote:

>> No it doesn't use host_sem.  The problem is with issuing requests to a blocked queue.

>> If the UFS device is in SLEEP state, runtime resume will try to do a

>> SCSI request to change to ACTIVE state.  That will block while the error

>> handler is running.  So if the error handler is waiting on runtime resume,

>> deadlock.

> 

> Please define "UFS device". Does this refer to the physical device or to a LUN?


UFS Device WLUN aka UFS Device Well-known LUN aka LUN 49488. It is in the spec.

> 

> I agree that suspending or resuming a LUN involves executing a SCSI command. See also __ufshcd_wl_suspend() and __ufshcd_wl_resume(). These functions are used to suspend or resume a LUN and not to suspend or resume the UFS device.


__ufshcd_wl_suspend() and __ufshcd_wl_resume() are for the UFS Device WLUN (what the wl stands for).  All other LUNs are device link consumers of it.

> 

> However, I don't see how the above scenario would lead to a deadlock? The UFS error handler (ufshcd_err_handler()) works at the link level and may resume the SCSI host and/or UFS device (hba->host and hba->dev). The UFS error handler must not try to resume any of the LUNs since that involves executing SCSI commands.


A detailed example was provided.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 75730a43fcca..8d87fb214281 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -17,6 +17,8 @@ 
 #include <linux/blk-pm.h>
 #include <linux/blkdev.h>
 #include <scsi/scsi_driver.h>
+#include <scsi/scsi_transport.h>
+#include "../scsi_transport_api.h"
 #include "ufshcd.h"
 #include "ufs_quirks.h"
 #include "unipro.h"
@@ -232,7 +234,6 @@  static int ufshcd_scale_clks(struct ufs_hba *hba, bool scale_up);
 static irqreturn_t ufshcd_intr(int irq, void *__hba);
 static int ufshcd_change_power_mode(struct ufs_hba *hba,
 			     struct ufs_pa_layer_attr *pwr_mode);
-static void ufshcd_schedule_eh_work(struct ufs_hba *hba);
 static int ufshcd_setup_hba_vreg(struct ufs_hba *hba, bool on);
 static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on);
 static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
@@ -3906,6 +3907,35 @@  int ufshcd_dme_get_attr(struct ufs_hba *hba, u32 attr_sel,
 }
 EXPORT_SYMBOL_GPL(ufshcd_dme_get_attr);
 
+static inline bool ufshcd_is_saved_err_fatal(struct ufs_hba *hba)
+{
+	lockdep_assert_held(hba->host->host_lock);
+
+	return (hba->saved_uic_err & UFSHCD_UIC_DL_PA_INIT_ERROR) ||
+	       (hba->saved_err & (INT_FATAL_ERRORS | UFSHCD_UIC_HIBERN8_MASK));
+}
+
+static void ufshcd_schedule_eh(struct ufs_hba *hba)
+{
+	bool schedule_eh = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	/* handle fatal errors only when link is not in error state */
+	if (hba->ufshcd_state != UFSHCD_STATE_ERROR) {
+		if (hba->force_reset || ufshcd_is_link_broken(hba) ||
+		    ufshcd_is_saved_err_fatal(hba))
+			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_FATAL;
+		else
+			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_NON_FATAL;
+		schedule_eh = true;
+	}
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	if (schedule_eh)
+		scsi_schedule_eh(hba->host);
+}
+
 /**
  * ufshcd_uic_pwr_ctrl - executes UIC commands (which affects the link power
  * state) and waits for it to take effect.
@@ -3926,6 +3956,7 @@  static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 {
 	DECLARE_COMPLETION_ONSTACK(uic_async_done);
 	unsigned long flags;
+	bool schedule_eh = false;
 	u8 status;
 	int ret;
 	bool reenable_intr = false;
@@ -3995,10 +4026,14 @@  static int ufshcd_uic_pwr_ctrl(struct ufs_hba *hba, struct uic_command *cmd)
 		ufshcd_enable_intr(hba, UIC_COMMAND_COMPL);
 	if (ret) {
 		ufshcd_set_link_broken(hba);
-		ufshcd_schedule_eh_work(hba);
+		schedule_eh = true;
 	}
+
 out_unlock:
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
+
+	if (schedule_eh)
+		ufshcd_schedule_eh(hba);
 	mutex_unlock(&hba->uic_cmd_mutex);
 
 	return ret;
@@ -5802,27 +5837,6 @@  static bool ufshcd_quirk_dl_nac_errors(struct ufs_hba *hba)
 	return err_handling;
 }
 
-/* host lock must be held before calling this func */
-static inline bool ufshcd_is_saved_err_fatal(struct ufs_hba *hba)
-{
-	return (hba->saved_uic_err & UFSHCD_UIC_DL_PA_INIT_ERROR) ||
-	       (hba->saved_err & (INT_FATAL_ERRORS | UFSHCD_UIC_HIBERN8_MASK));
-}
-
-/* host lock must be held before calling this func */
-static inline void ufshcd_schedule_eh_work(struct ufs_hba *hba)
-{
-	/* handle fatal errors only when link is not in error state */
-	if (hba->ufshcd_state != UFSHCD_STATE_ERROR) {
-		if (hba->force_reset || ufshcd_is_link_broken(hba) ||
-		    ufshcd_is_saved_err_fatal(hba))
-			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_FATAL;
-		else
-			hba->ufshcd_state = UFSHCD_STATE_EH_SCHEDULED_NON_FATAL;
-		queue_work(hba->eh_wq, &hba->eh_work);
-	}
-}
-
 static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
 {
 	down_write(&hba->clk_scaling_lock);
@@ -5956,11 +5970,11 @@  static bool ufshcd_is_pwr_mode_restore_needed(struct ufs_hba *hba)
 
 /**
  * ufshcd_err_handler - handle UFS errors that require s/w attention
- * @work: pointer to work structure
+ * @host: SCSI host pointer
  */
-static void ufshcd_err_handler(struct work_struct *work)
+static void ufshcd_err_handler(struct Scsi_Host *host)
 {
-	struct ufs_hba *hba;
+	struct ufs_hba *hba = shost_priv(host);
 	unsigned long flags;
 	bool err_xfer = false;
 	bool err_tm = false;
@@ -5968,10 +5982,9 @@  static void ufshcd_err_handler(struct work_struct *work)
 	int tag;
 	bool needs_reset = false, needs_restore = false;
 
-	hba = container_of(work, struct ufs_hba, eh_work);
-
 	down(&hba->host_sem);
 	spin_lock_irqsave(hba->host->host_lock, flags);
+	hba->host->host_eh_scheduled = 0;
 	if (ufshcd_err_handling_should_stop(hba)) {
 		if (hba->ufshcd_state != UFSHCD_STATE_ERROR)
 			hba->ufshcd_state = UFSHCD_STATE_OPERATIONAL;
@@ -6285,7 +6298,6 @@  static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 					 "host_regs: ");
 			ufshcd_print_pwr_info(hba);
 		}
-		ufshcd_schedule_eh_work(hba);
 		retval |= IRQ_HANDLED;
 	}
 	/*
@@ -6297,6 +6309,10 @@  static irqreturn_t ufshcd_check_errors(struct ufs_hba *hba, u32 intr_status)
 	hba->errors = 0;
 	hba->uic_error = 0;
 	spin_unlock(hba->host->host_lock);
+
+	if (queue_eh_work)
+		ufshcd_schedule_eh(hba);
+
 	return retval;
 }
 
@@ -6959,15 +6975,17 @@  static int ufshcd_abort(struct scsi_cmnd *cmd)
 	 * will be to send LU reset which, again, is a spec violation.
 	 * To avoid these unnecessary/illegal steps, first we clean up
 	 * the lrb taken by this cmd and re-set it in outstanding_reqs,
-	 * then queue the eh_work and bail.
+	 * then queue the error handler and bail.
 	 */
 	if (lrbp->lun == UFS_UPIU_UFS_DEVICE_WLUN) {
 		ufshcd_update_evt_hist(hba, UFS_EVT_ABORT, lrbp->lun);
 
 		spin_lock_irqsave(host->host_lock, flags);
 		hba->force_reset = true;
-		ufshcd_schedule_eh_work(hba);
 		spin_unlock_irqrestore(host->host_lock, flags);
+
+		ufshcd_schedule_eh(hba);
+
 		goto release;
 	}
 
@@ -7099,11 +7117,10 @@  static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->force_reset = true;
-	ufshcd_schedule_eh_work(hba);
 	dev_err(hba->dev, "%s: reset in progress - 1\n", __func__);
 	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
-	flush_work(&hba->eh_work);
+	ufshcd_err_handler(hba->host);
 
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	if (hba->ufshcd_state == UFSHCD_STATE_ERROR)
@@ -8463,8 +8480,6 @@  static void ufshcd_hba_exit(struct ufs_hba *hba)
 	if (hba->is_powered) {
 		ufshcd_exit_clk_scaling(hba);
 		ufshcd_exit_clk_gating(hba);
-		if (hba->eh_wq)
-			destroy_workqueue(hba->eh_wq);
 		ufs_debugfs_hba_exit(hba);
 		ufshcd_variant_hba_exit(hba);
 		ufshcd_setup_vreg(hba, false);
@@ -9303,6 +9318,10 @@  static int ufshcd_set_dma_mask(struct ufs_hba *hba)
 	return dma_set_mask_and_coherent(hba->dev, DMA_BIT_MASK(32));
 }
 
+static struct scsi_transport_template ufshcd_transport_template = {
+	.eh_strategy_handler = ufshcd_err_handler,
+};
+
 /**
  * ufshcd_alloc_host - allocate Host Bus Adapter (HBA)
  * @dev: pointer to device handle
@@ -9329,6 +9348,7 @@  int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle)
 		err = -ENOMEM;
 		goto out_error;
 	}
+	host->transportt = &ufshcd_transport_template;
 	hba = shost_priv(host);
 	hba->host = host;
 	hba->dev = dev;
@@ -9367,7 +9387,6 @@  int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	int err;
 	struct Scsi_Host *host = hba->host;
 	struct device *dev = hba->dev;
-	char eh_wq_name[sizeof("ufs_eh_wq_00")];
 
 	if (!mmio_base) {
 		dev_err(hba->dev,
@@ -9421,17 +9440,6 @@  int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 
 	hba->max_pwr_info.is_valid = false;
 
-	/* Initialize work queues */
-	snprintf(eh_wq_name, sizeof(eh_wq_name), "ufs_eh_wq_%d",
-		 hba->host->host_no);
-	hba->eh_wq = create_singlethread_workqueue(eh_wq_name);
-	if (!hba->eh_wq) {
-		dev_err(hba->dev, "%s: failed to create eh workqueue\n",
-				__func__);
-		err = -ENOMEM;
-		goto out_disable;
-	}
-	INIT_WORK(&hba->eh_work, ufshcd_err_handler);
 	INIT_WORK(&hba->eeh_work, ufshcd_exception_event_handler);
 
 	sema_init(&hba->host_sem, 1);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 91b0b278469d..d0bca2b233ef 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -716,8 +716,6 @@  struct ufs_hba_monitor {
  * @is_powered: flag to check if HBA is powered
  * @shutting_down: flag to check if shutdown has been invoked
  * @host_sem: semaphore used to serialize concurrent contexts
- * @eh_wq: Workqueue that eh_work works on
- * @eh_work: Worker to handle UFS errors that require s/w attention
  * @eeh_work: Worker to handle exception events
  * @errors: HBA errors
  * @uic_error: UFS interconnect layer error status
@@ -820,8 +818,6 @@  struct ufs_hba {
 	struct semaphore host_sem;
 
 	/* Work Queues */
-	struct workqueue_struct *eh_wq;
-	struct work_struct eh_work;
 	struct work_struct eeh_work;
 
 	/* HBA Errors */