diff mbox series

[v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery

Message ID 20221208072520.26210-1-peter.wang@mediatek.com
State New
Headers show
Series [v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery | expand

Commit Message

Peter Wang (王信友) Dec. 8, 2022, 7:25 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

When SSU/enter hibern8 fail in wlun suspend flow, trigger error
handler and return busy to break the suspend.
If not, wlun runtime pm status become error and the consumer will
stuck in runtime suspend status.

Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/ufs/core/ufshcd.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Martin K. Petersen Dec. 14, 2022, 3:16 a.m. UTC | #1
Peter,

> When SSU/enter hibern8 fail in wlun suspend flow, trigger error
> handler and return busy to break the suspend.  If not, wlun runtime pm
> status become error and the consumer will stuck in runtime suspend
> status.

Applied to 6.2/scsi-staging, thanks!
Daniil Lunev Dec. 20, 2022, 9 p.m. UTC | #2
> Applied to 6.2/scsi-staging, thanks!

There is an interesting side effect of the patch in this iteration
(which I am not sure was present in the past iteration I tried):
If the device auto suspends while running purge - controller is
seemingly recent and thus the purge is aborted (with no patch at all
it hangs).
That might be ok behaviour though - it will just make it an explicit
requirement to disable runtime suspend during the management
operation.

localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep bPurgeStatus
bPurgeStatus               := 0x00

[   25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
power mode: 2, result 2
[   25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready [current]
[   25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
sense information
[   25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
failed: -16
Peter Wang (王信友) Dec. 21, 2022, 5:59 a.m. UTC | #3
On Wed, 2022-12-21 at 08:00 +1100, Daniil Lunev wrote:
> > Applied to 6.2/scsi-staging, thanks!
> 
> There is an interesting side effect of the patch in this iteration
> (which I am not sure was present in the past iteration I tried):
> If the device auto suspends while running purge - controller is
> seemingly recent and thus the purge is aborted (with no patch at all
> it hangs).
> That might be ok behaviour though - it will just make it an explicit
> requirement to disable runtime suspend during the management
> operation.
> 

Hi Daniil,

I am not sure if this is similar reason we get SSU(sleep) fail.
But if without this patch when purge is onging, system IO will hang,
this is no better.
And I have another idea about rpm and purge.

To disable runtime suspend when purge operation is ongoing:
1. Disable rpm when fPurgeEnable is set, polling bPurgeStatus become 0
and enable rpm.
   But polling bPurgeStatus will extend rpm timer, so we don't need
really disable rpm, right?
2. Check bPurgeStatus if enter runtime suspend, return EBUSY if
bPurgeStatus is not 0 to break suspend.
   This is correct design to tell rpm flamework that driver is busy
with purge and suspend is inappropriate. 
   But it should be similar as current flow, return EBUSY when get SSU
fail?

So, with current design, if purge initiator do not want to see rpm
EBUSY, then he should polling bPurgeStatus. 
What do you think?


Thanks.
BR
Peter



> localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
> localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep
> bPurgeStatus
> bPurgeStatus               := 0x00
> 
> [   25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
> power mode: 2, result 2
> [   25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready
> [current]
> [   25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
> sense information
> [   25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
> failed: -16
Daniil Lunev Jan. 2, 2023, 10:05 p.m. UTC | #4
On Wed, Dec 21, 2022 at 4:59 PM Peter Wang (王信友)
<peter.wang@mediatek.com> wrote:
> But if without this patch when purge is onging, system IO will hang,
> this is no better.
Yes, that is why I am just pointing this out as a matter of fact, not as a bug.
It is arguable if resetting the controller in the deadlock situation is a proper
thing to do, but it might be the next best thing, so I don't argue that neither.

> So, with current design, if purge initiator do not want to see rpm
> EBUSY, then he should polling bPurgeStatus.
> What do you think?

I am actually not sure if management operations extend the timeout - they are
going through bsg interface, and I am not sure it properly re-sets the timeouts
on all possible nexus interfaces, need to check that.
But even if it does, there are two problems:
* If you make kernel be polling that parameter - it will actually make the
  application level to miss the completion code (since after querying
  completion once it will return Not Started afterwards).
* And application polling is race prone. We set runtime suspend to 100ms - so
  depending on the scheduling quirks it may miss the event.

--Daniil
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index b1f59a5fe632..31ed3fdb5266 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6070,6 +6070,14 @@  void ufshcd_schedule_eh_work(struct ufs_hba *hba)
 	}
 }
 
+static void ufshcd_force_error_recovery(struct ufs_hba *hba) 
+{
+	spin_lock_irq(hba->host->host_lock);
+	hba->force_reset = true;
+	ufshcd_schedule_eh_work(hba);
+	spin_unlock_irq(hba->host->host_lock);
+}
+
 static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
 {
 	down_write(&hba->clk_scaling_lock);
@@ -9049,6 +9057,15 @@  static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 
 		if (!hba->dev_info.b_rpm_dev_flush_capable) {
 			ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode);
+			if (ret && pm_op != UFS_SHUTDOWN_PM) {
+				/*
+				 * If return err in suspend flow, IO will hang.
+				 * Trigger error handler and break suspend for
+				 * error recovery.
+				 */
+				ufshcd_force_error_recovery(hba);
+				ret = -EBUSY;
+			}
 			if (ret)
 				goto enable_scaling;
 		}
@@ -9060,6 +9077,15 @@  static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	 */
 	check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba);
 	ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops);
+	if (ret && pm_op != UFS_SHUTDOWN_PM) {
+		/*
+		 * If return err in suspend flow, IO will hang.
+		 * Trigger error handler and break suspend for
+		 * error recovery.
+		 */
+		ufshcd_force_error_recovery(hba);
+		ret = -EBUSY;
+	}
 	if (ret)
 		goto set_dev_active;