diff mbox series

[v1] ufs: core: fix ufshcd_abort_all racing issue

Message ID 20240621113051.29523-1-peter.wang@mediatek.com
State New
Headers show
Series [v1] ufs: core: fix ufshcd_abort_all racing issue | expand

Commit Message

Peter Wang (王信友) June 21, 2024, 11:30 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

When ufshcd_abort_all racing with complete ISR,
the completed tag of request will be release by ISR.
And ufshca_abort_all call ufshcd_mcq_req_to_hwq will
get NULL pointer KE.
Also change the return value success when request is
completed by ISR beacuse sq dosen't need cleanup.

Below is KE back trace.
  ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
  ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
  Aborting tag 41 / CDB 0x28 succeeded
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
  pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
  lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
   do_mem_abort+0x58/0x118
   el1_abort+0x3c/0x5c
   el1h_64_sync_handler+0x54/0x90
   el1h_64_sync+0x68/0x6c
   blk_mq_unique_tag+0x8/0x14
   ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
   process_one_work+0x208/0x4fc
   worker_thread+0x228/0x438
   kthread+0x104/0x1d4
   ret_from_fork+0x10/0x20

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufs-mcq.c | 5 +++--
 drivers/ufs/core/ufshcd.c  | 9 +++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

Comments

Peter Wang (王信友) June 24, 2024, 9:13 a.m. UTC | #1
On Fri, 2024-06-21 at 10:15 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 6/21/24 4:30 AM, peter.wang@mediatek.com wrote:
>  > [ ... ]
> 
> Fixes: and Cc: stable tags are missing from this patch. Please add
> these.
> 

Hi Bart,

Okay, will add next version. Thanks.


> > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-
> mcq.c
> > index 8944548c30fa..3b2e5bcb08a7 100644
> > --- a/drivers/ufs/core/ufs-mcq.c
> > +++ b/drivers/ufs/core/ufs-mcq.c
> > @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
> int task_tag)
> >   return -ETIMEDOUT;
> >   
> >   if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
> > -if (!cmd)
> > -return -EINVAL;
> > +/* Should return 0 if cmd is already complete by irq */
> > +if (!cmd || !ufshcd_cmd_inflight(cmd))
> > +return 0;
> >   hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
> >   } else {
> >   hwq = hba->dev_cmd_queue;
> 
> This code change reduces the race window but does not eliminate it
> since
> no locks are held around this code path.
> 
> Additionally, are you sure that this change is necessary? I don't
> think
> that ufshcd_err_handler() calls ufshcd_mcq_sq_cleanup().
> 
> Thanks,
> 
> Bart.

Hi Bart,

Yes, this code reduces the race window only.
Beacuse if we want get mcq lock, we need know witch hwq[i] lock 
can use.
But before we get which hwq[i] lock could use, we will get this null
pointer error.
This is quite the chicken or the eqq dilemma.
Could you have any suggestion?

Additionally, here is the backtrace of ufshcd_mcq_sq_cleanup.

  ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
  Unable to handle kernel NULL pointer dereference at virtual address
0000000000000194
   pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
   lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc
[ufs_mediatek_mod_ise]
   Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
   Call trace:
    dump_backtrace+0xf8/0x148
    show_stack+0x18/0x24
    dump_stack_lvl+0x60/0x7c
    dump_stack+0x18/0x3c
    mrdump_common_die+0x24c/0x398 [mrdump]
    ipanic_die+0x20/0x34 [mrdump]
    notify_die+0x80/0xd8
    die+0x94/0x2b8
    __do_kernel_fault+0x264/0x298
    do_page_fault+0xa4/0x4b8
    do_translation_fault+0x38/0x54
    do_mem_abort+0x58/0x118
    el1_abort+0x3c/0x5c
    el1h_64_sync_handler+0x54/0x90
    el1h_64_sync+0x68/0x6c
    blk_mq_unique_tag+0x8/0x14
    ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
    ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
    ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
    process_one_work+0x208/0x4fc
    worker_thread+0x228/0x438
    kthread+0x104/0x1d4
    ret_from_fork+0x10/0x20

Thanks.
Peter
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 8944548c30fa..3b2e5bcb08a7 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -512,8 +512,9 @@  int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
 		return -ETIMEDOUT;
 
 	if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
-		if (!cmd)
-			return -EINVAL;
+		/* Should return 0 if cmd is already complete by irq */
+		if (!cmd || !ufshcd_cmd_inflight(cmd))
+			return 0;
 		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
 	} else {
 		hwq = hba->dev_cmd_queue;
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e5e9da61f15d..e8bca62ceed8 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6455,11 +6455,12 @@  static bool ufshcd_abort_one(struct request *rq, void *priv)
 
 	/* Release cmd in MCQ mode if abort succeeds */
 	if (is_mcq_enabled(hba) && (*ret == 0)) {
-		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
-		spin_lock_irqsave(&hwq->cq_lock, flags);
-		if (ufshcd_cmd_inflight(lrbp->cmd))
+		if (ufshcd_cmd_inflight(lrbp->cmd)) {
+			hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(lrbp->cmd));
+			spin_lock_irqsave(&hwq->cq_lock, flags);
 			ufshcd_release_scsi_cmd(hba, lrbp);
-		spin_unlock_irqrestore(&hwq->cq_lock, flags);
+			spin_unlock_irqrestore(&hwq->cq_lock, flags);
+		}
 	}
 
 	return *ret == 0;