diff mbox series

[v3,13/18] scsi: ufs: Optimize SCSI command processing

Message ID 20210722033439.26550-14-bvanassche@acm.org
State New
Headers show
Series UFS patches for kernel v5.15 | expand

Commit Message

Bart Van Assche July 22, 2021, 3:34 a.m. UTC
Use a spinlock to protect hba->outstanding_reqs instead of using atomic
operations to update this member variable.

This patch is a performance improvement because it reduces the number of
atomic operations in the hot path (test_and_clear_bit()) and because it
reduces the lock contention on the SCSI host lock. On my test setup this
patch improves IOPS by about 1%.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Can Guo <cang@codeaurora.org>
Cc: Asutosh Das <asutoshd@codeaurora.org>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 29 ++++++++++++++++++-----------
 drivers/scsi/ufs/ufshcd.h |  2 ++
 2 files changed, 20 insertions(+), 11 deletions(-)

Comments

Daejun Park July 29, 2021, 1:25 a.m. UTC | #1
Hi Bart,

>Use a spinlock to protect hba->outstanding_reqs instead of using atomic

>operations to update this member variable.

> 

>This patch is a performance improvement because it reduces the number of

>atomic operations in the hot path (test_and_clear_bit()) and because it

>reduces the lock contention on the SCSI host lock. On my test setup this

>patch improves IOPS by about 1%.


Reviewed-by: Daejun Park <daejun7.park@samsung.com>


Thanks,
Daejun
Bean Huo July 29, 2021, 9:12 a.m. UTC | #2
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c

> 

> index 436d814f4c1e..a3ad83a3bae0 100644

> 

> --- a/drivers/scsi/ufs/ufshcd.c

> 

> +++ b/drivers/scsi/ufs/ufshcd.c

> 

> @@ -2095,12 +2095,14 @@ void ufshcd_send_command(struct ufs_hba *hba,

> unsigned int task_tag)

> 

>         ufshcd_clk_scaling_start_busy(hba);

> 

>         if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))

> 

>                 ufshcd_start_monitor(hba, lrbp);

> 

> -       spin_lock_irqsave(hba->host->host_lock, flags);

> 

> +

> 

> +       spin_lock_irqsave(&hba->outstanding_lock, flags);

> 

>         if (hba->vops && hba->vops->setup_xfer_req)

> 

>                 hba->vops->setup_xfer_req(hba, task_tag, !!lrbp-

> >cmd);


Bart,

Removing hba->host->host_lock, use hba->outstanding_lock, the issue
fixed by your patch [12/18] still be fixed?

Bean
Bart Van Assche July 29, 2021, 4:11 p.m. UTC | #3
On 7/29/21 2:12 AM, Bean Huo wrote:
> Removing hba->host->host_lock, use hba->outstanding_lock, the issue

> fixed by your patch [12/18] still be fixed?


Yes. My understanding is that setup_xfer_req() calls must be serialized 
against each other but not against any other code that is protected by 
the host lock.

Thanks,

Bart.
Bean Huo Aug. 2, 2021, 12:11 p.m. UTC | #4
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:
> Use a spinlock to protect hba->outstanding_reqs instead of using

> atomic

> 

> operations to update this member variable.

> 

> 

> 

> This patch is a performance improvement because it reduces the number

> of

> 

> atomic operations in the hot path (test_and_clear_bit()) and because

> it

> 

> reduces the lock contention on the SCSI host lock. On my test setup

> this

> 

> patch improves IOPS by about 1%.

> 

> 

> 

> Cc: Adrian Hunter <adrian.hunter@intel.com>

> 

> Cc: Stanley Chu <stanley.chu@mediatek.com>

> 

> Cc: Can Guo <cang@codeaurora.org>

> 

> Cc: Asutosh Das <asutoshd@codeaurora.org>

> 

> Cc: Avri Altman <avri.altman@wdc.com>

> 

> Signed-off-by: Bart Van Assche <bvanassche@acm.org>


Reviewed-by: Bean Huo <beanhuo@micron.com>
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 436d814f4c1e..a3ad83a3bae0 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2095,12 +2095,14 @@  void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
-	spin_lock_irqsave(hba->host->host_lock, flags);
+
+	spin_lock_irqsave(&hba->outstanding_lock, flags);
 	if (hba->vops && hba->vops->setup_xfer_req)
 		hba->vops->setup_xfer_req(hba, task_tag, !!lrbp->cmd);
-	set_bit(task_tag, &hba->outstanding_reqs);
+	__set_bit(task_tag, &hba->outstanding_reqs);
 	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	spin_unlock_irqrestore(&hba->outstanding_lock, flags);
+
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -2882,7 +2884,9 @@  static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba,
 		 * we also need to clear the outstanding_request
 		 * field in hba
 		 */
-		clear_bit(lrbp->task_tag, &hba->outstanding_reqs);
+		spin_lock_irqsave(&hba->outstanding_lock, flags);
+		__clear_bit(lrbp->task_tag, &hba->outstanding_reqs);
+		spin_unlock_irqrestore(&hba->outstanding_lock, flags);
 	}
 
 	return err;
@@ -5194,8 +5198,6 @@  static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 	bool update_scaling = false;
 
 	for_each_set_bit(index, &completed_reqs, hba->nutrs) {
-		if (!test_and_clear_bit(index, &hba->outstanding_reqs))
-			continue;
 		lrbp = &hba->lrb[index];
 		lrbp->compl_time_stamp = ktime_get();
 		cmd = lrbp->cmd;
@@ -5250,10 +5252,14 @@  static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
-	spin_lock_irqsave(hba->host->host_lock, flags);
+	spin_lock_irqsave(&hba->outstanding_lock, flags);
 	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
-	spin_unlock_irqrestore(hba->host->host_lock, flags);
+	completed_reqs = ~tr_doorbell & hba->outstanding_reqs;
+	WARN_ONCE(completed_reqs & ~hba->outstanding_reqs,
+		  "completed: %#lx; outstanding: %#lx\n", completed_reqs,
+		  hba->outstanding_reqs);
+	hba->outstanding_reqs &= ~completed_reqs;
+	spin_unlock_irqrestore(&hba->outstanding_lock, flags);
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -9340,10 +9346,11 @@  int ufshcd_alloc_host(struct device *dev, struct ufs_hba **hba_handle)
 	hba = shost_priv(host);
 	hba->host = host;
 	hba->dev = dev;
-	*hba_handle = hba;
 	hba->dev_ref_clk_freq = REF_CLK_FREQ_INVAL;
-
 	INIT_LIST_HEAD(&hba->clk_list_head);
+	spin_lock_init(&hba->outstanding_lock);
+
+	*hba_handle = hba;
 
 out_error:
 	return err;
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 6df847facd1d..91b0b278469d 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -695,6 +695,7 @@  struct ufs_hba_monitor {
  * @lrb: local reference block
  * @cmd_queue: Used to allocate command tags from hba->host->tag_set.
  * @outstanding_tasks: Bits representing outstanding task requests
+ * @outstanding_lock: Protects @outstanding_reqs.
  * @outstanding_reqs: Bits representing outstanding transfer requests
  * @capabilities: UFS Controller Capabilities
  * @nutrs: Transfer Request Queue depth supported by controller
@@ -781,6 +782,7 @@  struct ufs_hba {
 	struct ufshcd_lrb *lrb;
 
 	unsigned long outstanding_tasks;
+	spinlock_t outstanding_lock;
 	unsigned long outstanding_reqs;
 
 	u32 capabilities;