diff mbox series

[v3,11/18] scsi: ufs: Revert "Utilize Transfer Request List Completion Notification Register"

Message ID 20210722033439.26550-12-bvanassche@acm.org
State New
Headers show
Series UFS patches for kernel v5.15 | expand

Commit Message

Bart Van Assche July 22, 2021, 3:34 a.m. UTC
Using the UTRLCNR register involves two MMIO accesses in the hot path while
using the doorbell register only involves a single MMIO access. Since MMIO
accesses take time, do not use the UTRLCNR register. The spinlock contention
on the SCSI host lock that is reintroduced by this patch will be addressed
by a later patch.

This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Can Guo <cang@codeaurora.org>
Cc: Asutosh Das <asutoshd@codeaurora.org>
Cc: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 52 +++++++++++----------------------------
 drivers/scsi/ufs/ufshcd.h |  5 ----
 drivers/scsi/ufs/ufshci.h |  1 -
 3 files changed, 15 insertions(+), 43 deletions(-)

Comments

Bean Huo July 29, 2021, 8:03 a.m. UTC | #1
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:
> Using the UTRLCNR register involves two MMIO accesses in the hot path

> while

> 

> using the doorbell register only involves a single MMIO access. Since

> MMIO

> 

> accesses take time, do not use the UTRLCNR register. The spinlock

> contention

> 

> on the SCSI host lock that is reintroduced by this patch will be

> addressed

> 

> by a later patch.

> 

> 

> 

> This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105.


Bart, 
This commit is the key change in "Optimize host lock on TR send/compl
paths and utilize UTRLCNR"
https://patchwork.kernel.org/project/linux-scsi/cover/1621845419-14194-1-git-send-email-cang@codeaurora.org/.

How did you compare the performance gain/loss after reverting this
commit?

Kind regards,
Bean
Bart Van Assche July 29, 2021, 4:10 p.m. UTC | #2
On 7/29/21 1:03 AM, Bean Huo wrote:
> On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:

>> Using the UTRLCNR register involves two MMIO accesses in the hot path

>> while

>>

>> using the doorbell register only involves a single MMIO access. Since

>> MMIO

>>

>> accesses take time, do not use the UTRLCNR register. The spinlock

>> contention

>>

>> on the SCSI host lock that is reintroduced by this patch will be

>> addressed

>>

>> by a later patch.

>>

>>

>>

>> This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105.

> 

> Bart,

> This commit is the key change in "Optimize host lock on TR send/compl

> paths and utilize UTRLCNR"

> https://patchwork.kernel.org/project/linux-scsi/cover/1621845419-14194-1-git-send-email-cang@codeaurora.org/.

> 

> How did you compare the performance gain/loss after reverting this

> commit?


Hi Bean,

I measured the performance impact of patches 11 and 12 combined. In a 4 
KB read IOPS test I see that these two patches combined improve 
performance by 1%. This illustrates that the two MMIO accesses of the 
UTRLCNR register are slower than the single MMIO access of the doorbell 
register.

Thanks,

Bart.
Bart Van Assche July 29, 2021, 4:13 p.m. UTC | #3
On 7/29/21 9:10 AM, Bart Van Assche wrote:
> On 7/29/21 1:03 AM, Bean Huo wrote:

>> How did you compare the performance gain/loss after reverting this

>> commit?

> 

> I measured the performance impact of patches 11 and 12 combined. In a 4 

> KB read IOPS test I see that these two patches combined improve 

> performance by 1%. This illustrates that the two MMIO accesses of the 

> UTRLCNR register are slower than the single MMIO access of the doorbell 

> register.


A correction: I wanted to refer to the combination of patches 11 and 13 
instead of 11 and 12.

Bart.
Bean Huo July 29, 2021, 9:14 p.m. UTC | #4
On Thu, 2021-07-29 at 09:13 -0700, Bart Van Assche wrote:
> > I measured the performance impact of patches 11 and 12 combined. In

> > a 4 

> > KB read IOPS test I see that these two patches combined improve 

> > performance by 1%. This illustrates that the two MMIO accesses of

> > the 

> > UTRLCNR register are slower than the single MMIO access of the

> > doorbell 

> > register.

> 

> 

> A correction: I wanted to refer to the combination of patches 11 and

> 13 

> 

> instead of 11 and 12.

> 

> 

> 

> Bart.


Bart,
thanks for your explanation. 

I tested this commit before, indeed there is performance improvement.
Based on your comments, I will verify your statement.

Bean
Bean Huo Aug. 2, 2021, 3:24 p.m. UTC | #5
On Wed, 2021-07-21 at 20:34 -0700, Bart Van Assche wrote:
> Using the UTRLCNR register involves two MMIO accesses in the hot path

> while

> using the doorbell register only involves a single MMIO access. Since

> MMIO

> accesses take time, do not use the UTRLCNR register. The spinlock

> contention

> on the SCSI host lock that is reintroduced by this patch will be

> addressed

> by a later patch.

> 

> This reverts commit 6f7151729647e58ac7c522081255fd0c07b38105.

> 

> Cc: Adrian Hunter <adrian.hunter@intel.com>

> Cc: Stanley Chu <stanley.chu@mediatek.com>

> Cc: Can Guo <cang@codeaurora.org>

> Cc: Asutosh Das <asutoshd@codeaurora.org>

> Cc: Avri Altman <avri.altman@wdc.com>

> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

> 


Hi Bart, 

I did the comparison test on my platform, it is very difficult to get a
very clear and fair result between two changes. but lamely speaking, on
the small chunk size read/write, your changes wins. but on the big
chunk size, It is not very clear, the gap between the two changes can
be ignored.

Tested-by: Bean Huo <beanhuo@micron.com>


Bean
Bart Van Assche Aug. 3, 2021, 6:49 p.m. UTC | #6
On 8/2/21 8:24 AM, Bean Huo wrote:
> I did the comparison test on my platform, it is very difficult to get a

> very clear and fair result between two changes. but lamely speaking, on

> the small chunk size read/write, your changes wins. but on the big

> chunk size, It is not very clear, the gap between the two changes can

> be ignored.

> 

> Tested-by: Bean Huo <beanhuo@micron.com>


Thanks for having tested this patch :-)

Bart.
diff mbox series

Patch

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 4c6d832f5e81..cb588b705fbb 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2087,6 +2087,7 @@  static inline
 void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 {
 	struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+	unsigned long flags;
 
 	lrbp->issue_time_stamp = ktime_get();
 	lrbp->compl_time_stamp = ktime_set(0, 0);
@@ -2095,19 +2096,10 @@  void ufshcd_send_command(struct ufs_hba *hba, unsigned int task_tag)
 	ufshcd_clk_scaling_start_busy(hba);
 	if (unlikely(ufshcd_should_inform_monitor(hba, lrbp)))
 		ufshcd_start_monitor(hba, lrbp);
-	if (ufshcd_has_utrlcnr(hba)) {
-		set_bit(task_tag, &hba->outstanding_reqs);
-		ufshcd_writel(hba, 1 << task_tag,
-			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
-	} else {
-		unsigned long flags;
-
-		spin_lock_irqsave(hba->host->host_lock, flags);
-		set_bit(task_tag, &hba->outstanding_reqs);
-		ufshcd_writel(hba, 1 << task_tag,
-			      REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-	}
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	set_bit(task_tag, &hba->outstanding_reqs);
+	ufshcd_writel(hba, 1 << task_tag, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 	/* Make sure that doorbell is committed immediately */
 	wmb();
 }
@@ -5234,17 +5226,17 @@  static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
 }
 
 /**
- * ufshcd_trc_handler - handle transfer requests completion
+ * ufshcd_transfer_req_compl - handle SCSI and query command completion
  * @hba: per adapter instance
- * @use_utrlcnr: get completed requests from UTRLCNR
  *
  * Returns
  *  IRQ_HANDLED - If interrupt is valid
  *  IRQ_NONE    - If invalid interrupt
  */
-static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr)
+static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
 {
-	unsigned long completed_reqs = 0;
+	unsigned long completed_reqs, flags;
+	u32 tr_doorbell;
 
 	/* Resetting interrupt aggregation counters first and reading the
 	 * DOOR_BELL afterward allows us to handle all the completed requests.
@@ -5257,24 +5249,10 @@  static irqreturn_t ufshcd_trc_handler(struct ufs_hba *hba, bool use_utrlcnr)
 	    !(hba->quirks & UFSHCI_QUIRK_SKIP_RESET_INTR_AGGR))
 		ufshcd_reset_intr_aggr(hba);
 
-	if (use_utrlcnr) {
-		u32 utrlcnr;
-
-		utrlcnr = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_LIST_COMPL);
-		if (utrlcnr) {
-			ufshcd_writel(hba, utrlcnr,
-				      REG_UTP_TRANSFER_REQ_LIST_COMPL);
-			completed_reqs = utrlcnr;
-		}
-	} else {
-		unsigned long flags;
-		u32 tr_doorbell;
-
-		spin_lock_irqsave(hba->host->host_lock, flags);
-		tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
-		completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
-		spin_unlock_irqrestore(hba->host->host_lock, flags);
-	}
+	spin_lock_irqsave(hba->host->host_lock, flags);
+	tr_doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
+	completed_reqs = tr_doorbell ^ hba->outstanding_reqs;
+	spin_unlock_irqrestore(hba->host->host_lock, flags);
 
 	if (completed_reqs) {
 		__ufshcd_transfer_req_compl(hba, completed_reqs);
@@ -5756,7 +5734,7 @@  static void ufshcd_exception_event_handler(struct work_struct *work)
 /* Complete requests that have door-bell cleared */
 static void ufshcd_complete_requests(struct ufs_hba *hba)
 {
-	ufshcd_trc_handler(hba, false);
+	ufshcd_transfer_req_compl(hba);
 	ufshcd_tmc_handler(hba);
 }
 
@@ -6397,7 +6375,7 @@  static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
 		retval |= ufshcd_tmc_handler(hba);
 
 	if (intr_status & UTP_TRANSFER_REQ_COMPL)
-		retval |= ufshcd_trc_handler(hba, ufshcd_has_utrlcnr(hba));
+		retval |= ufshcd_transfer_req_compl(hba);
 
 	return retval;
 }
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 8dcf0df770a2..a44baec43dd5 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -1176,11 +1176,6 @@  static inline u32 ufshcd_vops_get_ufs_hci_version(struct ufs_hba *hba)
 	return ufshcd_readl(hba, REG_UFS_VERSION);
 }
 
-static inline bool ufshcd_has_utrlcnr(struct ufs_hba *hba)
-{
-	return (hba->ufs_version >= ufshci_version(3, 0));
-}
-
 static inline int ufshcd_vops_clk_scale_notify(struct ufs_hba *hba,
 			bool up, enum ufs_notify_change_status status)
 {
diff --git a/drivers/scsi/ufs/ufshci.h b/drivers/scsi/ufs/ufshci.h
index 5affb1fce5ad..de95be5d11d4 100644
--- a/drivers/scsi/ufs/ufshci.h
+++ b/drivers/scsi/ufs/ufshci.h
@@ -39,7 +39,6 @@  enum {
 	REG_UTP_TRANSFER_REQ_DOOR_BELL		= 0x58,
 	REG_UTP_TRANSFER_REQ_LIST_CLEAR		= 0x5C,
 	REG_UTP_TRANSFER_REQ_LIST_RUN_STOP	= 0x60,
-	REG_UTP_TRANSFER_REQ_LIST_COMPL		= 0x64,
 	REG_UTP_TASK_REQ_LIST_BASE_L		= 0x70,
 	REG_UTP_TASK_REQ_LIST_BASE_H		= 0x74,
 	REG_UTP_TASK_REQ_DOOR_BELL		= 0x78,