diff mbox

mmc: core: complete/wait_for_completion performance

Message ID 1481706204.3994.4.camel@embedded.rocks
State New
Headers show

Commit Message

Jörg Krause Dec. 14, 2016, 9:03 a.m. UTC
Hi Stefan,

On Wed, 2016-12-07 at 20:23 +0100, Stefan Wahren wrote:
> Hi Jörg,

> 

> > Jörg Krause <joerg.krause@embedded.rocks> hat am 7. Dezember 2016

> > um 08:32

> > geschrieben:

> > 

> > 

> > Hit Stefan,

> > 

> > On Sat, 2016-11-26 at 20:10 +0100, Stefan Wahren wrote:

> > > Hi Jörg,

> > > 

> > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20. November

> > > > 2016

> > > > um 20:10

> > > > geschrieben:

> > > > 

> > > > 

> > > > On Sun, 2016-11-20 at 16:44 +0100, Stefan Wahren wrote:

> > > > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20.

> > > > > > November

> > > > > > 2016

> > > > > > um 15:42

> > > > > > geschrieben:

> > > > > > 

> > > > > > 

> > > > > > Hi Stefan,

> > > > > > 

> > > > > > On Sun, 2016-11-20 at 14:28 +0100, Stefan Wahren wrote:

> > > > > > > Hi Jörg,

> > > > > > > 

> > > > > > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20.

> > > > > > > > November

> > > > > > > > 2016

> > > > > > > > um 13:27

> > > > > > > > geschrieben:

> > > > > > > > 

> > > > > > > > 

> > > > > > > > Hi,

> > > > > > > > 

> > > > > > > > I started the discussion on this mailing list in

> > > > > > > > another

> > > > > > > > thread

> > > > > > > > [1],

> > > > > > > > but I'd like to move it to a new thread, because it

> > > > > > > > might

> > > > > > > > be

> > > > > > > > mmc

> > > > > > > > specific.

> > > > > > > > 

> > > > > > > > The issue is that I am noticed low wifi network

> > > > > > > > throughput

> > > > > > > > on

> > > > > > > > an

> > > > > > > > i.MX28

> > > > > > > > board with the mainline kernel (v4.7.10, about 6 Mbps)

> > > > > > > > compared

> > > > > > > > to

> > > > > > > > the

> > > > > > > > vendor kernel (Freescale v2.6.35.3, about 20 Mbps). The

> > > > > > > > wifi

> > > > > > > > chip

> > > > > > > > is

> > > > > > > > attached using the SDIO interface.

> > > > > > > > 

> > > > > > > > I started investigation where the bottleneck in the

> > > > > > > > mainline

> > > > > > > > kernel might come from. Therefore I checked that the

> > > > > > > > configs

> > > > > > > > and

> > > > > > > > settings for the interfaces and drivers are the same.

> > > > > > > > They

> > > > > > > > are.

> > > > > > > 

> > > > > > > so you're not using the mxs_defconfig settings anymore?

> > > > > > 

> > > > > > No, I changed the settings.

> > > > > > 

> > > > > 

> > > > > What happens to performance to if you change the following

> > > > > settings

> > > > > to the same

> > > > > like in mxs_defconfig?

> > > > > 

> > > > > CONFIG_PREEMPT_VOLUNTARY=y

> > > > > CONFIG_DEFAULT_IOSCHED="noop"

> > > > 

> > > > No much change at all. The time difference between complete()

> > > > and

> > > > wait_for_complete() decreases in best case to 110 us, but also

> > > > varies

> > > > to above 130 us.

> > > 

> > > just a weird idea. Did you tried to add MMC_CAP_CMD23 into the

> > > caps

> > > [1]?

> > > 

> > > [1] - http://lxr.free-electrons.com/source/drivers/mmc/host/mxs-m

> > > mc.c

> > > ?v=4.8#L642

> > 

> > I tried, but it did not improved the timing or throughput. However,

> > many thanks for the input.

> > 

> > Jörg

> 

> did you try cyclictest [1]?


Not yet. Not sure what to measure and which values to compare here.

> 

> Beside the time for a request the amount of requests for the complete

> iperf test

> would we interesting. Maybe there are retries.

> 

> I'm still interested in your PIO mode patches for mxs-mmc even

> without clean up.


Actually, the patch does not implement a PIO mode, but drops DMA and
uses polling instead. I've attached the patch.

> [1] - https://git.kernel.org/cgit/utils/rt-tests/rt-tests.git/


Best regards
Jörg
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Stefan Wahren Dec. 14, 2016, 6:57 p.m. UTC | #1
Hi Jörg,

> Jörg Krause <joerg.krause@embedded.rocks> hat am 14. Dezember 2016 um 10:03 geschrieben:
> 
> 
> Hi Stefan,
> 
> On Wed, 2016-12-07 at 20:23 +0100, Stefan Wahren wrote:
> > Hi Jörg,
> > 
> > > Jörg Krause <joerg.krause@embedded.rocks> hat am 7. Dezember 2016
> > > um 08:32
> > > geschrieben:
> > > 
> > > 
> > > Hit Stefan,
> > > 
> > > On Sat, 2016-11-26 at 20:10 +0100, Stefan Wahren wrote:
> > > > Hi Jörg,
> > > > 
> > > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20. November
> > > > > 2016
> > > > > um 20:10
> > > > > geschrieben:
> > > > > 
> > > > > 
> > > > > On Sun, 2016-11-20 at 16:44 +0100, Stefan Wahren wrote:
> > > > > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20.
> > > > > > > November
> > > > > > > 2016
> > > > > > > um 15:42
> > > > > > > geschrieben:
> > > > > > > 
> > > > > > > 
> > > > > > > Hi Stefan,
> > > > > > > 
> > > > > > > On Sun, 2016-11-20 at 14:28 +0100, Stefan Wahren wrote:
> > > > > > > > Hi Jörg,
> > > > > > > > 
> > > > > > > > > Jörg Krause <joerg.krause@embedded.rocks> hat am 20.
> > > > > > > > > November
> > > > > > > > > 2016
> > > > > > > > > um 13:27
> > > > > > > > > geschrieben:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > I started the discussion on this mailing list in
> > > > > > > > > another
> > > > > > > > > thread
> > > > > > > > > [1],
> > > > > > > > > but I'd like to move it to a new thread, because it
> > > > > > > > > might
> > > > > > > > > be
> > > > > > > > > mmc
> > > > > > > > > specific.
> > > > > > > > > 
> > > > > > > > > The issue is that I am noticed low wifi network
> > > > > > > > > throughput
> > > > > > > > > on
> > > > > > > > > an
> > > > > > > > > i.MX28
> > > > > > > > > board with the mainline kernel (v4.7.10, about 6 Mbps)
> > > > > > > > > compared
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > vendor kernel (Freescale v2.6.35.3, about 20 Mbps). The
> > > > > > > > > wifi
> > > > > > > > > chip
> > > > > > > > > is
> > > > > > > > > attached using the SDIO interface.
> > > > > > > > > 
> > > > > > > > > I started investigation where the bottleneck in the
> > > > > > > > > mainline
> > > > > > > > > kernel might come from. Therefore I checked that the
> > > > > > > > > configs
> > > > > > > > > and
> > > > > > > > > settings for the interfaces and drivers are the same.
> > > > > > > > > They
> > > > > > > > > are.
> > > > > > > > 
> > > > > > > > so you're not using the mxs_defconfig settings anymore?
> > > > > > > 
> > > > > > > No, I changed the settings.
> > > > > > > 
> > > > > > 
> > > > > > What happens to performance to if you change the following
> > > > > > settings
> > > > > > to the same
> > > > > > like in mxs_defconfig?
> > > > > > 
> > > > > > CONFIG_PREEMPT_VOLUNTARY=y
> > > > > > CONFIG_DEFAULT_IOSCHED="noop"
> > > > > 
> > > > > No much change at all. The time difference between complete()
> > > > > and
> > > > > wait_for_complete() decreases in best case to 110 us, but also
> > > > > varies
> > > > > to above 130 us.
> > > > 
> > > > just a weird idea. Did you tried to add MMC_CAP_CMD23 into the
> > > > caps
> > > > [1]?
> > > > 
> > > > [1] - http://lxr.free-electrons.com/source/drivers/mmc/host/mxs-m
> > > > mc.c
> > > > ?v=4.8#L642
> > > 
> > > I tried, but it did not improved the timing or throughput. However,
> > > many thanks for the input.
> > > 
> > > Jörg
> > 
> > did you try cyclictest [1]?
> 
> Not yet. Not sure what to measure and which values to compare here.

i tought you have the vendor kernel and the mainline kernel available for your platform.

So you could compare the both kernels.

> 
> > 
> > Beside the time for a request the amount of requests for the complete
> > iperf test
> > would we interesting. Maybe there are retries.
> > 
> > I'm still interested in your PIO mode patches for mxs-mmc even
> > without clean up.
> 
> Actually, the patch does not implement a PIO mode, but drops DMA and
> uses polling instead. I've attached the patch.

Thanks. I applied it, but unfortunately this breaks SD card support for my Duckbill and the kernel isn't able to mount the rootfs:

[    2.267073] mxs-mmc 80010000.ssp: initialized
[    2.272624] mxs-mmc 80010000.ssp: AC command error 0xffffff92

> 
> > [1] - https://git.kernel.org/cgit/utils/rt-tests/rt-tests.git/
> 
> Best regards
> Jörg
diff mbox

Patch

From 816b9a4fc08280ca971fa60825e9bb0ceb42ef0b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=B6rg=20Krause?= <joerg.krause@embedded.rocks>
Date: Mon, 31 Oct 2016 21:27:11 +0100
Subject: [PATCH] mmc: mxs-mmc: drop DMA and use polling mode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Jörg Krause <joerg.krause@embedded.rocks>
---
 drivers/mmc/host/mxs-mmc.c  | 141 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/spi/mxs-spi.h |  11 ++++
 2 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/host/mxs-mmc.c b/drivers/mmc/host/mxs-mmc.c
index 44ecebd..c4f5ff0 100644
--- a/drivers/mmc/host/mxs-mmc.c
+++ b/drivers/mmc/host/mxs-mmc.c
@@ -71,6 +71,9 @@  struct mxs_mmc_host {
 	spinlock_t			lock;
 	int				sdio_irq_en;
 	bool				broken_cd;
+
+	struct completion		dma_done;
+	u32				status;
 };
 
 static int mxs_mmc_get_cd(struct mmc_host *mmc)
@@ -256,22 +259,83 @@  static struct dma_async_tx_descriptor *mxs_mmc_prep_dma(
 	return desc;
 }
 
+/*
+ * Check for MMC command errors
+ * Returns error code or zerro if no errors
+ */
+static inline int mxs_mmc_cmd_error(u32 status)
+{
+	int err = 0;
+
+	if (status & BM_SSP_STATUS_TIMEOUT)
+		err = -ETIMEDOUT;
+	else if (status & BM_SSP_STATUS_RESP_TIMEOUT)
+		err = -ETIMEDOUT;
+	else if (status & BM_SSP_STATUS_RESP_CRC_ERR)
+		err = -EILSEQ;
+	else if (status & BM_SSP_STATUS_RESP_ERR)
+		err = -EIO;
+
+	return err;
+}
+
 static void mxs_mmc_bc(struct mxs_mmc_host *host)
 {
 	struct mxs_ssp *ssp = &host->ssp;
 	struct mmc_command *cmd = host->cmd;
 	struct dma_async_tx_descriptor *desc;
-	u32 ctrl0, cmd0, cmd1;
+	u32 ctrl0, ctrl1, cmd0, cmd1, c1;
 
 	ctrl0 = BM_SSP_CTRL0_ENABLE | BM_SSP_CTRL0_IGNORE_CRC;
+	ctrl1 =
+	    BM_SSP_CTRL1_DMA_ENABLE |
+	    BM_SSP_CTRL1_RECV_TIMEOUT_IRQ_EN |
+	    BM_SSP_CTRL1_DATA_CRC_IRQ_EN |
+	    BM_SSP_CTRL1_DATA_TIMEOUT_IRQ_EN |
+	    BM_SSP_CTRL1_RESP_TIMEOUT_IRQ_EN |
+	    BM_SSP_CTRL1_RESP_ERR_IRQ_EN;
 	cmd0 = BF_SSP(cmd->opcode, CMD0_CMD) | BM_SSP_CMD0_APPEND_8CYC;
-	cmd1 = cmd->arg;
+	cmd1 = BF_SSP(cmd->arg, CMD1_CMD);
 
 	if (host->sdio_irq_en) {
 		ctrl0 |= BM_SSP_CTRL0_SDIO_IRQ_CHECK;
 		cmd0 |= BM_SSP_CMD0_CONT_CLKING_EN | BM_SSP_CMD0_SLOW_CLKING_EN;
 	}
 
+	/* following IO operations */
+	writel(ctrl0, ssp->base + HW_SSP_CTRL0);
+	writel(cmd0, ssp->base + HW_SSP_CMD0);
+	writel(cmd1, ssp->base + HW_SSP_CMD1);
+
+	/* clear these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	init_completion(&host->dma_done);
+
+	writel(BM_SSP_CTRL0_RUN,
+	       ssp->base + HW_SSP_CTRL0 + STMP_OFFSET_REG_SET);
+
+	while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+		continue;
+	while (readl(ssp->base + HW_SSP_STATUS(ssp)) & BM_SSP_STATUS_BUSY)
+		continue;
+
+	host->status = readl(ssp->base + HW_SSP_STATUS(ssp));
+	c1 = readl(ssp->base + HW_SSP_CTRL1(ssp));
+	writel(c1 & MXS_MMC_IRQ_BITS,
+	       ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	/* reenable these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_SET);
+	/* end IO operations */
+
+	cmd->error = mxs_mmc_cmd_error(host->status);
+	
+	if (cmd->error) {
+		dev_warn(mmc_dev(host->mmc), "BC command error 0x%x\n", cmd->error);
+	}
+
+#if 0
 	ssp->ssp_pio_words[0] = ctrl0;
 	ssp->ssp_pio_words[1] = cmd0;
 	ssp->ssp_pio_words[2] = cmd1;
@@ -288,6 +352,7 @@  static void mxs_mmc_bc(struct mxs_mmc_host *host)
 out:
 	dev_warn(mmc_dev(host->mmc),
 		 "%s: failed to prep dma\n", __func__);
+#endif
 }
 
 static void mxs_mmc_ac(struct mxs_mmc_host *host)
@@ -296,7 +361,7 @@  static void mxs_mmc_ac(struct mxs_mmc_host *host)
 	struct mmc_command *cmd = host->cmd;
 	struct dma_async_tx_descriptor *desc;
 	u32 ignore_crc, get_resp, long_resp;
-	u32 ctrl0, cmd0, cmd1;
+	u32 ctrl0, ctrl1, cmd0, cmd1, c1;
 
 	ignore_crc = (mmc_resp_type(cmd) & MMC_RSP_CRC) ?
 			0 : BM_SSP_CTRL0_IGNORE_CRC;
@@ -306,14 +371,81 @@  static void mxs_mmc_ac(struct mxs_mmc_host *host)
 			BM_SSP_CTRL0_LONG_RESP : 0;
 
 	ctrl0 = BM_SSP_CTRL0_ENABLE | ignore_crc | get_resp | long_resp;
+	ctrl1 = BM_SSP_CTRL1_DMA_ENABLE |
+		BM_SSP_CTRL1_RECV_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_DATA_CRC_IRQ_EN |
+		BM_SSP_CTRL1_DATA_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_TIMEOUT_IRQ_EN |
+		BM_SSP_CTRL1_RESP_ERR_IRQ_EN;
 	cmd0 = BF_SSP(cmd->opcode, CMD0_CMD);
-	cmd1 = cmd->arg;
+	cmd1 = BF_SSP(cmd->arg, CMD1_CMD);
 
 	if (host->sdio_irq_en) {
 		ctrl0 |= BM_SSP_CTRL0_SDIO_IRQ_CHECK;
 		cmd0 |= BM_SSP_CMD0_CONT_CLKING_EN | BM_SSP_CMD0_SLOW_CLKING_EN;
 	}
 
+	/* following IO operations */
+	writel(ctrl0, ssp->base + HW_SSP_CTRL0);
+	writel(cmd0, ssp->base + HW_SSP_CMD0);
+	writel(cmd1, ssp->base + HW_SSP_CMD1);
+
+	/* clear these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	init_completion(&host->dma_done);
+
+	writel(BM_SSP_CTRL0_RUN,
+	       ssp->base + HW_SSP_CTRL0 + STMP_OFFSET_REG_SET);
+
+	while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+		continue;
+	while (readl(ssp->base + HW_SSP_STATUS(ssp)) & BM_SSP_STATUS_BUSY)
+		continue;
+
+	host->status = readl(ssp->base + HW_SSP_STATUS(ssp));
+	c1 = readl(ssp->base + HW_SSP_CTRL1(ssp));
+	writel(c1 & MXS_MMC_IRQ_BITS,
+	       ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_CLR);
+
+	/* reenable these bits */
+	writel(ctrl1, ssp->base + HW_SSP_CTRL1(ssp) + STMP_OFFSET_REG_SET);
+	/* end IO operations */
+
+	switch (mmc_resp_type(cmd)) {
+	case MMC_RSP_NONE:
+		while (readl(ssp->base + HW_SSP_CTRL0) & BM_SSP_CTRL0_RUN)
+			continue;
+		break;
+	case MMC_RSP_R1:
+	case MMC_RSP_R1B:
+	case MMC_RSP_R3:
+		cmd->resp[0] = readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		break;
+	case MMC_RSP_R2:
+		cmd->resp[3] =
+		    readl(ssp->base + HW_SSP_SDRESP0(ssp));
+		cmd->resp[2] =
+		    readl(ssp->base + HW_SSP_SDRESP1(ssp));
+		cmd->resp[1] =
+		    readl(ssp->base + HW_SSP_SDRESP2(ssp));
+		cmd->resp[0] =
+		    readl(ssp->base + HW_SSP_SDRESP3(ssp));
+		break;
+	default:
+		dev_warn(mmc_dev(host->mmc), "Unsupported response type 0x%x\n",
+			 mmc_resp_type(cmd));
+		BUG();
+		break;
+	}
+
+	cmd->error = mxs_mmc_cmd_error(host->status);
+
+	if (cmd->error) {
+		dev_warn(mmc_dev(host->mmc), "AC command error 0x%x\n", cmd->error);
+	}
+
+#if 0
 	ssp->ssp_pio_words[0] = ctrl0;
 	ssp->ssp_pio_words[1] = cmd0;
 	ssp->ssp_pio_words[2] = cmd1;
@@ -330,6 +462,7 @@  static void mxs_mmc_ac(struct mxs_mmc_host *host)
 out:
 	dev_warn(mmc_dev(host->mmc),
 		 "%s: failed to prep dma\n", __func__);
+#endif
 }
 
 static unsigned short mxs_ns_to_ssp_ticks(unsigned clock_rate, unsigned ns)
diff --git a/include/linux/spi/mxs-spi.h b/include/linux/spi/mxs-spi.h
index 381d368..83cb802 100644
--- a/include/linux/spi/mxs-spi.h
+++ b/include/linux/spi/mxs-spi.h
@@ -53,6 +53,10 @@ 
 #define  BP_SSP_CMD0_CMD			0
 #define  BM_SSP_CMD0_CMD			0xff
 #define HW_SSP_CMD1				0x020
+
+#define BP_SSP_CMD1_CMD				0
+#define BM_SSP_CMD1_CMD				0xFFFFFFFF
+
 #define HW_SSP_XFER_SIZE			0x030
 #define HW_SSP_BLOCK_SIZE			0x040
 #define  BP_SSP_BLOCK_SIZE_BLOCK_COUNT		4
@@ -117,6 +121,13 @@ 
 #define  BM_SSP_STATUS_SDIO_IRQ			(1 << 17)
 #define  BM_SSP_STATUS_FIFO_EMPTY		(1 << 5)
 
+#define  BM_SSP_STATUS_RESP_CRC_ERR		0x00010000
+#define  BM_SSP_STATUS_RESP_ERR			0x00008000
+#define  BM_SSP_STATUS_RESP_TIMEOUT		0x00004000
+#define  BM_SSP_STATUS_DATA_CRC_ERR		0x00002000
+#define  BM_SSP_STATUS_TIMEOUT			0x00001000
+#define  BM_SSP_STATUS_BUSY			0x00000001
+
 #define BF_SSP(value, field)	(((value) << BP_SSP_##field) & BM_SSP_##field)
 
 #define SSP_PIO_NUM	3
-- 
2.10.2