From patchwork Fri Jan 5 16:29:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Linus Walleij X-Patchwork-Id: 123548 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp1015628qgn; Fri, 5 Jan 2018 08:31:50 -0800 (PST) X-Google-Smtp-Source: ACJfBovV2NZ29sLSqKzLt7aznk4PO9AXwJJJnkYlNJiuo8SHXlyUG1VrfryG8uCxVKpS7q0TlFKY X-Received: by 10.84.149.168 with SMTP id m37mr3596838pla.353.1515169910539; Fri, 05 Jan 2018 08:31:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515169910; cv=none; d=google.com; s=arc-20160816; b=DapZt8I0u6rde5FgUtp7/oqJTNmr+jABsMASxz8wCFkzhcxvWHy6WDRCU0z6uDcnph mbeurR24ej5fqD+FUFmo/qHVVAdRcQiwENTuxD9ZPsvlt1A448wYDTfYcZwf1FPT+FDL ph9izDIVdwDaBhy8Sj4wB9+GPBpN84Bs28IPUbgZi0Z0sLq+B+YKEVFkusZ4MdVw28m5 WkhJFWt8ggbY0d7Z+if9ZX6oo5QRpkm+0ySMKY9a9PTRmlzzXU4Ks19AB9TJOTO5nCDa 24v9kti7nkv+pQankvFZJAsdsyqhfhswdz/ZH/r9INsPycuG131QKcerB58+EFLVFAWU GOBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=9iYE5Jrdb6Dw842NZG5edqcpN0hp346P8IpDEMQX/7M=; b=fWOPom8r0JID24QCiayDJ6vDFeJomwjrRS9NC/6xl1/9qDHzCjpcMu52vCfQ+HAhTU Pb1MpLo0qLh3lcqjed5VZM2YRdMYC6TXNSW4L7ppNWaFNw44LTlx88hGfDrf9uQQKxUe HY4h+uLQgDlWAGCnLZlLDz+qC5HBVpZfj8IjpNbka0DngyH7ilUnN/u7VM94Fo79EF6+ 1YmyHjQN0fl7NgpuwcD9l7D6SUx2wsBtSxc/FNGOgeSVdk9O+NbJv/XZXJIS8Jflt3yc ZD9HGr6TBmCfFMffxS5zrnwKz+GOEbTb6feJW9zHw4OY5i/yaoqfV3JUnL88NLa/13bW YGhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Odwj+ZCp; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y81si4278081pfb.21.2018.01.05.08.31.50; Fri, 05 Jan 2018 08:31:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Odwj+ZCp; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751774AbeAEQbt (ORCPT + 10 others); Fri, 5 Jan 2018 11:31:49 -0500 Received: from mail-lf0-f68.google.com ([209.85.215.68]:37285 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752081AbeAEQbr (ORCPT ); Fri, 5 Jan 2018 11:31:47 -0500 Received: by mail-lf0-f68.google.com with SMTP id f3so5714751lfe.4 for ; Fri, 05 Jan 2018 08:31:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=9iYE5Jrdb6Dw842NZG5edqcpN0hp346P8IpDEMQX/7M=; b=Odwj+ZCp7Da1/PztS325Pm8efsQPJbgH+d2LtZtm1ZzhYa41o01fM5aOSUIxzNcxkA nsCw2I6d5t/XiRMzSJBI6fnwDrTFsQ5ull8NzUXLrbCDZMwdSvHtT0PV/YfHEjnJh2Wq YoeBaW47HH3i13cYrlanBOIT5kpP3q5V1j6fw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=9iYE5Jrdb6Dw842NZG5edqcpN0hp346P8IpDEMQX/7M=; b=sH0iKl70mShgmaVoUsN8sZ18zbyLJ6n0Gmt2VbnNSkF6t7sdI+g7E9imgXb5if8pvX Eaebp8dTQrj1qgfqjc6YrgB8cRXJEbQsBJLRle6B4Yr3kj3qVzQ97hCSsOWVlpXof+p4 VpEV/LDO3F9dWkuIY2J3ghcg3EeYU2Y+l1/GHhzSBvUanWaEkJAaQJG+WqDZPyL1HYyc 3HS4UDbpSs+NvOqe6C6AL9zZH624LuMPdUclKhChX2rj/yJ4QzITr7rUFYp6rO0Clpkl JnzfX+p+kArSMykkpn8pZtQy93JsLON0nEjpuXR+5I0IGNdsJbOlvmUpQ5iRXMMNmxRW Olwg== X-Gm-Message-State: AKGB3mINRBAHwC9ARXojPANIYkq0eVwrDFQ8XJk0Tnse9EDvA7SULIQH GDtkuY3Ez8wmIbMOIBmq39GzOA== X-Received: by 10.46.66.206 with SMTP id h75mr2092016ljf.130.1515169905725; Fri, 05 Jan 2018 08:31:45 -0800 (PST) Received: from localhost.localdomain (c-cb7471d5.014-348-6c756e10.cust.bredbandsbolaget.se. [213.113.116.203]) by smtp.gmail.com with ESMTPSA id p5sm1089086ljc.7.2018.01.05.08.31.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Jan 2018 08:31:44 -0800 (PST) From: Linus Walleij To: linux-mmc@vger.kernel.org, Ulf Hansson , Benjamin Beckmeyer , Adrian Hunter Cc: Linus Walleij , Pierre Ossman , =?utf-8?q?Beno=C3=AEt_Th=C3=A9baudeau?= , Fabio Estevam , stable@vger.kernel.org Subject: [PATCH v3] mmc: sdhci: Implement an SDHCI-specific bounce buffer Date: Fri, 5 Jan 2018 17:29:36 +0100 Message-Id: <20180105162936.18612-1-linus.walleij@linaro.org> X-Mailer: git-send-email 2.14.3 MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The bounce buffer is gone from the MMC core, and now we found out that there are some (crippled) i.MX boards out there that have broken ADMA (cannot do scatter-gather), and broken PIO so they must use SDMA. Closer examination shows a less significant slowdown also on SDMA-only capable Laptop hosts. SDMA sets down the number of segments to one, so that each segment gets turned into a singular request that ping-pongs to the block layer before the next request/segment is issued. Apparently it happens a lot that the block layer send requests that include a lot of physically discontigous segments. My guess is that this phenomenon is coming from the file system. These devices that cannot handle scatterlists in hardware can see major benefits from a DMA-contigous bounce buffer. This patch accumulates those fragmented scatterlists in a physically contigous bounce buffer so that we can issue bigger DMA data chunks to/from the card. When tested with thise PCI-integrated host (1217:8221) that only supports SDMA: 0b:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller (rev 05) This patch gave ~1Mbyte/s improved throughput on large reads and writes when testing using iozone than without the patch. On the i.MX SDHCI controllers on the crippled i.MX 25 and i.MX 35 the patch restores the performance to what it was before we removed the bounce buffers, and then some: performance is better than ever because we now allocate a bounce buffer the size of the maximum single request the SDMA engine can handle. On the PCI laptop this is 256K, whereas with the old bounce buffer code it was 64K max. Cc: Pierre Ossman Cc: Benoît Thébaudeau Cc: Fabio Estevam Cc: stable@vger.kernel.org Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling") Tested-by: Benjamin Beckmeyer Signed-off-by: Linus Walleij --- ChangeLog v2->v3: - Rewrite the commit message a bit - Add Benjamin's Tested-by - Add Fixes and stable tags ChangeLog v1->v2: - Skip the remapping and fiddling with the buffer, instead use dma_alloc_coherent() and use a simple, coherent bounce buffer. - Couple kernel messages to ->parent of the mmc_host as it relates to the hardware characteristics. --- drivers/mmc/host/sdhci.c | 94 +++++++++++++++++++++++++++++++++++++++++++----- drivers/mmc/host/sdhci.h | 3 ++ 2 files changed, 89 insertions(+), 8 deletions(-) -- 2.14.3 diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index e9290a3439d5..97d4c6fc1159 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -502,8 +502,20 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host, if (data->host_cookie == COOKIE_PRE_MAPPED) return data->sg_count; - sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len, - mmc_get_dma_dir(data)); + /* Bounce write requests to the bounce buffer */ + if (host->bounce_buffer) { + if (mmc_get_dma_dir(data) == DMA_TO_DEVICE) { + /* Copy the data to the bounce buffer */ + sg_copy_to_buffer(data->sg, data->sg_len, + host->bounce_buffer, host->bounce_buffer_size); + } + /* Just a dummy value */ + sg_count = 1; + } else { + /* Just access the data directly from memory */ + sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len, + mmc_get_dma_dir(data)); + } if (sg_count == 0) return -ENOSPC; @@ -858,8 +870,13 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd) SDHCI_ADMA_ADDRESS_HI); } else { WARN_ON(sg_cnt != 1); - sdhci_writel(host, sg_dma_address(data->sg), - SDHCI_DMA_ADDRESS); + /* Bounce buffer goes to work */ + if (host->bounce_buffer) + sdhci_writel(host, host->bounce_addr, + SDHCI_DMA_ADDRESS); + else + sdhci_writel(host, sg_dma_address(data->sg), + SDHCI_DMA_ADDRESS); } } @@ -2248,7 +2265,12 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq) mrq->data->host_cookie = COOKIE_UNMAPPED; - if (host->flags & SDHCI_REQ_USE_DMA) + /* + * No pre-mapping in the pre hook if we're using the bounce buffer, + * for that we would need two bounce buffers since one buffer is + * in flight when this is getting called. + */ + if (host->flags & SDHCI_REQ_USE_DMA && !host->bounce_buffer) sdhci_pre_dma_transfer(host, mrq->data, COOKIE_PRE_MAPPED); } @@ -2352,8 +2374,19 @@ static bool sdhci_request_done(struct sdhci_host *host) struct mmc_data *data = mrq->data; if (data && data->host_cookie == COOKIE_MAPPED) { - dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, - mmc_get_dma_dir(data)); + if (host->bounce_buffer) { + /* On reads, copy the bounced data into the sglist */ + if (mmc_get_dma_dir(data) == DMA_FROM_DEVICE) { + sg_copy_from_buffer(data->sg, data->sg_len, + host->bounce_buffer, + host->bounce_buffer_size); + } + } else { + /* Unmap the raw data */ + dma_unmap_sg(mmc_dev(host->mmc), data->sg, + data->sg_len, + mmc_get_dma_dir(data)); + } data->host_cookie = COOKIE_UNMAPPED; } } @@ -2636,7 +2669,12 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask) */ if (intmask & SDHCI_INT_DMA_END) { u32 dmastart, dmanow; - dmastart = sg_dma_address(host->data->sg); + + if (host->bounce_buffer) + dmastart = host->bounce_addr; + else + dmastart = sg_dma_address(host->data->sg); + dmanow = dmastart + host->data->bytes_xfered; /* * Force update to the next DMA block boundary. @@ -3713,6 +3751,43 @@ int sdhci_setup_host(struct sdhci_host *host) */ mmc->max_blk_count = (host->quirks & SDHCI_QUIRK_NO_MULTIBLOCK) ? 1 : 65535; + if (mmc->max_segs == 1) { + unsigned int max_blocks; + unsigned int max_seg_size; + + max_seg_size = mmc->max_req_size; + max_blocks = max_seg_size / 512; + dev_info(mmc->parent, "host only supports SDMA, activate bounce buffer\n"); + + /* + * When we just support one segment, we can get significant speedups + * by the help of a bounce buffer to group scattered reads/writes + * together. + * + * TODO: is this too big? Stealing too much memory? The old bounce + * buffer is max 64K. This should be the 512K that SDMA can handle + * if I read the code above right. Anyways let's try this. + * FIXME: use devm_* + */ + host->bounce_buffer = dma_alloc_coherent(mmc->parent, max_seg_size, + &host->bounce_addr, GFP_KERNEL); + if (!host->bounce_buffer) { + dev_err(mmc->parent, + "failed to allocate %u bytes for bounce buffer\n", + max_seg_size); + return -ENOMEM; + } + host->bounce_buffer_size = max_seg_size; + + /* Lie about this since we're bouncing */ + mmc->max_segs = max_blocks; + mmc->max_seg_size = max_seg_size; + + dev_info(mmc->parent, + "bounce buffer: bounce up to %u segments into one, max segment size %u bytes\n", + max_blocks, max_seg_size); + } + return 0; unreg: @@ -3743,6 +3818,9 @@ void sdhci_cleanup_host(struct sdhci_host *host) host->align_addr); host->adma_table = NULL; host->align_buffer = NULL; + if (host->bounce_buffer) + dma_free_coherent(mmc->parent, host->bounce_buffer_size, + host->bounce_buffer, host->bounce_addr); } EXPORT_SYMBOL_GPL(sdhci_cleanup_host); diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index 54bc444c317f..865e09618d22 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -440,6 +440,9 @@ struct sdhci_host { int irq; /* Device IRQ */ void __iomem *ioaddr; /* Mapped address */ + char *bounce_buffer; /* For packing SDMA reads/writes */ + dma_addr_t bounce_addr; + size_t bounce_buffer_size; const struct sdhci_ops *ops; /* Low level hw interface */