diff mbox series

[3/3] spi: spi-geni-qcom: Slightly optimize setup of bidirectional xfters

Message ID 20200912140730.3.Ided778fb4cd078e36c6b240d1b279cd7a534a313@changeid
State New
Headers show
Series None | expand

Commit Message

Doug Anderson Sept. 12, 2020, 9:08 p.m. UTC
When setting up a bidirectional transfer we need to program both the
TX and RX lengths.  We don't need a memory barrier between those two
writes.  Factor out the __iowmb() and use writel_relaxed().  This
saves a fraction of a microsecond of setup overhead on bidirectional
transfers.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Bjorn Andersson Sept. 12, 2020, 10:54 p.m. UTC | #1
On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote:

> When setting up a bidirectional transfer we need to program both the
> TX and RX lengths.  We don't need a memory barrier between those two
> writes.  Factor out the __iowmb() and use writel_relaxed().  This
> saves a fraction of a microsecond of setup overhead on bidirectional
> transfers.
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> index 92d88bf85a90..6c7e12b68bf0 100644
> --- a/drivers/spi/spi-geni-qcom.c
> +++ b/drivers/spi/spi-geni-qcom.c
> @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer,
>  	len &= TRANS_LEN_MSK;
>  
>  	mas->cur_xfer = xfer;
> +
> +	/*
> +	 * Factor out the __iowmb() so that we can use writel_relaxed() for
> +	 * both writes below and thus only incur the overhead once even if
> +	 * we execute both of them.
> +	 */

How many passes through this function do we have to take before saving
the amount of time it took me to read this comment?

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

> +	__iowmb();
> +
>  	if (xfer->tx_buf) {
>  		m_cmd |= SPI_TX_ONLY;
>  		mas->tx_rem_bytes = xfer->len;
> -		writel(len, se->base + SE_SPI_TX_TRANS_LEN);
> +		writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN);
>  	}
> -
>  	if (xfer->rx_buf) {
>  		m_cmd |= SPI_RX_ONLY;
> -		writel(len, se->base + SE_SPI_RX_TRANS_LEN);
> +		writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN);
>  		mas->rx_rem_bytes = xfer->len;
>  	}
>  
> -- 
> 2.28.0.618.gf4bc123cb7-goog
>
kernel test robot Sept. 13, 2020, 3:45 a.m. UTC | #2
Hi Douglas,

I love your patch! Yet something to improve:

[auto build test ERROR on spi/for-next]
[also build test ERROR on v5.9-rc4 next-20200911]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928
base:   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
config: x86_64-randconfig-a002-20200913 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 3170d54842655d6d936aae32b7d0bc92fce7f22e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/spi/spi-geni-qcom.c:385:2: error: implicit declaration of function '__iowmb' [-Werror,-Wimplicit-function-declaration]
           __iowmb();
           ^
   1 error generated.

# https://github.com/0day-ci/linux/commit/4458adf4007926cfaaa505b54a83059c9ba813ad
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Douglas-Anderson/spi-spi-geni-qcom-Use-the-FIFO-even-more/20200913-050928
git checkout 4458adf4007926cfaaa505b54a83059c9ba813ad
vim +/__iowmb +385 drivers/spi/spi-geni-qcom.c

   334	
   335	static void setup_fifo_xfer(struct spi_transfer *xfer,
   336					struct spi_geni_master *mas,
   337					u16 mode, struct spi_master *spi)
   338	{
   339		u32 m_cmd = 0;
   340		u32 len;
   341		struct geni_se *se = &mas->se;
   342		int ret;
   343	
   344		/*
   345		 * Ensure that our interrupt handler isn't still running from some
   346		 * prior command before we start messing with the hardware behind
   347		 * its back.  We don't need to _keep_ the lock here since we're only
   348		 * worried about racing with out interrupt handler.  The SPI core
   349		 * already handles making sure that we're not trying to do two
   350		 * transfers at once or setting a chip select and doing a transfer
   351		 * concurrently.
   352		 *
   353		 * NOTE: we actually _can't_ hold the lock here because possibly we
   354		 * might call clk_set_rate() which needs to be able to sleep.
   355		 */
   356		spin_lock_irq(&mas->lock);
   357		spin_unlock_irq(&mas->lock);
   358	
   359		if (xfer->bits_per_word != mas->cur_bits_per_word) {
   360			spi_setup_word_len(mas, mode, xfer->bits_per_word);
   361			mas->cur_bits_per_word = xfer->bits_per_word;
   362		}
   363	
   364		/* Speed and bits per word can be overridden per transfer */
   365		ret = geni_spi_set_clock_and_bw(mas, xfer->speed_hz);
   366		if (ret)
   367			return;
   368	
   369		mas->tx_rem_bytes = 0;
   370		mas->rx_rem_bytes = 0;
   371	
   372		if (!(mas->cur_bits_per_word % MIN_WORD_LEN))
   373			len = xfer->len * BITS_PER_BYTE / mas->cur_bits_per_word;
   374		else
   375			len = xfer->len / (mas->cur_bits_per_word / BITS_PER_BYTE + 1);
   376		len &= TRANS_LEN_MSK;
   377	
   378		mas->cur_xfer = xfer;
   379	
   380		/*
   381		 * Factor out the __iowmb() so that we can use writel_relaxed() for
   382		 * both writes below and thus only incur the overhead once even if
   383		 * we execute both of them.
   384		 */
 > 385		__iowmb();
   386	
   387		if (xfer->tx_buf) {
   388			m_cmd |= SPI_TX_ONLY;
   389			mas->tx_rem_bytes = xfer->len;
   390			writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN);
   391		}
   392		if (xfer->rx_buf) {
   393			m_cmd |= SPI_RX_ONLY;
   394			writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN);
   395			mas->rx_rem_bytes = xfer->len;
   396		}
   397	
   398		/*
   399		 * Lock around right before we start the transfer since our
   400		 * interrupt could come in at any time now.
   401		 */
   402		spin_lock_irq(&mas->lock);
   403		geni_se_setup_m_cmd(se, m_cmd, FRAGMENTATION);
   404	
   405		/*
   406		 * TX_WATERMARK_REG should be set after SPI configuration and
   407		 * setting up GENI SE engine, as driver starts data transfer
   408		 * for the watermark interrupt.
   409		 */
   410		if (m_cmd & SPI_TX_ONLY)
   411			writel(mas->tx_wm, se->base + SE_GENI_TX_WATERMARK_REG);
   412		spin_unlock_irq(&mas->lock);
   413	}
   414	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Doug Anderson Sept. 13, 2020, 8:35 p.m. UTC | #3
Hi,

On Sat, Sep 12, 2020 at 6:09 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Sat, Sep 12, 2020 at 3:54 PM Bjorn Andersson
> <bjorn.andersson@linaro.org> wrote:
> >
> > On Sat 12 Sep 16:08 CDT 2020, Douglas Anderson wrote:
> >
> > > When setting up a bidirectional transfer we need to program both the
> > > TX and RX lengths.  We don't need a memory barrier between those two
> > > writes.  Factor out the __iowmb() and use writel_relaxed().  This
> > > saves a fraction of a microsecond of setup overhead on bidirectional
> > > transfers.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > >  drivers/spi/spi-geni-qcom.c | 13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
> > > index 92d88bf85a90..6c7e12b68bf0 100644
> > > --- a/drivers/spi/spi-geni-qcom.c
> > > +++ b/drivers/spi/spi-geni-qcom.c
> > > @@ -376,15 +376,22 @@ static void setup_fifo_xfer(struct spi_transfer *xfer,
> > >       len &= TRANS_LEN_MSK;
> > >
> > >       mas->cur_xfer = xfer;
> > > +
> > > +     /*
> > > +      * Factor out the __iowmb() so that we can use writel_relaxed() for
> > > +      * both writes below and thus only incur the overhead once even if
> > > +      * we execute both of them.
> > > +      */
> >
> > How many passes through this function do we have to take before saving
> > the amount of time it took me to read this comment?
> >
> > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>
> Thanks for the review!  Yeah, in Chrome OS we do a crazy amount of SPI
> transfers since our EC and security chip are connected over SPI and we
> seem to pile a whole lot of stuff into the EC.  This means we keep
> coming back to the SPI controller again and again when profiling
> things.  I'm hoping that we'll eventually be able to get DMA enabled
> here, but until then at least it's nice to make the FIFO transfers
> better...

Ugh.  Given the problem that the kernel test robot found, I'm gonna
say just drop this patch but keep the others I sent.  As per the CL
description, it's a pretty minor optimization and even though we do a
lot of SPI transfers it's probably more worth it to work towards DMA
mode than to try to find a cleaner solution for this one.

-Doug
diff mbox series

Patch

diff --git a/drivers/spi/spi-geni-qcom.c b/drivers/spi/spi-geni-qcom.c
index 92d88bf85a90..6c7e12b68bf0 100644
--- a/drivers/spi/spi-geni-qcom.c
+++ b/drivers/spi/spi-geni-qcom.c
@@ -376,15 +376,22 @@  static void setup_fifo_xfer(struct spi_transfer *xfer,
 	len &= TRANS_LEN_MSK;
 
 	mas->cur_xfer = xfer;
+
+	/*
+	 * Factor out the __iowmb() so that we can use writel_relaxed() for
+	 * both writes below and thus only incur the overhead once even if
+	 * we execute both of them.
+	 */
+	__iowmb();
+
 	if (xfer->tx_buf) {
 		m_cmd |= SPI_TX_ONLY;
 		mas->tx_rem_bytes = xfer->len;
-		writel(len, se->base + SE_SPI_TX_TRANS_LEN);
+		writel_relaxed(len, se->base + SE_SPI_TX_TRANS_LEN);
 	}
-
 	if (xfer->rx_buf) {
 		m_cmd |= SPI_RX_ONLY;
-		writel(len, se->base + SE_SPI_RX_TRANS_LEN);
+		writel_relaxed(len, se->base + SE_SPI_RX_TRANS_LEN);
 		mas->rx_rem_bytes = xfer->len;
 	}