Message ID | 20201008155154.1.Ifdb1b69fa3367b81118e16e9e4e63299980ca798@changeid |
---|---|
State | Superseded |
Headers | show |
Series | i2c: i2c-qcom-geni: More properly fix the DMA race | expand |
Quoting Douglas Anderson (2020-10-08 15:52:33) > On geni-i2c transfers using DMA, it was seen that if you program the > command (I2C_READ) before calling geni_se_rx_dma_prep() that it could > cause interrupts to fire. If we get unlucky, these interrupts can > just keep firing (and not be handled) blocking further progress and > hanging the system. > > In commit 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") > we avoided that by making sure we didn't program the command until > after geni_se_rx_dma_prep() was called. While that avoided the > problems, it also turns out to be invalid. At least in the TX case we > started seeing sporadic corrupted transfers. This is easily seen by > adding an msleep() between the DMA prep and the writing of the > command, which makes the problem worse. That means we need to revert > that commit and find another way to fix the bogus IRQs. > > Specifically, after reverting commit 02b9aec59243 ("i2c: > i2c-qcom-geni: Fix DMA transfer race"), I put some traces in. I found > that the when the interrupts were firing like crazy: > - "m_stat" had bits for M_RX_IRQ_EN, M_RX_FIFO_WATERMARK_EN set. > - "dma" was set. > > Further debugging showed that I could make the problem happen more > reliably by adding an "msleep(1)" any time after geni_se_setup_m_cmd() > ran up until geni_se_rx_dma_prep() programmed the length. > > A rather simple fix is to change geni_se_select_dma_mode() so it's a > true inverse of geni_se_select_fifo_mode() and disables all the FIFO > related interrupts. Now the problematic interrupts can't fire and we > can program things in the correct order without worrying. > > As part of this, let's also change the writel_relaxed() in the prepare > function to a writel() so that our DMA is guaranteed to be prepared > now that we can't rely on geni_se_setup_m_cmd()'s writel(). > > NOTE: the only current user of GENI_SE_DMA in mainline is i2c. > > Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") > Fixes: 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- Reviewed-by: Stephen Boyd <swboyd@chromium.org> > > drivers/soc/qcom/qcom-geni-se.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c > index d0e4f520cff8..751a49f6534f 100644 > --- a/drivers/soc/qcom/qcom-geni-se.c > +++ b/drivers/soc/qcom/qcom-geni-se.c > @@ -289,10 +289,23 @@ static void geni_se_select_fifo_mode(struct geni_se *se) > > static void geni_se_select_dma_mode(struct geni_se *se) > { > + u32 proto = geni_se_read_proto(se); > u32 val; > > geni_se_irq_clear(se); > > + val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN); > + if (proto != GENI_SE_UART) { Not a problem with this patch but it would be great if there was a comment here (and probably in geni_se_select_fifo_mode() too) indicating why GENI_SE_UART is special. Is it because GENI_SE_UART doesn't use the main sequencer? I think that is the reason, but I forgot and reading this code doesn't tell me that. Splitting the driver in this way where the logic is in the geni wrapper and in the engine driver leads to this confusion. > + val &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN); > + val &= ~(M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN); > + } > + writel_relaxed(val, se->base + SE_GENI_M_IRQ_EN); > + > + val = readl_relaxed(se->base + SE_GENI_S_IRQ_EN); > + if (proto != GENI_SE_UART) > + val &= ~S_CMD_DONE_EN; > + writel_relaxed(val, se->base + SE_GENI_S_IRQ_EN); > + > val = readl_relaxed(se->base + SE_GENI_DMA_MODE_EN); > val |= GENI_DMA_MODE_EN; > writel_relaxed(val, se->base + SE_GENI_DMA_MODE_EN);
On 10/9/2020 4:22 AM, Douglas Anderson wrote: > On geni-i2c transfers using DMA, it was seen that if you program the > command (I2C_READ) before calling geni_se_rx_dma_prep() that it could > cause interrupts to fire. If we get unlucky, these interrupts can > just keep firing (and not be handled) blocking further progress and > hanging the system. > > In commit 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") > we avoided that by making sure we didn't program the command until > after geni_se_rx_dma_prep() was called. While that avoided the > problems, it also turns out to be invalid. At least in the TX case we > started seeing sporadic corrupted transfers. This is easily seen by > adding an msleep() between the DMA prep and the writing of the > command, which makes the problem worse. That means we need to revert > that commit and find another way to fix the bogus IRQs. > > Specifically, after reverting commit 02b9aec59243 ("i2c: > i2c-qcom-geni: Fix DMA transfer race"), I put some traces in. I found > that the when the interrupts were firing like crazy: > - "m_stat" had bits for M_RX_IRQ_EN, M_RX_FIFO_WATERMARK_EN set. > - "dma" was set. > > Further debugging showed that I could make the problem happen more > reliably by adding an "msleep(1)" any time after geni_se_setup_m_cmd() > ran up until geni_se_rx_dma_prep() programmed the length. > > A rather simple fix is to change geni_se_select_dma_mode() so it's a > true inverse of geni_se_select_fifo_mode() and disables all the FIFO > related interrupts. Now the problematic interrupts can't fire and we > can program things in the correct order without worrying. > > As part of this, let's also change the writel_relaxed() in the prepare > function to a writel() so that our DMA is guaranteed to be prepared > now that we can't rely on geni_se_setup_m_cmd()'s writel(). > > NOTE: the only current user of GENI_SE_DMA in mainline is i2c. > > Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") > Fixes: 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") > Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Akash Asthana <akashast@codeaurora.org>
Hi Stephen, >> >> static void geni_se_select_dma_mode(struct geni_se *se) >> { >> + u32 proto = geni_se_read_proto(se); >> u32 val; >> >> geni_se_irq_clear(se); >> >> + val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN); >> + if (proto != GENI_SE_UART) { > Not a problem with this patch but it would be great if there was a > comment here (and probably in geni_se_select_fifo_mode() too) indicating > why GENI_SE_UART is special. Is it because GENI_SE_UART doesn't use the > main sequencer? I think that is the reason, but I forgot and reading > this code doesn't tell me that. > > Splitting the driver in this way where the logic is in the geni wrapper > and in the engine driver leads to this confusion. GENI_SE_UART uses main sequencer for TX and secondary for RX transfers because it is asynchronous in nature. That's why RX related bits (M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN) are not enable in main sequencer for UART. (M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN) bits are controlled from UART driver, it's gets enabled and disabled multiple times from start_tx ,stop_tx respectively. Regards, Akash > >> + val &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN); >> + val &= ~(M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN); >> + } >> + writel_relaxed(val, se->base + SE_GENI_M_IRQ_EN); >> + >> + val = readl_relaxed(se->base + SE_GENI_S_IRQ_EN); >> + if (proto != GENI_SE_UART) >> + val &= ~S_CMD_DONE_EN; >> + writel_relaxed(val, se->base + SE_GENI_S_IRQ_EN); >> + >> val = readl_relaxed(se->base + SE_GENI_DMA_MODE_EN); >> val |= GENI_DMA_MODE_EN; >> writel_relaxed(val, se->base + SE_GENI_DMA_MODE_EN);
Hi, On Mon, Oct 12, 2020 at 2:05 AM Akash Asthana <akashast@codeaurora.org> wrote: > > Hi Stephen, > > > >> > >> static void geni_se_select_dma_mode(struct geni_se *se) > >> { > >> + u32 proto = geni_se_read_proto(se); > >> u32 val; > >> > >> geni_se_irq_clear(se); > >> > >> + val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN); > >> + if (proto != GENI_SE_UART) { > > Not a problem with this patch but it would be great if there was a > > comment here (and probably in geni_se_select_fifo_mode() too) indicating > > why GENI_SE_UART is special. Is it because GENI_SE_UART doesn't use the > > main sequencer? I think that is the reason, but I forgot and reading > > this code doesn't tell me that. > > > > Splitting the driver in this way where the logic is in the geni wrapper > > and in the engine driver leads to this confusion. > > GENI_SE_UART uses main sequencer for TX and secondary for RX transfers > because it is asynchronous in nature. > > That's why RX related bits (M_RX_FIFO_WATERMARK_EN | > M_RX_FIFO_LAST_EN) are not enable in main sequencer for UART. > > (M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN) bits are controlled from UART > driver, it's gets enabled and disabled multiple times from start_tx > ,stop_tx respectively. For now I've "solved" this by adding some comments (in the 3rd patch) basically summarizing what Akash said. I didn't want to go further than that for now because it felt more important to get the i2c bug fixed sooner rather than later and re-organizing would be a big enough change that it'd probably need a few spins. Our bug trackers don't make it trivially easy to file a public bug tracking this and assign it to Qualcomm, but I've filed a bug asking folks at Qualcomm to help with re-organizing things after my patch series lands. This is internally tracked at Google as b:170766462 ("Rejigger geni_se_select_fifo_mode() / geni_se_select_dma_mode() to not manage interrupt enables"). -Doug
diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c index d0e4f520cff8..751a49f6534f 100644 --- a/drivers/soc/qcom/qcom-geni-se.c +++ b/drivers/soc/qcom/qcom-geni-se.c @@ -289,10 +289,23 @@ static void geni_se_select_fifo_mode(struct geni_se *se) static void geni_se_select_dma_mode(struct geni_se *se) { + u32 proto = geni_se_read_proto(se); u32 val; geni_se_irq_clear(se); + val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN); + if (proto != GENI_SE_UART) { + val &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN); + val &= ~(M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN); + } + writel_relaxed(val, se->base + SE_GENI_M_IRQ_EN); + + val = readl_relaxed(se->base + SE_GENI_S_IRQ_EN); + if (proto != GENI_SE_UART) + val &= ~S_CMD_DONE_EN; + writel_relaxed(val, se->base + SE_GENI_S_IRQ_EN); + val = readl_relaxed(se->base + SE_GENI_DMA_MODE_EN); val |= GENI_DMA_MODE_EN; writel_relaxed(val, se->base + SE_GENI_DMA_MODE_EN); @@ -651,7 +664,7 @@ int geni_se_tx_dma_prep(struct geni_se *se, void *buf, size_t len, writel_relaxed(lower_32_bits(*iova), se->base + SE_DMA_TX_PTR_L); writel_relaxed(upper_32_bits(*iova), se->base + SE_DMA_TX_PTR_H); writel_relaxed(GENI_SE_DMA_EOT_BUF, se->base + SE_DMA_TX_ATTR); - writel_relaxed(len, se->base + SE_DMA_TX_LEN); + writel(len, se->base + SE_DMA_TX_LEN); return 0; } EXPORT_SYMBOL(geni_se_tx_dma_prep); @@ -688,7 +701,7 @@ int geni_se_rx_dma_prep(struct geni_se *se, void *buf, size_t len, writel_relaxed(upper_32_bits(*iova), se->base + SE_DMA_RX_PTR_H); /* RX does not have EOT buffer type bit. So just reset RX_ATTR */ writel_relaxed(0, se->base + SE_DMA_RX_ATTR); - writel_relaxed(len, se->base + SE_DMA_RX_LEN); + writel(len, se->base + SE_DMA_RX_LEN); return 0; } EXPORT_SYMBOL(geni_se_rx_dma_prep);
On geni-i2c transfers using DMA, it was seen that if you program the command (I2C_READ) before calling geni_se_rx_dma_prep() that it could cause interrupts to fire. If we get unlucky, these interrupts can just keep firing (and not be handled) blocking further progress and hanging the system. In commit 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") we avoided that by making sure we didn't program the command until after geni_se_rx_dma_prep() was called. While that avoided the problems, it also turns out to be invalid. At least in the TX case we started seeing sporadic corrupted transfers. This is easily seen by adding an msleep() between the DMA prep and the writing of the command, which makes the problem worse. That means we need to revert that commit and find another way to fix the bogus IRQs. Specifically, after reverting commit 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race"), I put some traces in. I found that the when the interrupts were firing like crazy: - "m_stat" had bits for M_RX_IRQ_EN, M_RX_FIFO_WATERMARK_EN set. - "dma" was set. Further debugging showed that I could make the problem happen more reliably by adding an "msleep(1)" any time after geni_se_setup_m_cmd() ran up until geni_se_rx_dma_prep() programmed the length. A rather simple fix is to change geni_se_select_dma_mode() so it's a true inverse of geni_se_select_fifo_mode() and disables all the FIFO related interrupts. Now the problematic interrupts can't fire and we can program things in the correct order without worrying. As part of this, let's also change the writel_relaxed() in the prepare function to a writel() so that our DMA is guaranteed to be prepared now that we can't rely on geni_se_setup_m_cmd()'s writel(). NOTE: the only current user of GENI_SE_DMA in mainline is i2c. Fixes: 37692de5d523 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller") Fixes: 02b9aec59243 ("i2c: i2c-qcom-geni: Fix DMA transfer race") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- drivers/soc/qcom/qcom-geni-se.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)