Message ID | 1615376308-1941-1-git-send-email-loic.poulain@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | bus: mhi: Command completion workaround | expand |
On 3/10/2021 4:38 AM, Loic Poulain wrote: > Some buggy hardwares (e.g sdx24) may report the current command > ring wp pointer instead of the command completion pointer. It's > obviously wrong, causing completion timeout. We can however deal > with that situation by completing the cmd n-1 element, which is > what the device actually completes. > > Signed-off-by: Loic Poulain <loic.poulain@linaro.org> > --- > drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c > index 16b9640..3e3c520 100644 > --- a/drivers/bus/mhi/core/main.c > +++ b/drivers/bus/mhi/core/main.c > @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > { > dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); > struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; > + struct device *dev = &mhi_cntrl->mhi_dev->dev; > struct mhi_ring *mhi_ring = &cmd_ring->ring; > struct mhi_tre *cmd_pkt; > struct mhi_chan *mhi_chan; > @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > cmd_pkt = mhi_to_virtual(mhi_ring, ptr); > > + if (unlikely(cmd_pkt == mhi_ring->wp)) { > + /* Some buggy hardwares (e.g sdx24) sometimes report the current > + * command ring wp pointer instead of the command completion > + * pointer. It's obviously wrong, causing completion timeout. We > + * can however deal with that situation by completing the cmd > + * n-1 element. > + */ > + void *ring_ptr = (void *)cmd_pkt - mhi_ring->el_size; > + > + if (ring_ptr < mhi_ring->base) > + ring_ptr += mhi_ring->len; > + > + cmd_pkt = ring_ptr; > + > + dev_warn(dev, "Bad completion pointer (ptr == ring_wp)\n"); Is there value in having this warning every time? I wonder if a _once version would be better to not flood the kernel log. Although this is only for commands, which shouldn't be frequent, so maybe that is the implicit rate limiter. What do you think? > + } > + > chan = MHI_TRE_GET_CMD_CHID(cmd_pkt); > mhi_chan = &mhi_cntrl->mhi_chan[chan]; > write_lock_bh(&mhi_chan->lock); > -- Jeffrey Hugo Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Hi Loic, On 3/10/21 3:38 AM, Loic Poulain wrote: > Some buggy hardwares (e.g sdx24) may report the current command > ring wp pointer instead of the command completion pointer. It's > obviously wrong, causing completion timeout. We can however deal > with that situation by completing the cmd n-1 element, which is > what the device actually completes. > > Signed-off-by: Loic Poulain <loic.poulain@linaro.org> > --- > drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c > index 16b9640..3e3c520 100644 > --- a/drivers/bus/mhi/core/main.c > +++ b/drivers/bus/mhi/core/main.c > @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > { > dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); > struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; > + struct device *dev = &mhi_cntrl->mhi_dev->dev; > struct mhi_ring *mhi_ring = &cmd_ring->ring; > struct mhi_tre *cmd_pkt; > struct mhi_chan *mhi_chan; > @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > cmd_pkt = mhi_to_virtual(mhi_ring, ptr); > > + if (unlikely(cmd_pkt == mhi_ring->wp)) { As per spec : The location of the command ring read pointer is reported to the host on the command completion events in the primary event ring. If device is buggy and updates with WP instead of Rp, we should not workaround it by processing Wp - 1. We can print a warning if cmd_pkt != mhi_ring->rp and let the command completion timeout. This needs to be fixed by device. We can not accommodate device side bug in host side. > + /* Some buggy hardwares (e.g sdx24) sometimes report the current > + * command ring wp pointer instead of the command completion > + * pointer. It's obviously wrong, causing completion timeout. We > + * can however deal with that situation by completing the cmd > + * n-1 element. > + */ > + void *ring_ptr = (void *)cmd_pkt - mhi_ring->el_size; > + > + if (ring_ptr < mhi_ring->base) > + ring_ptr += mhi_ring->len; > + > + cmd_pkt = ring_ptr; > + > + dev_warn(dev, "Bad completion pointer (ptr == ring_wp)\n"); > + } > + > chan = MHI_TRE_GET_CMD_CHID(cmd_pkt); > mhi_chan = &mhi_cntrl->mhi_chan[chan]; > write_lock_bh(&mhi_chan->lock); > Hi Mani, What do you think about this workaround ? Thanks, Hemant -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Hi Hemant, On Wed, 10 Mar 2021 at 21:43, Hemant Kumar <hemantk@codeaurora.org> wrote: > > Hi Loic, > > On 3/10/21 3:38 AM, Loic Poulain wrote: > > Some buggy hardwares (e.g sdx24) may report the current command > > ring wp pointer instead of the command completion pointer. It's > > obviously wrong, causing completion timeout. We can however deal > > with that situation by completing the cmd n-1 element, which is > > what the device actually completes. > > > > Signed-off-by: Loic Poulain <loic.poulain@linaro.org> > > --- > > drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c > > index 16b9640..3e3c520 100644 > > --- a/drivers/bus/mhi/core/main.c > > +++ b/drivers/bus/mhi/core/main.c > > @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > { > > dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); > > struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; > > + struct device *dev = &mhi_cntrl->mhi_dev->dev; > > struct mhi_ring *mhi_ring = &cmd_ring->ring; > > struct mhi_tre *cmd_pkt; > > struct mhi_chan *mhi_chan; > > @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > > > cmd_pkt = mhi_to_virtual(mhi_ring, ptr); > > > > + if (unlikely(cmd_pkt == mhi_ring->wp)) { > As per spec : The location of the command ring read pointer is reported > to the host on the command completion events in the primary event ring. > > If device is buggy and updates with WP instead of Rp, we should not > workaround it by processing Wp - 1. We can print a warning if cmd_pkt != > mhi_ring->rp and let the command completion timeout. This needs to be > fixed by device. We can not accommodate device side bug in host side. I see your point, but here it's not to accommodate the device but the users using such 'buggy' device. The kernel has a ton of 'quirks' in various drivers, I'm not a fan of this but my argument is that: - It captures a behavior that was not captured until now - It workarounds an issue without any impact on non 'buggy' devices - It clearly prints a warn to highlight that it's a known issue that should be fixed - Fixing devices in the wild is quite complex, and we may have to live with it. Regards, Loic
Hi Jeffrey, On Wed, 10 Mar 2021 at 17:19, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > > On 3/10/2021 4:38 AM, Loic Poulain wrote: > > Some buggy hardwares (e.g sdx24) may report the current command > > ring wp pointer instead of the command completion pointer. It's > > obviously wrong, causing completion timeout. We can however deal > > with that situation by completing the cmd n-1 element, which is > > what the device actually completes. > > > > Signed-off-by: Loic Poulain <loic.poulain@linaro.org> > > --- > > drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c > > index 16b9640..3e3c520 100644 > > --- a/drivers/bus/mhi/core/main.c > > +++ b/drivers/bus/mhi/core/main.c > > @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > { > > dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); > > struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; > > + struct device *dev = &mhi_cntrl->mhi_dev->dev; > > struct mhi_ring *mhi_ring = &cmd_ring->ring; > > struct mhi_tre *cmd_pkt; > > struct mhi_chan *mhi_chan; > > @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, > > > > cmd_pkt = mhi_to_virtual(mhi_ring, ptr); > > > > + if (unlikely(cmd_pkt == mhi_ring->wp)) { > > + /* Some buggy hardwares (e.g sdx24) sometimes report the current > > + * command ring wp pointer instead of the command completion > > + * pointer. It's obviously wrong, causing completion timeout. We > > + * can however deal with that situation by completing the cmd > > + * n-1 element. > > + */ > > + void *ring_ptr = (void *)cmd_pkt - mhi_ring->el_size; > > + > > + if (ring_ptr < mhi_ring->base) > > + ring_ptr += mhi_ring->len; > > + > > + cmd_pkt = ring_ptr; > > + > > + dev_warn(dev, "Bad completion pointer (ptr == ring_wp)\n"); > > Is there value in having this warning every time? I wonder if a _once > version would be better to not flood the kernel log. Although this is > only for commands, which shouldn't be frequent, so maybe that is the > implicit rate limiter. > > What do you think? As you said it's kind of self rate-limited because of the unfrequent command operations, mostly for starting and stopping channels. A _once variant would hide the issue a bit, and probably not annoying enough to raise curiosity. > > > + } > > + > > chan = MHI_TRE_GET_CMD_CHID(cmd_pkt); > > mhi_chan = &mhi_cntrl->mhi_chan[chan]; > > write_lock_bh(&mhi_chan->lock); > > > > > -- > Jeffrey Hugo > Qualcomm Technologies, Inc. is a member of the > Code Aurora Forum, a Linux Foundation Collaborative Project.
On 3/11/2021 1:05 AM, Loic Poulain wrote: > Hi Jeffrey, > > On Wed, 10 Mar 2021 at 17:19, Jeffrey Hugo <jhugo@codeaurora.org> wrote: >> >> On 3/10/2021 4:38 AM, Loic Poulain wrote: >>> Some buggy hardwares (e.g sdx24) may report the current command >>> ring wp pointer instead of the command completion pointer. It's >>> obviously wrong, causing completion timeout. We can however deal >>> with that situation by completing the cmd n-1 element, which is >>> what the device actually completes. >>> >>> Signed-off-by: Loic Poulain <loic.poulain@linaro.org> >>> --- >>> drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ >>> 1 file changed, 18 insertions(+) >>> >>> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c >>> index 16b9640..3e3c520 100644 >>> --- a/drivers/bus/mhi/core/main.c >>> +++ b/drivers/bus/mhi/core/main.c >>> @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, >>> { >>> dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); >>> struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; >>> + struct device *dev = &mhi_cntrl->mhi_dev->dev; >>> struct mhi_ring *mhi_ring = &cmd_ring->ring; >>> struct mhi_tre *cmd_pkt; >>> struct mhi_chan *mhi_chan; >>> @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, >>> >>> cmd_pkt = mhi_to_virtual(mhi_ring, ptr); >>> >>> + if (unlikely(cmd_pkt == mhi_ring->wp)) { >>> + /* Some buggy hardwares (e.g sdx24) sometimes report the current >>> + * command ring wp pointer instead of the command completion >>> + * pointer. It's obviously wrong, causing completion timeout. We >>> + * can however deal with that situation by completing the cmd >>> + * n-1 element. >>> + */ >>> + void *ring_ptr = (void *)cmd_pkt - mhi_ring->el_size; >>> + >>> + if (ring_ptr < mhi_ring->base) >>> + ring_ptr += mhi_ring->len; >>> + >>> + cmd_pkt = ring_ptr; >>> + >>> + dev_warn(dev, "Bad completion pointer (ptr == ring_wp)\n"); >> >> Is there value in having this warning every time? I wonder if a _once >> version would be better to not flood the kernel log. Although this is >> only for commands, which shouldn't be frequent, so maybe that is the >> implicit rate limiter. >> >> What do you think? > > As you said it's kind of self rate-limited because of the unfrequent > command operations, mostly for starting and stopping channels. A _once > variant would hide the issue a bit, and probably not annoying enough > to raise curiosity. Thats fair. I happened to notice just now that the block comment you have above is not the proper style. That looks like the netdev style, but we are not in the netdev area. I'm curious to see where you and Hemant land on his comment. -- Jeffrey Hugo Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c index 16b9640..3e3c520 100644 --- a/drivers/bus/mhi/core/main.c +++ b/drivers/bus/mhi/core/main.c @@ -707,6 +707,7 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, { dma_addr_t ptr = MHI_TRE_GET_EV_PTR(tre); struct mhi_cmd *cmd_ring = &mhi_cntrl->mhi_cmd[PRIMARY_CMD_RING]; + struct device *dev = &mhi_cntrl->mhi_dev->dev; struct mhi_ring *mhi_ring = &cmd_ring->ring; struct mhi_tre *cmd_pkt; struct mhi_chan *mhi_chan; @@ -714,6 +715,23 @@ static void mhi_process_cmd_completion(struct mhi_controller *mhi_cntrl, cmd_pkt = mhi_to_virtual(mhi_ring, ptr); + if (unlikely(cmd_pkt == mhi_ring->wp)) { + /* Some buggy hardwares (e.g sdx24) sometimes report the current + * command ring wp pointer instead of the command completion + * pointer. It's obviously wrong, causing completion timeout. We + * can however deal with that situation by completing the cmd + * n-1 element. + */ + void *ring_ptr = (void *)cmd_pkt - mhi_ring->el_size; + + if (ring_ptr < mhi_ring->base) + ring_ptr += mhi_ring->len; + + cmd_pkt = ring_ptr; + + dev_warn(dev, "Bad completion pointer (ptr == ring_wp)\n"); + } + chan = MHI_TRE_GET_CMD_CHID(cmd_pkt); mhi_chan = &mhi_cntrl->mhi_chan[chan]; write_lock_bh(&mhi_chan->lock);
Some buggy hardwares (e.g sdx24) may report the current command ring wp pointer instead of the command completion pointer. It's obviously wrong, causing completion timeout. We can however deal with that situation by completing the cmd n-1 element, which is what the device actually completes. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> --- drivers/bus/mhi/core/main.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) -- 2.7.4