Message ID | 20240704101805.30612-4-johan+linaro@kernel.org |
---|---|
State | New |
Headers | show |
Series | serial: qcom-geni: fix lockups | expand |
Hi, On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > The Qualcomm GENI serial driver did not handle buffer flushing and used > to print discarded characters when the circular buffer was cleared. > Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") > this instead resulted in a hard lockup due to > qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the > interrupt handler. > > The underlying bugs have now been fixed, but make sure to output NUL > characters instead of killing the machine if a similar driver bug is > ever reintroduced. > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > --- > drivers/tty/serial/qcom_geni_serial.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c > index b2bbd2d79dbb..69a632fefc41 100644 > --- a/drivers/tty/serial/qcom_geni_serial.c > +++ b/drivers/tty/serial/qcom_geni_serial.c > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport, > memset(buf, 0, sizeof(buf)); > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD); > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes); > + uart_fifo_out(uport, buf, tx_bytes); FWIW I would have rather we output something much more obviously wrong in this case instead of a NUL byte. Maybe we should fill it with "@" characters or something? As you said: the driver shouldn't get into this error condition so it shouldn't matter, but if we have a bug in the future I'd rather it be an obvious bug instead of a subtle bug. I'm happy to post a patch or provide a Reviewed-by if you want to post a patch. Let me know. -Doug
On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote: > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > > > The Qualcomm GENI serial driver did not handle buffer flushing and used > > to print discarded characters when the circular buffer was cleared. > > Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") > > this instead resulted in a hard lockup due to > > qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the > > interrupt handler. > > > > The underlying bugs have now been fixed, but make sure to output NUL > > characters instead of killing the machine if a similar driver bug is > > ever reintroduced. > > > > Signed-off-by: Johan Hovold <johan+linaro@kernel.org> > > --- > > drivers/tty/serial/qcom_geni_serial.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c > > index b2bbd2d79dbb..69a632fefc41 100644 > > --- a/drivers/tty/serial/qcom_geni_serial.c > > +++ b/drivers/tty/serial/qcom_geni_serial.c > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport, > > memset(buf, 0, sizeof(buf)); > > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD); > > > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes); > > + uart_fifo_out(uport, buf, tx_bytes); > > FWIW I would have rather we output something much more obviously wrong > in this case instead of a NUL byte. Maybe we should fill it with "@" > characters or something? As you said: the driver shouldn't get into > this error condition so it shouldn't matter, but if we have a bug in > the future I'd rather it be an obvious bug instead of a subtle bug. Yeah, I've been running with a patch like that locally in my tests, and went a bit back and forth whether I should post it. My reasoning for not doing so was that the bugs have been fixed so we don't need to spend cycles on memsetting the buffer to anything but NUL (I used 'X' in my testing). I guess that can be avoided by only padding the buffer if we ever hit an underrun, but I still thinks it's questionable to spend the effort as this is not something that should be needed. In any case, I didn't want to spend time on it to fix the 6.10 regressions. Killing the machine is perhaps an effective way to get attention to an issue, but I'd much rather have an occasional NUL character in the log *if* this ever becomes an issue at all again. > I'm happy to post a patch or provide a Reviewed-by if you want to post > a patch. Let me know. If you feel strongly about this, I can either fill the buffer with something else than NUL or add error handling for any such future hypothetical bugs. What do you prefer? Johan
On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote: > On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote: > > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport, > > > memset(buf, 0, sizeof(buf)); > > > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD); > > > > > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes); > > > + uart_fifo_out(uport, buf, tx_bytes); > > > > FWIW I would have rather we output something much more obviously wrong > > in this case instead of a NUL byte. Maybe we should fill it with "@" > > characters or something? As you said: the driver shouldn't get into > > this error condition so it shouldn't matter, but if we have a bug in > > the future I'd rather it be an obvious bug instead of a subtle bug. > > Yeah, I've been running with a patch like that locally in my tests, and > went a bit back and forth whether I should post it. My reasoning for not > doing so was that the bugs have been fixed so we don't need to spend > cycles on memsetting the buffer to anything but NUL (I used 'X' in my > testing). > > I guess that can be avoided by only padding the buffer if we ever hit an > underrun, but I still thinks it's questionable to spend the effort as > this is not something that should be needed. In any case, I didn't want > to spend time on it to fix the 6.10 regressions. > > Killing the machine is perhaps an effective way to get attention to an > issue, but I'd much rather have an occasional NUL character in the log > *if* this ever becomes an issue at all again. > > > I'm happy to post a patch or provide a Reviewed-by if you want to post > > a patch. Let me know. > > If you feel strongly about this, I can either fill the buffer with > something else than NUL or add error handling for any such future > hypothetical bugs. What do you prefer? Actually we just need to clear the buffer on entry, which would do away with the unnecessary memset() that is there today. This should also give you a printable indication that something is wrong in case a similar bug is ever reintroduced (e.g. the last four characters would be repeated until the transfer is complete instead of a fixed char like '@'). Perhaps that's good enough as a compromise? Johan
Hi, On Tue, Jul 9, 2024 at 5:55 AM Johan Hovold <johan@kernel.org> wrote: > > On Tue, Jul 09, 2024 at 11:44:18AM +0200, Johan Hovold wrote: > > On Mon, Jul 08, 2024 at 04:59:59PM -0700, Doug Anderson wrote: > > > On Thu, Jul 4, 2024 at 3:19 AM Johan Hovold <johan+linaro@kernel.org> wrote: > > > > > @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport, > > > > memset(buf, 0, sizeof(buf)); > > > > tx_bytes = min(remaining, BYTES_PER_FIFO_WORD); > > > > > > > > - tx_bytes = uart_fifo_out(uport, buf, tx_bytes); > > > > + uart_fifo_out(uport, buf, tx_bytes); > > > > > > FWIW I would have rather we output something much more obviously wrong > > > in this case instead of a NUL byte. Maybe we should fill it with "@" > > > characters or something? As you said: the driver shouldn't get into > > > this error condition so it shouldn't matter, but if we have a bug in > > > the future I'd rather it be an obvious bug instead of a subtle bug. > > > > Yeah, I've been running with a patch like that locally in my tests, and > > went a bit back and forth whether I should post it. My reasoning for not > > doing so was that the bugs have been fixed so we don't need to spend > > cycles on memsetting the buffer to anything but NUL (I used 'X' in my > > testing). > > > > I guess that can be avoided by only padding the buffer if we ever hit an > > underrun, but I still thinks it's questionable to spend the effort as > > this is not something that should be needed. In any case, I didn't want > > to spend time on it to fix the 6.10 regressions. > > > > Killing the machine is perhaps an effective way to get attention to an > > issue, but I'd much rather have an occasional NUL character in the log > > *if* this ever becomes an issue at all again. > > > > > I'm happy to post a patch or provide a Reviewed-by if you want to post > > > a patch. Let me know. > > > > If you feel strongly about this, I can either fill the buffer with > > something else than NUL or add error handling for any such future > > hypothetical bugs. What do you prefer? > > Actually we just need to clear the buffer on entry, which would do away > with the unnecessary memset() that is there today. This should also give > you a printable indication that something is wrong in case a similar bug > is ever reintroduced (e.g. the last four characters would be repeated > until the transfer is complete instead of a fixed char like '@'). > > Perhaps that's good enough as a compromise? IMO initting 32-bits of data should be fine to do each time through the loop. I've sent a patch: https://lore.kernel.org/r/20240709162841.1.I93bf39f29d1887c46c74fbf8d4b937f6497cdfaa@changeid -Doug
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c index b2bbd2d79dbb..69a632fefc41 100644 --- a/drivers/tty/serial/qcom_geni_serial.c +++ b/drivers/tty/serial/qcom_geni_serial.c @@ -878,7 +878,7 @@ static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport, memset(buf, 0, sizeof(buf)); tx_bytes = min(remaining, BYTES_PER_FIFO_WORD); - tx_bytes = uart_fifo_out(uport, buf, tx_bytes); + uart_fifo_out(uport, buf, tx_bytes); iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);
The Qualcomm GENI serial driver did not handle buffer flushing and used to print discarded characters when the circular buffer was cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo") this instead resulted in a hard lockup due to qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the interrupt handler. The underlying bugs have now been fixed, but make sure to output NUL characters instead of killing the machine if a similar driver bug is ever reintroduced. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> --- drivers/tty/serial/qcom_geni_serial.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)