mbox series

[RFC,0/2] serial: sci: fix OOPS because of wrongly running hrtimer

Message ID 20240416123545.7098-4-wsa+renesas@sang-engineering.com
Headers show
Series serial: sci: fix OOPS because of wrongly running hrtimer | expand

Message

Wolfram Sang April 16, 2024, 12:35 p.m. UTC
Dirk sent a very interesting bug report[1]. This series is what I found
out by reviewing the driver. It is not tested yet because I couldn't
trigger the code path yet. The console still works normally with these
patches. Still, I am already curious in hearing your opinions, so here
is what I have...

[1] https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com

Wolfram Sang (2):
  serial: sh-sci: start hrtimer after setting up DMA
  serial: sh-sci: always cancel hrtimer when DMA RX is invalidated

 drivers/tty/serial/sh-sci.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

Comments

Geert Uytterhoeven April 24, 2024, 9:48 a.m. UTC | #1
Hi Wolfram,

On Tue, Apr 16, 2024 at 2:35 PM Wolfram Sang
<wsa+renesas@sang-engineering.com> wrote:
> Clear the timer whenever 'chan_rx' is cleared to avoid an OOPS.
> Currently, the driver only runs the timer when 'chan_rx' is set before.
> However, it is good defensive programming to make sure the hrtimer is
> always stopped before clearing the 'chan_rx' pointer.
>
> Reported-by: Dirk Behme <dirk.behme@de.bosch.com>
> Closes: https://lore.kernel.org/r/ee6c9e16-9f29-450e-81da-4a8dceaa8fc7@de.bosch.com
> Fixes: 9ab765566086 ("serial: sh-sci: Remove timer on shutdown of port")
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Thanks for your patch!

> Locking needs to be double-checked here. This patch is mainly calling
> for opinions.

I do think you need to cancel the timer: even when not restarting
the timer in sci_dma_rx_complete() due to a DMA failure, the previous
timer may still be running, and will cause a NULL pointer dereference
on s->chan_rx on timer expiry.

> --- a/drivers/tty/serial/sh-sci.c
> +++ b/drivers/tty/serial/sh-sci.c
> @@ -1262,6 +1262,7 @@ static void sci_dma_rx_chan_invalidate(struct sci_port *s)
>  {
>         unsigned int i;
>
> +       hrtimer_cancel(&s->rx_timer);

Is it safe to do this unconditionally on shutdown (cfr. the old check
for s->chan_rx_saved)?

>         s->chan_rx = NULL;
>         for (i = 0; i < ARRAY_SIZE(s->cookie_rx); i++)
>                 s->cookie_rx[i] = -EINVAL;
> @@ -2242,14 +2243,6 @@ static void sci_shutdown(struct uart_port *port)
>                        scr & (SCSCR_CKE1 | SCSCR_CKE0 | s->hscif_tot));
>         uart_port_unlock_irqrestore(port, flags);
>
> -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> -       if (s->chan_rx_saved) {
> -               dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> -                       port->line);
> -               hrtimer_cancel(&s->rx_timer);
> -       }
> -#endif
> -
>         if (s->rx_trigger > 1 && s->rx_fifo_timeout > 0)
>                 del_timer_sync(&s->rx_fifo_timer);
>         sci_free_irq(s);

Gr{oetje,eeting}s,

                        Geert
Wolfram Sang May 2, 2024, 6:17 p.m. UTC | #2
Hi Geert,

good news, I was able to trigger the DMA rx code path. I dunno what I
did wrong last time. I started from scratch again and it worked easily
by dd-ing random data to the second non-console debug port.

> I do think you need to cancel the timer: even when not restarting
> the timer in sci_dma_rx_complete() due to a DMA failure, the previous
> timer may still be running, and will cause a NULL pointer dereference
> on s->chan_rx on timer expiry.

Taking locking into account, I think this patch is bogus. If we run into
this NULL-pointer, we have a locking problem and cancelling the timer in
sci_dma_rx_chan_invalidate() is not going to fix the underlying locking
problem. sci_dma_rx_chan_invalidate() does not only clear the pointer
but also the cookie_rx-array. sci_dma_rx_timer_fn() bails out via
sci_dma_rx_find_active() if that array is cleared. It does so before
accessing the chan_rx-pointer. So, it looks to me that should work once
all calls to sci_dma_rx_chan_invalidate() are protected. And there is
one path where this is not true, namely via sci_dma_rx_release() during
shutdown. This is why I asked Dirk if the system was about to shutdown.
Currently, I don't see any other problematic code path.

> > -#ifdef CONFIG_SERIAL_SH_SCI_DMA
> > -       if (s->chan_rx_saved) {
> > -               dev_dbg(port->dev, "%s(%d) deleting rx_timer\n", __func__,
> > -                       port->line);
> > -               hrtimer_cancel(&s->rx_timer);
> > -       }
> > -#endif

Also, this chunk needs to stay. I suggested in patch 1 to cancel the
timer on successful dma_rx_complete, so the timer only runs when a DMA
is in progress. Then, of course, we need to cancel it in shutdown.

I hope I am not seeing "no wood for the trees" by now. I am not
convinced that I actually found Dirk's race condition yet...

All the best,

   Wolfram