Message ID | 20240404150034.41648-1-andriy.shevchenko@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [v1,1/1] serial: core: Clearing the circular buffer before NULLifying it | expand |
On 04. 04. 24, 16:59, Andy Shevchenko wrote: > The circular buffer is NULLified in uart_tty_port_shutdown() > under the spin lock. However, the PM or other timer based callbacks > may still trigger after this event without knowning that buffer pointer > is not valid. Since the serial code is a bit inconsistent in checking > the buffer state (some rely on the head-tail positions, some on the > buffer pointer), it's better to have both aligned, i.e. buffer pointer > to be NULL and head-tail possitions to be the same, meaning it's empty. > This will prevent asynchronous calls to dereference NULL pointer as > reported recently in 8250 case: > > BUG: kernel NULL pointer dereference, address: 00000cf5 > Workqueue: pm pm_runtime_work > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > ... > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > __rpm_callback (drivers/base/power/runtime.c:393) > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > rpm_suspend (drivers/base/power/runtime.c:447) Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday too. So thanks for looking into this. And it's pretty weird. I think it's new with the runtime PM (sure, /me reads Fixes: now). I am not sure if it is documented, but most of the code in tty/ assumes NO ordinary ->ops (like start_tx()) are called after shutdown(). Actually, to me it occurs like serial8250_start_tx() should not be called in the first place. It makes no sense after all. BTW cannot be x_char en/queued at that time too (the other check in the if)? But again, serial8250_start_tx() should not be called after shutdown(). > The proposed change will prevent ->start_tx() to be called during > suspend on shut down port. > > Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: https://lore.kernel.org/oe-lkp/202404031607.2e92eebe-lkp@intel.com > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > --- > > I have got into the very similar issue while working on max3100 driver. > I haven't checked the 8250 case, but for mine the culprit is the same > and this patch fixes it. Hence I assume it will fix the 8250 case as > well. > > drivers/tty/serial/serial_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c > index a005fc06a077..ba3a674a8bbf 100644 > --- a/drivers/tty/serial/serial_core.c > +++ b/drivers/tty/serial/serial_core.c > @@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port) > * Free the transmit buffer. > */ > uart_port_lock_irq(uport); > + uart_circ_clear(&state->xmit); > buf = state->xmit.buf; > state->xmit.buf = NULL; > uart_port_unlock_irq(uport);
* Jiri Slaby <jirislaby@kernel.org> [240405 05:25]:
> But again, serial8250_start_tx() should not be called after shutdown().
Sounds like we should add some check in addition to UPF_DEAD to
serial_port_runtime_suspend() and serial_port_runtime_resume() to
bail out early on shutdown.
Regards,
Tony
On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote: > On 04. 04. 24, 16:59, Andy Shevchenko wrote: > > The circular buffer is NULLified in uart_tty_port_shutdown() > > under the spin lock. However, the PM or other timer based callbacks > > may still trigger after this event without knowning that buffer pointer > > is not valid. Since the serial code is a bit inconsistent in checking > > the buffer state (some rely on the head-tail positions, some on the > > buffer pointer), it's better to have both aligned, i.e. buffer pointer > > to be NULL and head-tail possitions to be the same, meaning it's empty. > > This will prevent asynchronous calls to dereference NULL pointer as > > reported recently in 8250 case: > > > > BUG: kernel NULL pointer dereference, address: 00000cf5 > > Workqueue: pm pm_runtime_work > > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > ... > > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > > __rpm_callback (drivers/base/power/runtime.c:393) > > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > > rpm_suspend (drivers/base/power/runtime.c:447) > > Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday > too. So thanks for looking into this. > And it's pretty weird. I think it's new with the runtime PM (sure, /me reads > Fixes: now). I am not sure if it is documented, but most of the code in tty/ > assumes NO ordinary ->ops (like start_tx()) are called after shutdown(). > Actually, to me it occurs like serial8250_start_tx() should not be called in > the first place. It makes no sense after all. > > BTW cannot be x_char en/queued at that time too (the other check in the if)? > But again, serial8250_start_tx() should not be called after shutdown(). Yes, and I have no clue how we can check this as startup can be called again and so on. The PM callback is timer based AFAIU, meaning it may happen at any time. But do you agree that this patch has value on its own?
On Fri, Apr 05, 2024 at 06:17:54PM +0300, Andy Shevchenko wrote: > On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote: > > On 04. 04. 24, 16:59, Andy Shevchenko wrote: > > > The circular buffer is NULLified in uart_tty_port_shutdown() > > > under the spin lock. However, the PM or other timer based callbacks > > > may still trigger after this event without knowning that buffer pointer > > > is not valid. Since the serial code is a bit inconsistent in checking > > > the buffer state (some rely on the head-tail positions, some on the > > > buffer pointer), it's better to have both aligned, i.e. buffer pointer > > > to be NULL and head-tail possitions to be the same, meaning it's empty. > > > This will prevent asynchronous calls to dereference NULL pointer as > > > reported recently in 8250 case: > > > > > > BUG: kernel NULL pointer dereference, address: 00000cf5 > > > Workqueue: pm pm_runtime_work > > > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > > ... > > > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > > > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > > > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > > > __rpm_callback (drivers/base/power/runtime.c:393) > > > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > > > rpm_suspend (drivers/base/power/runtime.c:447) > > > > Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday > > too. So thanks for looking into this. > > > And it's pretty weird. I think it's new with the runtime PM (sure, /me reads > > Fixes: now). I am not sure if it is documented, but most of the code in tty/ > > assumes NO ordinary ->ops (like start_tx()) are called after shutdown(). > > Actually, to me it occurs like serial8250_start_tx() should not be called in > > the first place. It makes no sense after all. > > > > BTW cannot be x_char en/queued at that time too (the other check in the if)? > > But again, serial8250_start_tx() should not be called after shutdown(). > > Yes, and I have no clue how we can check this as startup can be called again > and so on. The PM callback is timer based AFAIU, meaning it may happen at any > time. > > But do you agree that this patch has value on its own? FWIW, https://lore.kernel.org/all/0000000000009e2dd805ffc595a3@google.com/T/
* Andy Shevchenko <andriy.shevchenko@linux.intel.com> [240405 22:37]: > On Fri, Apr 05, 2024 at 06:17:54PM +0300, Andy Shevchenko wrote: > > On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote: > > > BTW cannot be x_char en/queued at that time too (the other check in the if)? > > > But again, serial8250_start_tx() should not be called after shutdown(). > > > > Yes, and I have no clue how we can check this as startup can be called again > > and so on. The PM callback is timer based AFAIU, meaning it may happen at any > > time. So below is an incomplete pseudo patch just showing where we could disable tx for runtime PM. The patch won't compile, and assumes we only disable tx for runtime PM. However, if we need it elsewhere also, then we may want to set up some UPF_TX_ENABLED type flag instead of serial_base_port specific calls. My preference would be to limit it to serial_port.c if we can get away with that. Anybody have better ideas for enabling and disabling tx? > > But do you agree that this patch has value on its own? > > FWIW, https://lore.kernel.org/all/0000000000009e2dd805ffc595a3@google.com/T/ No objections from me for clearing the xmit. But should it also be done for uart_shutdown() in addition to uart_tty_port_shutdown()? Regards, Tony 8< ----------------------- diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -345,16 +345,23 @@ static int uart_startup(struct tty_struct *tty, struct uart_state *state, bool init_hw) { struct tty_port *port = &state->port; + struct uart_port *uport; int retval; if (tty_port_initialized(port)) - return 0; + goto enable_tx; retval = uart_port_startup(tty, state, init_hw); - if (retval) + if (retval) { set_bit(TTY_IO_ERROR, &tty->flags); + return retval; + } - return retval; +enable_tx: + uport = uart_port_check(state); + serial_base_port_enable_tx(uport); + + return 0; } /* @@ -377,6 +384,9 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) if (tty) set_bit(TTY_IO_ERROR, &tty->flags); + if (uport) + serial_base_port_disable_tx(uport); + if (tty_port_initialized(port)) { tty_port_set_initialized(port, false); @@ -1821,6 +1831,7 @@ static void uart_tty_port_shutdown(struct tty_port *port) uport->ops->stop_rx(uport); uart_port_unlock_irq(uport); + serial_base_port_disable_tx(uport); uart_port_shutdown(port); /*
Hi Andy, On 2024/4/4 22:59, Andy Shevchenko wrote: > The circular buffer is NULLified in uart_tty_port_shutdown() > under the spin lock. However, the PM or other timer based callbacks > may still trigger after this event without knowning that buffer pointer > is not valid. Since the serial code is a bit inconsistent in checking > the buffer state (some rely on the head-tail positions, some on the > buffer pointer), it's better to have both aligned, i.e. buffer pointer > to be NULL and head-tail possitions to be the same, meaning it's empty. > This will prevent asynchronous calls to dereference NULL pointer as > reported recently in 8250 case: > > BUG: kernel NULL pointer dereference, address: 00000cf5 > Workqueue: pm pm_runtime_work > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > ... > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > __rpm_callback (drivers/base/power/runtime.c:393) > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > rpm_suspend (drivers/base/power/runtime.c:447) > > The proposed change will prevent ->start_tx() to be called during > suspend on shut down port. > Just saw the issue and thanks for your timely fix. I didn't got a board with 8250 and sorry for didn't found this issue. FYI, I checked device_shutdown() and seems it called pm_runtime_barrier() for waiting all the scheduled RPM callbacks finished and keep the device in resume state. So ideally there shouldn't be any pending requests later since we handled them before shutdown? There's someone encountered the same issue in shutdown() due to runtime pm and fixed it in af8db1508f2c ("PM / driver core: disable device's runtime PM during shutdown") patch above seems to still have some problem and later fixed by: fe6b91f47080 ("PM / Driver core: leave runtime PM enabled during system shutdown") But seems the handling in the driver core doesn't cover the case here.. > Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: https://lore.kernel.org/oe-lkp/202404031607.2e92eebe-lkp@intel.com > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > --- > > I have got into the very similar issue while working on max3100 driver. > I haven't checked the 8250 case, but for mine the culprit is the same > and this patch fixes it. Hence I assume it will fix the 8250 case as > well. > > drivers/tty/serial/serial_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c > index a005fc06a077..ba3a674a8bbf 100644 > --- a/drivers/tty/serial/serial_core.c > +++ b/drivers/tty/serial/serial_core.c > @@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port) > * Free the transmit buffer. > */ > uart_port_lock_irq(uport); > + uart_circ_clear(&state->xmit); > buf = state->xmit.buf; > state->xmit.buf = NULL; > uart_port_unlock_irq(uport); >
On Sun, Apr 07, 2024 at 05:49:19PM +0800, Yicong Yang wrote: > On 2024/4/4 22:59, Andy Shevchenko wrote: > > The circular buffer is NULLified in uart_tty_port_shutdown() > > under the spin lock. However, the PM or other timer based callbacks > > may still trigger after this event without knowning that buffer pointer > > is not valid. Since the serial code is a bit inconsistent in checking > > the buffer state (some rely on the head-tail positions, some on the > > buffer pointer), it's better to have both aligned, i.e. buffer pointer > > to be NULL and head-tail possitions to be the same, meaning it's empty. > > This will prevent asynchronous calls to dereference NULL pointer as > > reported recently in 8250 case: > > > > BUG: kernel NULL pointer dereference, address: 00000cf5 > > Workqueue: pm pm_runtime_work > > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > ... > > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > > __rpm_callback (drivers/base/power/runtime.c:393) > > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > > rpm_suspend (drivers/base/power/runtime.c:447) > > > > The proposed change will prevent ->start_tx() to be called during > > suspend on shut down port. > > Just saw the issue and thanks for your timely fix. I didn't got a board with > 8250 and sorry for didn't found this issue. But does this change make no regression in your case? Can you test it? > FYI, I checked device_shutdown() and seems it called pm_runtime_barrier() for waiting all > the scheduled RPM callbacks finished and keep the device in resume state. So ideally there > shouldn't be any pending requests later since we handled them before shutdown? > > There's someone encountered the same issue in shutdown() due to runtime pm and fixed it in > af8db1508f2c ("PM / driver core: disable device's runtime PM during shutdown") > patch above seems to still have some problem and later fixed by: > fe6b91f47080 ("PM / Driver core: leave runtime PM enabled during system shutdown") Ah, yes, thanks for reminding (yeah, I saw those patches, let me test it on my setup. > But seems the handling in the driver core doesn't cover the case here.. Of course, since we have our own shutdown on the upper level, we don't kill the device, but we do release _some_ resources.
On Fri, Apr 05, 2024 at 07:25:03AM +0200, Jiri Slaby wrote: > On 04. 04. 24, 16:59, Andy Shevchenko wrote: > > The circular buffer is NULLified in uart_tty_port_shutdown() > > under the spin lock. However, the PM or other timer based callbacks > > may still trigger after this event without knowning that buffer pointer > > is not valid. Since the serial code is a bit inconsistent in checking > > the buffer state (some rely on the head-tail positions, some on the > > buffer pointer), it's better to have both aligned, i.e. buffer pointer > > to be NULL and head-tail possitions to be the same, meaning it's empty. > > This will prevent asynchronous calls to dereference NULL pointer as > > reported recently in 8250 case: > > > > BUG: kernel NULL pointer dereference, address: 00000cf5 > > Workqueue: pm pm_runtime_work > > EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > ... > > ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) > > __start_tx (drivers/tty/serial/8250/8250_port.c:1551) > > serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) > > serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) > > __rpm_callback (drivers/base/power/runtime.c:393) > > ? serial_port_remove (drivers/tty/serial/serial_port.c:50) > > rpm_suspend (drivers/base/power/runtime.c:447) > > Yeah, I noticed start_tx() is called repeatedly after shutdown() yesterday > too. So thanks for looking into this. > > And it's pretty weird. I think it's new with the runtime PM (sure, /me reads > Fixes: now). I am not sure if it is documented, but most of the code in tty/ > assumes NO ordinary ->ops (like start_tx()) are called after shutdown(). > Actually, to me it occurs like serial8250_start_tx() should not be called in > the first place. It makes no sense after all. So, with PM autosuspend the [PM] callback can be called in to cases: - port is open and busy, but PM is not informed (yet) of it - port is closed while PM timer is still counting The Fixes seems about the first case (so we need to call Tx there). The second one probably can be fixed properly with PM barrier. This fix is just against the oops AFAIU the bug report and my own case.
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index a005fc06a077..ba3a674a8bbf 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -1788,6 +1788,7 @@ static void uart_tty_port_shutdown(struct tty_port *port) * Free the transmit buffer. */ uart_port_lock_irq(uport); + uart_circ_clear(&state->xmit); buf = state->xmit.buf; state->xmit.buf = NULL; uart_port_unlock_irq(uport);
The circular buffer is NULLified in uart_tty_port_shutdown() under the spin lock. However, the PM or other timer based callbacks may still trigger after this event without knowning that buffer pointer is not valid. Since the serial code is a bit inconsistent in checking the buffer state (some rely on the head-tail positions, some on the buffer pointer), it's better to have both aligned, i.e. buffer pointer to be NULL and head-tail possitions to be the same, meaning it's empty. This will prevent asynchronous calls to dereference NULL pointer as reported recently in 8250 case: BUG: kernel NULL pointer dereference, address: 00000cf5 Workqueue: pm pm_runtime_work EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) ... ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) __start_tx (drivers/tty/serial/8250/8250_port.c:1551) serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) __rpm_callback (drivers/base/power/runtime.c:393) ? serial_port_remove (drivers/tty/serial/serial_port.c:50) rpm_suspend (drivers/base/power/runtime.c:447) The proposed change will prevent ->start_tx() to be called during suspend on shut down port. Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202404031607.2e92eebe-lkp@intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> --- I have got into the very similar issue while working on max3100 driver. I haven't checked the 8250 case, but for mine the culprit is the same and this patch fixes it. Hence I assume it will fix the 8250 case as well. drivers/tty/serial/serial_core.c | 1 + 1 file changed, 1 insertion(+)