diff mbox series

[v2] serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler

Message ID 20221216021504.457699-1-marex@denx.de
State Superseded
Headers show
Series [v2] serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler | expand

Commit Message

Marek Vasut Dec. 16, 2022, 2:15 a.m. UTC
Avoid locking in hard interrupt context, move the entirety of hard IRQ
context code into single IRQ handler, preempt-rt would move the handler
into thread. This fixes the following splat with preempt-rt enabled:

 BUG: scheduling while atomic: (mount)/1289/0x00010001
 Modules linked in:
 Preemption disabled at:
 [<c0119127>] irq_enter_rcu+0xb/0x42
 CPU: 0 PID: 1289 Comm: (mount) Not tainted 6.1.0-rc7-rt5-stable-standard-00006-gd70aeccb9f0f #17
 Hardware name: STM32 (Device Tree Support)
  unwind_backtrace from show_stack+0xb/0xc
  show_stack from dump_stack_lvl+0x2b/0x34
  dump_stack_lvl from __schedule_bug+0x53/0x80
  __schedule_bug from __schedule+0x47/0x404
  __schedule from schedule_rtlock+0x15/0x34
  schedule_rtlock from rtlock_slowlock_locked+0x1d7/0x57e
  rtlock_slowlock_locked from rt_spin_lock+0x29/0x3c
  rt_spin_lock from stm32_usart_interrupt+0xa9/0x110
  stm32_usart_interrupt from __handle_irq_event_percpu+0x73/0x14e
  __handle_irq_event_percpu from handle_irq_event_percpu+0x9/0x22
  handle_irq_event_percpu from handle_irq_event+0x53/0x76
  handle_irq_event from handle_fasteoi_irq+0x65/0xa8
  handle_fasteoi_irq from handle_irq_desc+0xf/0x18
  handle_irq_desc from gic_handle_irq+0x45/0x54
  gic_handle_irq from generic_handle_arch_irq+0x19/0x2c
  generic_handle_arch_irq from call_with_stack+0xd/0x10

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Erwan Le Ray <erwan.leray@foss.st.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Caron <valentin.caron@foss.st.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
To: linux-serial@vger.kernel.org
---
V2: - Update patch subject, was:
      serial: stm32: Move hard IRQ handling to threaded interrupt context
    - Use request_irq() instead, rename the IRQ handler function
---
 drivers/tty/serial/stm32-usart.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

Comments

Marek Vasut Dec. 16, 2022, 11:52 a.m. UTC | #1
On 12/16/22 09:00, Sebastian Andrzej Siewior wrote:
> On 2022-12-16 03:15:04 [+0100], Marek Vasut wrote:
>> Avoid locking in hard interrupt context, move the entirety of hard IRQ
>> context code into single IRQ handler, preempt-rt would move the handler
>> into thread. This fixes the following splat with preempt-rt enabled:
>>
>>   BUG: scheduling while atomic: (mount)/1289/0x00010001
>>   Modules linked in:
>>   Preemption disabled at:
>>   [<c0119127>] irq_enter_rcu+0xb/0x42
>>   CPU: 0 PID: 1289 Comm: (mount) Not tainted 6.1.0-rc7-rt5-stable-standard-00006-gd70aeccb9f0f #17
>>   Hardware name: STM32 (Device Tree Support)
>>    unwind_backtrace from show_stack+0xb/0xc
>>    show_stack from dump_stack_lvl+0x2b/0x34
>>    dump_stack_lvl from __schedule_bug+0x53/0x80
>>    __schedule_bug from __schedule+0x47/0x404
>>    __schedule from schedule_rtlock+0x15/0x34
>>    schedule_rtlock from rtlock_slowlock_locked+0x1d7/0x57e
>>    rtlock_slowlock_locked from rt_spin_lock+0x29/0x3c
>>    rt_spin_lock from stm32_usart_interrupt+0xa9/0x110
>>    stm32_usart_interrupt from __handle_irq_event_percpu+0x73/0x14e
>>    __handle_irq_event_percpu from handle_irq_event_percpu+0x9/0x22
>>    handle_irq_event_percpu from handle_irq_event+0x53/0x76
>>    handle_irq_event from handle_fasteoi_irq+0x65/0xa8
>>    handle_fasteoi_irq from handle_irq_desc+0xf/0x18
>>    handle_irq_desc from gic_handle_irq+0x45/0x54
>>    gic_handle_irq from generic_handle_arch_irq+0x19/0x2c
>>    generic_handle_arch_irq from call_with_stack+0xd/0x10
> 
> Could this be replaced maybe with a proper description instead of
> slapping the backtrace into the patch description?

Sure, I'm not confident in the preempt-rt parts, thanks for the commit 
message update, I'll include it in V3.

>    Requesting an interrupt with IRQF_ONESHOT will run the primary handler
>    in the hard-IRQ context even in the force-threaded mode. The
>    force-threaded mode is used by PREEMPT_RT in order to avoid acquiring
>    sleeping locks (spinlock_t) in hard-IRQ context. This combination
>    makes it impossible and leads to "sleeping while atomic" warnings.
> 
>    Use one interrupt handler for both handlers (primary and secondary)
>    and drop the IRQF_ONESHOT flag which is not needed.
> 
>    Fixes: e359b4411c283 ("serial: stm32: fix threaded interrupt handling")
> 
> As for your change, this should work.
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thanks for all the help with this.

> Should this DMA-mode need to be outsourced (due to $REASON) you can
> request two handlers but then you need to avoid IRQF_ONESHOT and the
> primary handler needs to disable the interrupt source in the UART
> hardware.
> 
> Also it might be worth checking if the DMA mode makes any sense if the
> FIFO is so small.

If you want to push a lot of data through the UART without refilling the 
small FIFO all the time and getting a lot of IRQs, that's where the DMA 
comes in. Maybe I misunderstood this comment ?
Sebastian Andrzej Siewior Dec. 16, 2022, 4:40 p.m. UTC | #2
On 2022-12-16 12:52:43 [+0100], Marek Vasut wrote:
> > Also it might be worth checking if the DMA mode makes any sense if the
> > FIFO is so small.
> 
> If you want to push a lot of data through the UART without refilling the
> small FIFO all the time and getting a lot of IRQs, that's where the DMA
> comes in. Maybe I misunderstood this comment ?

I have no idea how this works in detail. However: if you FIFO which is
16 bytes deep then filling it means 16 writes and so one interrupt every
16 bytes. If the DMA engine is the "average slave dma support" then it
gets programmed to fill 16 bytes and then issues an interrupt again. The
"lucky" case if you can program say 512 bytes (or so) and the DMA
engines itself is able to fill the FIFO 32 times without involving the
CPU. The last case is clear win.

If you have the 16 bytes-DMA case then you would have to check what is
cheaper: programming the DMA engine for 16 bytes or stuffing it directly
into the FIFO.
If the DMA engine supports the larger case say 512 and refills the FIFO
on its own, then using it makes sense. However this makes usually sense
for larger transfers. Having a console on it usually leads to more
overhead because you receive usually say two bytes a second (unless you
are a fast typer). Sending depends on the usecase and the peak is
usually during boot. People doing BT via UART usually want to use DMA
because of the insane amount of data, that is pumped.

Sebastian
Marek Vasut Dec. 17, 2022, 2:28 a.m. UTC | #3
On 12/16/22 17:40, Sebastian Andrzej Siewior wrote:
> On 2022-12-16 12:52:43 [+0100], Marek Vasut wrote:
>>> Also it might be worth checking if the DMA mode makes any sense if the
>>> FIFO is so small.
>>
>> If you want to push a lot of data through the UART without refilling the
>> small FIFO all the time and getting a lot of IRQs, that's where the DMA
>> comes in. Maybe I misunderstood this comment ?
> 
> I have no idea how this works in detail. However: if you FIFO which is
> 16 bytes deep then filling it means 16 writes and so one interrupt every
> 16 bytes. If the DMA engine is the "average slave dma support" then it
> gets programmed to fill 16 bytes and then issues an interrupt again. The
> "lucky" case if you can program say 512 bytes (or so) and the DMA
> engines itself is able to fill the FIFO 32 times without involving the
> CPU. The last case is clear win.

Ah, no, on the STM32 the DMA should be capable of streaming arbitrary 
amount of data from DRAM to the UART FIFO if needed I think.

> If you have the 16 bytes-DMA case then you would have to check what is
> cheaper: programming the DMA engine for 16 bytes or stuffing it directly
> into the FIFO.
> If the DMA engine supports the larger case say 512 and refills the FIFO
> on its own, then using it makes sense. However this makes usually sense
> for larger transfers. Having a console on it usually leads to more
> overhead because you receive usually say two bytes a second (unless you
> are a fast typer). Sending depends on the usecase and the peak is
> usually during boot. People doing BT via UART usually want to use DMA
> because of the insane amount of data, that is pumped.

Yes, I think we are in agreement here. The terse version was just a bit 
too terse. Thanks
diff mbox series

Patch

diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index dfdbcf092facc..bbbab8dc2bfa9 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -752,8 +752,9 @@  static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
 	struct tty_port *tport = &port->state->port;
 	struct stm32_port *stm32_port = to_stm32_port(port);
 	const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
-	u32 sr;
+	unsigned long flags;
 	unsigned int size;
+	u32 sr;
 
 	sr = readl_relaxed(port->membase + ofs->isr);
 
@@ -793,27 +794,13 @@  static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
 	}
 
 	if ((sr & USART_SR_TXE) && !(stm32_port->tx_ch)) {
-		spin_lock(&port->lock);
+		spin_lock_irqsave(&port->lock, flags);
 		stm32_usart_transmit_chars(port);
-		spin_unlock(&port->lock);
+		spin_unlock_irqrestore(&port->lock, flags);
 	}
 
-	if (stm32_usart_rx_dma_enabled(port))
-		return IRQ_WAKE_THREAD;
-	else
-		return IRQ_HANDLED;
-}
-
-static irqreturn_t stm32_usart_threaded_interrupt(int irq, void *ptr)
-{
-	struct uart_port *port = ptr;
-	struct tty_port *tport = &port->state->port;
-	struct stm32_port *stm32_port = to_stm32_port(port);
-	unsigned int size;
-	unsigned long flags;
-
 	/* Receiver timeout irq for DMA RX */
-	if (!stm32_port->throttled) {
+	if (stm32_usart_rx_dma_enabled(port) && !stm32_port->throttled) {
 		spin_lock_irqsave(&port->lock, flags);
 		size = stm32_usart_receive_chars(port, false);
 		uart_unlock_and_check_sysrq_irqrestore(port, flags);
@@ -1016,10 +1003,8 @@  static int stm32_usart_startup(struct uart_port *port)
 	u32 val;
 	int ret;
 
-	ret = request_threaded_irq(port->irq, stm32_usart_interrupt,
-				   stm32_usart_threaded_interrupt,
-				   IRQF_ONESHOT | IRQF_NO_SUSPEND,
-				   name, port);
+	ret = request_irq(port->irq, stm32_usart_interrupt,
+			  IRQF_NO_SUSPEND, name, port);
 	if (ret)
 		return ret;