mbox series

[net,v2,0/2] virtio-net: suppress bad irq warning for tx napi

Message ID 20210220014436.3556492-1-weiwan@google.com
Headers show
Series virtio-net: suppress bad irq warning for tx napi | expand

Message

Wei Wang Feb. 20, 2021, 1:44 a.m. UTC
With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this could introduce a race where tx complete
interrupt has been raised, but the handler found there is no work to do
because we have done the work in the previous rx interrupt handler.
This could lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  <IRQ>
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patch series contains 2 patches. The first one adds a new param to
struct vring_virtqueue to control if we want to suppress the bad irq
warning. And the second patch in virtio-net sets it for tx virtqueues if
napi-tx is enabled.

Wei Wang (2):
  virtio: add a new parameter in struct virtqueue
  virtio-net: suppress bad irq warning for tx napi

 drivers/net/virtio_net.c     | 19 ++++++++++++++-----
 drivers/virtio/virtio_ring.c | 16 ++++++++++++++++
 include/linux/virtio.h       |  2 ++
 3 files changed, 32 insertions(+), 5 deletions(-)

Comments

Michael S. Tsirkin Feb. 23, 2021, 2:25 p.m. UTC | #1
On Fri, Feb 19, 2021 at 05:44:34PM -0800, Wei Wang wrote:
> With the implementation of napi-tx in virtio driver, we clean tx

> descriptors from rx napi handler, for the purpose of reducing tx

> complete interrupts. But this could introduce a race where tx complete

> interrupt has been raised, but the handler found there is no work to do

> because we have done the work in the previous rx interrupt handler.

> This could lead to the following warning msg:

> [ 3588.010778] irq 38: nobody cared (try booting with the

> "irqpoll" option)

> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted

> 5.3.0-19-generic #20~18.04.2-Ubuntu

> [ 3588.017940] Call Trace:

> [ 3588.017942]  <IRQ>

> [ 3588.017951]  dump_stack+0x63/0x85

> [ 3588.017953]  __report_bad_irq+0x35/0xc0

> [ 3588.017955]  note_interrupt+0x24b/0x2a0

> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80

> [ 3588.017957]  handle_irq_event+0x3b/0x60

> [ 3588.017958]  handle_edge_irq+0x83/0x1a0

> [ 3588.017961]  handle_irq+0x20/0x30

> [ 3588.017964]  do_IRQ+0x50/0xe0

> [ 3588.017966]  common_interrupt+0xf/0xf

> [ 3588.017966]  </IRQ>

> [ 3588.017989] handlers:

> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt

> [ 3588.025099] Disabling IRQ #38

> 

> This patch series contains 2 patches. The first one adds a new param to

> struct vring_virtqueue to control if we want to suppress the bad irq

> warning. And the second patch in virtio-net sets it for tx virtqueues if

> napi-tx is enabled.


I'm going to be busy until March, I think there should be a better
way to fix this though. Will think about it and respond in about a week.


> Wei Wang (2):

>   virtio: add a new parameter in struct virtqueue

>   virtio-net: suppress bad irq warning for tx napi

> 

>  drivers/net/virtio_net.c     | 19 ++++++++++++++-----

>  drivers/virtio/virtio_ring.c | 16 ++++++++++++++++

>  include/linux/virtio.h       |  2 ++

>  3 files changed, 32 insertions(+), 5 deletions(-)

> 

> -- 

> 2.30.0.617.g56c4b15f3c-goog
Wei Wang Feb. 23, 2021, 5:37 p.m. UTC | #2
On Tue, Feb 23, 2021 at 6:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>

> On Fri, Feb 19, 2021 at 05:44:34PM -0800, Wei Wang wrote:

> > With the implementation of napi-tx in virtio driver, we clean tx

> > descriptors from rx napi handler, for the purpose of reducing tx

> > complete interrupts. But this could introduce a race where tx complete

> > interrupt has been raised, but the handler found there is no work to do

> > because we have done the work in the previous rx interrupt handler.

> > This could lead to the following warning msg:

> > [ 3588.010778] irq 38: nobody cared (try booting with the

> > "irqpoll" option)

> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted

> > 5.3.0-19-generic #20~18.04.2-Ubuntu

> > [ 3588.017940] Call Trace:

> > [ 3588.017942]  <IRQ>

> > [ 3588.017951]  dump_stack+0x63/0x85

> > [ 3588.017953]  __report_bad_irq+0x35/0xc0

> > [ 3588.017955]  note_interrupt+0x24b/0x2a0

> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80

> > [ 3588.017957]  handle_irq_event+0x3b/0x60

> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0

> > [ 3588.017961]  handle_irq+0x20/0x30

> > [ 3588.017964]  do_IRQ+0x50/0xe0

> > [ 3588.017966]  common_interrupt+0xf/0xf

> > [ 3588.017966]  </IRQ>

> > [ 3588.017989] handlers:

> > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt

> > [ 3588.025099] Disabling IRQ #38

> >

> > This patch series contains 2 patches. The first one adds a new param to

> > struct vring_virtqueue to control if we want to suppress the bad irq

> > warning. And the second patch in virtio-net sets it for tx virtqueues if

> > napi-tx is enabled.

>

> I'm going to be busy until March, I think there should be a better

> way to fix this though. Will think about it and respond in about a week.

>

OK. Thanks.

>

> > Wei Wang (2):

> >   virtio: add a new parameter in struct virtqueue

> >   virtio-net: suppress bad irq warning for tx napi

> >

> >  drivers/net/virtio_net.c     | 19 ++++++++++++++-----

> >  drivers/virtio/virtio_ring.c | 16 ++++++++++++++++

> >  include/linux/virtio.h       |  2 ++

> >  3 files changed, 32 insertions(+), 5 deletions(-)

> >

> > --

> > 2.30.0.617.g56c4b15f3c-goog

>
Willem de Bruijn Feb. 23, 2021, 7:13 p.m. UTC | #3
On Tue, Feb 23, 2021 at 12:37 PM Wei Wang <weiwan@google.com> wrote:
>

> On Tue, Feb 23, 2021 at 6:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> >

> > On Fri, Feb 19, 2021 at 05:44:34PM -0800, Wei Wang wrote:

> > > With the implementation of napi-tx in virtio driver, we clean tx

> > > descriptors from rx napi handler, for the purpose of reducing tx

> > > complete interrupts. But this could introduce a race where tx complete

> > > interrupt has been raised, but the handler found there is no work to do

> > > because we have done the work in the previous rx interrupt handler.

> > > This could lead to the following warning msg:

> > > [ 3588.010778] irq 38: nobody cared (try booting with the

> > > "irqpoll" option)

> > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted

> > > 5.3.0-19-generic #20~18.04.2-Ubuntu

> > > [ 3588.017940] Call Trace:

> > > [ 3588.017942]  <IRQ>

> > > [ 3588.017951]  dump_stack+0x63/0x85

> > > [ 3588.017953]  __report_bad_irq+0x35/0xc0

> > > [ 3588.017955]  note_interrupt+0x24b/0x2a0

> > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80

> > > [ 3588.017957]  handle_irq_event+0x3b/0x60

> > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0

> > > [ 3588.017961]  handle_irq+0x20/0x30

> > > [ 3588.017964]  do_IRQ+0x50/0xe0

> > > [ 3588.017966]  common_interrupt+0xf/0xf

> > > [ 3588.017966]  </IRQ>

> > > [ 3588.017989] handlers:

> > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt

> > > [ 3588.025099] Disabling IRQ #38

> > >

> > > This patch series contains 2 patches. The first one adds a new param to

> > > struct vring_virtqueue to control if we want to suppress the bad irq

> > > warning. And the second patch in virtio-net sets it for tx virtqueues if

> > > napi-tx is enabled.

> >

> > I'm going to be busy until March, I think there should be a better

> > way to fix this though. Will think about it and respond in about a week.

> >

> OK. Thanks.


Yes, thanks for helping to think about a solution.

The warning went unreported for years. I'm a bit hesitant to make
actual datapath changes to suppress it, if those may actually have a
higher risk of regressions for some workloads.

Unless they actually might show a clear progression. Which may very
well be possible given the high spurious interrupt rate.

But the odd thing is that by virtue of the interrupt getting masked
once the warning hits, it may actually be difficult to improve on the
efficiency today.

As you pointed out, just probabilistically throttling how often to
steal work from the rx interrupt handler would be another low risk
approach to reduce the incidence rate.

Anyway, definitely no rush. This went unreported for a long time.