mbox series

[v3,0/8] xen/events: bug fixes and some diagnostic aids

Message ID 20210219154030.10892-1-jgross@suse.com
Headers show
Series xen/events: bug fixes and some diagnostic aids | expand

Message

Jürgen Groß Feb. 19, 2021, 3:40 p.m. UTC
The first four patches are fixes for XSA-332. The avoid WARN splats
and a performance issue with interdomain events.

Patches 5 and 6 are some additions to event handling in order to add
some per pv-device statistics to sysfs and the ability to have a per
backend device spurious event delay control.

Patches 7 and 8 are minor fixes I had lying around.

Juergen Gross (8):
  xen/events: reset affinity of 2-level event when tearing it down
  xen/events: don't unmask an event channel when an eoi is pending
  xen/events: avoid handling the same event on two cpus at the same time
  xen/netback: fix spurious event detection for common event case
  xen/events: link interdomain events to associated xenbus device
  xen/events: add per-xenbus device event statistics and settings
  xen/evtchn: use smp barriers for user event ring
  xen/evtchn: use READ/WRITE_ONCE() for accessing ring indices

 .../ABI/testing/sysfs-devices-xenbus          |  41 ++++
 drivers/block/xen-blkback/xenbus.c            |   2 +-
 drivers/net/xen-netback/interface.c           |  24 ++-
 drivers/xen/events/events_2l.c                |  22 +-
 drivers/xen/events/events_base.c              | 199 +++++++++++++-----
 drivers/xen/events/events_fifo.c              |   7 -
 drivers/xen/events/events_internal.h          |  14 +-
 drivers/xen/evtchn.c                          |  29 ++-
 drivers/xen/pvcalls-back.c                    |   4 +-
 drivers/xen/xen-pciback/xenbus.c              |   2 +-
 drivers/xen/xen-scsiback.c                    |   2 +-
 drivers/xen/xenbus/xenbus_probe.c             |  66 ++++++
 include/xen/events.h                          |   7 +-
 include/xen/xenbus.h                          |   7 +
 14 files changed, 327 insertions(+), 99 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-xenbus

Comments

Julien Grall Feb. 20, 2021, 12:10 p.m. UTC | #1
Hi Juergen,

On 19/02/2021 15:40, Juergen Gross wrote:
> An event channel should be kept masked when an eoi is pending for it.

> When being migrated to another cpu it might be unmasked, though.

> 

> In order to avoid this keep three different flags for each event channel

> to be able to distinguish "normal" masking/unmasking from eoi related

> masking/unmasking and temporary masking. The event channel should only

> be able to generate an interrupt if all flags are cleared.

> 

> Cc: stable@vger.kernel.org

> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

> Reported-by: Julien Grall <julien@xen.org>

> Signed-off-by: Juergen Gross <jgross@suse.com>


Reviewed-by: Julien Grall <jgrall@amazon.com>


Cheers,

-- 
Julien Grall
Ross Lagerwall Feb. 23, 2021, 9:26 a.m. UTC | #2
On 2021-02-19 15:40, Juergen Gross wrote:
> An event channel should be kept masked when an eoi is pending for it.
> When being migrated to another cpu it might be unmasked, though.
> 
> In order to avoid this keep three different flags for each event channel
> to be able to distinguish "normal" masking/unmasking from eoi related
> masking/unmasking and temporary masking. The event channel should only
> be able to generate an interrupt if all flags are cleared.
> 
> Cc: stable@vger.kernel.org
> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
> Reported-by: Julien Grall <julien@xen.org>
> Signed-off-by: Juergen Gross <jgross@suse.com>

I tested this patch series backported to a 4.19 kernel and found that
when doing a reboot loop of Windows with PV drivers, occasionally it will
end up in a state with some event channels pending and masked in dom0
which breaks networking in the guest.

The issue seems to have been introduced with this patch, though at first
glance it appears correct. I haven't yet looked into why it is happening.
Have you seen anything like this with this patch?

Thanks,
Ross
Jürgen Groß March 5, 2021, 10:53 a.m. UTC | #3
On 23.02.21 10:26, Ross Lagerwall wrote:
> On 2021-02-19 15:40, Juergen Gross wrote:

>> An event channel should be kept masked when an eoi is pending for it.

>> When being migrated to another cpu it might be unmasked, though.

>>

>> In order to avoid this keep three different flags for each event channel

>> to be able to distinguish "normal" masking/unmasking from eoi related

>> masking/unmasking and temporary masking. The event channel should only

>> be able to generate an interrupt if all flags are cleared.

>>

>> Cc: stable@vger.kernel.org

>> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

>> Reported-by: Julien Grall <julien@xen.org>

>> Signed-off-by: Juergen Gross <jgross@suse.com>

> 

> I tested this patch series backported to a 4.19 kernel and found that

> when doing a reboot loop of Windows with PV drivers, occasionally it will

> end up in a state with some event channels pending and masked in dom0

> which breaks networking in the guest.

> 

> The issue seems to have been introduced with this patch, though at first

> glance it appears correct. I haven't yet looked into why it is happening.

> Have you seen anything like this with this patch?


Sorry it took so long, but now I was able to look into this issue.

I have managed to reproduce it with a pv Linux guest. I'm now adding
some debug code to understand what is happening there.


Juergen
Jürgen Groß March 6, 2021, 4:18 p.m. UTC | #4
On 23.02.21 10:26, Ross Lagerwall wrote:
> On 2021-02-19 15:40, Juergen Gross wrote:

>> An event channel should be kept masked when an eoi is pending for it.

>> When being migrated to another cpu it might be unmasked, though.

>>

>> In order to avoid this keep three different flags for each event channel

>> to be able to distinguish "normal" masking/unmasking from eoi related

>> masking/unmasking and temporary masking. The event channel should only

>> be able to generate an interrupt if all flags are cleared.

>>

>> Cc: stable@vger.kernel.org

>> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

>> Reported-by: Julien Grall <julien@xen.org>

>> Signed-off-by: Juergen Gross <jgross@suse.com>

> 

> I tested this patch series backported to a 4.19 kernel and found that

> when doing a reboot loop of Windows with PV drivers, occasionally it will

> end up in a state with some event channels pending and masked in dom0

> which breaks networking in the guest.

> 

> The issue seems to have been introduced with this patch, though at first

> glance it appears correct. I haven't yet looked into why it is happening.

> Have you seen anything like this with this patch?


I have found the issue. lateeoi_mask_ack_dynirq() must not set the "eoi"
mask reason flag, as this callback will be called when the handler will
not be called later, so there will never be a call of xen_irq_lateeoi()
to unmask the event channel again.

Juergen