mbox series

[v4,0/3] xen/events: bug fixes and some diagnostic aids

Message ID 20210306161833.4552-1-jgross@suse.com
Headers show
Series xen/events: bug fixes and some diagnostic aids | expand

Message

Jürgen Groß March 6, 2021, 4:18 p.m. UTC
Those are fixes for XSA-332.

The rest of the V3 patches have been applied already. There is one
additional fix in patch 2 which addresses network outages when a guest
is doing reboot loops.

Juergen Gross (3):
  xen/events: reset affinity of 2-level event when tearing it down
  xen/events: don't unmask an event channel when an eoi is pending
  xen/events: avoid handling the same event on two cpus at the same time

 drivers/xen/events/events_2l.c       |  22 +++--
 drivers/xen/events/events_base.c     | 130 ++++++++++++++++++++-------
 drivers/xen/events/events_fifo.c     |   7 --
 drivers/xen/events/events_internal.h |  14 +--
 4 files changed, 123 insertions(+), 50 deletions(-)

Comments

Boris Ostrovsky March 8, 2021, 8:33 p.m. UTC | #1
On 3/6/21 11:18 AM, Juergen Gross wrote:
> An event channel should be kept masked when an eoi is pending for it.

> When being migrated to another cpu it might be unmasked, though.

>

> In order to avoid this keep three different flags for each event channel

> to be able to distinguish "normal" masking/unmasking from eoi related

> masking/unmasking and temporary masking. The event channel should only

> be able to generate an interrupt if all flags are cleared.

>

> Cc: stable@vger.kernel.org

> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

> Reported-by: Julien Grall <julien@xen.org>

> Signed-off-by: Juergen Gross <jgross@suse.com>

> Reviewed-by: Julien Grall <jgrall@amazon.com>

> ---

> V2:

> - introduce a lock around masking/unmasking

> - merge patch 3 into this one (Jan Beulich)

> V4:

> - don't set eoi masking flag in lateeoi_mask_ack_dynirq()



Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



Ross, are you planning to test this?


-boris
Jürgen Groß March 9, 2021, 5:14 a.m. UTC | #2
On 08.03.21 21:33, Boris Ostrovsky wrote:
> 

> On 3/6/21 11:18 AM, Juergen Gross wrote:

>> An event channel should be kept masked when an eoi is pending for it.

>> When being migrated to another cpu it might be unmasked, though.

>>

>> In order to avoid this keep three different flags for each event channel

>> to be able to distinguish "normal" masking/unmasking from eoi related

>> masking/unmasking and temporary masking. The event channel should only

>> be able to generate an interrupt if all flags are cleared.

>>

>> Cc: stable@vger.kernel.org

>> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

>> Reported-by: Julien Grall <julien@xen.org>

>> Signed-off-by: Juergen Gross <jgross@suse.com>

>> Reviewed-by: Julien Grall <jgrall@amazon.com>

>> ---

>> V2:

>> - introduce a lock around masking/unmasking

>> - merge patch 3 into this one (Jan Beulich)

>> V4:

>> - don't set eoi masking flag in lateeoi_mask_ack_dynirq()

> 

> 

> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

> 

> 

> Ross, are you planning to test this?


Just as another data point: With the previous version of the patches
a reboot loop of a guest needed max 33 reboots to loose network in
my tests (those were IIRC 6 test runs). With this patch version I
stopped the test after about 1300 reboots without having seen any
problems.

Juergen
Ross Lagerwall March 9, 2021, 8:57 a.m. UTC | #3
On 2021-03-09 05:14, Jürgen Groß wrote:
> On 08.03.21 21:33, Boris Ostrovsky wrote:
>>
>> On 3/6/21 11:18 AM, Juergen Gross wrote:
>>> An event channel should be kept masked when an eoi is pending for it.
>>> When being migrated to another cpu it might be unmasked, though.
>>>
>>> In order to avoid this keep three different flags for each event channel
>>> to be able to distinguish "normal" masking/unmasking from eoi related
>>> masking/unmasking and temporary masking. The event channel should only
>>> be able to generate an interrupt if all flags are cleared.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")
>>> Reported-by: Julien Grall <julien@xen.org>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>> Reviewed-by: Julien Grall <jgrall@amazon.com>
>>> ---
>>> V2:
>>> - introduce a lock around masking/unmasking
>>> - merge patch 3 into this one (Jan Beulich)
>>> V4:
>>> - don't set eoi masking flag in lateeoi_mask_ack_dynirq()
>>
>>
>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>
>>
>> Ross, are you planning to test this?
> 
> Just as another data point: With the previous version of the patches
> a reboot loop of a guest needed max 33 reboots to loose network in
> my tests (those were IIRC 6 test runs). With this patch version I
> stopped the test after about 1300 reboots without having seen any
> problems.
> 

Thanks, I'll test it today and get back to you.

Ross
Ross Lagerwall March 10, 2021, 9:08 a.m. UTC | #4
On 2021-03-09 08:57, Ross Lagerwall wrote:
> On 2021-03-09 05:14, Jürgen Groß wrote:

>> On 08.03.21 21:33, Boris Ostrovsky wrote:

>>>

>>> On 3/6/21 11:18 AM, Juergen Gross wrote:

>>>> An event channel should be kept masked when an eoi is pending for it.

>>>> When being migrated to another cpu it might be unmasked, though.

>>>>

>>>> In order to avoid this keep three different flags for each event channel

>>>> to be able to distinguish "normal" masking/unmasking from eoi related

>>>> masking/unmasking and temporary masking. The event channel should only

>>>> be able to generate an interrupt if all flags are cleared.

>>>>

>>>> Cc: stable@vger.kernel.org

>>>> Fixes: 54c9de89895e0a36047 ("xen/events: add a new late EOI evtchn framework")

>>>> Reported-by: Julien Grall <julien@xen.org>

>>>> Signed-off-by: Juergen Gross <jgross@suse.com>

>>>> Reviewed-by: Julien Grall <jgrall@amazon.com>

>>>> ---

>>>> V2:

>>>> - introduce a lock around masking/unmasking

>>>> - merge patch 3 into this one (Jan Beulich)

>>>> V4:

>>>> - don't set eoi masking flag in lateeoi_mask_ack_dynirq()

>>>

>>>

>>> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

>>>

>>>

>>> Ross, are you planning to test this?

>>

>> Just as another data point: With the previous version of the patches

>> a reboot loop of a guest needed max 33 reboots to loose network in

>> my tests (those were IIRC 6 test runs). With this patch version I

>> stopped the test after about 1300 reboots without having seen any

>> problems.

>>

> 

> Thanks, I'll test it today and get back to you.

> 


Tested-by: Ross Lagerwall <ross.lagerwall@citrix.com>


The updated patch seems fine in testing.

Thanks
Ross
Boris Ostrovsky March 11, 2021, 2:36 p.m. UTC | #5
On 3/6/21 11:18 AM, Juergen Gross wrote:
> Those are fixes for XSA-332.

>

> The rest of the V3 patches have been applied already. There is one

> additional fix in patch 2 which addresses network outages when a guest

> is doing reboot loops.

>

> Juergen Gross (3):

>   xen/events: reset affinity of 2-level event when tearing it down

>   xen/events: don't unmask an event channel when an eoi is pending

>   xen/events: avoid handling the same event on two cpus at the same time

>

>  drivers/xen/events/events_2l.c       |  22 +++--

>  drivers/xen/events/events_base.c     | 130 ++++++++++++++++++++-------

>  drivers/xen/events/events_fifo.c     |   7 --

>  drivers/xen/events/events_internal.h |  14 +--

>  4 files changed, 123 insertions(+), 50 deletions(-)



Applied to for-linus-5.12b