diff mbox series

[Xen-devel,1/4] xen/arm: gic: Ensure we have an ISB between ack and do_IRQ()

Message ID 20181023181709.11883-2-julien.grall@arm.com
State Accepted
Commit 177afec4556c676e5a1a958d1626226fbca2a696
Headers show
Series xen/arm: GIC fixes and improvement | expand

Commit Message

Julien Grall Oct. 23, 2018, 6:17 p.m. UTC
Devices that expose their interrupt status registers via system
registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
vgic (although unused by Linux), ...) rely on a context synchronising
operation on the CPU to ensure that the updated status register is
visible to the CPU when handling the interrupt. This usually happens as
a result of taking the IRQ exception in the first place, but there are
two race scenarios where this isn't the case.

For example, let's say we have two peripherals (X and Y), where Y uses a
system register for its interrupt status.

Case 1:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. Y then raises its interrupt line, but the update to its system
   register is not yet visible to the CPU
3. The GIC decides to expose Y's interrupt number first in the Ack
   register
4. The CPU runs the IRQ handler for Y, but the status register is stale

Case 2:
1. CPU takes an IRQ exception as a result of X raising an interrupt
2. CPU reads the interrupt number for X from the Ack register and runs
   its IRQ handler
3. Y raises its interrupt line and the Ack register is updated, but
   again, the update to its system register is not yet visible to the
   CPU.
4. Since the GIC drivers poll the Ack register, we read Y's interrupt
   number and run its handler without a context synchronisation
   operation, therefore seeing the stale register value.

In either case, we run the risk of missing an IRQ. This patch solves the
problem by ensuring that we execute an ISB in the GIC drivers prior
to invoking the interrupt handler.

Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
"irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".

Signed-off-by: Julien Grall <julien.grall@arm.com>

---
    This patch is a candidate for backporting up to Xen 4.9.
---
 xen/arch/arm/gic.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andrii Anisov Oct. 24, 2018, 9:38 a.m. UTC | #1
Hello Julien,


This patch is very interesting to me.


On 23.10.18 21:17, Julien Grall wrote:
> Devices that expose their interrupt status registers via system

> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,

> vgic (although unused by Linux), ...)

I guess under vgic you mean GICH registers, is it correct?
I would speculate that GIC v2 registers are memory mapped, not exposed 
via system registers.

>   rely on a context synchronising

> operation on the CPU to ensure that the updated status register is

> visible to the CPU when handling the interrupt. This usually happens as

> a result of taking the IRQ exception in the first place, but there are

> two race scenarios where this isn't the case.

>

> For example, let's say we have two peripherals (X and Y), where Y uses a

> system register for its interrupt status.

>

> Case 1:

> 1. CPU takes an IRQ exception as a result of X raising an interrupt

> 2. Y then raises its interrupt line, but the update to its system

>     register is not yet visible to the CPU

> 3. The GIC decides to expose Y's interrupt number first in the Ack

>     register

> 4. The CPU runs the IRQ handler for Y, but the status register is stale

But this scenario somehow explains a strange thing I saw during IRQ 
latency investigation (optimization attempts) during last weeks.
The strange sequence is as following:
     1. CPU takes an IRQ exception from the guest context
     2. In `enter_hypervisor_head()` function (i.e. in `gic_clear_lrs()` 
as for mainline) from some LR registers interrupt statuses are read as 
PENDING
     3. Performing further code, without returning to the guest (i.e. 
inside vgic_vcpu_inject_irq()), it happens that we read status INVALID 
from the LR we read PENDING before, in step 2.

Please note that we are tailoring xen based on RELEASE-4.10.0.

> Case 2:

> 1. CPU takes an IRQ exception as a result of X raising an interrupt

> 2. CPU reads the interrupt number for X from the Ack register and runs

>     its IRQ handler

> 3. Y raises its interrupt line and the Ack register is updated, but

>     again, the update to its system register is not yet visible to the

>     CPU.

> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt

>     number and run its handler without a context synchronisation

>     operation, therefore seeing the stale register value.

>

> In either case, we run the risk of missing an IRQ. This patch solves the

> problem by ensuring that we execute an ISB in the GIC drivers prior

> to invoking the interrupt handler.

>

> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206

> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".

>

> Signed-off-by: Julien Grall <julien.grall@arm.com>

>

> ---

>      This patch is a candidate for backporting up to Xen 4.9.

> ---

>   xen/arch/arm/gic.c | 2 ++

>   1 file changed, 2 insertions(+)

>

> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c

> index 8d7e491060..305fbd66dd 100644

> --- a/xen/arch/arm/gic.c

> +++ b/xen/arch/arm/gic.c

> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)

>           if ( likely(irq >= 16 && irq < 1020) )

>           {

>               local_irq_enable();

> +            isb();

Taking in account that the first GICH accesses are from 
`gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest 
moving isb() there.

>               do_IRQ(regs, irq, is_fiq);

>               local_irq_disable();

>           }

>           else if ( is_lpi(irq) )

>           {

>               local_irq_enable();

> +            isb();

>               gic_hw_ops->do_LPI(irq);

>               local_irq_disable();

>           }


-- 

*Andrii Anisov*
Julien Grall Oct. 24, 2018, 2:41 p.m. UTC | #2
On 10/24/18 10:38 AM, Andrii Anisov wrote:
> Hello Julien,

Hi Andrii,

Thank you for the review.

> On 23.10.18 21:17, Julien Grall wrote:
>> Devices that expose their interrupt status registers via system
>> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
>> vgic (although unused by Linux), ...)
> I guess under vgic you mean GICH registers, is it correct?
> I would speculate that GIC v2 registers are memory mapped, not exposed
> via system registers.

vGIC is not only about GICv2. It also covers GICv3 that is accessible 
via system registers.

> 
>>    rely on a context synchronising
>> operation on the CPU to ensure that the updated status register is
>> visible to the CPU when handling the interrupt. This usually happens as
>> a result of taking the IRQ exception in the first place, but there are
>> two race scenarios where this isn't the case.
>>
>> For example, let's say we have two peripherals (X and Y), where Y uses a
>> system register for its interrupt status.
>>
>> Case 1:
>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>> 2. Y then raises its interrupt line, but the update to its system
>>      register is not yet visible to the CPU
>> 3. The GIC decides to expose Y's interrupt number first in the Ack
>>      register
>> 4. The CPU runs the IRQ handler for Y, but the status register is stale
> But this scenario somehow explains a strange thing I saw during IRQ
> latency investigation (optimization attempts) during last weeks.
> The strange sequence is as following:
>       1. CPU takes an IRQ exception from the guest context
>       2. In `enter_hypervisor_head()` function (i.e. in `gic_clear_lrs()`
> as for mainline) from some LR registers interrupt statuses are read as
> PENDING
>       3. Performing further code, without returning to the guest (i.e.
> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID
> from the LR we read PENDING before, in step 2. >
> Please note that we are tailoring xen based on RELEASE-4.10.0.

As you tailor Xen, I need more details on your modification to be able 
to provide feedback on the oddness you encounter.

But you are using GICv2, right? If so, you are not using system 
registers for the vGIC. This is not covered by this patch.

> 
>> Case 2:
>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>> 2. CPU reads the interrupt number for X from the Ack register and runs
>>      its IRQ handler
>> 3. Y raises its interrupt line and the Ack register is updated, but
>>      again, the update to its system register is not yet visible to the
>>      CPU.
>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>>      number and run its handler without a context synchronisation
>>      operation, therefore seeing the stale register value.
>>
>> In either case, we run the risk of missing an IRQ. This patch solves the
>> problem by ensuring that we execute an ISB in the GIC drivers prior
>> to invoking the interrupt handler.
>>
>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>
>> ---
>>       This patch is a candidate for backporting up to Xen 4.9.
>> ---
>>    xen/arch/arm/gic.c | 2 ++
>>    1 file changed, 2 insertions(+)
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 8d7e491060..305fbd66dd 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>>            if ( likely(irq >= 16 && irq < 1020) )
>>            {
>>                local_irq_enable();
>> +            isb();
> Taking in account that the first GICH accesses are from
> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest
> moving isb() there.
> 
>>                do_IRQ(regs, irq, is_fiq);
>>                local_irq_disable();
>>            }
>>            else if ( is_lpi(irq) )
>>            {
>>                local_irq_enable();
>> +            isb();
>>                gic_hw_ops->do_LPI(irq);
>>                local_irq_disable();
>>            }
> 

Cheers,
Andrii Anisov Oct. 25, 2018, 2:11 p.m. UTC | #3
Hello  Julien,


On 24.10.18 17:41, Julien Grall wrote:
> vGIC is not only about GICv2. It also covers GICv3 that is accessible

> via system registers.

Yes, I know. But, as you state below, for GICv2 based systems those
barriers are not needed.

>>>    rely on a context synchronising

>>> operation on the CPU to ensure that the updated status register is

>>> visible to the CPU when handling the interrupt. This usually happens as

>>> a result of taking the IRQ exception in the first place, but there are

>>> two race scenarios where this isn't the case.

>>>

>>> For example, let's say we have two peripherals (X and Y), where Y

>>> uses a

>>> system register for its interrupt status.

>>>

>>> Case 1:

>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt

>>> 2. Y then raises its interrupt line, but the update to its system

>>>      register is not yet visible to the CPU

>>> 3. The GIC decides to expose Y's interrupt number first in the Ack

>>>      register

>>> 4. The CPU runs the IRQ handler for Y, but the status register is stale

>> But this scenario somehow explains a strange thing I saw during IRQ

>> latency investigation (optimization attempts) during last weeks.

>> The strange sequence is as following:

>>       1. CPU takes an IRQ exception from the guest context

>>       2. In `enter_hypervisor_head()` function (i.e. in

>> `gic_clear_lrs()`

>> as for mainline) from some LR registers interrupt statuses are read as

>> PENDING

>>       3. Performing further code, without returning to the guest (i.e.

>> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID

>> from the LR we read PENDING before, in step 2. >

>> Please note that we are tailoring xen based on RELEASE-4.10.0.

>

> As you tailor Xen, I need more details on your modification to be able

> to provide feedback on the oddness you encounter.

I guess I should make a dedicated patch applicable to mainline to reveal
the issue. I hope I'll do this nearest days.

> But you are using GICv2, right? If so, you are not using system

> registers for the vGIC. This is not covered by this patch.

Yep, our SoC has GICv2.

>>> Case 2:

>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt

>>> 2. CPU reads the interrupt number for X from the Ack register and runs

>>>      its IRQ handler

>>> 3. Y raises its interrupt line and the Ack register is updated, but

>>>      again, the update to its system register is not yet visible to the

>>>      CPU.

>>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt

>>>      number and run its handler without a context synchronisation

>>>      operation, therefore seeing the stale register value.

>>>

>>> In either case, we run the risk of missing an IRQ. This patch solves

>>> the

>>> problem by ensuring that we execute an ISB in the GIC drivers prior

>>> to invoking the interrupt handler.

>>>

>>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206

>>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".

>>>

>>> Signed-off-by: Julien Grall <julien.grall@arm.com>

>>>

>>> ---

>>>       This patch is a candidate for backporting up to Xen 4.9.

>>> ---

>>>    xen/arch/arm/gic.c | 2 ++

>>>    1 file changed, 2 insertions(+)

>>>

>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c

>>> index 8d7e491060..305fbd66dd 100644

>>> --- a/xen/arch/arm/gic.c

>>> +++ b/xen/arch/arm/gic.c

>>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs,

>>> int is_fiq)

>>>            if ( likely(irq >= 16 && irq < 1020) )

>>>            {

>>>                local_irq_enable();

>>> +            isb();

>> Taking in account that the first GICH accesses are from

>> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest

>> moving isb() there.

Any comments here?

>>

>>>                do_IRQ(regs, irq, is_fiq);

>>>                local_irq_disable();

>>>            }

>>>            else if ( is_lpi(irq) )

>>>            {

>>>                local_irq_enable();

>>> +            isb();

>>>                gic_hw_ops->do_LPI(irq);

>>>                local_irq_disable();

>>>            }

>>

>

> Cheers,

>


--

*Andrii Anisov*
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<font size="2"><span style="font-size:11pt">
<div class="PlainText">Hello&nbsp; Julien,<br>
<br>
<br>
On 24.10.18 17:41, Julien Grall wrote:<br>
&gt; vGIC is not only about GICv2. It also covers GICv3 that is accessible <br>
&gt; via system registers.<br>
Yes, I know. But, as you state below, for GICv2 based systems those <br>
barriers are not needed.<br>
<br>
&gt;&gt;&gt; &nbsp;&nbsp; rely on a context synchronising<br>
&gt;&gt;&gt; operation on the CPU to ensure that the updated status register is<br>
&gt;&gt;&gt; visible to the CPU when handling the interrupt. This usually happens as<br>
&gt;&gt;&gt; a result of taking the IRQ exception in the first place, but there are<br>
&gt;&gt;&gt; two race scenarios where this isn't the case.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; For example, let's say we have two peripherals (X and Y), where Y <br>
&gt;&gt;&gt; uses a<br>
&gt;&gt;&gt; system register for its interrupt status.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Case 1:<br>
&gt;&gt;&gt; 1. CPU takes an IRQ exception as a result of X raising an interrupt<br>
&gt;&gt;&gt; 2. Y then raises its interrupt line, but the update to its system<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; register is not yet visible to the CPU<br>
&gt;&gt;&gt; 3. The GIC decides to expose Y's interrupt number first in the Ack<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; register<br>
&gt;&gt;&gt; 4. The CPU runs the IRQ handler for Y, but the status register is stale<br>
&gt;&gt; But this scenario somehow explains a strange thing I saw during IRQ<br>
&gt;&gt; latency investigation (optimization attempts) during last weeks.<br>
&gt;&gt; The strange sequence is as following:<br>
&gt;&gt; &nbsp; &nbsp;&nbsp;&nbsp; 1. CPU takes an IRQ exception from the guest context<br>
&gt;&gt; &nbsp; &nbsp;&nbsp;&nbsp; 2. In `enter_hypervisor_head()` function (i.e. in <br>
&gt;&gt; `gic_clear_lrs()`<br>
&gt;&gt; as for mainline) from some LR registers interrupt statuses are read as<br>
&gt;&gt; PENDING<br>
&gt;&gt; &nbsp; &nbsp;&nbsp;&nbsp; 3. Performing further code, without returning to the guest (i.e.<br>
&gt;&gt; inside vgic_vcpu_inject_irq()), it happens that we read status INVALID<br>
&gt;&gt; from the LR we read PENDING before, in step 2. &gt;<br>
&gt;&gt; Please note that we are tailoring xen based on RELEASE-4.10.0.<br>
&gt;<br>
&gt; As you tailor Xen, I need more details on your modification to be able <br>
&gt; to provide feedback on the oddness you encounter.<br>
I guess I should make a dedicated patch applicable to mainline to reveal <br>
the issue. I hope I'll do this nearest days.<br>
<br>
&gt; But you are using GICv2, right? If so, you are not using system <br>
&gt; registers for the vGIC. This is not covered by this patch.<br>
Yep, our SoC has GICv2.<br>
<br>
&gt;&gt;&gt; Case 2:<br>
&gt;&gt;&gt; 1. CPU takes an IRQ exception as a result of X raising an interrupt<br>
&gt;&gt;&gt; 2. CPU reads the interrupt number for X from the Ack register and runs<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; its IRQ handler<br>
&gt;&gt;&gt; 3. Y raises its interrupt line and the Ack register is updated, but<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; again, the update to its system register is not yet visible to the<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; CPU.<br>
&gt;&gt;&gt; 4. Since the GIC drivers poll the Ack register, we read Y's interrupt<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; number and run its handler without a context synchronisation<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp; operation, therefore seeing the stale register value.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; In either case, we run the risk of missing an IRQ. This patch solves <br>
&gt;&gt;&gt; the<br>
&gt;&gt;&gt; problem by ensuring that we execute an ISB in the GIC drivers prior<br>
&gt;&gt;&gt; to invoking the interrupt handler.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206<br>
&gt;&gt;&gt; &quot;irqchip/gic: Ensure we have an ISB between ack and -&gt;handle_irq&quot;.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Signed-off-by: Julien Grall &lt;julien.grall@arm.com&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; ---<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; This patch is a candidate for backporting up to Xen 4.9.<br>
&gt;&gt;&gt; ---<br>
&gt;&gt;&gt; &nbsp;&nbsp; xen/arch/arm/gic.c | 2 &#43;&#43;<br>
&gt;&gt;&gt; &nbsp;&nbsp; 1 file changed, 2 insertions(&#43;)<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c<br>
&gt;&gt;&gt; index 8d7e491060..305fbd66dd 100644<br>
&gt;&gt;&gt; --- a/xen/arch/arm/gic.c<br>
&gt;&gt;&gt; &#43;&#43;&#43; b/xen/arch/arm/gic.c<br>
&gt;&gt;&gt; @@ -388,12 &#43;388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, <br>
&gt;&gt;&gt; int is_fiq)<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if ( likely(irq &gt;= 16 &amp;&amp; irq &lt; 1020) )<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local_irq_enable();<br>
&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; isb();<br>
&gt;&gt; Taking in account that the first GICH accesses are from<br>
&gt;&gt; `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest<br>
&gt;&gt; moving isb() there.<br>
Any comments here?<br>
<br>
&gt;&gt;<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do_IRQ(regs, irq, is_fiq);<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local_irq_disable();<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; else if ( is_lpi(irq) )<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local_irq_enable();<br>
&gt;&gt;&gt; &#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; isb();<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gic_hw_ops-&gt;do_LPI(irq);<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; local_irq_disable();<br>
&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<br>
&gt;&gt;<br>
&gt;<br>
&gt; Cheers,<br>
&gt;<br>
<br>
-- <br>
<br>
*Andrii Anisov*<br>
<br>
<br>
</div>
</span></font></div>
</body>
</html>
Julien Grall Oct. 25, 2018, 2:22 p.m. UTC | #4
On 10/25/18 3:11 PM, Andrii Anisov wrote:
> Hello  Julien,

Hi,

> On 24.10.18 17:41, Julien Grall wrote:
>> vGIC is not only about GICv2. It also covers GICv3 that is accessible 
>> via system registers.
> Yes, I know. But, as you state below, for GICv2 based systems those
> barriers are not needed.

You misread what I wrote. They are not needed to cover the vGIC for 
GICv2 platform. However they are still necessary, for instance, for the 
timer as it is accessible via system registers.

> 
>>>>    rely on a context synchronising
>>>> operation on the CPU to ensure that the updated status register is
>>>> visible to the CPU when handling the interrupt. This usually happens as
>>>> a result of taking the IRQ exception in the first place, but there are
>>>> two race scenarios where this isn't the case.
>>>>
>>>> For example, let's say we have two peripherals (X and Y), where Y 
>>>> uses a
>>>> system register for its interrupt status.
>>>>
>>>> Case 1:
>>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>>>> 2. Y then raises its interrupt line, but the update to its system
>>>>      register is not yet visible to the CPU
>>>> 3. The GIC decides to expose Y's interrupt number first in the Ack
>>>>      register
>>>> 4. The CPU runs the IRQ handler for Y, but the status register is stale
>>> But this scenario somehow explains a strange thing I saw during IRQ
>>> latency investigation (optimization attempts) during last weeks.
>>> The strange sequence is as following:
>>>       1. CPU takes an IRQ exception from the guest context
>>>       2. In `enter_hypervisor_head()` function (i.e. in 
>>> `gic_clear_lrs()`
>>> as for mainline) from some LR registers interrupt statuses are read as
>>> PENDING
>>>       3. Performing further code, without returning to the guest (i.e.
>>> inside vgic_vcpu_inject_irq()), it happens that we read status INVALID
>>> from the LR we read PENDING before, in step 2. >
>>> Please note that we are tailoring xen based on RELEASE-4.10.0.
>>
>> As you tailor Xen, I need more details on your modification to be able 
>> to provide feedback on the oddness you encounter.
> I guess I should make a dedicated patch applicable to mainline to reveal
> the issue. I hope I'll do this nearest days.
> 
>> But you are using GICv2, right? If so, you are not using system 
>> registers for the vGIC. This is not covered by this patch.
> Yep, our SoC has GICv2.
> 
>>>> Case 2:
>>>> 1. CPU takes an IRQ exception as a result of X raising an interrupt
>>>> 2. CPU reads the interrupt number for X from the Ack register and runs
>>>>      its IRQ handler
>>>> 3. Y raises its interrupt line and the Ack register is updated, but
>>>>      again, the update to its system register is not yet visible to the
>>>>      CPU.
>>>> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>>>>      number and run its handler without a context synchronisation
>>>>      operation, therefore seeing the stale register value.
>>>>
>>>> In either case, we run the risk of missing an IRQ. This patch solves 
>>>> the
>>>> problem by ensuring that we execute an ISB in the GIC drivers prior
>>>> to invoking the interrupt handler.
>>>>
>>>> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
>>>> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>
>>>> ---
>>>>       This patch is a candidate for backporting up to Xen 4.9.
>>>> ---
>>>>    xen/arch/arm/gic.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>>> index 8d7e491060..305fbd66dd 100644
>>>> --- a/xen/arch/arm/gic.c
>>>> +++ b/xen/arch/arm/gic.c
>>>> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, 
>>>> int is_fiq)
>>>>            if ( likely(irq >= 16 && irq < 1020) )
>>>>            {
>>>>                local_irq_enable();
>>>> +            isb();
>>> Taking in account that the first GICH accesses are from
>>> `gic_clear_lrs()`, called from `enter_hypervisor_head`, I would suggest
>>> moving isb() there.
> Any comments here?

I haven't answered to this bit because I thought this was related to 
your issue.

If this is not related to your issue, then the placement would be wrong. 
Here the isb() sits between ack() and do_IRQ().

This ensure the system registers are synchronized after you acknowledge 
an interrupt.

Cheers,
Andrii Anisov Oct. 25, 2018, 5:45 p.m. UTC | #5
On 25.10.18 17:22, Julien Grall wrote:
> You misread what I wrote. They are not needed to cover the vGIC for 
> GICv2 platform. However they are still necessary, for instance, for 
> the timer as it is accessible via system registers.

Ah, OK.


> Here the isb() sits between ack() and do_IRQ().
>
> This ensure the system registers are synchronized after you 
> acknowledge an interrupt.

I see.
Andrii Anisov Oct. 25, 2018, 5:46 p.m. UTC | #6
On 23.10.18 21:17, Julien Grall wrote:
> Devices that expose their interrupt status registers via system
> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
> vgic (although unused by Linux), ...) rely on a context synchronising
> operation on the CPU to ensure that the updated status register is
> visible to the CPU when handling the interrupt. This usually happens as
> a result of taking the IRQ exception in the first place, but there are
> two race scenarios where this isn't the case.
>
> For example, let's say we have two peripherals (X and Y), where Y uses a
> system register for its interrupt status.
>
> Case 1:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. Y then raises its interrupt line, but the update to its system
>     register is not yet visible to the CPU
> 3. The GIC decides to expose Y's interrupt number first in the Ack
>     register
> 4. The CPU runs the IRQ handler for Y, but the status register is stale
>
> Case 2:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. CPU reads the interrupt number for X from the Ack register and runs
>     its IRQ handler
> 3. Y raises its interrupt line and the Ack register is updated, but
>     again, the update to its system register is not yet visible to the
>     CPU.
> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>     number and run its handler without a context synchronisation
>     operation, therefore seeing the stale register value.
>
> In either case, we run the risk of missing an IRQ. This patch solves the
> problem by ensuring that we execute an ISB in the GIC drivers prior
> to invoking the interrupt handler.
>
> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
>
> Signed-off-by: Julien Grall <julien.grall@arm.com>
>
> ---


Reviewed-by: Andrii Anisov<andrii_anisov@epam.com>
Andrii Anisov Oct. 26, 2018, 12:49 p.m. UTC | #7
Hello Julien


On 25.10.18 17:11, Andrii Anisov wrote:
> I guess I should make a dedicated patch applicable to mainline to reveal
> the issue. I hope I'll do this nearest days.

Please find below the diff applicable to the current xenbits/smoke which 
exposes the issue.

With that diff I see (on my board) outputs like:


     (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97
     (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
     (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
     (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
     (XEN) Stored LR is stale: stored 0x1A00001B read 0xA00001B
     (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
     (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97


Those prints I treat as LR content change (state PENDING->INVALID) 
without exiting from hyp to guest. Unfortunately, I did not find a kind 
of explanation to this effect in ARM IHI 0048B.b document.
I have GICv2 on the die.

I appreciate you find some time to look at this and express your opinion.


---

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 511c8d7..0b9aa2d 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t 
*cpumask)
      return mask;
  }

+static void gicv2_store_lrs(void)
+{
+    int i;
+
+    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
+        current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
+}
+
  static void gicv2_save_state(struct vcpu *v)
  {
      int i;
@@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct 
pending_irq *p,
                                     << GICH_V2_LR_PHYSICAL_SHIFT);

      writel_gich(lr_reg, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = lr_reg;
  }

  static void gicv2_clear_lr(int lr)
  {
      writel_gich(0, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = 0;
  }

  static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
@@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
      uint32_t lrv;

      lrv          = readl_gich(GICH_LR + lr * 4);
+    if ( lrv != current->arch.gic.v2.lr[lr])
+        printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n",
+               current->arch.gic.v2.lr[lr], lrv);
      lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) & 
GICH_V2_LR_PHYSICAL_MASK;
      lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & 
GICH_V2_LR_VIRTUAL_MASK;
      lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & 
GICH_V2_LR_PRIORITY_MASK;
@@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct 
gic_lr *lr_reg)
            ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) << 
GICH_V2_LR_GRP_SHIFT) );

      writel_gich(lrv, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = lrv;
  }

  static void gicv2_hcr_status(uint32_t flag, bool status)
@@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = {
      .info                = &gicv2_info,
      .init                = gicv2_init,
      .secondary_init      = gicv2_secondary_cpu_init,
+    .store_lrs           = gicv2_store_lrs,
      .save_state          = gicv2_save_state,
      .restore_state       = gicv2_restore_state,
      .dump_state          = gicv2_dump_state,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ed363f6..a05c518 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i)
      }
  }

+void gic_store_lrs(void)
+{
+    if(gic_hw_ops->store_lrs)
+        gic_hw_ops->store_lrs();
+}
+
  void gic_clear_lrs(struct vcpu *v)
  {
      int i = 0;
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index f6f6de3..985192b 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct 
cpu_user_regs *regs)
          if ( current->arch.hcr_el2 & HCR_VA )
              current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);

+        gic_store_lrs();
          gic_clear_lrs(current);
      }
  }
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index d3d7bda..6fe3fdb 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void);
  /* IRQ translation function for the device tree */
  int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
                    unsigned int *out_hwirq, unsigned int *out_type);
+void gic_store_lrs(void);
  void gic_clear_lrs(struct vcpu *v);

  struct gic_info {
@@ -314,6 +315,8 @@ struct gic_hw_operations {
      /* Initialize the GIC and the boot CPU */
      int (*init)(void);
      /* Save GIC registers */
+    void (*store_lrs)(void);
+    /* Save GIC registers */
      void (*save_state)(struct vcpu *);
      /* Restore GIC registers */
      void (*restore_state)(const struct vcpu *);
Julien Grall Oct. 26, 2018, 9:55 p.m. UTC | #8
On 10/26/18 1:49 PM, Andrii Anisov wrote:
> Hello Julien

Hi,

> On 25.10.18 17:11, Andrii Anisov wrote:
>> I guess I should make a dedicated patch applicable to mainline to reveal
>> the issue. I hope I'll do this nearest days.
> 
> Please find below the diff applicable to the current xenbits/smoke which
> exposes the issue.

The functions below does not exist in latest Xen. So are you sure this 
based on mainline?

> 
> With that diff I see (on my board) outputs like:
> 
> 
>       (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97
>       (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
>       (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
>       (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
>       (XEN) Stored LR is stale: stored 0x1A00001B read 0xA00001B
>       (XEN) Stored LR is stale: stored 0x9A025C97 read 0x8A025C97
>       (XEN) Stored LR is stale: stored 0xAA025C97 read 0x8A025C97

How many domain do you have run at that time? Under which condition this 
is hapenning?

Also, what is your Xen command line? I assume you didn't disable 
serrors, right? So we should have plenty of dsb(sy) and isb() in the 
path to there.

So I would rule out a missing barrier here.

> 
> 
> Those prints I treat as LR content change (state PENDING->INVALID)
> without exiting from hyp to guest. Unfortunately, I did not find a kind
> of explanation to this effect in ARM IHI 0048B.b document.
> I have GICv2 on the die.
> 
> I appreciate you find some time to look at this and express your opinion.
> 
> 
> ---
> 
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index 511c8d7..0b9aa2d 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t
> *cpumask)
>        return mask;
>    }
> 
> +static void gicv2_store_lrs(void)
> +{
> +    int i;
> +
> +    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
> +        current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
> +}
> +
>    static void gicv2_save_state(struct vcpu *v)
>    {
>        int i;
> @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct
> pending_irq *p,
>                                       << GICH_V2_LR_PHYSICAL_SHIFT);
> 
>        writel_gich(lr_reg, GICH_LR + lr * 4);
> +    current->arch.gic.v2.lr[lr] = lr_reg;
>    }
> 
>    static void gicv2_clear_lr(int lr)
>    {
>        writel_gich(0, GICH_LR + lr * 4);
> +    current->arch.gic.v2.lr[lr] = 0;
>    }
> 
>    static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
> @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
>        uint32_t lrv;
> 
>        lrv          = readl_gich(GICH_LR + lr * 4);
> +    if ( lrv != current->arch.gic.v2.lr[lr])
> +        printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n",
> +               current->arch.gic.v2.lr[lr], lrv);
>        lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) &
> GICH_V2_LR_PHYSICAL_MASK;
>        lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) &
> GICH_V2_LR_VIRTUAL_MASK;
>        lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) &
> GICH_V2_LR_PRIORITY_MASK;
> @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct
> gic_lr *lr_reg)
>              ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) <<
> GICH_V2_LR_GRP_SHIFT) );
> 
>        writel_gich(lrv, GICH_LR + lr * 4);
> +    current->arch.gic.v2.lr[lr] = lrv;
>    }
> 
>    static void gicv2_hcr_status(uint32_t flag, bool status)
> @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = {
>        .info                = &gicv2_info,
>        .init                = gicv2_init,
>        .secondary_init      = gicv2_secondary_cpu_init,
> +    .store_lrs           = gicv2_store_lrs,
>        .save_state          = gicv2_save_state,
>        .restore_state       = gicv2_restore_state,
>        .dump_state          = gicv2_dump_state,
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index ed363f6..a05c518 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>        }
>    }
> 
> +void gic_store_lrs(void)
> +{
> +    if(gic_hw_ops->store_lrs)
> +        gic_hw_ops->store_lrs();
> +}
> +
>    void gic_clear_lrs(struct vcpu *v)
>    {
>        int i = 0;
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index f6f6de3..985192b 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct
> cpu_user_regs *regs)
>            if ( current->arch.hcr_el2 & HCR_VA )
>                current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
> 
> +        gic_store_lrs();

gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are 
not due by this function, can you move that after gic_clear_lrs()?

>            gic_clear_lrs(current);
>        }
>    }
> diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
> index d3d7bda..6fe3fdb 100644
> --- a/xen/include/asm-arm/gic.h
> +++ b/xen/include/asm-arm/gic.h
> @@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void);
>    /* IRQ translation function for the device tree */
>    int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
>                      unsigned int *out_hwirq, unsigned int *out_type);
> +void gic_store_lrs(void);
>    void gic_clear_lrs(struct vcpu *v);
> 
>    struct gic_info {
> @@ -314,6 +315,8 @@ struct gic_hw_operations {
>        /* Initialize the GIC and the boot CPU */
>        int (*init)(void);
>        /* Save GIC registers */
> +    void (*store_lrs)(void);
> +    /* Save GIC registers */
>        void (*save_state)(struct vcpu *);
>        /* Restore GIC registers */
>        void (*restore_state)(const struct vcpu *);

Cheers,
Andrii Anisov Oct. 27, 2018, 12:14 p.m. UTC | #9
Hello Julien,

On 10/27/2018 12:55 AM, Julien Grall wrote:
The functions below does not exist in latest Xen. So are you sure this based on mainline?

Yep, I've put a wrong diff into the email, it is for XEN 4.10.
Please find the patch for XEN 4.12-unstable on my github [1].

How many domain do you have run at that time?
Only one domain. In my setup I'm running a BSP release under XEN as Dom0.

Under which condition this is hapenning?
Under no specific conditions. During the system boot, during console operations, during glmark2 [2] running.

Also, what is your Xen command line?
`dom0_mem=1G console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 loglvl=all cpufreq=none`

I assume you didn't disable serrors, right? So we should have plenty of dsb(sy) and isb() in the path to there.
I didn't disable serrors.

So I would rule out a missing barrier here.





Those prints I treat as LR content change (state PENDING->INVALID)
without exiting from hyp to guest. Unfortunately, I did not find a kind
of explanation to this effect in ARM IHI 0048B.b document.
I have GICv2 on the die.

I appreciate you find some time to look at this and express your opinion.


---

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 511c8d7..0b9aa2d 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t
*cpumask)
       return mask;
   }

+static void gicv2_store_lrs(void)
+{
+    int i;
+
+    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
+        current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
+}
+
   static void gicv2_save_state(struct vcpu *v)
   {
       int i;
@@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct
pending_irq *p,
                                      << GICH_V2_LR_PHYSICAL_SHIFT);

       writel_gich(lr_reg, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = lr_reg;
   }

   static void gicv2_clear_lr(int lr)
   {
       writel_gich(0, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = 0;
   }

   static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
@@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
       uint32_t lrv;

       lrv          = readl_gich(GICH_LR + lr * 4);
+    if ( lrv != current->arch.gic.v2.lr[lr])
+        printk("Stored LR is stale: stored 0x%"PRIX32" read 0x%"PRIX32"\n",
+               current->arch.gic.v2.lr[lr], lrv);
       lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) &
GICH_V2_LR_PHYSICAL_MASK;
       lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) &
GICH_V2_LR_VIRTUAL_MASK;
       lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) &
GICH_V2_LR_PRIORITY_MASK;
@@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct
gic_lr *lr_reg)
             ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) <<
GICH_V2_LR_GRP_SHIFT) );

       writel_gich(lrv, GICH_LR + lr * 4);
+    current->arch.gic.v2.lr[lr] = lrv;
   }

   static void gicv2_hcr_status(uint32_t flag, bool status)
@@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops = {
       .info                = &gicv2_info,
       .init                = gicv2_init,
       .secondary_init      = gicv2_secondary_cpu_init,
+    .store_lrs           = gicv2_store_lrs,
       .save_state          = gicv2_save_state,
       .restore_state       = gicv2_restore_state,
       .dump_state          = gicv2_dump_state,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ed363f6..a05c518 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i)
       }
   }

+void gic_store_lrs(void)
+{
+    if(gic_hw_ops->store_lrs)
+        gic_hw_ops->store_lrs();
+}
+
   void gic_clear_lrs(struct vcpu *v)
   {
       int i = 0;
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index f6f6de3..985192b 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct
cpu_user_regs *regs)
           if ( current->arch.hcr_el2 & HCR_VA )
               current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);

+        gic_store_lrs();

gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are not due by this function, can you move that after gic_clear_lrs()?
Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it does not change PENDING to INVALID under no circumstances from one hand. From another hand, all changes to LRs are made through gic specific operations gic_hw_ops->... which are tracked by me. You can see, in the code above, that I copy all updates to the physical LR issued by hypervisor into the stored LRs. So it not the case. But I'll check on Monday.


           gic_clear_lrs(current);
       }
   }
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index d3d7bda..6fe3fdb 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -293,6 +293,7 @@ extern unsigned int gic_number_lines(void);
   /* IRQ translation function for the device tree */
   int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
                     unsigned int *out_hwirq, unsigned int *out_type);
+void gic_store_lrs(void);
   void gic_clear_lrs(struct vcpu *v);

   struct gic_info {
@@ -314,6 +315,8 @@ struct gic_hw_operations {
       /* Initialize the GIC and the boot CPU */
       int (*init)(void);
       /* Save GIC registers */
+    void (*store_lrs)(void);
+    /* Save GIC registers */
       void (*save_state)(struct vcpu *);
       /* Restore GIC registers */
       void (*restore_state)(const struct vcpu *);

Cheers,


[1] https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c
[2] https://github.com/glmark2/glmark2
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hello Julien,<br>
</p>
<br>
<div class="moz-cite-prefix">On 10/27/2018 12:55 AM, Julien Grall wrote:<br>
</div>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">The functions below does not exist in latest Xen. So are you sure this based on mainline?
<br>
</blockquote>
<br>
Yep, I've put a wrong diff into the email, it is for XEN 4.10.<br>
Please find the patch for XEN 4.12-unstable on my github [1].<br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">How many domain do you have run at that time?</blockquote>
Only one domain. In my setup I'm running a BSP release under XEN as Dom0.<br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">Under which condition this is hapenning?
<br>
</blockquote>
Under no specific conditions. During the system boot, during console operations, during glmark2 [2] running.<br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">Also, what is your Xen command line?</blockquote>
<span style="color: rgb(0, 0, 0); font-family: Arial; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">`dom0_mem=1G
 console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 loglvl=all cpufreq=none`</span><br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">I assume you didn't disable serrors, right? So we should have plenty of dsb(sy) and isb() in the path to there.<br>
</blockquote>
I didn't disable serrors. <br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com">So I would rule out a missing barrier here.
<br>
</blockquote>
<br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com"><br>
<blockquote type="cite"><br>
<br>
Those prints I treat as LR content change (state PENDING-&gt;INVALID) <br>
without exiting from hyp to guest. Unfortunately, I did not find a kind <br>
of explanation to this effect in ARM IHI 0048B.b document. <br>
I have GICv2 on the die. <br>
<br>
I appreciate you find some time to look at this and express your opinion. <br>
<br>
<br>
--- <br>
<br>
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c <br>
index 511c8d7..0b9aa2d 100644 <br>
--- a/xen/arch/arm/gic-v2.c <br>
&#43;&#43;&#43; b/xen/arch/arm/gic-v2.c <br>
@@ -171,6 &#43;171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t <br>
*cpumask) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; return mask; <br>
&nbsp; &nbsp;} <br>
<br>
&#43;static void gicv2_store_lrs(void) <br>
&#43;{ <br>
&#43;&nbsp;&nbsp;&nbsp; int i; <br>
&#43; <br>
&#43;&nbsp;&nbsp;&nbsp; for ( i = 0; i &lt; gicv2_info.nr_lrs; i&#43;&#43; ) <br>
&#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; current-&gt;arch.gic.v2.lr[i] = readl_gich(GICH_LR &#43; i * 4); <br>
&#43;} <br>
&#43; <br>
&nbsp; &nbsp;static void gicv2_save_state(struct vcpu *v) <br>
&nbsp; &nbsp;{ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; int i; <br>
@@ -446,11 &#43;454,13 @@ static void gicv2_update_lr(int lr, const struct <br>
pending_irq *p, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;&lt; GICH_V2_LR_PHYSICAL_SHIFT); <br>
<br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; writel_gich(lr_reg, GICH_LR &#43; lr * 4); <br>
&#43;&nbsp;&nbsp;&nbsp; current-&gt;arch.gic.v2.lr[lr] = lr_reg; <br>
&nbsp; &nbsp;} <br>
<br>
&nbsp; &nbsp;static void gicv2_clear_lr(int lr) <br>
&nbsp; &nbsp;{ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; writel_gich(0, GICH_LR &#43; lr * 4); <br>
&#43;&nbsp;&nbsp;&nbsp; current-&gt;arch.gic.v2.lr[lr] = 0; <br>
&nbsp; &nbsp;} <br>
<br>
&nbsp; &nbsp;static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) <br>
@@ -458,6 &#43;468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; uint32_t lrv; <br>
<br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; lrv&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = readl_gich(GICH_LR &#43; lr * 4); <br>
&#43;&nbsp;&nbsp;&nbsp; if ( lrv != current-&gt;arch.gic.v2.lr[lr]) <br>
&#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printk(&quot;Stored LR is stale: stored 0x%&quot;PRIX32&quot; read 0x%&quot;PRIX32&quot;\n&quot;, <br>
&#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; current-&gt;arch.gic.v2.lr[lr], lrv); <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; lr_reg-&gt;pirq = (lrv &gt;&gt; GICH_V2_LR_PHYSICAL_SHIFT) &amp; <br>
GICH_V2_LR_PHYSICAL_MASK; <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; lr_reg-&gt;virq = (lrv &gt;&gt; GICH_V2_LR_VIRTUAL_SHIFT) &amp; <br>
GICH_V2_LR_VIRTUAL_MASK; <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; lr_reg-&gt;priority = (lrv &gt;&gt; GICH_V2_LR_PRIORITY_SHIFT) &amp; <br>
GICH_V2_LR_PRIORITY_MASK; <br>
@@ -481,6 &#43;494,7 @@ static void gicv2_write_lr(int lr, const struct <br>
gic_lr *lr_reg) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ((uint32_t)(lr_reg-&gt;grp &amp; GICH_V2_LR_GRP_MASK) &lt;&lt; <br>
GICH_V2_LR_GRP_SHIFT) ); <br>
<br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; writel_gich(lrv, GICH_LR &#43; lr * 4); <br>
&#43;&nbsp;&nbsp;&nbsp; current-&gt;arch.gic.v2.lr[lr] = lrv; <br>
&nbsp; &nbsp;} <br>
<br>
&nbsp; &nbsp;static void gicv2_hcr_status(uint32_t flag, bool status) <br>
@@ -1232,6 &#43;1246,7 @@ const static struct gic_hw_operations gicv2_ops = { <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .info&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = &amp;gicv2_info, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .init&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_init, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .secondary_init&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_secondary_cpu_init, <br>
&#43;&nbsp;&nbsp;&nbsp; .store_lrs&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_store_lrs, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .save_state&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_save_state, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .restore_state&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_restore_state, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; .dump_state&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = gicv2_dump_state, <br>
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c <br>
index ed363f6..a05c518 100644 <br>
--- a/xen/arch/arm/gic.c <br>
&#43;&#43;&#43; b/xen/arch/arm/gic.c <br>
@@ -594,6 &#43;594,12 @@ static void gic_update_one_lr(struct vcpu *v, int i) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; } <br>
&nbsp; &nbsp;} <br>
<br>
&#43;void gic_store_lrs(void) <br>
&#43;{ <br>
&#43;&nbsp;&nbsp;&nbsp; if(gic_hw_ops-&gt;store_lrs) <br>
&#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gic_hw_ops-&gt;store_lrs(); <br>
&#43;} <br>
&#43; <br>
&nbsp; &nbsp;void gic_clear_lrs(struct vcpu *v) <br>
&nbsp; &nbsp;{ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; int i = 0; <br>
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c <br>
index f6f6de3..985192b 100644 <br>
--- a/xen/arch/arm/traps.c <br>
&#43;&#43;&#43; b/xen/arch/arm/traps.c <br>
@@ -2095,6 &#43;2095,7 @@ static void enter_hypervisor_head(struct <br>
cpu_user_regs *regs) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if ( current-&gt;arch.hcr_el2 &amp; HCR_VA ) <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; current-&gt;arch.hcr_el2 = READ_SYSREG(HCR_EL2); <br>
<br>
&#43;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gic_store_lrs(); <br>
</blockquote>
<br>
gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are not due by this function, can you move that after gic_clear_lrs()?
<br>
</blockquote>
Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it does not change PENDING to INVALID under no circumstances from one hand. From another hand, all changes to LRs are made through gic specific operations gic_hw_ops-&gt;... which are tracked
 by me. You can see, in the code above, that I copy all updates to the physical LR issued by hypervisor into the stored LRs. So it not the case. But I'll check on Monday.<br>
<br>
<blockquote type="cite" cite="mid:22f7ebc8-c74b-0d8e-4847-9d3df9bcf5db@arm.com"><br>
<blockquote type="cite">&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gic_clear_lrs(current); <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; } <br>
&nbsp; &nbsp;} <br>
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h <br>
index d3d7bda..6fe3fdb 100644 <br>
--- a/xen/include/asm-arm/gic.h <br>
&#43;&#43;&#43; b/xen/include/asm-arm/gic.h <br>
@@ -293,6 &#43;293,7 @@ extern unsigned int gic_number_lines(void); <br>
&nbsp; &nbsp;/* IRQ translation function for the device tree */ <br>
&nbsp; &nbsp;int gic_irq_xlate(const u32 *intspec, unsigned int intsize, <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; unsigned int *out_hwirq, unsigned int *out_type); <br>
&#43;void gic_store_lrs(void); <br>
&nbsp; &nbsp;void gic_clear_lrs(struct vcpu *v); <br>
<br>
&nbsp; &nbsp;struct gic_info { <br>
@@ -314,6 &#43;315,8 @@ struct gic_hw_operations { <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; /* Initialize the GIC and the boot CPU */ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; int (*init)(void); <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; /* Save GIC registers */ <br>
&#43;&nbsp;&nbsp;&nbsp; void (*store_lrs)(void); <br>
&#43;&nbsp;&nbsp;&nbsp; /* Save GIC registers */ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; void (*save_state)(struct vcpu *); <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; /* Restore GIC registers */ <br>
&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; void (*restore_state)(const struct vcpu *); <br>
</blockquote>
<br>
Cheers, <br>
<br>
</blockquote>
<br>
[1] <a class="moz-txt-link-freetext" href="https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c">
https://github.com/aanisov/xen/commit/b1c28be38d30eb460ed0339526c4987b2cf1708c</a><br>
[2] <a class="moz-txt-link-freetext" href="https://github.com/glmark2/glmark2">https://github.com/glmark2/glmark2</a><br>
<br>
</body>
</html>
Julien Grall Oct. 28, 2018, 11:46 a.m. UTC | #10
On 10/27/18 1:14 PM, Andrii Anisov wrote:
> Hello Julien,
> 
> 
> On 10/27/2018 12:55 AM, Julien Grall wrote:
>> The functions below does not exist in latest Xen. So are you sure this 
>> based on mainline?
> 
> Yep, I've put a wrong diff into the email, it is for XEN 4.10.
> Please find the patch for XEN 4.12-unstable on my github [1].
> 
>> How many domain do you have run at that time?
> Only one domain. In my setup I'm running a BSP release under XEN as Dom0.
> 
>> Under which condition this is hapenning?
> Under no specific conditions. During the system boot, during console 
> operations, during glmark2 [2] running.

So the glmark2 is running in Dom0, am I correct?

> 
>> Also, what is your Xen command line?
> `dom0_mem=1G console=dtuart dtuart=serial0 dom0_max_vcpus=4 bootscrub=0 
> loglvl=all cpufreq=none`
> 
>> I assume you didn't disable serrors, right? So we should have plenty 
>> of dsb(sy) and isb() in the path to there.
> I didn't disable serrors.
> 
>> So I would rule out a missing barrier here.
> 
> 
>>
>>>
>>>
>>> Those prints I treat as LR content change (state PENDING->INVALID)
>>> without exiting from hyp to guest. Unfortunately, I did not find a kind
>>> of explanation to this effect in ARM IHI 0048B.b document.
>>> I have GICv2 on the die.
>>>
>>> I appreciate you find some time to look at this and express your 
>>> opinion.
>>>
>>>
>>> ---
>>>
>>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>>> index 511c8d7..0b9aa2d 100644
>>> --- a/xen/arch/arm/gic-v2.c
>>> +++ b/xen/arch/arm/gic-v2.c
>>> @@ -171,6 +171,14 @@ static unsigned int gicv2_cpu_mask(const cpumask_t
>>> *cpumask)
>>>        return mask;
>>>    }
>>>
>>> +static void gicv2_store_lrs(void)
>>> +{
>>> +    int i;
>>> +
>>> +    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
>>> +        current->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
>>> +}
>>> +
>>>    static void gicv2_save_state(struct vcpu *v)
>>>    {
>>>        int i;
>>> @@ -446,11 +454,13 @@ static void gicv2_update_lr(int lr, const struct
>>> pending_irq *p,
>>>                                       << GICH_V2_LR_PHYSICAL_SHIFT);
>>>
>>>        writel_gich(lr_reg, GICH_LR + lr * 4);
>>> +    current->arch.gic.v2.lr[lr] = lr_reg;
>>>    }
>>>
>>>    static void gicv2_clear_lr(int lr)
>>>    {
>>>        writel_gich(0, GICH_LR + lr * 4);
>>> +    current->arch.gic.v2.lr[lr] = 0;
>>>    }
>>>
>>>    static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
>>> @@ -458,6 +468,9 @@ static void gicv2_read_lr(int lr, struct gic_lr 
>>> *lr_reg)
>>>        uint32_t lrv;
>>>
>>>        lrv          = readl_gich(GICH_LR + lr * 4);
>>> +    if ( lrv != current->arch.gic.v2.lr[lr])
>>> +        printk("Stored LR is stale: stored 0x%"PRIX32" read 
>>> 0x%"PRIX32"\n",
>>> +               current->arch.gic.v2.lr[lr], lrv);
>>>        lr_reg->pirq = (lrv >> GICH_V2_LR_PHYSICAL_SHIFT) &
>>> GICH_V2_LR_PHYSICAL_MASK;
>>>        lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) &
>>> GICH_V2_LR_VIRTUAL_MASK;
>>>        lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) &
>>> GICH_V2_LR_PRIORITY_MASK;
>>> @@ -481,6 +494,7 @@ static void gicv2_write_lr(int lr, const struct
>>> gic_lr *lr_reg)
>>>              ((uint32_t)(lr_reg->grp & GICH_V2_LR_GRP_MASK) <<
>>> GICH_V2_LR_GRP_SHIFT) );
>>>
>>>        writel_gich(lrv, GICH_LR + lr * 4);
>>> +    current->arch.gic.v2.lr[lr] = lrv;
>>>    }
>>>
>>>    static void gicv2_hcr_status(uint32_t flag, bool status)
>>> @@ -1232,6 +1246,7 @@ const static struct gic_hw_operations gicv2_ops 
>>> = {
>>>        .info                = &gicv2_info,
>>>        .init                = gicv2_init,
>>>        .secondary_init      = gicv2_secondary_cpu_init,
>>> +    .store_lrs           = gicv2_store_lrs,
>>>        .save_state          = gicv2_save_state,
>>>        .restore_state       = gicv2_restore_state,
>>>        .dump_state          = gicv2_dump_state,
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index ed363f6..a05c518 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -594,6 +594,12 @@ static void gic_update_one_lr(struct vcpu *v, 
>>> int i)
>>>        }
>>>    }
>>>
>>> +void gic_store_lrs(void)
>>> +{
>>> +    if(gic_hw_ops->store_lrs)
>>> +        gic_hw_ops->store_lrs();
>>> +}
>>> +
>>>    void gic_clear_lrs(struct vcpu *v)
>>>    {
>>>        int i = 0;
>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>> index f6f6de3..985192b 100644
>>> --- a/xen/arch/arm/traps.c
>>> +++ b/xen/arch/arm/traps.c
>>> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct
>>> cpu_user_regs *regs)
>>>            if ( current->arch.hcr_el2 & HCR_VA )
>>>                current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
>>>
>>> +        gic_store_lrs();
>>
>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are 
>> not due by this function, can you move that after gic_clear_lrs()?
> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it 
> does not change PENDING to INVALID under no circumstances from one hand. 
> From another hand, all changes to LRs are made through gic specific 
> operations gic_hw_ops->... which are tracked by me. You can see, in the 
> code above, that I copy all updates to the physical LR issued by 
> hypervisor into the stored LRs. So it not the case. But I'll check on 
> Monday.

Ah, I missed the part where you updated the LRs.

While the interrupt exception path does not re-enable interrupts early. 
Other traps may do it.

On those trap, you would have the following path:
	1) Save GP registers
	2) Enable interrupts
	3) Store lrs

It is possible to receive an interrupts right after 2) and before 3). 
Because the current vGIC is touching the LRs directly (not the case on 
the new one). You may then end up with the information of the previous 
store.

Could you investigate how the guest has exited (i.e sync vs interrupt)?

Cheers,
Andrii Anisov Oct. 29, 2018, 10:06 a.m. UTC | #11
Hello Julien,


Sorry for the previous email sent as html.


On 27.10.18 15:14, Andrii Anisov wrote:
>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>> index f6f6de3..985192b 100644
>>> --- a/xen/arch/arm/traps.c
>>> +++ b/xen/arch/arm/traps.c
>>> @@ -2095,6 +2095,7 @@ static void enter_hypervisor_head(struct
>>> cpu_user_regs *regs)
>>>            if ( current->arch.hcr_el2 & HCR_VA )
>>>                current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
>>>
>>> +        gic_store_lrs();
>>
>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs are 
>> not due by this function, can you move that after gic_clear_lrs()?
> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But it 
> does not change PENDING to INVALID under no circumstances from one 
> hand. From another hand, all changes to LRs are made through gic 
> specific operations gic_hw_ops->... which are tracked by me. You can 
> see, in the code above, that I copy all updates to the physical LR 
> issued by hypervisor into the stored LRs. So it not the case. But I'll 
> check on Monday.

In 4.12-unstable code base I moved gic_store_lrs() after vgic_sync_from_lrs()  and see significant changes.  Now stale lines are printed at very high rate, and it is the proper behavior. Because the correspondent check (performed when vgic_sync_from_lrs() reads LRs) detects that VM processes interrupts and LR values are changed comparing to those set by hypervisor lately.

So now it is the question, why could I detect spurious changes in LRs without exiting to VM?
Julien Grall Oct. 29, 2018, 1:36 p.m. UTC | #12
On 29/10/2018 10:06, Andrii Anisov wrote:
> Hello Julien,

Hi,

> 
> Sorry for the previous email sent as html.

Don't worry. I didn't notice it :).

> 
> 
> On 27.10.18 15:14, Andrii Anisov wrote:
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index
>>>> f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++
>>>> b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void
>>>> enter_hypervisor_head(struct cpu_user_regs *regs) if (
>>>> current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 =
>>>> READ_SYSREG(HCR_EL2);
>>>> 
>>>> +        gic_store_lrs();
>>> 
>>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs
>>> are not due by this function, can you move that after
>>> gic_clear_lrs()?
>> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But
>> it does not change PENDING to INVALID under no circumstances from
>> one hand. From another hand, all changes to LRs are made through
>> gic specific operations gic_hw_ops->... which are tracked by me.
>> You can see, in the code above, that I copy all updates to the
>> physical LR issued by hypervisor into the stored LRs. So it not the
>> case. But I'll check on Monday.
> 
> In 4.12-unstable code base I moved gic_store_lrs() after
> vgic_sync_from_lrs()  and see significant changes.  Now stale lines
> are printed at very high rate, and it is the proper behavior. Because
> the correspondent check (performed when vgic_sync_from_lrs() reads
> LRs) detects that VM processes interrupts and LR values are changed
> comparing to those set by hypervisor lately.
> 
> So now it is the question, why could I detect spurious changes in LRs
> without exiting to VM?

I wrote down an answer yesterday (sent it today) to your previous
answer. You may use the LRs information from the previous guest trap as
interrupts are re-enabled before storing the LRs.

Can you try the patch below?

commit 11e360b93be81a58a41832d714f33f797ad312a9
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Oct 29 13:32:56 2018 +0000

     xen/arm: Re-enable interrupt later in the trap path

     Signed-off-by: Julien Grall <julien.grall@arm.com>

diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index 97b05f53ea..8f287891b6 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -195,7 +195,6 @@ hyp_error_invalid:

  hyp_error:
          entry   hyp=1
-        msr     daifclr, #2
          mov     x0, sp
          bl      do_trap_hyp_serror
          exit    hyp=1
@@ -203,7 +202,7 @@ hyp_error:
  /* Traps taken in Current EL with SP_ELx */
  hyp_sync:
          entry   hyp=1
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_hyp_sync
          exit    hyp=1
@@ -304,7 +303,7 @@ guest_sync_slowpath:
          ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                      "nop; nop",
                      SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_sync
  1:
@@ -332,7 +331,7 @@ guest_fiq_invalid:

  guest_error:
          entry   hyp=0, compat=0
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_serror
          exit    hyp=0, compat=0
@@ -347,7 +346,7 @@ guest_sync_compat:
          ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                      "nop; nop",
                      SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_sync
  1:
@@ -375,7 +374,7 @@ guest_fiq_invalid_compat:

  guest_error_compat:
          entry   hyp=0, compat=1
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_serror
          exit    hyp=0, compat=1
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 51d2e42c77..c18f89b41f 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
      {
          struct vcpu *v = current;

+        ASSERT(!local_irq_is_enabled());
+
          /* If the guest has disabled the workaround, bring it back on. */
          if ( needs_ssbd_flip(v) )
              arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
@@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
      const union hsr hsr = { .bits = regs->hsr };

      enter_hypervisor_head(regs);
+    local_irq_enable();

      switch ( hsr.ec )
      {
@@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
      const union hsr hsr = { .bits = regs->hsr };

      enter_hypervisor_head(regs);
+    local_irq_enable();

      switch ( hsr.ec )
      {
@@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
  void do_trap_hyp_serror(struct cpu_user_regs *regs)
  {
      enter_hypervisor_head(regs);
+    local_irq_enable();

      __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));
  }
@@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)
  void do_trap_guest_serror(struct cpu_user_regs *regs)
  {
      enter_hypervisor_head(regs);
+    local_irq_enable();

      __do_trap_serror(regs, true);
  }
Andrii Anisov Oct. 29, 2018, 4:16 p.m. UTC | #13
Hello Julien

On 29.10.18 15:36, Julien Grall wrote:0
> I wrote down an answer yesterday (sent it today) to your previous
> answer. You may use the LRs information from the previous guest trap as
> interrupts are re-enabled before storing the LRs.

Yes, it is the case. I've overlooked that for some exceptions interrupts 
are enabled before enter_hypervisor_head().


> Can you try the patch below?

I tried it, and it works like a charm for the issue.

The only notice here is that, I suppose, changes are needed for 
exceptions taken from a guest mode. For the exceptions taken from hyp, 
we do not massage gic in enter_hypervisor_head.
Julien Grall Oct. 29, 2018, 4:22 p.m. UTC | #14
On 29/10/2018 16:16, Andrii Anisov wrote:
> Hello Julien

Hi,
> On 29.10.18 15:36, Julien Grall wrote:0
>> I wrote down an answer yesterday (sent it today) to your previous
>> answer. You may use the LRs information from the previous guest trap as
>> interrupts are re-enabled before storing the LRs.
> 
> Yes, it is the case. I've overlooked that for some exceptions interrupts
> are enabled before enter_hypervisor_head().
> 
> 
>> Can you try the patch below?
> 
> I tried it, and it works like a charm for the issue.

What is the issue? Is it just your print or there a latent bug in current vGIC?

> 
> The only notice here is that, I suppose, changes are needed for
> exceptions taken from a guest mode. For the exceptions taken from hyp,
> we do not massage gic in enter_hypervisor_head.

I actually prefer if we re-enable interrupts after entry_hypervisor_head(). This 
makes the code working the same way everywhere so less trouble to figure out 
problem.

Cheers,
Andrii Anisov Oct. 30, 2018, 8:07 a.m. UTC | #15
Hi,


On 29.10.18 18:22, Julien Grall wrote:
> What is the issue? Is it just your print or there a latent bug in 
> current vGIC?

Ah, OK, that was the issue for me.


> I actually prefer if we re-enable interrupts after 
> entry_hypervisor_head(). This makes the code working the same way 
> everywhere so less trouble to figure out problem.

Good point.

Actually I was confused by the reported behavior, because I overlooked 
those msr's with different arguments in entry.S and assumed 
enter_hypervisor_head() always runs with irqs disabled.
Andrii Anisov Nov. 9, 2018, 2:42 p.m. UTC | #16
Hello Julien,

I just wonder, do you plan to upstream the patch below?

Andrii Anisov


On 29/10/2018 10:06, Andrii Anisov wrote:
> Hello Julien,

Hi,

> 
> Sorry for the previous email sent as html.

Don't worry. I didn't notice it :).

> 
> 
> On 27.10.18 15:14, Andrii Anisov wrote:
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index
>>>> f6f6de3..985192b 100644 --- a/xen/arch/arm/traps.c +++
>>>> b/xen/arch/arm/traps.c @@ -2095,6 +2095,7 @@ static void
>>>> enter_hypervisor_head(struct cpu_user_regs *regs) if (
>>>> current->arch.hcr_el2 & HCR_VA ) current->arch.hcr_el2 =
>>>> READ_SYSREG(HCR_EL2);
>>>> 
>>>> +        gic_store_lrs();
>>> 
>>> gic_clear_lrs(...) may rewrite the LRs. To confirm this stall LRs
>>> are not due by this function, can you move that after
>>> gic_clear_lrs()?
>> Right you are, gic_clear_lrs(), in 4.10 codebase, rewrites LRs. But
>> it does not change PENDING to INVALID under no circumstances from
>> one hand. From another hand, all changes to LRs are made through
>> gic specific operations gic_hw_ops->... which are tracked by me.
>> You can see, in the code above, that I copy all updates to the
>> physical LR issued by hypervisor into the stored LRs. So it not the
>> case. But I'll check on Monday.
> 
> In 4.12-unstable code base I moved gic_store_lrs() after
> vgic_sync_from_lrs()  and see significant changes.  Now stale lines
> are printed at very high rate, and it is the proper behavior. Because
> the correspondent check (performed when vgic_sync_from_lrs() reads
> LRs) detects that VM processes interrupts and LR values are changed
> comparing to those set by hypervisor lately.
> 
> So now it is the question, why could I detect spurious changes in LRs
> without exiting to VM?

I wrote down an answer yesterday (sent it today) to your previous
answer. You may use the LRs information from the previous guest trap as
interrupts are re-enabled before storing the LRs.

Can you try the patch below?

commit 11e360b93be81a58a41832d714f33f797ad312a9
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Oct 29 13:32:56 2018 +0000

     xen/arm: Re-enable interrupt later in the trap path

     Signed-off-by: Julien Grall <julien.grall@arm.com>

diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index 97b05f53ea..8f287891b6 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -195,7 +195,6 @@ hyp_error_invalid:

  hyp_error:
          entry   hyp=1
-        msr     daifclr, #2
          mov     x0, sp
          bl      do_trap_hyp_serror
          exit    hyp=1
@@ -203,7 +202,7 @@ hyp_error:
  /* Traps taken in Current EL with SP_ELx */
  hyp_sync:
          entry   hyp=1
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_hyp_sync
          exit    hyp=1
@@ -304,7 +303,7 @@ guest_sync_slowpath:
          ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                      "nop; nop",
                      SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_sync
  1:
@@ -332,7 +331,7 @@ guest_fiq_invalid:

  guest_error:
          entry   hyp=0, compat=0
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_serror
          exit    hyp=0, compat=0
@@ -347,7 +346,7 @@ guest_sync_compat:
          ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                      "nop; nop",
                      SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_sync
  1:
@@ -375,7 +374,7 @@ guest_fiq_invalid_compat:

  guest_error_compat:
          entry   hyp=0, compat=1
-        msr     daifclr, #6
+        msr     daifclr, #4
          mov     x0, sp
          bl      do_trap_guest_serror
          exit    hyp=0, compat=1
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 51d2e42c77..c18f89b41f 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
      {
          struct vcpu *v = current;

+        ASSERT(!local_irq_is_enabled());
+
          /* If the guest has disabled the workaround, bring it back on. */
          if ( needs_ssbd_flip(v) )
              arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
@@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
      const union hsr hsr = { .bits = regs->hsr };

      enter_hypervisor_head(regs);
+    local_irq_enable();

      switch ( hsr.ec )
      {
@@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
      const union hsr hsr = { .bits = regs->hsr };

      enter_hypervisor_head(regs);
+    local_irq_enable();

      switch ( hsr.ec )
      {
@@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
  void do_trap_hyp_serror(struct cpu_user_regs *regs)
  {
      enter_hypervisor_head(regs);
+    local_irq_enable();

      __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));
  }
@@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)
  void do_trap_guest_serror(struct cpu_user_regs *regs)
  {
      enter_hypervisor_head(regs);
+    local_irq_enable();

      __do_trap_serror(regs, true);
  }
Julien Grall Nov. 9, 2018, 5:47 p.m. UTC | #17
On 09/11/2018 14:42, Andrii Anisov wrote:
> Hello Julien,

Hi Andrii,

> 
> I just wonder, do you plan to upstream the patch below?

I don't plan to upstream the patch below. Andre and I discussed about it 
extensively and haven't found potential issue with the 2 vGIC implementation we 
have in Xen.

Cheers,
Stefano Stabellini Nov. 9, 2018, 11:02 p.m. UTC | #18
On Tue, 23 Oct 2018, Julien Grall wrote:
> Devices that expose their interrupt status registers via system
> registers (e.g. Statistical profiling, CPU PMU, DynamIQ PMU, arch timer,
> vgic (although unused by Linux), ...) rely on a context synchronising
> operation on the CPU to ensure that the updated status register is
> visible to the CPU when handling the interrupt. This usually happens as
> a result of taking the IRQ exception in the first place, but there are
> two race scenarios where this isn't the case.
> 
> For example, let's say we have two peripherals (X and Y), where Y uses a
> system register for its interrupt status.
> 
> Case 1:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. Y then raises its interrupt line, but the update to its system
>    register is not yet visible to the CPU
> 3. The GIC decides to expose Y's interrupt number first in the Ack
>    register
> 4. The CPU runs the IRQ handler for Y, but the status register is stale
> 
> Case 2:
> 1. CPU takes an IRQ exception as a result of X raising an interrupt
> 2. CPU reads the interrupt number for X from the Ack register and runs
>    its IRQ handler
> 3. Y raises its interrupt line and the Ack register is updated, but
>    again, the update to its system register is not yet visible to the
>    CPU.
> 4. Since the GIC drivers poll the Ack register, we read Y's interrupt
>    number and run its handler without a context synchronisation
>    operation, therefore seeing the stale register value.
> 
> In either case, we run the risk of missing an IRQ. This patch solves the
> problem by ensuring that we execute an ISB in the GIC drivers prior
> to invoking the interrupt handler.
> 
> Based on Linux commit 39a06b67c2c1256bcf2361a1f67d2529f70ab206
> "irqchip/gic: Ensure we have an ISB between ack and ->handle_irq".
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>

> ---
>     This patch is a candidate for backporting up to Xen 4.9.
> ---
>  xen/arch/arm/gic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 8d7e491060..305fbd66dd 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -388,12 +388,14 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>          if ( likely(irq >= 16 && irq < 1020) )
>          {
>              local_irq_enable();
> +            isb();
>              do_IRQ(regs, irq, is_fiq);
>              local_irq_disable();
>          }
>          else if ( is_lpi(irq) )
>          {
>              local_irq_enable();
> +            isb();
>              gic_hw_ops->do_LPI(irq);
>              local_irq_disable();
>          }
> -- 
> 2.11.0
>
Andrii Anisov Nov. 19, 2018, 3:54 p.m. UTC | #19
Julien,


It's me again about your patch:)

I've found this patch useful and even can give a motivation to have it 
in the mainline. The patch ensures that vgic_sync_from_lrs is performed 
on guest to hyp switch prior to any IRQ processing.

So, do you plan to push it for review? Could I do that on behalf of you?


On 09.11.18 16:42, Andrii Anisov wrote:
> Hello Julien,
>
> I just wonder, do you plan to upstream the patch below?
>
> Andrii Anisov
>
>
>
> commit 11e360b93be81a58a41832d714f33f797ad312a9
> Author: Julien Grall <julien.grall@arm.com>
> Date:   Mon Oct 29 13:32:56 2018 +0000
>
>       xen/arm: Re-enable interrupt later in the trap path
>
>       Signed-off-by: Julien Grall <julien.grall@arm.com>
>
> diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
> index 97b05f53ea..8f287891b6 100644
> --- a/xen/arch/arm/arm64/entry.S
> +++ b/xen/arch/arm/arm64/entry.S
> @@ -195,7 +195,6 @@ hyp_error_invalid:
>
>    hyp_error:
>            entry   hyp=1
> -        msr     daifclr, #2
>            mov     x0, sp
>            bl      do_trap_hyp_serror
>            exit    hyp=1
> @@ -203,7 +202,7 @@ hyp_error:
>    /* Traps taken in Current EL with SP_ELx */
>    hyp_sync:
>            entry   hyp=1
> -        msr     daifclr, #6
> +        msr     daifclr, #4
>            mov     x0, sp
>            bl      do_trap_hyp_sync
>            exit    hyp=1
> @@ -304,7 +303,7 @@ guest_sync_slowpath:
>            ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
>                        "nop; nop",
>                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
> -        msr     daifclr, #6
> +        msr     daifclr, #4
>            mov     x0, sp
>            bl      do_trap_guest_sync
>    1:
> @@ -332,7 +331,7 @@ guest_fiq_invalid:
>
>    guest_error:
>            entry   hyp=0, compat=0
> -        msr     daifclr, #6
> +        msr     daifclr, #4
>            mov     x0, sp
>            bl      do_trap_guest_serror
>            exit    hyp=0, compat=0
> @@ -347,7 +346,7 @@ guest_sync_compat:
>            ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
>                        "nop; nop",
>                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
> -        msr     daifclr, #6
> +        msr     daifclr, #4
>            mov     x0, sp
>            bl      do_trap_guest_sync
>    1:
> @@ -375,7 +374,7 @@ guest_fiq_invalid_compat:
>
>    guest_error_compat:
>            entry   hyp=0, compat=1
> -        msr     daifclr, #6
> +        msr     daifclr, #4
>            mov     x0, sp
>            bl      do_trap_guest_serror
>            exit    hyp=0, compat=1
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 51d2e42c77..c18f89b41f 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>        {
>            struct vcpu *v = current;
>
> +        ASSERT(!local_irq_is_enabled());
> +
>            /* If the guest has disabled the workaround, bring it back on. */
>            if ( needs_ssbd_flip(v) )
>                arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
> @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
>        const union hsr hsr = { .bits = regs->hsr };
>
>        enter_hypervisor_head(regs);
> +    local_irq_enable();
>
>        switch ( hsr.ec )
>        {
> @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
>        const union hsr hsr = { .bits = regs->hsr };
>
>        enter_hypervisor_head(regs);
> +    local_irq_enable();
>
>        switch ( hsr.ec )
>        {
> @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
>    void do_trap_hyp_serror(struct cpu_user_regs *regs)
>    {
>        enter_hypervisor_head(regs);
> +    local_irq_enable();
>
>        __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));
>    }
> @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)
>    void do_trap_guest_serror(struct cpu_user_regs *regs)
>    {
>        enter_hypervisor_head(regs);
> +    local_irq_enable();
>
>        __do_trap_serror(regs, true);
>    }
>
Julien Grall Nov. 19, 2018, 4:42 p.m. UTC | #20
On Mon, 19 Nov 2018, 15:57 Andrii Anisov, <andrii.anisov@gmail.com> wrote:

> Julien,

>

>

> It's me again about your patch:)

>

> I've found this patch useful and even can give a motivation to have it

> in the mainline. The patch ensures that vgic_sync_from_lrs is performed

> on guest to hyp switch prior to any IRQ processing.

>




There are no issue about processing IRQs before the syncs. It is the same
as if an IRQ was raised from ila different pCPUs. So why do you need that?


> So, do you plan to push it for review? Could I do that on behalf of you?

>


Unless there are any bug with current code, then I don't plan to merge it.

Cheers,


>

> On 09.11.18 16:42, Andrii Anisov wrote:

> > Hello Julien,

> >

> > I just wonder, do you plan to upstream the patch below?

> >

> > Andrii Anisov

> >

> >

> >

> > commit 11e360b93be81a58a41832d714f33f797ad312a9

> > Author: Julien Grall <julien.grall@arm.com>

> > Date:   Mon Oct 29 13:32:56 2018 +0000

> >

> >       xen/arm: Re-enable interrupt later in the trap path

> >

> >       Signed-off-by: Julien Grall <julien.grall@arm.com>

> >

> > diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S

> > index 97b05f53ea..8f287891b6 100644

> > --- a/xen/arch/arm/arm64/entry.S

> > +++ b/xen/arch/arm/arm64/entry.S

> > @@ -195,7 +195,6 @@ hyp_error_invalid:

> >

> >    hyp_error:

> >            entry   hyp=1

> > -        msr     daifclr, #2

> >            mov     x0, sp

> >            bl      do_trap_hyp_serror

> >            exit    hyp=1

> > @@ -203,7 +202,7 @@ hyp_error:

> >    /* Traps taken in Current EL with SP_ELx */

> >    hyp_sync:

> >            entry   hyp=1

> > -        msr     daifclr, #6

> > +        msr     daifclr, #4

> >            mov     x0, sp

> >            bl      do_trap_hyp_sync

> >            exit    hyp=1

> > @@ -304,7 +303,7 @@ guest_sync_slowpath:

> >            ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",

> >                        "nop; nop",

> >                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)

> > -        msr     daifclr, #6

> > +        msr     daifclr, #4

> >            mov     x0, sp

> >            bl      do_trap_guest_sync

> >    1:

> > @@ -332,7 +331,7 @@ guest_fiq_invalid:

> >

> >    guest_error:

> >            entry   hyp=0, compat=0

> > -        msr     daifclr, #6

> > +        msr     daifclr, #4

> >            mov     x0, sp

> >            bl      do_trap_guest_serror

> >            exit    hyp=0, compat=0

> > @@ -347,7 +346,7 @@ guest_sync_compat:

> >            ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",

> >                        "nop; nop",

> >                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)

> > -        msr     daifclr, #6

> > +        msr     daifclr, #4

> >            mov     x0, sp

> >            bl      do_trap_guest_sync

> >    1:

> > @@ -375,7 +374,7 @@ guest_fiq_invalid_compat:

> >

> >    guest_error_compat:

> >            entry   hyp=0, compat=1

> > -        msr     daifclr, #6

> > +        msr     daifclr, #4

> >            mov     x0, sp

> >            bl      do_trap_guest_serror

> >            exit    hyp=0, compat=1

> > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c

> > index 51d2e42c77..c18f89b41f 100644

> > --- a/xen/arch/arm/traps.c

> > +++ b/xen/arch/arm/traps.c

> > @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct

> cpu_user_regs *regs)

> >        {

> >            struct vcpu *v = current;

> >

> > +        ASSERT(!local_irq_is_enabled());

> > +

> >            /* If the guest has disabled the workaround, bring it back

> on. */

> >            if ( needs_ssbd_flip(v) )

> >                arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1,

> NULL);

> > @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)

> >        const union hsr hsr = { .bits = regs->hsr };

> >

> >        enter_hypervisor_head(regs);

> > +    local_irq_enable();

> >

> >        switch ( hsr.ec )

> >        {

> > @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)

> >        const union hsr hsr = { .bits = regs->hsr };

> >

> >        enter_hypervisor_head(regs);

> > +    local_irq_enable();

> >

> >        switch ( hsr.ec )

> >        {

> > @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)

> >    void do_trap_hyp_serror(struct cpu_user_regs *regs)

> >    {

> >        enter_hypervisor_head(regs);

> > +    local_irq_enable();

> >

> >        __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));

> >    }

> > @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)

> >    void do_trap_guest_serror(struct cpu_user_regs *regs)

> >    {

> >        enter_hypervisor_head(regs);

> > +    local_irq_enable();

> >

> >        __do_trap_serror(regs, true);

> >    }

> >

> --

> Sincerely,

> Andrii Anisov.

>

>

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@lists.xenproject.org

> https://lists.xenproject.org/mailman/listinfo/xen-devel
<br><br><div class="gmail_quote"><div dir="ltr">On Mon, 19 Nov 2018, 15:57 Andrii Anisov, &lt;<a href="mailto:andrii.anisov@gmail.com">andrii.anisov@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Julien,<br>
<br>
<br>
It&#39;s me again about your patch:)<br>
<br>
I&#39;ve found this patch useful and even can give a motivation to have it <br>
in the mainline. The patch ensures that vgic_sync_from_lrs is performed <br>
on guest to hyp switch prior to any IRQ processing.<br></blockquote></div><div><br></div><div><br></div><div><br>There are no issue about processing IRQs before the syncs. It is the same as if an IRQ was raised from ila different pCPUs. So why do you need that?</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
So, do you plan to push it for review? Could I do that on behalf of you?<br></blockquote></div><div><br></div><div>Unless there are any bug with current code, then I don&#39;t plan to merge it.</div><div><br></div><div>Cheers,</div><div><br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
<br>
On 09.11.18 16:42, Andrii Anisov wrote:<br>
&gt; Hello Julien,<br>
&gt;<br>
&gt; I just wonder, do you plan to upstream the patch below?<br>
&gt;<br>
&gt; Andrii Anisov<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; commit 11e360b93be81a58a41832d714f33f797ad312a9<br>
&gt; Author: Julien Grall &lt;<a href="mailto:julien.grall@arm.com" target="_blank">julien.grall@arm.com</a>&gt;<br>
&gt; Date:   Mon Oct 29 13:32:56 2018 +0000<br>
&gt;<br>
&gt;       xen/arm: Re-enable interrupt later in the trap path<br>
&gt;<br>
&gt;       Signed-off-by: Julien Grall &lt;<a href="mailto:julien.grall@arm.com" target="_blank">julien.grall@arm.com</a>&gt;<br>
&gt;<br>
&gt; diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S<br>
&gt; index 97b05f53ea..8f287891b6 100644<br>
&gt; --- a/xen/arch/arm/arm64/entry.S<br>
&gt; +++ b/xen/arch/arm/arm64/entry.S<br>
&gt; @@ -195,7 +195,6 @@ hyp_error_invalid:<br>
&gt;<br>
&gt;    hyp_error:<br>
&gt;            entry   hyp=1<br>
&gt; -        msr     daifclr, #2<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_hyp_serror<br>
&gt;            exit    hyp=1<br>
&gt; @@ -203,7 +202,7 @@ hyp_error:<br>
&gt;    /* Traps taken in Current EL with SP_ELx */<br>
&gt;    hyp_sync:<br>
&gt;            entry   hyp=1<br>
&gt; -        msr     daifclr, #6<br>
&gt; +        msr     daifclr, #4<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_hyp_sync<br>
&gt;            exit    hyp=1<br>
&gt; @@ -304,7 +303,7 @@ guest_sync_slowpath:<br>
&gt;            ALTERNATIVE(&quot;bl check_pending_vserror; cbnz x0, 1f&quot;,<br>
&gt;                        &quot;nop; nop&quot;,<br>
&gt;                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)<br>
&gt; -        msr     daifclr, #6<br>
&gt; +        msr     daifclr, #4<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_guest_sync<br>
&gt;    1:<br>
&gt; @@ -332,7 +331,7 @@ guest_fiq_invalid:<br>
&gt;<br>
&gt;    guest_error:<br>
&gt;            entry   hyp=0, compat=0<br>
&gt; -        msr     daifclr, #6<br>
&gt; +        msr     daifclr, #4<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_guest_serror<br>
&gt;            exit    hyp=0, compat=0<br>
&gt; @@ -347,7 +346,7 @@ guest_sync_compat:<br>
&gt;            ALTERNATIVE(&quot;bl check_pending_vserror; cbnz x0, 1f&quot;,<br>
&gt;                        &quot;nop; nop&quot;,<br>
&gt;                        SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)<br>
&gt; -        msr     daifclr, #6<br>
&gt; +        msr     daifclr, #4<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_guest_sync<br>
&gt;    1:<br>
&gt; @@ -375,7 +374,7 @@ guest_fiq_invalid_compat:<br>
&gt;<br>
&gt;    guest_error_compat:<br>
&gt;            entry   hyp=0, compat=1<br>
&gt; -        msr     daifclr, #6<br>
&gt; +        msr     daifclr, #4<br>
&gt;            mov     x0, sp<br>
&gt;            bl      do_trap_guest_serror<br>
&gt;            exit    hyp=0, compat=1<br>
&gt; diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c<br>
&gt; index 51d2e42c77..c18f89b41f 100644<br>
&gt; --- a/xen/arch/arm/traps.c<br>
&gt; +++ b/xen/arch/arm/traps.c<br>
&gt; @@ -2039,6 +2039,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)<br>
&gt;        {<br>
&gt;            struct vcpu *v = current;<br>
&gt;<br>
&gt; +        ASSERT(!local_irq_is_enabled());<br>
&gt; +<br>
&gt;            /* If the guest has disabled the workaround, bring it back on. */<br>
&gt;            if ( needs_ssbd_flip(v) )<br>
&gt;                arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);<br>
&gt; @@ -2073,6 +2075,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)<br>
&gt;        const union hsr hsr = { .bits = regs-&gt;hsr };<br>
&gt;<br>
&gt;        enter_hypervisor_head(regs);<br>
&gt; +    local_irq_enable();<br>
&gt;<br>
&gt;        switch ( <a href="http://hsr.ec" rel="noreferrer" target="_blank">hsr.ec</a> )<br>
&gt;        {<br>
&gt; @@ -2208,6 +2211,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)<br>
&gt;        const union hsr hsr = { .bits = regs-&gt;hsr };<br>
&gt;<br>
&gt;        enter_hypervisor_head(regs);<br>
&gt; +    local_irq_enable();<br>
&gt;<br>
&gt;        switch ( <a href="http://hsr.ec" rel="noreferrer" target="_blank">hsr.ec</a> )<br>
&gt;        {<br>
&gt; @@ -2246,6 +2250,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)<br>
&gt;    void do_trap_hyp_serror(struct cpu_user_regs *regs)<br>
&gt;    {<br>
&gt;        enter_hypervisor_head(regs);<br>
&gt; +    local_irq_enable();<br>
&gt;<br>
&gt;        __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));<br>
&gt;    }<br>
&gt; @@ -2253,6 +2258,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)<br>
&gt;    void do_trap_guest_serror(struct cpu_user_regs *regs)<br>
&gt;    {<br>
&gt;        enter_hypervisor_head(regs);<br>
&gt; +    local_irq_enable();<br>
&gt;<br>
&gt;        __do_trap_serror(regs, true);<br>
&gt;    }<br>
&gt;<br>
-- <br>
Sincerely,<br>
Andrii Anisov.<br>
<br>
<br>
_______________________________________________<br>
Xen-devel mailing list<br>
<a href="mailto:Xen-devel@lists.xenproject.org" target="_blank">Xen-devel@lists.xenproject.org</a><br>
<a href="https://lists.xenproject.org/mailman/listinfo/xen-devel" rel="noreferrer" target="_blank">https://lists.xenproject.org/mailman/listinfo/xen-devel</a></blockquote></div>
Andrii Anisov Nov. 20, 2018, 6:10 p.m. UTC | #21
Hello Julien,


On 19.11.18 18:42, Julien Grall wrote:
> There are no issue about processing IRQs before the syncs. It is the 
> same as if an IRQ was raised from ila different pCPUs.
> So why do you need that?

 From my understanding of gic-vgic code (old vgic), for the IRQs 
targeting the `current` vcpu, it leads to a faster processing under 
interrupts storm conditions. If it was all LRs set on previous switch to 
a guest, a the IRQ will have a chance to go directly to LR instead of 
setting on lr_pending queue. Also inflight_irqs queue have a chance to 
be shorter to insert.


Moreover, maybe you can explain me, what's the point of interrupts 
enabling before for `do_IRQ()` call? Those interrupts would be grabbed 
and processed anyway, during run through the loop in `gic_interrupt()`. 
So I see the only outcome of interrupts enabling - spending more time 
for context switches.
Julien Grall Nov. 20, 2018, 6:47 p.m. UTC | #22
On 20/11/2018 18:10, Andrii Anisov wrote:
> Hello Julien,
> 
> 
> On 19.11.18 18:42, Julien Grall wrote:
>> There are no issue about processing IRQs before the syncs. It is the same as 
>> if an IRQ was raised from ila different pCPUs.
>> So why do you need that?
> 
>  From my understanding of gic-vgic code (old vgic), for the IRQs targeting the 
> `current` vcpu, it leads to a faster processing under interrupts storm 
> conditions. If it was all LRs set on previous switch to a guest, a the IRQ will 
> have a chance to go directly to LR instead of setting on lr_pending queue. Also 
> inflight_irqs queue have a chance to be shorter to insert.

Do you have actual numbers? Also to be on the same page, what is your definition 
of interrupts storm?

Bear in mind that the old vGIC will be phased out soon. If you are worried about 
performance, then I would recommend to try the new vGIC and see whether it improves.

> Moreover, maybe you can explain me, what's the point of interrupts enabling 
> before for `do_IRQ()` call? Those interrupts would be grabbed and processed 
> anyway, during run through the loop in `gic_interrupt()`. So I see the only 
> outcome of interrupts enabling - spending more time for context switches.

Well, if you re-enable the interrupts you give a chance for higher priority 
interrupts to come up. This will not happen if you have interrupts disabled.

But you seem to base your assumption on interrupts storm (yet to be defined). If 
you have an interrupt storm, then you are already doomed as your guest/Xen will 
not have time to do any other work.

In any case, you need to provide number to support your optimization.

Cheers,
Andrii Anisov Nov. 22, 2018, 4:51 p.m. UTC | #23
Hello Julien,

On 20.11.18 20:47, Julien Grall wrote:
> 
> 
> On 20/11/2018 18:10, Andrii Anisov wrote:
>> Hello Julien,
>>
>>
>> On 19.11.18 18:42, Julien Grall wrote:
>>> There are no issue about processing IRQs before the syncs. It is the 
>>> same as if an IRQ was raised from ila different pCPUs.
>>> So why do you need that?
>>
>>  From my understanding of gic-vgic code (old vgic), for the IRQs 
>> targeting the `current` vcpu, it leads to a faster processing under 
>> interrupts storm conditions. If it was all LRs set on previous switch 
>> to a guest, a the IRQ will have a chance to go directly to LR instead 
>> of setting on lr_pending queue. Also inflight_irqs queue have a chance 
>> to be shorter to insert.
> 
> Do you have actual numbers?
Unfortunately, my numbers are pretty indirect. I'm referring glmark2 
benchmark results. With this and the rest of my changes (not yet 
published), I can cut out another percent or two of performance drop due 
to XEN existence in the system. BTW, that's why I recently asked Stefano 
about his approach of interrupt latency measurement.

On my board that benchmark processing causes at least 4 different HW 
interrupts issuing with different frequency. Adding the reschedule IRQ 
makes the system tend to not fit all IRQs into 4 LRs available in my 
GIC. Moreover, the benchmark does not emit a network traffic or disk 
usage during the run. So real life cases will add more concurrent IRQs.

> Also to be on the same page, what is your 
> definition of interrupts storm?
I mean the system takes different interrupts (more IRQ sources than LRs 
available) with a relatively high rate. Let's say more than 7000 
interrupts per second. It's not very big number, but close to what I see 
on my desk.

> Bear in mind that the old vGIC will be phased out soon.
As I remember a new vgic experimental yet. Do not support GIC-v3 yet.

> If you are 
> worried about performance, then I would recommend to try the new vGIC 
> and see whether it improves.
You know, we are based on XEN 4.10. Initially, when a customer said 
about their dissatisfaction about performance drop in benchmark due to 
XEN existence, I tried 4.12-unstable, both an old and a new VGIC. So 
performance with 4.12-unstable with the old VGIC was worse than 4.10, 
and the new VGIC made things even much worse. I can't remember the exact 
numbers or proportions, but that was the reason we do not offer 
upgrading XEN yet.

> Well, if you re-enable the interrupts you give a chance for higher 
> priority interrupts to come up. This will not happen if you have 
> interrupts disabled.
I understand the theory, but can not match it with the current XEN code.
Guest interrupts prioritization within do_IRQ is pretty meaningless. 
They will go through the same path. And an effect would not be seen 
before exiting to a guest.
The PPI interrupts are reflected into the processing of soft IRQs or 
injecting an IRQ into queues. So it does not matter much when exactly we 
do read the IRQ from IAR in a gic_interrupt loop. I suppose it should be 
faster to loop through gic_interrupt at once, collecting all interrupts, 
without going through exception path, then switch to soft IRQs 
processing in leave_hypervisor_tail.
The only thing which might get a noticeable effect here is serving 
GIC_SGI_CALL_FUNCTION, which is executed right away from `gic_interrupt`.

> But you seem to base your assumption on interrupts storm (yet to be 
> defined). If you have an interrupt storm, then you are already doomed as 
> your guest/Xen will not have time to do any other work.

> 
> In any case, you need to provide number to support your optimization.I'm moving all my patches to current staging and would send them as RFC 
with a description of why is it done and how I measured results.
Julien Grall Nov. 22, 2018, 5:22 p.m. UTC | #24
On 11/22/18 4:51 PM, Andrii Anisov wrote:
> Hello Julien,

Hi Andrii,


> On 20.11.18 20:47, Julien Grall wrote:
>>
>>
>> On 20/11/2018 18:10, Andrii Anisov wrote:
>>> Hello Julien,
>>>
>>>
>>> On 19.11.18 18:42, Julien Grall wrote:
>>>> There are no issue about processing IRQs before the syncs. It is the 
>>>> same as if an IRQ was raised from ila different pCPUs.
>>>> So why do you need that?
>>>
>>>  From my understanding of gic-vgic code (old vgic), for the IRQs 
>>> targeting the `current` vcpu, it leads to a faster processing under 
>>> interrupts storm conditions. If it was all LRs set on previous switch 
>>> to a guest, a the IRQ will have a chance to go directly to LR instead 
>>> of setting on lr_pending queue. Also inflight_irqs queue have a 
>>> chance to be shorter to insert.
>>
>> Do you have actual numbers?
> Unfortunately, my numbers are pretty indirect. I'm referring glmark2 
> benchmark results. With this and the rest of my changes (not yet 
> published), I can cut out another percent or two of performance drop due 
> to XEN existence in the system. BTW, that's why I recently asked Stefano 
> about his approach of interrupt latency measurement.

My biggest worry is you are doing optimization on a vGIC that is not 
fully compliant with how a GIC should behave (e.g edge vs level) and 
with very fragile locking. If you are interested, Andre can provides 
more details.

> 
> On my board that benchmark processing causes at least 4 different HW 
> interrupts issuing with different frequency. Adding the reschedule IRQ 
> makes the system tend to not fit all IRQs into 4 LRs available in my 
> GIC. Moreover, the benchmark does not emit a network traffic or disk 
> usage during the run. So real life cases will add more concurrent IRQs.
> 
>> Also to be on the same page, what is your definition of interrupts storm?
> I mean the system takes different interrupts (more IRQ sources than LRs 
> available) with a relatively high rate. Let's say more than 7000 
> interrupts per second. It's not very big number, but close to what I see 
> on my desk.
> 
>> Bear in mind that the old vGIC will be phased out soon.
> As I remember a new vgic experimental yet. Do not support GIC-v3 yet.
> 
>> If you are worried about performance, then I would recommend to try 
>> the new vGIC and see whether it improves.
> You know, we are based on XEN 4.10. Initially, when a customer said 
> about their dissatisfaction about performance drop in benchmark due to 
> XEN existence, I tried 4.12-unstable, both an old and a new VGIC. So 
> performance with 4.12-unstable with the old VGIC was worse than 4.10, 
> and the new VGIC made things even much worse. I can't remember the exact 
> numbers or proportions, but that was the reason we do not offer 
> upgrading XEN yet.

I can't comment without any numbers here. Bear in mind that we fixed 
bugs in Xen 4.12 (including spectre/meltdown and missing barriers) that 
wasn't backported to Xen 4.10. It is entirely possible that it 
introduced slowness but it also ensure the code is behaving correctly.

Anyway, if there are performance regression we should investigate them 
and discuss how we can address/limit them. Similarly for the new vGIC, 
if you think it is too slow, then we need to know why before we get rid 
of the old vGIC.

> 
>> Well, if you re-enable the interrupts you give a chance for higher 
>> priority interrupts to come up. This will not happen if you have 
>> interrupts disabled.
> I understand the theory, but can not match it with the current XEN code.
> Guest interrupts prioritization within do_IRQ is pretty meaningless. 

There are no guest prioritization at the moment. However, we may want to 
introduce it to give priority to one guest over.

Cheers,
Andre Przywara Nov. 22, 2018, 6:04 p.m. UTC | #25
On Thu, 22 Nov 2018 18:51:13 +0200
Andrii Anisov <andrii.anisov@gmail.com> wrote:

Hi,

> On 20.11.18 20:47, Julien Grall wrote:
> > 
> > 
> > On 20/11/2018 18:10, Andrii Anisov wrote:  
> >> Hello Julien,
> >>
> >>
> >> On 19.11.18 18:42, Julien Grall wrote:  
> >>> There are no issue about processing IRQs before the syncs. It is
> >>> the same as if an IRQ was raised from ila different pCPUs.
> >>> So why do you need that?  
> >>
> >>  From my understanding of gic-vgic code (old vgic), for the IRQs 
> >> targeting the `current` vcpu, it leads to a faster processing
> >> under interrupts storm conditions. If it was all LRs set on
> >> previous switch to a guest, a the IRQ will have a chance to go
> >> directly to LR instead of setting on lr_pending queue. Also
> >> inflight_irqs queue have a chance to be shorter to insert.  
> > 
> > Do you have actual numbers?  
> Unfortunately, my numbers are pretty indirect. I'm referring glmark2 
> benchmark results. With this and the rest of my changes (not yet 
> published), I can cut out another percent or two of performance drop
> due to XEN existence in the system. BTW, that's why I recently asked
> Stefano about his approach of interrupt latency measurement.
> 
> On my board that benchmark processing causes at least 4 different HW 
> interrupts issuing with different frequency.

Is that benchmark chosen to put some interrupt load on the system? Or
is that what the customer actually uses and she is really suffering
from Xen's interrupt handling and emulation?
If you chose this benchmark because it tends to trigger a lot of
interrupts and you hope to estimate some other interrupt property with
this (for instance interrupt latency or typical LR utilisation), then
you might be disappointed. This seems to go into the direction of an
interrupt storm, where we really don't care about performance, but just
want to  make sure we keep running and ideally don't penalise other
guests.

> Adding the reschedule
> IRQ makes the system tend to not fit all IRQs into 4 LRs available in
> my GIC. Moreover, the benchmark does not emit a network traffic or
> disk usage during the run. So real life cases will add more
> concurrent IRQs.

That's rather odd. Are you sure you actually have all LRs used? What is
your guest system? Linux? Can you check whether you use EOI mode 1 in
the guest ("GIC: Using split EOI/Deactivate mode" in dmesg, right after
"NR_IRQS: xx, nr_irqs: xx, preallocated irqs: 0")?

Typically Linux EOIs an IRQ very quickly, and with EOI mode 0 you just
have *one* active IRQ. So the other interrupts would be pending, which
means your guest is busy with handling interrupts. Which points to
other problems and might not be a valid benchmark.
Also: what is the exact problem with exceeding the number of hardware
LRs? If you have a high interrupt load (more than three or four
interrupts pending at the same time), chances are you exit anyway quite
often, in which case the LRs get updated. I don't see the huge
advantage of being able to present 8 pending IRQs to the guest.

> > Also to be on the same page, what is your 
> > definition of interrupts storm?  
> I mean the system takes different interrupts (more IRQ sources than
> LRs available) with a relatively high rate. Let's say more than 7000 
> interrupts per second. It's not very big number, but close to what I
> see on my desk.

The frequency of the interrupt (n per second) should be unrelated to
the number of IRQs being presented simultaneously to the guest.
Typically I would assume you have zero interrupts normally, because
your guest is doing actual work (running the OS or userland program).
Then you handle the occasional interrupt (1 LR in active state), and
the timer IRQ fires during this. This lets the guest exit, and the
second LR gets used with the injected pending timer IRQ. Now every now
and then an IPI might also be injected from another VCPU at the same
time, which brings the count up to 3. But all of the time the guest
still handles this first interrupt. And chances are the interrupt
handler sooner or later triggers an (MMIO) exit, at which case the
number of LR becomes irrelevant. A high number of LRs would only be
interesting if you are able to process all those interrupts without a
single exit, typically this is only true for the virtual arch timer IRQ.

Also keep in mind that some instructions here and there in the
interrupt handling path in Xen might not be relevant if you exit the
guest frequently (due to interrupts, for instance). The cost of an exit
will probably dwarf the cost of adding a struct pending_irq to a linked
list or so.

Cheers,
Andre.

> > Bear in mind that the old vGIC will be phased out soon.  
> As I remember a new vgic experimental yet. Do not support GIC-v3 yet.
> 
> > If you are 
> > worried about performance, then I would recommend to try the new
> > vGIC and see whether it improves.  
> You know, we are based on XEN 4.10. Initially, when a customer said 
> about their dissatisfaction about performance drop in benchmark due
> to XEN existence, I tried 4.12-unstable, both an old and a new VGIC.
> So performance with 4.12-unstable with the old VGIC was worse than
> 4.10, and the new VGIC made things even much worse. I can't remember
> the exact numbers or proportions, but that was the reason we do not
> offer upgrading XEN yet.
> 
> > Well, if you re-enable the interrupts you give a chance for higher 
> > priority interrupts to come up. This will not happen if you have 
> > interrupts disabled.  
> I understand the theory, but can not match it with the current XEN
> code. Guest interrupts prioritization within do_IRQ is pretty
> meaningless. They will go through the same path. And an effect would
> not be seen before exiting to a guest.
> The PPI interrupts are reflected into the processing of soft IRQs or 
> injecting an IRQ into queues. So it does not matter much when exactly
> we do read the IRQ from IAR in a gic_interrupt loop. I suppose it
> should be faster to loop through gic_interrupt at once, collecting
> all interrupts, without going through exception path, then switch to
> soft IRQs processing in leave_hypervisor_tail.
> The only thing which might get a noticeable effect here is serving 
> GIC_SGI_CALL_FUNCTION, which is executed right away from
> `gic_interrupt`.
> 
> > But you seem to base your assumption on interrupts storm (yet to be 
> > defined). If you have an interrupt storm, then you are already
> > doomed as your guest/Xen will not have time to do any other work.  
> 
> > 
> > In any case, you need to provide number to support your
> > optimization.I'm moving all my patches to current staging and would
> > send them as RFC   
> with a description of why is it done and how I measured results.
>
Andrii Anisov Nov. 23, 2018, 10:09 a.m. UTC | #26
Hello Julien,

On 22.11.18 19:22, Julien Grall wrote:
> My biggest worry is you are doing optimization on a vGIC that is not 
> fully compliant with how a GIC should behave (e.g edge vs level) and 
> with very fragile locking.
Yep, old VGIC locking looks pretty terrible.

> If you are interested, Andre can provides more details.
Being honest, I'm not fully understand edge vs level problem there. It 
would be good to get better view on it.

> I can't comment without any numbers here. Bear in mind that we fixed 
> bugs in Xen 4.12 (including spectre/meltdown and missing barriers) that 
> wasn't backported to Xen 4.10. It is entirely possible that it 
> introduced slowness but it also ensure the code is behaving correctly.
For sure, I know that. It was rather a political decision.

> Anyway, if there are performance regression we should investigate them 
> and discuss how we can address/limit them. Similarly for the new vGIC, 
> if you think it is too slow, then we need to know why before we get rid 
> of the old vGIC.
Yep.

> There are no guest prioritization at the moment. However, we may want to 
> introduce it to give priority to one guest over.
But still, processing of IRQs by a hypervisor (e.g. moving them from gic 
to vgic for guests IRQs) has a higher priority over any guest execution.
Andre Przywara Nov. 23, 2018, 12:18 p.m. UTC | #27
On Fri, 23 Nov 2018 12:09:41 +0200
Andrii Anisov <andrii.anisov@gmail.com> wrote:

Hi,

> On 22.11.18 19:22, Julien Grall wrote:
> > My biggest worry is you are doing optimization on a vGIC that is
> > not fully compliant with how a GIC should behave (e.g edge vs
> > level) and with very fragile locking.  
> Yep, old VGIC locking looks pretty terrible.
> 
> > If you are interested, Andre can provides more details.  
> Being honest, I'm not fully understand edge vs level problem there.
> It would be good to get better view on it.

Fundamentally there is a semantic difference between edge and level
triggered IRQs:
When the guest has handled an *edge* IRQ (EOIed so the LR's state goes
to 0), this is done and dusted, and Xen doesn't need to care
about this anymore until the next IRQ occurs.

For level triggered IRQs, even though the guest has handled it, we need
to resample the (potentially virtual) IRQ line, as it may come up or
down at the *device*'s discretion: the interrupt reason might have gone
away (GPIO condition no longer true), even before we were able to
inject it, or there might be another interrupt reason not yet handled
(incoming serial character while serving a transmit interrupt). Also
typically it's up to the interrupt handler to confirm handling the
interrupt, either explicitly by clearing an interrupt bit in some
status register or implicitly, for instance by draining a FIFO, say on
a serial device. So even though from the (V)GIC's point of view the
interrupt has been processed (EOIed), it might still be pending.

My intimate "old Xen VGIC" knowledge has been swapped out from my brain
meanwhile, but IIRC Xen treats every IRQ as if it would be an edge IRQ.
Which works if the guest's interrupt handler behaves correctly. Most
IRQ handlers have a loop to iterate over all possible interrupt
reasons and process them, so the line goes indeed down before they EOI
the IRQ.

> > I can't comment without any numbers here. Bear in mind that we
> > fixed bugs in Xen 4.12 (including spectre/meltdown and missing
> > barriers) that wasn't backported to Xen 4.10. It is entirely
> > possible that it introduced slowness but it also ensure the code is
> > behaving correctly.
> For sure, I know that. It was rather a political decision.

Translating "applying a fix for a serious security issue" to "political
decision" is a rather, ummh, interesting way of seeing things ;-)

Cheers,
Andre.

> > Anyway, if there are performance regression we should investigate
> > them and discuss how we can address/limit them. Similarly for the
> > new vGIC, if you think it is too slow, then we need to know why
> > before we get rid of the old vGIC.  
> Yep.
> 
> > There are no guest prioritization at the moment. However, we may
> > want to introduce it to give priority to one guest over.  
> But still, processing of IRQs by a hypervisor (e.g. moving them from
> gic to vgic for guests IRQs) has a higher priority over any guest
> execution.
>
Andrii Anisov Nov. 23, 2018, 12:58 p.m. UTC | #28
Hello Andre,

On 22.11.18 20:04, Andre Przywara wrote:
> 
> Is that benchmark chosen to put some interrupt load on the system? Or
> is that what the customer actually uses and she is really suffering
> from Xen's interrupt handling and emulation?
That it the 3D benchmark used by customer to compare virtualization 
solutions from different providers.

> If you chose this benchmark because it tends to trigger a lot of
> interrupts and you hope to estimate some other interrupt property with
> this (for instance interrupt latency or typical LR utilisation), then
> you might be disappointed. This seems to go into the direction of an
> interrupt storm, where we really don't care about performance, but just
> want to  make sure we keep running and ideally don't penalise other
> guests.
Well, that benchmark itself is rather interrupts oriented (on our HW). 
It emits GPU load, so causes very low CPU load, but lot of intrerupts 
from GPU, video subsytem, display subsystem. I know about the WFI/WFE 
problem and `vwfi=native`. But we can't use it, because our system is 
overcommitted.

> 
>> Adding the reschedule
>> IRQ makes the system tend to not fit all IRQs into 4 LRs available in
>> my GIC. Moreover, the benchmark does not emit a network traffic or
>> disk usage during the run. So real life cases will add more
>> concurrent IRQs.
> 
> That's rather odd. Are you sure you actually have all LRs used?
I have to recheck. 7

> What is your guest system? Linux?
Yep, LK 4.14.35

> Can you check whether you use EOI mode 1 in
> the guest ("GIC: Using split EOI/Deactivate mode" in dmesg, right after
> "NR_IRQS: xx, nr_irqs: xx, preallocated irqs: 0")?
I didn't find such a print in dmesg:
	[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
	[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
	[    0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (virt).


> The frequency of the interrupt (n per second) should be unrelated to
> the number of IRQs being presented simultaneously to the guest.
Actually yes. What could matter here is that we have 4 concurrent HW 
interrupt sources involved in 3D processing.

> Typically I would assume you have zero interrupts normally, because
> your guest is doing actual work (running the OS or userland program).
> Then you handle the occasional interrupt (1 LR in active state), and
> the timer IRQ fires during this. This lets the guest exit, and the
> second LR gets used with the injected pending timer IRQ. Now every now
> and then an IPI might also be injected from another VCPU at the same
> time, which brings the count up to 3. But all of the time the guest
> still handles this first interrupt. And chances are the interrupt
> handler sooner or later triggers an (MMIO) exit, at which case the
> number of LR becomes irrelevant. A high number of LRs would only be
> interesting if you are able to process all those interrupts without a
> single exit, typically this is only true for the virtual arch timer IRQ.
I need some time to sort it out.

> Also keep in mind that some instructions here and there in the
> interrupt handling path in Xen might not be relevant if you exit the
> guest frequently (due to interrupts, for instance). The cost of an exit
> will probably dwarf the cost of adding a struct pending_irq to a linked
> list or so.
It is clear. As we discussed internally, even making IRQ path shorter, 
we may experience the benchmark results drop due to the fact that we are 
doing more context switches from guest instead of collecting those 
interrupts directly from hyp.
Andrii Anisov Nov. 23, 2018, 1:04 p.m. UTC | #29
On 23.11.18 14:18, Andre Przywara wrote:
> Fundamentally there is a semantic difference between edge and level
> triggered IRQs:
> When the guest has handled an *edge* IRQ (EOIed so the LR's state goes
> to 0), this is done and dusted, and Xen doesn't need to care
> about this anymore until the next IRQ occurs.
> 
> For level triggered IRQs, even though the guest has handled it, we need
> to resample the (potentially virtual) IRQ line, as it may come up or
> down at the *device*'s discretion: the interrupt reason might have gone
> away (GPIO condition no longer true), even before we were able to
> inject it, or there might be another interrupt reason not yet handled
> (incoming serial character while serving a transmit interrupt). Also
> typically it's up to the interrupt handler to confirm handling the
> interrupt, either explicitly by clearing an interrupt bit in some
> status register or implicitly, for instance by draining a FIFO, say on
> a serial device. So even though from the (V)GIC's point of view the
> interrupt has been processed (EOIed), it might still be pending.
> 
> My intimate "old Xen VGIC" knowledge has been swapped out from my brain
> meanwhile, but IIRC Xen treats every IRQ as if it would be an edge IRQ.
> Which works if the guest's interrupt handler behaves correctly. Most
> IRQ handlers have a loop to iterate over all possible interrupt
> reasons and process them, so the line goes indeed down before they EOI
> the IRQ.
Thank you for the explanation. I'll read it carefully.
Julien Grall Nov. 23, 2018, 1:27 p.m. UTC | #30
On 23/11/2018 12:58, Andrii Anisov wrote:
> Hello Andre,
> 
> On 22.11.18 20:04, Andre Przywara wrote:
>>
>> Is that benchmark chosen to put some interrupt load on the system? Or
>> is that what the customer actually uses and she is really suffering
>> from Xen's interrupt handling and emulation?
> That it the 3D benchmark used by customer to compare virtualization solutions 
> from different providers.
> 
>> If you chose this benchmark because it tends to trigger a lot of
>> interrupts and you hope to estimate some other interrupt property with
>> this (for instance interrupt latency or typical LR utilisation), then
>> you might be disappointed. This seems to go into the direction of an
>> interrupt storm, where we really don't care about performance, but just
>> want to  make sure we keep running and ideally don't penalise other
>> guests.
> Well, that benchmark itself is rather interrupts oriented (on our HW). It emits 
> GPU load, so causes very low CPU load, but lot of intrerupts from GPU, video 
> subsytem, display subsystem. I know about the WFI/WFE problem and `vwfi=native`. 

The WFI/WFE problem is mostly because of the context switch. It likely can be 
optimized to reduce the overhead.

> But we can't use it, because our system is overcommitted.

Can you describe how overcommitted your system is?

>> Also keep in mind that some instructions here and there in the
>> interrupt handling path in Xen might not be relevant if you exit the
>> guest frequently (due to interrupts, for instance). The cost of an exit
>> will probably dwarf the cost of adding a struct pending_irq to a linked
>> list or so.
> It is clear. As we discussed internally, even making IRQ path shorter, we may 
> experience the benchmark results drop due to the fact that we are doing more 
> context switches from guest instead of collecting those interrupts directly from 
> hyp.

I don't understand what you mean. Can you details it?

Cheers,
Andrii Anisov Nov. 27, 2018, 1:30 p.m. UTC | #31
Hello Julien,

On 23.11.18 15:27, Julien Grall wrote:
>> But we can't use it, because our system is overcommitted.

> 

> Can you describe how overcommitted your system is?

I did mean that we have a requirement to not limit VCPU number with PCPU 
number. Also we should not use cpu pinning. I guess I used a wrong word, 
it must be "oversubscribed".

> I don't understand what you mean. Can you details it?

I hope you do not mind I draw a picture for this. I failed to find a 
simple wording for it :(
At some rate of IRQs from different sources, slight reducing of IRQ 
processing time in the hypervisor might result in an overly slower IRQs 
processing.

So, on the picture attaches, in case2 IRQ processing path in XEN made 
shorter. But when we have IRQ1 and IRQ2 asserted at some rate, we result 
in two context switches and going through the IRQ processing path twice. 
What is longer than catching the IRQ2 right away with IRQ1 in XEN itself.
We come to this idea after observing a strange effect: dropping a bit of 
code from IRQ processing path (i.e. several if's) led to the benchmark 
shown us a bit smaller numbers.

-- 
Sincerely,
Andrii Anisov.
Julien Grall Nov. 27, 2018, 3:13 p.m. UTC | #32
On 11/27/18 1:30 PM, Andrii Anisov wrote:
> Hello Julien,

Hi Andrii,

> 
> On 23.11.18 15:27, Julien Grall wrote:
>>> But we can't use it, because our system is overcommitted.
>>
>> Can you describe how overcommitted your system is?
> I did mean that we have a requirement to not limit VCPU number with PCPU 
> number. Also we should not use cpu pinning. I guess I used a wrong word, 
> it must be "oversubscribed".

Oversubscribing is usually a pretty bad idea if you want to have good 
latency. There are no promise when the vCPU will get scheduled.

Furthermore, if you want to have more vCPU than pCPU, then you still 
need to make sure that the total amount of vCPU usage is always lower 
(or equal) than the total capacity of your pCPUs.

So can you describe how oversubscribed your platform is when doing the 
benchmark?

> 
>> I don't understand what you mean. Can you details it?
> I hope you do not mind I draw a picture for this. I failed to find a 
> simple wording for it :(
> At some rate of IRQs from different sources, slight reducing of IRQ 
> processing time in the hypervisor might result in an overly slower IRQs 
> processing.
> 
> So, on the picture attaches, in case2 IRQ processing path in XEN made 
> shorter. But when we have IRQ1 and IRQ2 asserted at some rate, we result 
> in two context switches and going through the IRQ processing path twice. 
> What is longer than catching the IRQ2 right away with IRQ1 in XEN itself.
> We come to this idea after observing a strange effect: dropping a bit of 
> code from IRQ processing path (i.e. several if's) led to the benchmark 
> shown us a bit smaller numbers.

I think I now understand your problem. The problem is not because of 
re-enabling the interrupt. Instead, it is because the GIC CPU priority 
is been dropped using EOI early (via desc->handler->end()). As soon as 
you drop the priority another interrupt with the same (or lower) 
priority can fire.

Looking at do_IRQ, we don't handle the same way guest IRQ and Xen IRQ.

The steps for Xen IRQ is roughly:
	-> read_irq
	-> local_irq_enable
	-> do_IRQ
	   -> local_irq_enable (via spin_unlock_irq)
	   -> call handlers
	   -> local_irq_disable (via spin_lock_irq)
	   -> EOI
	   -> DIR
	-> local_irq_disable

The steps of for Guest IRQ is roughly:
	-> read_irq
	-> local_irq_enable
	-> do_IRQ
		-> EOI
		-> vgic_inject_irq
			-> local_irq_disable  (via spin_lock_irqsave)
			-> local_irq_enable   (via spin_lock_irqrestore)
	-> local_irq_disable

All vgic_inject_irq is pretty much running with interrupts disabled. The 
Xen IRQ path seem to continue pointless enable/disable.

So I think the following steps should suit you.

Xen IRQ:
	-> read_irq
	-> do_IRQ
		-> local_irq_enable (via spin_unlock_irq)
		-> call handlers
		-> local_irq_disable (via spin_lock_irq)
		-> EOI
		-> DIR
Guest IRQ:
	-> read_irq
	-> local_irq_enable
	-> do_IRQ
		-> EOI
		-> vgic_inject_irq

SGIs seems to be handled with IRQ disabled, so no change there. For 
LPIs, we might want to do the same (needs some investigation).

Cheers,
Stefano Stabellini Nov. 28, 2018, 12:30 a.m. UTC | #33
On Thu, 22 Nov 2018, Julien Grall wrote:
> > > If you are worried about performance, then I would recommend to try the
> > > new vGIC and see whether it improves.
> > You know, we are based on XEN 4.10. Initially, when a customer said about
> > their dissatisfaction about performance drop in benchmark due to XEN
> > existence, I tried 4.12-unstable, both an old and a new VGIC. So performance
> > with 4.12-unstable with the old VGIC was worse than 4.10, and the new VGIC
> > made things even much worse. I can't remember the exact numbers or
> > proportions, but that was the reason we do not offer upgrading XEN yet.
> 
> I can't comment without any numbers here. Bear in mind that we fixed bugs in
> Xen 4.12 (including spectre/meltdown and missing barriers) that wasn't
> backported to Xen 4.10. It is entirely possible that it introduced slowness
> but it also ensure the code is behaving correctly.
> 
> Anyway, if there are performance regression we should investigate them and
> discuss how we can address/limit them. Similarly for the new vGIC, if you
> think it is too slow, then we need to know why before we get rid of the old
> vGIC.

Well said! We care about interrupt performance very much and we
definitely need to address any regressions with either the old or the
new driver. But to do that, we need reliable numbers and to figure out
exactly what the problem is so that we can fix it.

I sent the way I used to measure performance in a separate email.
Andrii Anisov Nov. 30, 2018, 7:52 p.m. UTC | #34
Hello Andre,

Please see my comments below:

On 23.11.18 14:18, Andre Przywara wrote:
> Fundamentally there is a semantic difference between edge and level 
> triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so
> the LR's state goes to 0), this is done and dusted, and Xen doesn't
> need to care about this anymore until the next IRQ occurs.> For level
> triggered IRQs, even though the guest has handled it, we need to
> resample the (potentially virtual) IRQ line, as it may come up or 
> down at the *device*'s discretion: the interrupt reason might have
> gone away (GPIO condition no longer true), even before we were able
> to inject it, or there might be another interrupt reason not yet
> handled (incoming serial character while serving a transmit
> interrupt). Also typically it's up to the interrupt handler to
> confirm handling the interrupt, either explicitly by clearing an
> interrupt bit in some status register or implicitly, for instance by
> draining a FIFO, say on a serial device. So even though from the
> (V)GIC's point of view the interrupt has been processed (EOIed), it
> might still be pending.
So, as I understand the intended behavior of a vGIC for the level
interrupt is following cases:
1. in case the interrupt line is still active from HW side, but
    interrupt handler from VM EOIs the interrupt, it should
    be signaled to vCPU by vGIC again
2. in case a peripheral deactivated interrupt line, but VM did not
    activated it yet, this interrupt should be removed from pending for
    VM

IMO, case 1 is indirectly supported by old vgic. For HW interrupts its 
pretty natural: deactivation by VM in VGIC leads to deactivation in GIC, 
so the interrupt priority is restored and GIC will trap PCPU to reinsert 
it. This will be seen by VM as immediate IRQ trap after EOI. Also
Case 2 is not implemented in the old vgic. It is somehow supported by 
new vgic, though it also relies on the trap to the hypervisor to commit 
the update to LRs. But it's rather a problem of GIC arch/implementation, 
which does not signal CPU nor updates associated LR on level interrupt 
deassertion.

> My intimate "old Xen VGIC" knowledge has been swapped out from my
> brain meanwhile, but IIRC Xen treats every IRQ as if it would be an
> edge IRQ. Which works if the guest's interrupt handler behaves
> correctly. Most IRQ handlers have a loop to iterate over all possible
> interrupt reasons and process them, so the line goes indeed down
> before they EOI the IRQ.

I've spent some time to look through the new vgic implementation and I 
have a note about it:
It's not clear why are you probing level interrupts on guest->hyp 
transition. While it targets case 2 described above, it seems to be more 
relevant to probe the level interrupts right before hyp->guest 
transition. Because vcpu might be descheduled and while it hangs on 
scheduler queues interrupt line level has more chances to be changed by 
peripheral itself.
Also I'm pretty scared of new vgic locking scheme with per-irq locks and 
locking logic i.e. in vgic_queue_irq_unlock() function. Also sorting 
list in vgic_flush_lr_state() with vgic_irq_cmp() looks very expensive.

But, for sure all that stuff performance should be properly evaluated 
and measured.
Andrii Anisov Dec. 3, 2018, 12:08 p.m. UTC | #35
Hello Julien,

On 27.11.18 17:13, Julien Grall wrote:
> Oversubscribing is usually a pretty bad idea if you want to have good 
> latency. There are no promise when the vCPU will get scheduled.
Yes, I know and clearly understand that. But we still have requirements.

> So can you describe how oversubscribed your platform is when doing the 
> benchmark?
On the setup a customer runs, its about 8 vCPU per 4 pCPUs. Under the 
benchmark conditions - system is quite idle.

> I think I now understand your problem. The problem is not because of 
> re-enabling the interrupt. Instead, it is because the GIC CPU priority 
> is been dropped using EOI early (via desc->handler->end()). As soon as 
> you drop the priority another interrupt with the same (or lower) 
> priority can fire.
I understand that.

> Looking at do_IRQ, we don't handle the same way guest IRQ and Xen IRQ.
Yep.

> The steps for Xen IRQ is roughly:
>      -> read_irq
>      -> local_irq_enable
>      -> do_IRQ
>         -> local_irq_enable (via spin_unlock_irq)
>         -> call handlers
>         -> local_irq_disable (via spin_lock_irq)
>         -> EOI
>         -> DIR
>      -> local_irq_disable
> 
> The steps of for Guest IRQ is roughly:
>      -> read_irq
>      -> local_irq_enable
>      -> do_IRQ
>          -> EOI
So here we might have lower priority interrupt fired, getting us back to 
`hyp_irq()` to run `do_trap_irq()` again.

>          -> vgic_inject_irq
>              -> local_irq_disable  (via spin_lock_irqsave)
>              -> local_irq_enable   (via spin_lock_irqrestore)
>      -> local_irq_disable
> 
> All vgic_inject_irq is pretty much running with interrupts disabled. The 
> Xen IRQ path seem to continue pointless enable/disable.
> 
> So I think the following steps should suit you.
Well, do you state they do not suit mainline?

> Xen IRQ:
>      -> read_irq
>      -> do_IRQ
>          -> local_irq_enable (via spin_unlock_irq)
I suppose, isb() should be moved here from `do_trap_irq()` as well
>          -> call handlers
>          -> local_irq_disable (via spin_lock_irq)
>          -> EOI
>          -> DIR
> Guest IRQ:
>      -> read_irq
>      -> local_irq_enable
As I understand, the line above should not be there.
>      -> do_IRQ
>          -> EOI
>          -> vgic_inject_irq
> 
> SGIs seems to be handled with IRQ disabled, so no change there. For 
> LPIs, we might want to do the same (needs some investigation).
Julien Grall Dec. 3, 2018, 12:17 p.m. UTC | #36
Hi Andrii,

On 03/12/2018 12:08, Andrii Anisov wrote:
> On 27.11.18 17:13, Julien Grall wrote:
>> The steps for Xen IRQ is roughly:
>>      -> read_irq
>>      -> local_irq_enable
>>      -> do_IRQ
>>         -> local_irq_enable (via spin_unlock_irq)
>>         -> call handlers
>>         -> local_irq_disable (via spin_lock_irq)
>>         -> EOI
>>         -> DIR
>>      -> local_irq_disable
>>
>> The steps of for Guest IRQ is roughly:
>>      -> read_irq
>>      -> local_irq_enable
>>      -> do_IRQ
>>          -> EOI
> So here we might have lower priority interrupt fired, getting us back to 
> `hyp_irq()` to run `do_trap_irq()` again.
> 
>>          -> vgic_inject_irq
>>              -> local_irq_disable  (via spin_lock_irqsave)
>>              -> local_irq_enable   (via spin_lock_irqrestore)
>>      -> local_irq_disable
>>
>> All vgic_inject_irq is pretty much running with interrupts disabled. The Xen 
>> IRQ path seem to continue pointless enable/disable.
>>
>> So I think the following steps should suit you.
> Well, do you state they do not suit mainline?

No. I meant that I would be happy with that and I think should also suit you.

> 
>> Xen IRQ:
>>      -> read_irq
>>      -> do_IRQ
>>          -> local_irq_enable (via spin_unlock_irq)
> I suppose, isb() should be moved here from `do_trap_irq()` as well

There are no isb() in do_trap_irq(). So did you mean gic_interrupt()?

But then, I am not sure why you want to avoid the isb() in the guest path.

Cheers,
Andrii Anisov Dec. 3, 2018, 12:58 p.m. UTC | #37
On 03.12.18 14:17, Julien Grall wrote:
> No. I meant that I would be happy with that and I think should also suit 
> you.
Great!

> There are no isb() in do_trap_irq(). So did you mean gic_interrupt()?
Right you are.

> But then, I am not sure why you want to avoid the isb() in the guest path.
Well, as I remember, and the commit message says, it is needed to get 
peripheral register to be updated before interrupt handler reads them 
for interrupt handling :)
About guest irqs, we, actually, do not handle them, just rise a 
notification to guest that it needs handling. Thus that synchronization 
is not required in a guest interrupt processing path.
Julien Grall Dec. 3, 2018, 1:10 p.m. UTC | #38
Hi Andrii,

On 03/12/2018 12:58, Andrii Anisov wrote:
> 
> On 03.12.18 14:17, Julien Grall wrote:
>> No. I meant that I would be happy with that and I think should also suit you.
> Great!
> 
>> There are no isb() in do_trap_irq(). So did you mean gic_interrupt()?
> Right you are.
> 
>> But then, I am not sure why you want to avoid the isb() in the guest path.
> Well, as I remember, and the commit message says, it is needed to get peripheral 
> register to be updated before interrupt handler reads them for interrupt 
> handling :)
> About guest irqs, we, actually, do not handle them, just rise a notification to 
> guest that it needs handling. Thus that synchronization is not required in a 
> guest interrupt processing path.

Possibly, but I would prefer to keep the isb() in the current position. It 
catches all the handlers so less risk for missing the isb() in the future. Do 
you see any performance drop?

Cheers,
Andre Przywara Dec. 3, 2018, 1:46 p.m. UTC | #39
On 30/11/2018 19:52, Andrii Anisov wrote:
> Hello Andre,
> 
> Please see my comments below:
> 
> On 23.11.18 14:18, Andre Przywara wrote:
>> Fundamentally there is a semantic difference between edge and level
>> triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so
>> the LR's state goes to 0), this is done and dusted, and Xen doesn't
>> need to care about this anymore until the next IRQ occurs.> For level
>> triggered IRQs, even though the guest has handled it, we need to
>> resample the (potentially virtual) IRQ line, as it may come up or
>> down at the *device*'s discretion: the interrupt reason might have
>> gone away (GPIO condition no longer true), even before we were able
>> to inject it, or there might be another interrupt reason not yet
>> handled (incoming serial character while serving a transmit
>> interrupt). Also typically it's up to the interrupt handler to
>> confirm handling the interrupt, either explicitly by clearing an
>> interrupt bit in some status register or implicitly, for instance by
>> draining a FIFO, say on a serial device. So even though from the
>> (V)GIC's point of view the interrupt has been processed (EOIed), it
>> might still be pending.
> So, as I understand the intended behavior of a vGIC for the level
> interrupt is following cases:
> 1. in case the interrupt line is still active from HW side, but
>    interrupt handler from VM EOIs the interrupt, it should
>    be signaled to vCPU by vGIC again

yes

> 2. in case a peripheral deactivated interrupt line, but VM did not
>    activated it yet, this interrupt should be removed from pending for
>    VM

yes

> IMO, case 1 is indirectly supported by old vgic. For HW interrupts its
> pretty natural: deactivation by VM in VGIC leads to deactivation in
> GIC, so the interrupt priority is restored and GIC will trap PCPU to
> reinsert it. This will be seen by VM as immediate IRQ trap after EOI.

Yes, this is true for hardware interrupts, and this lets the old VGIC
get away with it.
Virtual devices with level interrupt semantics are a different story,
the SBSA UART has some hacks in it to support it properly.

> Also Case 2 is not implemented in the old vgic. It is somehow
> supported by new vgic, though it also relies on the trap to the
> hypervisor to commit the update to LRs.

Yes, and there is not so much we can do about it. But that's not a real
problem, as you have this problem in bare metal, too.

> But it's rather a problem of
> GIC arch/implementation, which does not signal CPU nor updates
> associated LR on level interrupt deassertion.
> 
>> My intimate "old Xen VGIC" knowledge has been swapped out from my
>> brain meanwhile, but IIRC Xen treats every IRQ as if it would be an
>> edge IRQ. Which works if the guest's interrupt handler behaves
>> correctly. Most IRQ handlers have a loop to iterate over all possible
>> interrupt reasons and process them, so the line goes indeed down
>> before they EOI the IRQ.
> 
> I've spent some time to look through the new vgic implementation and I
> have a note about it:
> It's not clear why are you probing level interrupts on guest->hyp
> transition. While it targets case 2 described above, it seems to be
> more relevant to probe the level interrupts right before hyp->guest
> transition. Because vcpu might be descheduled and while it hangs on
> scheduler queues interrupt line level has more chances to be changed
> by peripheral itself.
> Also I'm pretty scared of new vgic locking scheme with per-irq locks
> and locking logic i.e. in vgic_queue_irq_unlock() function.

Well, you should be scared of the old VGIC locking scheme instead ;-)
Apart from the vgic_queue_irq_unlock() function, the rest of the new
locking scheme is much clearer. And it scales much better, as we have
per-IRQ locks, which should virtually never be contended, and per-CPU
locks, which are very rarely contended only, as it's mostly only taken
by its own VCPU during entry and exit. There is no per-domain lock for
the emulation anymore.
I see that it's tempting to have an "easy" locking scheme, but that
typically is either one-lock-to-rule-them-all, which doesn't scale, or
doesn't cover corner cases.
You should always prefer correctness over performance, otherwise you
just fail faster ;-)

> Also sorting
> list in vgic_flush_lr_state() with vgic_irq_cmp() looks very
> expensive.

Yes, but effectively this virtually never happens, as you have rarely
more than four pending IRQs at the same time.
I had patches lying around to improve this part, just never got around
to post, especially with only little rationale.
If you are interested, I can dig them out, though I am not sure how
relevant this is.

> But, for sure all that stuff performance should be properly evaluated
> and measured.

Indeed. Can you hack something into Xen to get some statistics on those
cases? I am not sure if Xen has something to trace lock contention, but
you could easily add some counters to track how many LRs we actually
use, to see if this is actually a problem in your case.
I think PERFCOUNTER is your friend.

Cheers,
Andre.
Jürgen Groß Dec. 3, 2018, 1:53 p.m. UTC | #40
On 03/12/2018 14:46, Andre Przywara wrote:
> On 30/11/2018 19:52, Andrii Anisov wrote:
>> Hello Andre,
>>
>> Please see my comments below:
>>
>> On 23.11.18 14:18, Andre Przywara wrote:
>>> Fundamentally there is a semantic difference between edge and level
>>> triggered IRQs: When the guest has handled an *edge* IRQ (EOIed so
>>> the LR's state goes to 0), this is done and dusted, and Xen doesn't
>>> need to care about this anymore until the next IRQ occurs.> For level
>>> triggered IRQs, even though the guest has handled it, we need to
>>> resample the (potentially virtual) IRQ line, as it may come up or
>>> down at the *device*'s discretion: the interrupt reason might have
>>> gone away (GPIO condition no longer true), even before we were able
>>> to inject it, or there might be another interrupt reason not yet
>>> handled (incoming serial character while serving a transmit
>>> interrupt). Also typically it's up to the interrupt handler to
>>> confirm handling the interrupt, either explicitly by clearing an
>>> interrupt bit in some status register or implicitly, for instance by
>>> draining a FIFO, say on a serial device. So even though from the
>>> (V)GIC's point of view the interrupt has been processed (EOIed), it
>>> might still be pending.
>> So, as I understand the intended behavior of a vGIC for the level
>> interrupt is following cases:
>> 1. in case the interrupt line is still active from HW side, but
>>    interrupt handler from VM EOIs the interrupt, it should
>>    be signaled to vCPU by vGIC again
> 
> yes
> 
>> 2. in case a peripheral deactivated interrupt line, but VM did not
>>    activated it yet, this interrupt should be removed from pending for
>>    VM
> 
> yes
> 
>> IMO, case 1 is indirectly supported by old vgic. For HW interrupts its
>> pretty natural: deactivation by VM in VGIC leads to deactivation in
>> GIC, so the interrupt priority is restored and GIC will trap PCPU to
>> reinsert it. This will be seen by VM as immediate IRQ trap after EOI.
> 
> Yes, this is true for hardware interrupts, and this lets the old VGIC
> get away with it.
> Virtual devices with level interrupt semantics are a different story,
> the SBSA UART has some hacks in it to support it properly.
> 
>> Also Case 2 is not implemented in the old vgic. It is somehow
>> supported by new vgic, though it also relies on the trap to the
>> hypervisor to commit the update to LRs.
> 
> Yes, and there is not so much we can do about it. But that's not a real
> problem, as you have this problem in bare metal, too.
> 
>> But it's rather a problem of
>> GIC arch/implementation, which does not signal CPU nor updates
>> associated LR on level interrupt deassertion.
>>
>>> My intimate "old Xen VGIC" knowledge has been swapped out from my
>>> brain meanwhile, but IIRC Xen treats every IRQ as if it would be an
>>> edge IRQ. Which works if the guest's interrupt handler behaves
>>> correctly. Most IRQ handlers have a loop to iterate over all possible
>>> interrupt reasons and process them, so the line goes indeed down
>>> before they EOI the IRQ.
>>
>> I've spent some time to look through the new vgic implementation and I
>> have a note about it:
>> It's not clear why are you probing level interrupts on guest->hyp
>> transition. While it targets case 2 described above, it seems to be
>> more relevant to probe the level interrupts right before hyp->guest
>> transition. Because vcpu might be descheduled and while it hangs on
>> scheduler queues interrupt line level has more chances to be changed
>> by peripheral itself.
>> Also I'm pretty scared of new vgic locking scheme with per-irq locks
>> and locking logic i.e. in vgic_queue_irq_unlock() function.
> 
> Well, you should be scared of the old VGIC locking scheme instead ;-)
> Apart from the vgic_queue_irq_unlock() function, the rest of the new
> locking scheme is much clearer. And it scales much better, as we have
> per-IRQ locks, which should virtually never be contended, and per-CPU
> locks, which are very rarely contended only, as it's mostly only taken
> by its own VCPU during entry and exit. There is no per-domain lock for
> the emulation anymore.
> I see that it's tempting to have an "easy" locking scheme, but that
> typically is either one-lock-to-rule-them-all, which doesn't scale, or
> doesn't cover corner cases.
> You should always prefer correctness over performance, otherwise you
> just fail faster ;-)
> 
>> Also sorting
>> list in vgic_flush_lr_state() with vgic_irq_cmp() looks very
>> expensive.
> 
> Yes, but effectively this virtually never happens, as you have rarely
> more than four pending IRQs at the same time.
> I had patches lying around to improve this part, just never got around
> to post, especially with only little rationale.
> If you are interested, I can dig them out, though I am not sure how
> relevant this is.
> 
>> But, for sure all that stuff performance should be properly evaluated
>> and measured.
> 
> Indeed. Can you hack something into Xen to get some statistics on those
> cases? I am not sure if Xen has something to trace lock contention, but
> you could easily add some counters to track how many LRs we actually
> use, to see if this is actually a problem in your case.
> I think PERFCOUNTER is your friend.

CONFIG_LOCK_PROFILE and xen-lockprof on tools side?

Not sure it is still working, though. Its about 9 years since I wrote
and used it.


Juergen
Andrii Anisov Dec. 3, 2018, 2:36 p.m. UTC | #41
Hello Juergen,

On 03.12.18 15:53, Juergen Gross wrote:
> On 03/12/2018 14:46, Andre Przywara wrote:
>> I think PERFCOUNTER is your friend.
> 
> CONFIG_LOCK_PROFILE and xen-lockprof on tools side?
> 
> Not sure it is still working, though. Its about 9 years since I wrote
> and used it.
It does work. I've used it recently for this theme.
But it gave me unclear results, I did not match them with what I got 
from xentrace and rough lauterbach traces.
Andrii Anisov Dec. 3, 2018, 3:28 p.m. UTC | #42
Hello Andre,

On 03.12.18 15:46, Andre Przywara wrote:
> Well, you should be scared of the old VGIC locking scheme instead ;-)
Old VGIC locking is more mazy, indeed.

> Apart from the vgic_queue_irq_unlock() function, the rest of the new
> locking scheme is much clearer.
I agree,

> Yes, but effectively this virtually never happens, as you have rarely
> more than four pending IRQs at the same time.
I've checked that. Just put a perfcounter there. Surprisingly it happens 
few times per run under my simplified conditions. Should check them with 
more complex multimedia use-cases on a different setup.

> I had patches lying around to improve this part, just never got around
> to post, especially with only little rationale.
> If you are interested, I can dig them out, though I am not sure how
> relevant this is.
I'm interested, for sure. I'm pretty sure we will need them when we have 
multimedia scenarios on the table.
Julien Grall Dec. 4, 2018, 5:16 p.m. UTC | #43
Hi Andrii,

On 12/3/18 2:36 PM, Andrii Anisov wrote:
> On 03.12.18 15:53, Juergen Gross wrote:
>> On 03/12/2018 14:46, Andre Przywara wrote:
>>> I think PERFCOUNTER is your friend.
>>
>> CONFIG_LOCK_PROFILE and xen-lockprof on tools side?
>>
>> Not sure it is still working, though. Its about 9 years since I wrote
>> and used it.
> It does work. I've used it recently for this theme.
> But it gave me unclear results, I did not match them with what I got 
> from xentrace and rough lauterbach traces.

Can you expand it? What are the differences between the 3?

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 8d7e491060..305fbd66dd 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -388,12 +388,14 @@  void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
         if ( likely(irq >= 16 && irq < 1020) )
         {
             local_irq_enable();
+            isb();
             do_IRQ(regs, irq, is_fiq);
             local_irq_disable();
         }
         else if ( is_lpi(irq) )
         {
             local_irq_enable();
+            isb();
             gic_hw_ops->do_LPI(irq);
             local_irq_disable();
         }