diff mbox

hw/arm/virt: Add always-on property to the virt board timer

Message ID 1453204158-11412-1-git-send-email-christoffer.dall@linaro.org
State Superseded
Headers show

Commit Message

Christoffer Dall Jan. 19, 2016, 11:49 a.m. UTC
The virt board has an arch timer, which is always on.  Emit the
"always-on" property to indicate to Linux that it can switch off the
periodic timer and reduces the amount of interrupts injected into a
guest.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

---
 hw/arm/virt.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.1.2.330.g565301e.dirty

Comments

Andrew Jones Jan. 19, 2016, 12:37 p.m. UTC | #1
On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:
> The virt board has an arch timer, which is always on.  Emit the

> "always-on" property to indicate to Linux that it can switch off the

> periodic timer and reduces the amount of interrupts injected into a

> guest.

> 

> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---

>  hw/arm/virt.c | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/hw/arm/virt.c b/hw/arm/virt.c

> index 05f9087..265fe9a 100644

> --- a/hw/arm/virt.c

> +++ b/hw/arm/virt.c

> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)

>          qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",

>                                  "arm,armv7-timer");

>      }

> +    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);

>      qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",

>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,

>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,

> -- 

> 2.1.2.330.g565301e.dirty

> 

>


Hi Christoffer,

We should also patch the ACPI generation at the same time. I think
something like

 - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;
 + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;

should do it.

Also, having the guest reduce the number of interrupts sounds good. Can
you point me to something to read about how/why a guest may choose to do
that, and what the trade-offs are?

Thanks,
drew
Christoffer Dall Jan. 19, 2016, 12:43 p.m. UTC | #2
On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:
> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:

> > The virt board has an arch timer, which is always on.  Emit the

> > "always-on" property to indicate to Linux that it can switch off the

> > periodic timer and reduces the amount of interrupts injected into a

> > guest.

> > 

> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> > ---

> >  hw/arm/virt.c | 1 +

> >  1 file changed, 1 insertion(+)

> > 

> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c

> > index 05f9087..265fe9a 100644

> > --- a/hw/arm/virt.c

> > +++ b/hw/arm/virt.c

> > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)

> >          qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",

> >                                  "arm,armv7-timer");

> >      }

> > +    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);

> >      qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",

> >                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,

> >                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,

> > -- 

> > 2.1.2.330.g565301e.dirty

> > 

> >

> 

> Hi Christoffer,

> 

> We should also patch the ACPI generation at the same time. I think

> something like

> 

>  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;

>  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;


I'm really not familiar enough with ACPI to be comfortable writing code
for this or testing this.

But if someone can pick this up and add the ACPI bits or can post a
follow-up patch, then I'm all for it :)

> 

> should do it.

> 

> Also, having the guest reduce the number of interrupts sounds good. Can

> you point me to something to read about how/why a guest may choose to do

> that, and what the trade-offs are?

> 

Not really, but you can ask Marc.

-Christoffer
Andrew Jones Jan. 19, 2016, 1:32 p.m. UTC | #3
On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote:
> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> > On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:

> > > The virt board has an arch timer, which is always on.  Emit the

> > > "always-on" property to indicate to Linux that it can switch off the

> > > periodic timer and reduces the amount of interrupts injected into a

> > > guest.

> > > 

> > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> > > ---

> > >  hw/arm/virt.c | 1 +

> > >  1 file changed, 1 insertion(+)

> > > 

> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c

> > > index 05f9087..265fe9a 100644

> > > --- a/hw/arm/virt.c

> > > +++ b/hw/arm/virt.c

> > > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)

> > >          qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",

> > >                                  "arm,armv7-timer");

> > >      }

> > > +    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);

> > >      qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",

> > >                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,

> > >                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,

> > > -- 

> > > 2.1.2.330.g565301e.dirty

> > > 

> > >

> > 

> > Hi Christoffer,

> > 

> > We should also patch the ACPI generation at the same time. I think

> > something like

> > 

> >  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;

> >  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;

> 

> I'm really not familiar enough with ACPI to be comfortable writing code

> for this or testing this.

> 

> But if someone can pick this up and add the ACPI bits or can post a

> follow-up patch, then I'm all for it :)


I can post a follow-up patch.

> 

> > 

> > should do it.

> > 

> > Also, having the guest reduce the number of interrupts sounds good. Can

> > you point me to something to read about how/why a guest may choose to do

> > that, and what the trade-offs are?

> > 

> Not really, but you can ask Marc.


OK, CCing him. One thing I see is that without this change we're
currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though
it's not true. Having that set may disable the oneshot capabilityj
necessary to switch to nohz mode? I'll just stop there with my
speculation though, so Marc won't have to correct too much...

drew
Andrew Jones Jan. 19, 2016, 2:07 p.m. UTC | #4
On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote:
> On 19/01/16 13:32, Andrew Jones wrote:

> > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote:

> >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:

> >>>> The virt board has an arch timer, which is always on.  Emit the

> >>>> "always-on" property to indicate to Linux that it can switch off the

> >>>> periodic timer and reduces the amount of interrupts injected into a

> >>>> guest.

> >>>>

> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> >>>> ---

> >>>>  hw/arm/virt.c | 1 +

> >>>>  1 file changed, 1 insertion(+)

> >>>>

> >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c

> >>>> index 05f9087..265fe9a 100644

> >>>> --- a/hw/arm/virt.c

> >>>> +++ b/hw/arm/virt.c

> >>>> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)

> >>>>          qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",

> >>>>                                  "arm,armv7-timer");

> >>>>      }

> >>>> +    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);

> >>>>      qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",

> >>>>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,

> >>>>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,

> >>>> -- 

> >>>> 2.1.2.330.g565301e.dirty

> >>>>

> >>>>

> >>>

> >>> Hi Christoffer,

> >>>

> >>> We should also patch the ACPI generation at the same time. I think

> >>> something like

> >>>

> >>>  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;

> >>>  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;

> >>

> >> I'm really not familiar enough with ACPI to be comfortable writing code

> >> for this or testing this.

> >>

> >> But if someone can pick this up and add the ACPI bits or can post a

> >> follow-up patch, then I'm all for it :)

> > 

> > I can post a follow-up patch.

> > 

> >>

> >>>

> >>> should do it.

> >>>

> >>> Also, having the guest reduce the number of interrupts sounds good. Can

> >>> you point me to something to read about how/why a guest may choose to do

> >>> that, and what the trade-offs are?

> >>>

> >> Not really, but you can ask Marc.

> > 

> > OK, CCing him. One thing I see is that without this change we're

> > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though

> > it's not true. Having that set may disable the oneshot capabilityj

> > necessary to switch to nohz mode? I'll just stop there with my

> > speculation though, so Marc won't have to correct too much...

> 

> You're spot on. See 82a5619 in the kernel tree. When I did a similar

> change in kvmtool, I saw a massive reduction in the number of timer

> interrupts injected (specially when the number of vcpu is relatively high).

> 

> This also have interesting benefits when running on a model, where

> you're trying to squeeze the last bits of "performance" from the monster...

> 

> Thanks,


Thanks Marc! Christoffer, I'll create the ACPI patch, and do some pre-/post-
tracing to confirm the happy reduction in interrupts :-)

Also, as for this patch,

Reviewed-by: Andrew Jones <drjones@redhat.com>


drew
Peter Maydell Jan. 19, 2016, 3:21 p.m. UTC | #5
On 19 January 2016 at 11:49, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> The virt board has an arch timer, which is always on.  Emit the

> "always-on" property to indicate to Linux that it can switch off the

> periodic timer and reduces the amount of interrupts injected into a

> guest.

>

> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>




Applied to target-arm.next, thanks.

-- PMM
Andrew Jones Jan. 19, 2016, 6:48 p.m. UTC | #6
On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote:
> On 19/01/16 13:32, Andrew Jones wrote:

> > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote:

> >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote:

> >>>> The virt board has an arch timer, which is always on.  Emit the

> >>>> "always-on" property to indicate to Linux that it can switch off the

> >>>> periodic timer and reduces the amount of interrupts injected into a

> >>>> guest.

> >>>>

> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

> >>>> ---

> >>>>  hw/arm/virt.c | 1 +

> >>>>  1 file changed, 1 insertion(+)

> >>>>

> >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c

> >>>> index 05f9087..265fe9a 100644

> >>>> --- a/hw/arm/virt.c

> >>>> +++ b/hw/arm/virt.c

> >>>> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)

> >>>>          qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",

> >>>>                                  "arm,armv7-timer");

> >>>>      }

> >>>> +    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);

> >>>>      qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",

> >>>>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,

> >>>>                         GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,

> >>>> -- 

> >>>> 2.1.2.330.g565301e.dirty

> >>>>

> >>>>

> >>>

> >>> Hi Christoffer,

> >>>

> >>> We should also patch the ACPI generation at the same time. I think

> >>> something like

> >>>

> >>>  - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE;

> >>>  + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON;

> >>

> >> I'm really not familiar enough with ACPI to be comfortable writing code

> >> for this or testing this.

> >>

> >> But if someone can pick this up and add the ACPI bits or can post a

> >> follow-up patch, then I'm all for it :)

> > 

> > I can post a follow-up patch.

> > 

> >>

> >>>

> >>> should do it.

> >>>

> >>> Also, having the guest reduce the number of interrupts sounds good. Can

> >>> you point me to something to read about how/why a guest may choose to do

> >>> that, and what the trade-offs are?

> >>>

> >> Not really, but you can ask Marc.

> > 

> > OK, CCing him. One thing I see is that without this change we're

> > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though

> > it's not true. Having that set may disable the oneshot capabilityj

> > necessary to switch to nohz mode? I'll just stop there with my

> > speculation though, so Marc won't have to correct too much...

> 

> You're spot on. See 82a5619 in the kernel tree. When I did a similar

> change in kvmtool, I saw a massive reduction in the number of timer

> interrupts injected (specially when the number of vcpu is relatively high).

> 

> This also have interesting benefits when running on a model, where

> you're trying to squeeze the last bits of "performance" from the monster...

>


Hmm, I'm probably testing this wrong, but I don't see any difference in
the number of injected timer interrupts. My guest, which I boot with
UEFI, has 

CONFIG_ARM_ARCH_TIMER=y
CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y
CONFIG_ARM_TIMER_SP804=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000

I've boot a guest using DT with and without this patch

---WITHOUT---

# ls /proc/device-tree/timer
compatible  interrupts  name
# cat /proc/interrupts                  
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7
  3:       6958       5766       5166       5187       5576       5129 4695       4398       GIC  27 Edge      arch_timer
# sleep 120 && cat /proc/interrupts                  
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7
  3:       7557       5986       5487       5265       6232       5868 5464       4438       GIC  27 Edge      arch_timer

---WITH---

# ls /proc/device-tree/timer
always-on  compatible  interrupts  name
# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7
  3:       7005       6080       4996       5391       5165       5257 4930       4844       GIC  27 Edge      arch_timer
# sleep 120 && cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7
  3:       7523       6505       5264       6717       5273       5391 5526       4901       GIC  27 Edge      arch_timer



And kvm trace data has

---WITHOUT---
$ grep kvm_timer_update_irq trace.out | wc -l
94336
---WITH---
$ grep kvm_timer_update_irq trace.out | wc -l
95838


Any suggestions?

Thanks,
drew
Andrew Jones Jan. 20, 2016, 2:01 p.m. UTC | #7
On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote:
> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote:

> > >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> > > OK, CCing him. One thing I see is that without this change we're

> > > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though

> > > it's not true. Having that set may disable the oneshot capabilityj

> > > necessary to switch to nohz mode? I'll just stop there with my

> > > speculation though, so Marc won't have to correct too much...

> > 

> > You're spot on. See 82a5619 in the kernel tree. When I did a similar

> > change in kvmtool, I saw a massive reduction in the number of timer

> > interrupts injected (specially when the number of vcpu is relatively high).

> > 

> > This also have interesting benefits when running on a model, where

> > you're trying to squeeze the last bits of "performance" from the monster...

> >

> 

> Hmm, I'm probably testing this wrong, but I don't see any difference in

> the number of injected timer interrupts. My guest, which I boot with

> UEFI, has 

> 

> CONFIG_ARM_ARCH_TIMER=y

> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y

> CONFIG_ARM_TIMER_SP804=y

> CONFIG_HIGH_RES_TIMERS=y

> CONFIG_TICK_ONESHOT=y

> CONFIG_NO_HZ_COMMON=y

> # CONFIG_HZ_PERIODIC is not set

> CONFIG_NO_HZ_IDLE=y

> # CONFIG_NO_HZ_FULL is not set

> CONFIG_NO_HZ=y

> CONFIG_HZ_1000=y

> CONFIG_HZ=1000

> 

> I've boot a guest using DT with and without this patch

> 

> ---WITHOUT---

> 

> # ls /proc/device-tree/timer

> compatible  interrupts  name

> # cat /proc/interrupts                  

>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

>   3:       6958       5766       5166       5187       5576       5129 4695       4398       GIC  27 Edge      arch_timer

> # sleep 120 && cat /proc/interrupts                  

>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

>   3:       7557       5986       5487       5265       6232       5868 5464       4438       GIC  27 Edge      arch_timer

> 

> ---WITH---

> 

> # ls /proc/device-tree/timer

> always-on  compatible  interrupts  name

> # cat /proc/interrupts 

>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

>   3:       7005       6080       4996       5391       5165       5257 4930       4844       GIC  27 Edge      arch_timer

> # sleep 120 && cat /proc/interrupts 

>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

>   3:       7523       6505       5264       6717       5273       5391 5526       4901       GIC  27 Edge      arch_timer

> 

> 

> 

> And kvm trace data has

> 

> ---WITHOUT---

> $ grep kvm_timer_update_irq trace.out | wc -l

> 94336

> ---WITH---

> $ grep kvm_timer_update_irq trace.out | wc -l

> 95838

> 

>


Must be how I'm looking, because I just tried kvmtool with/without
Marc's patch that adds always-on, but don't see any reduction of
interrupts there either. I used a defconfig guest kernel. Also,
not that I think it should matter, but my host kernel is 4.4-rc4
based.

I'd like to be able to see a difference with/without this always-on
patch, not because I don't think we should take it anyway, but because
I need a test case for the ACPI counterpart.

Thanks,
drew
Andrew Jones Jan. 20, 2016, 3:06 p.m. UTC | #8
On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote:
> On 20/01/16 14:01, Andrew Jones wrote:

> > On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote:

> >> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote:

> >>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> >>>> OK, CCing him. One thing I see is that without this change we're

> >>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though

> >>>> it's not true. Having that set may disable the oneshot capabilityj

> >>>> necessary to switch to nohz mode? I'll just stop there with my

> >>>> speculation though, so Marc won't have to correct too much...

> >>>

> >>> You're spot on. See 82a5619 in the kernel tree. When I did a similar

> >>> change in kvmtool, I saw a massive reduction in the number of timer

> >>> interrupts injected (specially when the number of vcpu is relatively high).

> >>>

> >>> This also have interesting benefits when running on a model, where

> >>> you're trying to squeeze the last bits of "performance" from the monster...

> >>>

> >>

> >> Hmm, I'm probably testing this wrong, but I don't see any difference in

> >> the number of injected timer interrupts. My guest, which I boot with

> >> UEFI, has 

> >>

> >> CONFIG_ARM_ARCH_TIMER=y

> >> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y

> >> CONFIG_ARM_TIMER_SP804=y

> >> CONFIG_HIGH_RES_TIMERS=y

> >> CONFIG_TICK_ONESHOT=y

> >> CONFIG_NO_HZ_COMMON=y

> >> # CONFIG_HZ_PERIODIC is not set

> >> CONFIG_NO_HZ_IDLE=y

> >> # CONFIG_NO_HZ_FULL is not set

> >> CONFIG_NO_HZ=y

> >> CONFIG_HZ_1000=y

> >> CONFIG_HZ=1000

> >>

> >> I've boot a guest using DT with and without this patch

> >>

> >> ---WITHOUT---

> >>

> >> # ls /proc/device-tree/timer

> >> compatible  interrupts  name

> >> # cat /proc/interrupts                  

> >>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>   3:       6958       5766       5166       5187       5576       5129 4695       4398       GIC  27 Edge      arch_timer

> >> # sleep 120 && cat /proc/interrupts                  

> >>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>   3:       7557       5986       5487       5265       6232       5868 5464       4438       GIC  27 Edge      arch_timer

> >>

> >> ---WITH---

> >>

> >> # ls /proc/device-tree/timer

> >> always-on  compatible  interrupts  name

> >> # cat /proc/interrupts 

> >>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>   3:       7005       6080       4996       5391       5165       5257 4930       4844       GIC  27 Edge      arch_timer

> >> # sleep 120 && cat /proc/interrupts 

> >>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>   3:       7523       6505       5264       6717       5273       5391 5526       4901       GIC  27 Edge      arch_timer

> >>

> >>

> >>

> >> And kvm trace data has

> >>

> >> ---WITHOUT---

> >> $ grep kvm_timer_update_irq trace.out | wc -l

> >> 94336

> >> ---WITH---

> >> $ grep kvm_timer_update_irq trace.out | wc -l

> >> 95838

> >>

> >>

> > 

> > Must be how I'm looking, because I just tried kvmtool with/without

> > Marc's patch that adds always-on, but don't see any reduction of

> > interrupts there either. I used a defconfig guest kernel. Also,

> > not that I think it should matter, but my host kernel is 4.4-rc4

> > based.

> > 

> > I'd like to be able to see a difference with/without this always-on

> > patch, not because I don't think we should take it anyway, but because

> > I need a test case for the ACPI counterpart.

> 

> I just run a couple of quick tests, measuring interrupt rate (vmstat 1)

> on the host, with one VM (2 vcpus) idling, and I'm seeing the following

> thing:

> 

> Without "always-on": ~380 interrupts per second

> With "always-on": ~40 interrupts per second

> 

> This is with kvmtool, 32bit host (but none of that is arch specific anyway).

>


For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the
following.

Without "always-on": mean=56.370 sd=33.404 min=1 max=244
With "always-on":    mean=51.580 sd=33.361 min=1 max=273

I'm also using kvmtool, and my guest is idle.

So a difference between 32 and 64bit hosts? Again, my guest config is
now just a defconfig. My host config is not, but I'm not sure what
options to look for other than what I wrote above, which are the same
for my host.

Thanks,
drew
Andrew Jones Jan. 20, 2016, 4:47 p.m. UTC | #9
On Wed, Jan 20, 2016 at 04:20:03PM +0000, Marc Zyngier wrote:
> On 20/01/16 15:06, Andrew Jones wrote:

> > On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote:

> >> On 20/01/16 14:01, Andrew Jones wrote:

> >>> On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote:

> >>>> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote:

> >>>>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote:

> >>>>>> OK, CCing him. One thing I see is that without this change we're

> >>>>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though

> >>>>>> it's not true. Having that set may disable the oneshot capabilityj

> >>>>>> necessary to switch to nohz mode? I'll just stop there with my

> >>>>>> speculation though, so Marc won't have to correct too much...

> >>>>>

> >>>>> You're spot on. See 82a5619 in the kernel tree. When I did a similar

> >>>>> change in kvmtool, I saw a massive reduction in the number of timer

> >>>>> interrupts injected (specially when the number of vcpu is relatively high).

> >>>>>

> >>>>> This also have interesting benefits when running on a model, where

> >>>>> you're trying to squeeze the last bits of "performance" from the monster...

> >>>>>

> >>>>

> >>>> Hmm, I'm probably testing this wrong, but I don't see any difference in

> >>>> the number of injected timer interrupts. My guest, which I boot with

> >>>> UEFI, has 

> >>>>

> >>>> CONFIG_ARM_ARCH_TIMER=y

> >>>> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y

> >>>> CONFIG_ARM_TIMER_SP804=y

> >>>> CONFIG_HIGH_RES_TIMERS=y

> >>>> CONFIG_TICK_ONESHOT=y

> >>>> CONFIG_NO_HZ_COMMON=y

> >>>> # CONFIG_HZ_PERIODIC is not set

> >>>> CONFIG_NO_HZ_IDLE=y

> >>>> # CONFIG_NO_HZ_FULL is not set

> >>>> CONFIG_NO_HZ=y

> >>>> CONFIG_HZ_1000=y

> >>>> CONFIG_HZ=1000

> >>>>

> >>>> I've boot a guest using DT with and without this patch

> >>>>

> >>>> ---WITHOUT---

> >>>>

> >>>> # ls /proc/device-tree/timer

> >>>> compatible  interrupts  name

> >>>> # cat /proc/interrupts                  

> >>>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>>>   3:       6958       5766       5166       5187       5576       5129 4695       4398       GIC  27 Edge      arch_timer

> >>>> # sleep 120 && cat /proc/interrupts                  

> >>>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>>>   3:       7557       5986       5487       5265       6232       5868 5464       4438       GIC  27 Edge      arch_timer

> >>>>

> >>>> ---WITH---

> >>>>

> >>>> # ls /proc/device-tree/timer

> >>>> always-on  compatible  interrupts  name

> >>>> # cat /proc/interrupts 

> >>>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>>>   3:       7005       6080       4996       5391       5165       5257 4930       4844       GIC  27 Edge      arch_timer

> >>>> # sleep 120 && cat /proc/interrupts 

> >>>>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5 CPU6       CPU7

> >>>>   3:       7523       6505       5264       6717       5273       5391 5526       4901       GIC  27 Edge      arch_timer

> >>>>

> >>>>

> >>>>

> >>>> And kvm trace data has

> >>>>

> >>>> ---WITHOUT---

> >>>> $ grep kvm_timer_update_irq trace.out | wc -l

> >>>> 94336

> >>>> ---WITH---

> >>>> $ grep kvm_timer_update_irq trace.out | wc -l

> >>>> 95838

> >>>>

> >>>>

> >>>

> >>> Must be how I'm looking, because I just tried kvmtool with/without

> >>> Marc's patch that adds always-on, but don't see any reduction of

> >>> interrupts there either. I used a defconfig guest kernel. Also,

> >>> not that I think it should matter, but my host kernel is 4.4-rc4

> >>> based.

> >>>

> >>> I'd like to be able to see a difference with/without this always-on

> >>> patch, not because I don't think we should take it anyway, but because

> >>> I need a test case for the ACPI counterpart.

> >>

> >> I just run a couple of quick tests, measuring interrupt rate (vmstat 1)

> >> on the host, with one VM (2 vcpus) idling, and I'm seeing the following

> >> thing:

> >>

> >> Without "always-on": ~380 interrupts per second

> >> With "always-on": ~40 interrupts per second

> >>

> >> This is with kvmtool, 32bit host (but none of that is arch specific anyway).

> >>

> > 

> > For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the

> > following.

> > 

> > Without "always-on": mean=56.370 sd=33.404 min=1 max=244

> > With "always-on":    mean=51.580 sd=33.361 min=1 max=273

> > 

> > I'm also using kvmtool, and my guest is idle.

> > 

> > So a difference between 32 and 64bit hosts? Again, my guest config is

> > now just a defconfig. My host config is not, but I'm not sure what

> > options to look for other than what I wrote above, which are the same

> > for my host.

> 

> Just tried on Seattle with a 64bit guest, and there is hardly any

> difference indeed. Both host and guest are "mostly" defconfig as well.

> So there is a kernel configuration difference.

> 

> Running my 32bit guest on a 64bit host definitely shows a massive

> difference (with 8 vcpus):

> 

> Without "always-on": ~1200 interrupts per second

> With "always-on": ~50 interrupts per second

> 

> [Head scratching, poking Mark]

> 

> Right, I now know what is going on: The arm64 kernel uses

> tick_setup_hrtimer_broadcast() so that it can still use the arch timer

> as a broadcast timer (forcing one CPU to remain on), while the 32bit

> kernel relies on the presence of a backup timer (sp804 anyone?) or the

> guarantee that the timer cannot go away (always-on).

> 

> This is probably why I'm seeing such a gain with a 32bit guest, and none

> with a 64bit guest (the kernel already does the right thing). As to why

> there is such a difference between the two architectures, this is a

> story for another day...

>


Thanks Marc! I just confirmed with an AArch32 guest using QEMU, and the
patch we've hijacked for this thread. Without the patch I get a megaton
of interrupts (~14000/s). With the patch, after letting the guest chill
for a while, I'm getting ~150/s (8 vcpus).

I guess I don't have a test case for the ACPI code though. afaik we only
have UEFI for AArch64 guests, and we don't have ACPI boot without UEFI.
Or maybe I can hack time_init to remove the tick_setup_hrtimer_broadcast
call?

drew
diff mbox

Patch

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 05f9087..265fe9a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -291,6 +291,7 @@  static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype)
         qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible",
                                 "arm,armv7-timer");
     }
+    qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0);
     qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
                        GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags,
                        GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,