Message ID | 1453204158-11412-1-git-send-email-christoffer.dall@linaro.org |
---|---|
State | Superseded |
Headers | show |
On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > The virt board has an arch timer, which is always on. Emit the > "always-on" property to indicate to Linux that it can switch off the > periodic timer and reduces the amount of interrupts injected into a > guest. > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > --- > hw/arm/virt.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 05f9087..265fe9a 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) > qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > "arm,armv7-timer"); > } > + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, > -- > 2.1.2.330.g565301e.dirty > > Hi Christoffer, We should also patch the ACPI generation at the same time. I think something like - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; should do it. Also, having the guest reduce the number of interrupts sounds good. Can you point me to something to read about how/why a guest may choose to do that, and what the trade-offs are? Thanks, drew
On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > > The virt board has an arch timer, which is always on. Emit the > > "always-on" property to indicate to Linux that it can switch off the > > periodic timer and reduces the amount of interrupts injected into a > > guest. > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > > --- > > hw/arm/virt.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 05f9087..265fe9a 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) > > qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > > "arm,armv7-timer"); > > } > > + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > > qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, > > -- > > 2.1.2.330.g565301e.dirty > > > > > > Hi Christoffer, > > We should also patch the ACPI generation at the same time. I think > something like > > - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; I'm really not familiar enough with ACPI to be comfortable writing code for this or testing this. But if someone can pick this up and add the ACPI bits or can post a follow-up patch, then I'm all for it :) > > should do it. > > Also, having the guest reduce the number of interrupts sounds good. Can > you point me to something to read about how/why a guest may choose to do > that, and what the trade-offs are? > Not really, but you can ask Marc. -Christoffer
On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote: > On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > > On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > > > The virt board has an arch timer, which is always on. Emit the > > > "always-on" property to indicate to Linux that it can switch off the > > > periodic timer and reduces the amount of interrupts injected into a > > > guest. > > > > > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > > > --- > > > hw/arm/virt.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > > index 05f9087..265fe9a 100644 > > > --- a/hw/arm/virt.c > > > +++ b/hw/arm/virt.c > > > @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) > > > qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > > > "arm,armv7-timer"); > > > } > > > + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > > > qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, > > > GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, > > > -- > > > 2.1.2.330.g565301e.dirty > > > > > > > > > > Hi Christoffer, > > > > We should also patch the ACPI generation at the same time. I think > > something like > > > > - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > > + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; > > I'm really not familiar enough with ACPI to be comfortable writing code > for this or testing this. > > But if someone can pick this up and add the ACPI bits or can post a > follow-up patch, then I'm all for it :) I can post a follow-up patch. > > > > > should do it. > > > > Also, having the guest reduce the number of interrupts sounds good. Can > > you point me to something to read about how/why a guest may choose to do > > that, and what the trade-offs are? > > > Not really, but you can ask Marc. OK, CCing him. One thing I see is that without this change we're currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though it's not true. Having that set may disable the oneshot capabilityj necessary to switch to nohz mode? I'll just stop there with my speculation though, so Marc won't have to correct too much... drew
On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > On 19/01/16 13:32, Andrew Jones wrote: > > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote: > >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > >>>> The virt board has an arch timer, which is always on. Emit the > >>>> "always-on" property to indicate to Linux that it can switch off the > >>>> periodic timer and reduces the amount of interrupts injected into a > >>>> guest. > >>>> > >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > >>>> --- > >>>> hw/arm/virt.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>>> index 05f9087..265fe9a 100644 > >>>> --- a/hw/arm/virt.c > >>>> +++ b/hw/arm/virt.c > >>>> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) > >>>> qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > >>>> "arm,armv7-timer"); > >>>> } > >>>> + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > >>>> qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, > >>>> -- > >>>> 2.1.2.330.g565301e.dirty > >>>> > >>>> > >>> > >>> Hi Christoffer, > >>> > >>> We should also patch the ACPI generation at the same time. I think > >>> something like > >>> > >>> - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > >>> + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; > >> > >> I'm really not familiar enough with ACPI to be comfortable writing code > >> for this or testing this. > >> > >> But if someone can pick this up and add the ACPI bits or can post a > >> follow-up patch, then I'm all for it :) > > > > I can post a follow-up patch. > > > >> > >>> > >>> should do it. > >>> > >>> Also, having the guest reduce the number of interrupts sounds good. Can > >>> you point me to something to read about how/why a guest may choose to do > >>> that, and what the trade-offs are? > >>> > >> Not really, but you can ask Marc. > > > > OK, CCing him. One thing I see is that without this change we're > > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > > it's not true. Having that set may disable the oneshot capabilityj > > necessary to switch to nohz mode? I'll just stop there with my > > speculation though, so Marc won't have to correct too much... > > You're spot on. See 82a5619 in the kernel tree. When I did a similar > change in kvmtool, I saw a massive reduction in the number of timer > interrupts injected (specially when the number of vcpu is relatively high). > > This also have interesting benefits when running on a model, where > you're trying to squeeze the last bits of "performance" from the monster... > > Thanks, Thanks Marc! Christoffer, I'll create the ACPI patch, and do some pre-/post- tracing to confirm the happy reduction in interrupts :-) Also, as for this patch, Reviewed-by: Andrew Jones <drjones@redhat.com> drew
On 19 January 2016 at 11:49, Christoffer Dall <christoffer.dall@linaro.org> wrote: > The virt board has an arch timer, which is always on. Emit the > "always-on" property to indicate to Linux that it can switch off the > periodic timer and reduces the amount of interrupts injected into a > guest. > > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Applied to target-arm.next, thanks. -- PMM
On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > On 19/01/16 13:32, Andrew Jones wrote: > > On Tue, Jan 19, 2016 at 01:43:41PM +0100, Christoffer Dall wrote: > >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>> On Tue, Jan 19, 2016 at 12:49:18PM +0100, Christoffer Dall wrote: > >>>> The virt board has an arch timer, which is always on. Emit the > >>>> "always-on" property to indicate to Linux that it can switch off the > >>>> periodic timer and reduces the amount of interrupts injected into a > >>>> guest. > >>>> > >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> > >>>> --- > >>>> hw/arm/virt.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>>> index 05f9087..265fe9a 100644 > >>>> --- a/hw/arm/virt.c > >>>> +++ b/hw/arm/virt.c > >>>> @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) > >>>> qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", > >>>> "arm,armv7-timer"); > >>>> } > >>>> + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); > >>>> qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, > >>>> GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags, > >>>> -- > >>>> 2.1.2.330.g565301e.dirty > >>>> > >>>> > >>> > >>> Hi Christoffer, > >>> > >>> We should also patch the ACPI generation at the same time. I think > >>> something like > >>> > >>> - gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE; > >>> + gtdt->non_secure_el1_flags = ACPI_EDGE_SENSITIVE | ACPI_GTDT_ALWAYS_ON; > >> > >> I'm really not familiar enough with ACPI to be comfortable writing code > >> for this or testing this. > >> > >> But if someone can pick this up and add the ACPI bits or can post a > >> follow-up patch, then I'm all for it :) > > > > I can post a follow-up patch. > > > >> > >>> > >>> should do it. > >>> > >>> Also, having the guest reduce the number of interrupts sounds good. Can > >>> you point me to something to read about how/why a guest may choose to do > >>> that, and what the trade-offs are? > >>> > >> Not really, but you can ask Marc. > > > > OK, CCing him. One thing I see is that without this change we're > > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > > it's not true. Having that set may disable the oneshot capabilityj > > necessary to switch to nohz mode? I'll just stop there with my > > speculation though, so Marc won't have to correct too much... > > You're spot on. See 82a5619 in the kernel tree. When I did a similar > change in kvmtool, I saw a massive reduction in the number of timer > interrupts injected (specially when the number of vcpu is relatively high). > > This also have interesting benefits when running on a model, where > you're trying to squeeze the last bits of "performance" from the monster... > Hmm, I'm probably testing this wrong, but I don't see any difference in the number of injected timer interrupts. My guest, which I boot with UEFI, has CONFIG_ARM_ARCH_TIMER=y CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y CONFIG_ARM_TIMER_SP804=y CONFIG_HIGH_RES_TIMERS=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set CONFIG_NO_HZ=y CONFIG_HZ_1000=y CONFIG_HZ=1000 I've boot a guest using DT with and without this patch ---WITHOUT--- # ls /proc/device-tree/timer compatible interrupts name # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 6958 5766 5166 5187 5576 5129 4695 4398 GIC 27 Edge arch_timer # sleep 120 && cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7557 5986 5487 5265 6232 5868 5464 4438 GIC 27 Edge arch_timer ---WITH--- # ls /proc/device-tree/timer always-on compatible interrupts name # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7005 6080 4996 5391 5165 5257 4930 4844 GIC 27 Edge arch_timer # sleep 120 && cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 3: 7523 6505 5264 6717 5273 5391 5526 4901 GIC 27 Edge arch_timer And kvm trace data has ---WITHOUT--- $ grep kvm_timer_update_irq trace.out | wc -l 94336 ---WITH--- $ grep kvm_timer_update_irq trace.out | wc -l 95838 Any suggestions? Thanks, drew
On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote: > On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > > >> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > > > OK, CCing him. One thing I see is that without this change we're > > > currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > > > it's not true. Having that set may disable the oneshot capabilityj > > > necessary to switch to nohz mode? I'll just stop there with my > > > speculation though, so Marc won't have to correct too much... > > > > You're spot on. See 82a5619 in the kernel tree. When I did a similar > > change in kvmtool, I saw a massive reduction in the number of timer > > interrupts injected (specially when the number of vcpu is relatively high). > > > > This also have interesting benefits when running on a model, where > > you're trying to squeeze the last bits of "performance" from the monster... > > > > Hmm, I'm probably testing this wrong, but I don't see any difference in > the number of injected timer interrupts. My guest, which I boot with > UEFI, has > > CONFIG_ARM_ARCH_TIMER=y > CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y > CONFIG_ARM_TIMER_SP804=y > CONFIG_HIGH_RES_TIMERS=y > CONFIG_TICK_ONESHOT=y > CONFIG_NO_HZ_COMMON=y > # CONFIG_HZ_PERIODIC is not set > CONFIG_NO_HZ_IDLE=y > # CONFIG_NO_HZ_FULL is not set > CONFIG_NO_HZ=y > CONFIG_HZ_1000=y > CONFIG_HZ=1000 > > I've boot a guest using DT with and without this patch > > ---WITHOUT--- > > # ls /proc/device-tree/timer > compatible interrupts name > # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > 3: 6958 5766 5166 5187 5576 5129 4695 4398 GIC 27 Edge arch_timer > # sleep 120 && cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > 3: 7557 5986 5487 5265 6232 5868 5464 4438 GIC 27 Edge arch_timer > > ---WITH--- > > # ls /proc/device-tree/timer > always-on compatible interrupts name > # cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > 3: 7005 6080 4996 5391 5165 5257 4930 4844 GIC 27 Edge arch_timer > # sleep 120 && cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > 3: 7523 6505 5264 6717 5273 5391 5526 4901 GIC 27 Edge arch_timer > > > > And kvm trace data has > > ---WITHOUT--- > $ grep kvm_timer_update_irq trace.out | wc -l > 94336 > ---WITH--- > $ grep kvm_timer_update_irq trace.out | wc -l > 95838 > > Must be how I'm looking, because I just tried kvmtool with/without Marc's patch that adds always-on, but don't see any reduction of interrupts there either. I used a defconfig guest kernel. Also, not that I think it should matter, but my host kernel is 4.4-rc4 based. I'd like to be able to see a difference with/without this always-on patch, not because I don't think we should take it anyway, but because I need a test case for the ACPI counterpart. Thanks, drew
On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote: > On 20/01/16 14:01, Andrew Jones wrote: > > On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote: > >> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > >>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>>> OK, CCing him. One thing I see is that without this change we're > >>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > >>>> it's not true. Having that set may disable the oneshot capabilityj > >>>> necessary to switch to nohz mode? I'll just stop there with my > >>>> speculation though, so Marc won't have to correct too much... > >>> > >>> You're spot on. See 82a5619 in the kernel tree. When I did a similar > >>> change in kvmtool, I saw a massive reduction in the number of timer > >>> interrupts injected (specially when the number of vcpu is relatively high). > >>> > >>> This also have interesting benefits when running on a model, where > >>> you're trying to squeeze the last bits of "performance" from the monster... > >>> > >> > >> Hmm, I'm probably testing this wrong, but I don't see any difference in > >> the number of injected timer interrupts. My guest, which I boot with > >> UEFI, has > >> > >> CONFIG_ARM_ARCH_TIMER=y > >> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y > >> CONFIG_ARM_TIMER_SP804=y > >> CONFIG_HIGH_RES_TIMERS=y > >> CONFIG_TICK_ONESHOT=y > >> CONFIG_NO_HZ_COMMON=y > >> # CONFIG_HZ_PERIODIC is not set > >> CONFIG_NO_HZ_IDLE=y > >> # CONFIG_NO_HZ_FULL is not set > >> CONFIG_NO_HZ=y > >> CONFIG_HZ_1000=y > >> CONFIG_HZ=1000 > >> > >> I've boot a guest using DT with and without this patch > >> > >> ---WITHOUT--- > >> > >> # ls /proc/device-tree/timer > >> compatible interrupts name > >> # cat /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >> 3: 6958 5766 5166 5187 5576 5129 4695 4398 GIC 27 Edge arch_timer > >> # sleep 120 && cat /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >> 3: 7557 5986 5487 5265 6232 5868 5464 4438 GIC 27 Edge arch_timer > >> > >> ---WITH--- > >> > >> # ls /proc/device-tree/timer > >> always-on compatible interrupts name > >> # cat /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >> 3: 7005 6080 4996 5391 5165 5257 4930 4844 GIC 27 Edge arch_timer > >> # sleep 120 && cat /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >> 3: 7523 6505 5264 6717 5273 5391 5526 4901 GIC 27 Edge arch_timer > >> > >> > >> > >> And kvm trace data has > >> > >> ---WITHOUT--- > >> $ grep kvm_timer_update_irq trace.out | wc -l > >> 94336 > >> ---WITH--- > >> $ grep kvm_timer_update_irq trace.out | wc -l > >> 95838 > >> > >> > > > > Must be how I'm looking, because I just tried kvmtool with/without > > Marc's patch that adds always-on, but don't see any reduction of > > interrupts there either. I used a defconfig guest kernel. Also, > > not that I think it should matter, but my host kernel is 4.4-rc4 > > based. > > > > I'd like to be able to see a difference with/without this always-on > > patch, not because I don't think we should take it anyway, but because > > I need a test case for the ACPI counterpart. > > I just run a couple of quick tests, measuring interrupt rate (vmstat 1) > on the host, with one VM (2 vcpus) idling, and I'm seeing the following > thing: > > Without "always-on": ~380 interrupts per second > With "always-on": ~40 interrupts per second > > This is with kvmtool, 32bit host (but none of that is arch specific anyway). > For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the following. Without "always-on": mean=56.370 sd=33.404 min=1 max=244 With "always-on": mean=51.580 sd=33.361 min=1 max=273 I'm also using kvmtool, and my guest is idle. So a difference between 32 and 64bit hosts? Again, my guest config is now just a defconfig. My host config is not, but I'm not sure what options to look for other than what I wrote above, which are the same for my host. Thanks, drew
On Wed, Jan 20, 2016 at 04:20:03PM +0000, Marc Zyngier wrote: > On 20/01/16 15:06, Andrew Jones wrote: > > On Wed, Jan 20, 2016 at 02:28:05PM +0000, Marc Zyngier wrote: > >> On 20/01/16 14:01, Andrew Jones wrote: > >>> On Tue, Jan 19, 2016 at 07:48:14PM +0100, Andrew Jones wrote: > >>>> On Tue, Jan 19, 2016 at 01:43:07PM +0000, Marc Zyngier wrote: > >>>>>>> On Tue, Jan 19, 2016 at 01:37:16PM +0100, Andrew Jones wrote: > >>>>>> OK, CCing him. One thing I see is that without this change we're > >>>>>> currently setting the clock feature CLOCK_EVT_FEAT_C3STOP, even though > >>>>>> it's not true. Having that set may disable the oneshot capabilityj > >>>>>> necessary to switch to nohz mode? I'll just stop there with my > >>>>>> speculation though, so Marc won't have to correct too much... > >>>>> > >>>>> You're spot on. See 82a5619 in the kernel tree. When I did a similar > >>>>> change in kvmtool, I saw a massive reduction in the number of timer > >>>>> interrupts injected (specially when the number of vcpu is relatively high). > >>>>> > >>>>> This also have interesting benefits when running on a model, where > >>>>> you're trying to squeeze the last bits of "performance" from the monster... > >>>>> > >>>> > >>>> Hmm, I'm probably testing this wrong, but I don't see any difference in > >>>> the number of injected timer interrupts. My guest, which I boot with > >>>> UEFI, has > >>>> > >>>> CONFIG_ARM_ARCH_TIMER=y > >>>> CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y > >>>> CONFIG_ARM_TIMER_SP804=y > >>>> CONFIG_HIGH_RES_TIMERS=y > >>>> CONFIG_TICK_ONESHOT=y > >>>> CONFIG_NO_HZ_COMMON=y > >>>> # CONFIG_HZ_PERIODIC is not set > >>>> CONFIG_NO_HZ_IDLE=y > >>>> # CONFIG_NO_HZ_FULL is not set > >>>> CONFIG_NO_HZ=y > >>>> CONFIG_HZ_1000=y > >>>> CONFIG_HZ=1000 > >>>> > >>>> I've boot a guest using DT with and without this patch > >>>> > >>>> ---WITHOUT--- > >>>> > >>>> # ls /proc/device-tree/timer > >>>> compatible interrupts name > >>>> # cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >>>> 3: 6958 5766 5166 5187 5576 5129 4695 4398 GIC 27 Edge arch_timer > >>>> # sleep 120 && cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >>>> 3: 7557 5986 5487 5265 6232 5868 5464 4438 GIC 27 Edge arch_timer > >>>> > >>>> ---WITH--- > >>>> > >>>> # ls /proc/device-tree/timer > >>>> always-on compatible interrupts name > >>>> # cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >>>> 3: 7005 6080 4996 5391 5165 5257 4930 4844 GIC 27 Edge arch_timer > >>>> # sleep 120 && cat /proc/interrupts > >>>> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 > >>>> 3: 7523 6505 5264 6717 5273 5391 5526 4901 GIC 27 Edge arch_timer > >>>> > >>>> > >>>> > >>>> And kvm trace data has > >>>> > >>>> ---WITHOUT--- > >>>> $ grep kvm_timer_update_irq trace.out | wc -l > >>>> 94336 > >>>> ---WITH--- > >>>> $ grep kvm_timer_update_irq trace.out | wc -l > >>>> 95838 > >>>> > >>>> > >>> > >>> Must be how I'm looking, because I just tried kvmtool with/without > >>> Marc's patch that adds always-on, but don't see any reduction of > >>> interrupts there either. I used a defconfig guest kernel. Also, > >>> not that I think it should matter, but my host kernel is 4.4-rc4 > >>> based. > >>> > >>> I'd like to be able to see a difference with/without this always-on > >>> patch, not because I don't think we should take it anyway, but because > >>> I need a test case for the ACPI counterpart. > >> > >> I just run a couple of quick tests, measuring interrupt rate (vmstat 1) > >> on the host, with one VM (2 vcpus) idling, and I'm seeing the following > >> thing: > >> > >> Without "always-on": ~380 interrupts per second > >> With "always-on": ~40 interrupts per second > >> > >> This is with kvmtool, 32bit host (but none of that is arch specific anyway). > >> > > > > For me (64bit host, one VM (8 vcpus)) of 100 'vmstat 1' samples I have the > > following. > > > > Without "always-on": mean=56.370 sd=33.404 min=1 max=244 > > With "always-on": mean=51.580 sd=33.361 min=1 max=273 > > > > I'm also using kvmtool, and my guest is idle. > > > > So a difference between 32 and 64bit hosts? Again, my guest config is > > now just a defconfig. My host config is not, but I'm not sure what > > options to look for other than what I wrote above, which are the same > > for my host. > > Just tried on Seattle with a 64bit guest, and there is hardly any > difference indeed. Both host and guest are "mostly" defconfig as well. > So there is a kernel configuration difference. > > Running my 32bit guest on a 64bit host definitely shows a massive > difference (with 8 vcpus): > > Without "always-on": ~1200 interrupts per second > With "always-on": ~50 interrupts per second > > [Head scratching, poking Mark] > > Right, I now know what is going on: The arm64 kernel uses > tick_setup_hrtimer_broadcast() so that it can still use the arch timer > as a broadcast timer (forcing one CPU to remain on), while the 32bit > kernel relies on the presence of a backup timer (sp804 anyone?) or the > guarantee that the timer cannot go away (always-on). > > This is probably why I'm seeing such a gain with a 32bit guest, and none > with a 64bit guest (the kernel already does the right thing). As to why > there is such a difference between the two architectures, this is a > story for another day... > Thanks Marc! I just confirmed with an AArch32 guest using QEMU, and the patch we've hijacked for this thread. Without the patch I get a megaton of interrupts (~14000/s). With the patch, after letting the guest chill for a while, I'm getting ~150/s (8 vcpus). I guess I don't have a test case for the ACPI code though. afaik we only have UEFI for AArch64 guests, and we don't have ACPI boot without UEFI. Or maybe I can hack time_init to remove the tick_setup_hrtimer_broadcast call? drew
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 05f9087..265fe9a 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -291,6 +291,7 @@ static void fdt_add_timer_nodes(const VirtBoardInfo *vbi, int gictype) qemu_fdt_setprop_string(vbi->fdt, "/timer", "compatible", "arm,armv7-timer"); } + qemu_fdt_setprop(vbi->fdt, "/timer", "always-on", NULL, 0); qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts", GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_S_EL1_IRQ, irqflags, GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL1_IRQ, irqflags,
The virt board has an arch timer, which is always on. Emit the "always-on" property to indicate to Linux that it can switch off the periodic timer and reduces the amount of interrupts injected into a guest. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> --- hw/arm/virt.c | 1 + 1 file changed, 1 insertion(+) -- 2.1.2.330.g565301e.dirty