diff mbox

[5/5] arm/arm64: KVM: Turn off vcpus and flush stage-2 pgtables on sytem exit events

Message ID 1417113660-23610-6-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall Nov. 27, 2014, 6:41 p.m. UTC
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.

Also, to ensure a coherent icache/dcache/ram situation when restarting
with the guest MMU off, flush all stage-2 page table entries so we start
taking aborts when the guest reboots, and flush/invalidate the necessary
cache lines.

Clarify the behavior and expectations for arm/arm64 in the
KVM_EXIT_SYSTEM_EVENT case.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/api.txt |  4 ++++
 arch/arm/kvm/psci.c               | 18 ++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 3 files changed, 23 insertions(+)

Comments

Peter Maydell Nov. 27, 2014, 11:10 p.m. UTC | #1
On 27 November 2014 at 18:41, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
> should really be turned off for the VM adhering to the suggestions in
> the PSCI spec, and it's the sane thing to do.
>
> Also, to ensure a coherent icache/dcache/ram situation when restarting
> with the guest MMU off, flush all stage-2 page table entries so we start
> taking aborts when the guest reboots, and flush/invalidate the necessary
> cache lines.
>
> Clarify the behavior and expectations for arm/arm64 in the
> KVM_EXIT_SYSTEM_EVENT case.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/api.txt |  4 ++++
>  arch/arm/kvm/psci.c               | 18 ++++++++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  3 files changed, 23 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index fc12b4f..c67e4956 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2955,6 +2955,10 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes
>  the system-level event type. The 'flags' field describes architecture
>  specific flags for the system-level event.
>
> +In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown
> +or reset, and it is the responsibility of userspace to reinitialize the vcpus
> +using KVM_ARM_VCPU_INIT.

Heh, we're not even consistent within this patchseries about the capitalisation
of "vcpu" :-)

What happens if you try to KVM_RUN a CPU the kernel thinks is powered down?
Does the kernel just say "ok, doing nothing"?

Also, the clarification we want here should not I think be architecture
specific -- the handling of the exit system event in QEMU is in common
code. What you want to say is something like:

"Valid values for 'type' are:
  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
   VM. Userspace is not obliged to honour this, and if it does honour
   this does not need to destroy the VM synchronously (ie it may call
   KVM_RUN again before shutdown finally occurs).
  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
   As with SHUTDOWN, userspace is permitted to ignore the request, or
   to schedule the reset to occur in the future and may call KVM_RUN again."

The corollary is that it's the kernel's job to deal with any impedance
mismatch between this and whatever ABI like PSCI it's implementing, but
that's fairly obvious so doesn't really need mentioning in the docs.

(I'd like to claim that "the vcpus are powered off when requesting shutdown"
is an implementation detail of this, not part of the API. I think we can
get away with that...)

> +
>                 /* Fix the size of the union. */
>                 char padding[256];
>         };
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index 09cf377..b4ab613 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -15,11 +15,13 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>
> +#include <linux/preempt.h>
>  #include <linux/kvm_host.h>
>  #include <linux/wait.h>
>
>  #include <asm/cputype.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_psci.h>
>
>  /*
> @@ -166,6 +168,22 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>
>  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  {
> +       int i;
> +       struct kvm_vcpu *tmp;
> +
> +       /* Stop all vcpus */
> +       kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> +               tmp->arch.pause = true;
> +       preempt_disable();
> +       force_vm_exit(cpu_all_mask);
> +       preempt_enable();
> +
> +       /*
> +        * Ensure a rebooted VM will fault in RAM pages and detect if the
> +        * guest MMU is turned off and flush the caches as needed.
> +        */
> +       stage2_unmap_vm(vcpu->kvm);

It seems odd to have this unmap happen on attempted system reset/powerdown,
not on cpu init/start. (I seem to remember having this conversation on
IRC, so maybe I've just forgotten why it has to be this way...)

thanks
-- PMM
Peter Maydell Dec. 1, 2014, 5:57 p.m. UTC | #2
On 27 November 2014 at 23:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> It seems odd to have this unmap happen on attempted system reset/powerdown,
> not on cpu init/start.

Here's a concrete case that I think requires the unmap to be
done on cpu init:
 * start a VM and run it for a bit
 * from the QEMU monitor, use "loadvm" to load a VM snapshot

This will cause QEMU to do a system reset (including calling
VCPU_INIT to reset the CPUs), load the contents of guest
RAM from the snapshot, set guest CPU registers with a pile
of SET_ONE_REG calls, and then KVM_RUN to start the VM.

If we don't unmap stage2 on vcpu init,  then what in this
sequence causes the icaches to be flushed so we execute
the newly loaded ram contents rather than stale data
from the first VM run?

thanks
-- PMM
Christoffer Dall Dec. 2, 2014, 1:29 p.m. UTC | #3
On Mon, Dec 01, 2014 at 05:57:53PM +0000, Peter Maydell wrote:
> On 27 November 2014 at 23:10, Peter Maydell <peter.maydell@linaro.org> wrote:
> > It seems odd to have this unmap happen on attempted system reset/powerdown,
> > not on cpu init/start.
> 
> Here's a concrete case that I think requires the unmap to be
> done on cpu init:
>  * start a VM and run it for a bit
>  * from the QEMU monitor, use "loadvm" to load a VM snapshot
> 
> This will cause QEMU to do a system reset (including calling
> VCPU_INIT to reset the CPUs), load the contents of guest
> RAM from the snapshot, set guest CPU registers with a pile
> of SET_ONE_REG calls, and then KVM_RUN to start the VM.
> 
> If we don't unmap stage2 on vcpu init,  then what in this
> sequence causes the icaches to be flushed so we execute
> the newly loaded ram contents rather than stale data
> from the first VM run?
> 

You're absolutely right that it makes more sense to stick it in
vcpu_init.  I put it only in the shutdown event handler for debugging
and forgot that was what I was doing :)

The only down-side is that we'll be trying to free memory that was never
mapped on initial startup, but it's not in the critical path and we
could add an explicit check to early-out if the vcpu has never been run,
which may increase code readibility too (we already have that flag I
belive).

-Christoffer
Christoffer Dall Dec. 2, 2014, 3:01 p.m. UTC | #4
On Thu, Nov 27, 2014 at 11:10:14PM +0000, Peter Maydell wrote:
> On 27 November 2014 at 18:41, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
> > should really be turned off for the VM adhering to the suggestions in
> > the PSCI spec, and it's the sane thing to do.
> >
> > Also, to ensure a coherent icache/dcache/ram situation when restarting
> > with the guest MMU off, flush all stage-2 page table entries so we start
> > taking aborts when the guest reboots, and flush/invalidate the necessary
> > cache lines.
> >
> > Clarify the behavior and expectations for arm/arm64 in the
> > KVM_EXIT_SYSTEM_EVENT case.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/api.txt |  4 ++++
> >  arch/arm/kvm/psci.c               | 18 ++++++++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  3 files changed, 23 insertions(+)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index fc12b4f..c67e4956 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2955,6 +2955,10 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes
> >  the system-level event type. The 'flags' field describes architecture
> >  specific flags for the system-level event.
> >
> > +In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown
> > +or reset, and it is the responsibility of userspace to reinitialize the vcpus
> > +using KVM_ARM_VCPU_INIT.
> 
> Heh, we're not even consistent within this patchseries about the capitalisation
> of "vcpu" :-)
> 
> What happens if you try to KVM_RUN a CPU the kernel thinks is powered down?
> Does the kernel just say "ok, doing nothing"?

yes, it blocks the vcpu execution by putting the thread on a wait-queue.
That's exactly what happens for the secondary vcpus in an SMP guest
using PSCI.

> 
> Also, the clarification we want here should not I think be architecture
> specific -- the handling of the exit system event in QEMU is in common
> code. What you want to say is something like:
> 
> "Valid values for 'type' are:
>   KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
>    VM. Userspace is not obliged to honour this, and if it does honour
>    this does not need to destroy the VM synchronously (ie it may call
>    KVM_RUN again before shutdown finally occurs).
>   KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
>    As with SHUTDOWN, userspace is permitted to ignore the request, or
>    to schedule the reset to occur in the future and may call KVM_RUN again."

ok, this is pretty good, but do we need to say that userspace is
permitted to do this or that?  The kernel never relies on user space for
correct functionality, so do you mean 'for the run a vm semantics to
still otherwise be functional'?

> 
> The corollary is that it's the kernel's job to deal with any impedance
> mismatch between this and whatever ABI like PSCI it's implementing, but
> that's fairly obvious so doesn't really need mentioning in the docs.

I didn't find it obvious (which is why I thought we'd spell it out), but
I agree that not mentioning it makes this arch-generic and we can put
the other stuff into a comment in arch/arm/kvm/psci.c.

> 
> (I'd like to claim that "the vcpus are powered off when requesting shutdown"
> is an implementation detail of this, not part of the API. I think we can
> get away with that...)
> 

ok

> > +
> >                 /* Fix the size of the union. */
> >                 char padding[256];
> >         };
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index 09cf377..b4ab613 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -15,11 +15,13 @@
> >   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >   */
> >
> > +#include <linux/preempt.h>
> >  #include <linux/kvm_host.h>
> >  #include <linux/wait.h>
> >
> >  #include <asm/cputype.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> >  #include <asm/kvm_psci.h>
> >
> >  /*
> > @@ -166,6 +168,22 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> >
> >  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >  {
> > +       int i;
> > +       struct kvm_vcpu *tmp;
> > +
> > +       /* Stop all vcpus */
> > +       kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > +               tmp->arch.pause = true;
> > +       preempt_disable();
> > +       force_vm_exit(cpu_all_mask);
> > +       preempt_enable();
> > +
> > +       /*
> > +        * Ensure a rebooted VM will fault in RAM pages and detect if the
> > +        * guest MMU is turned off and flush the caches as needed.
> > +        */
> > +       stage2_unmap_vm(vcpu->kvm);
> 
> It seems odd to have this unmap happen on attempted system reset/powerdown,
> not on cpu init/start. (I seem to remember having this conversation on
> IRC, so maybe I've just forgotten why it has to be this way...)
> 

no, as I said in the other mail, I forgot I was submitting a hack to the
list.  Nice job on my side.

I'll test an implementation that does this at init time for the next
revision.

Thanks!
-Christoffer
Peter Maydell Dec. 2, 2014, 3:42 p.m. UTC | #5
On 2 December 2014 at 15:01, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Thu, Nov 27, 2014 at 11:10:14PM +0000, Peter Maydell wrote:
>> Also, the clarification we want here should not I think be architecture
>> specific -- the handling of the exit system event in QEMU is in common
>> code. What you want to say is something like:
>>
>> "Valid values for 'type' are:
>>   KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
>>    VM. Userspace is not obliged to honour this, and if it does honour
>>    this does not need to destroy the VM synchronously (ie it may call
>>    KVM_RUN again before shutdown finally occurs).
>>   KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
>>    As with SHUTDOWN, userspace is permitted to ignore the request, or
>>    to schedule the reset to occur in the future and may call KVM_RUN again."
>
> ok, this is pretty good, but do we need to say that userspace is
> permitted to do this or that?  The kernel never relies on user space for
> correct functionality, so do you mean 'for the run a vm semantics to
> still otherwise be functional'?

I meant "permitted" in the sense of "the kernel won't kill the VM,
return errnos to subsequent KVM_RUN requests or otherwise treat
this userspace behaviour as buggy". If you want to rephrase it
somehow I don't object, as long as the docs make it clear that
it's a valid implementation strategy for userspace to do that.

-- PMM
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index fc12b4f..c67e4956 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2955,6 +2955,10 @@  HVC instruction based PSCI call from the vcpu. The 'type' field describes
 the system-level event type. The 'flags' field describes architecture
 specific flags for the system-level event.
 
+In the case of ARM/ARM64, all vcpus will be powered off when requesting shutdown
+or reset, and it is the responsibility of userspace to reinitialize the vcpus
+using KVM_ARM_VCPU_INIT.
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 09cf377..b4ab613 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -15,11 +15,13 @@ 
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/preempt.h>
 #include <linux/kvm_host.h>
 #include <linux/wait.h>
 
 #include <asm/cputype.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
 
 /*
@@ -166,6 +168,22 @@  static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 
 static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 {
+	int i;
+	struct kvm_vcpu *tmp;
+
+	/* Stop all vcpus */
+	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+		tmp->arch.pause = true;
+	preempt_disable();
+	force_vm_exit(cpu_all_mask);
+	preempt_enable();
+
+	/*
+	 * Ensure a rebooted VM will fault in RAM pages and detect if the
+	 * guest MMU is turned off and flush the caches as needed.
+	 */
+	stage2_unmap_vm(vcpu->kvm);
+
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
 	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2012c4b..dbd3212 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,6 +200,7 @@  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
 u64 kvm_call_hyp(void *hypfn, ...);
+void force_vm_exit(const cpumask_t *mask);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);