diff mbox

[v2,5/6] arm/arm64: KVM: Turn off vcpus on PSCI shutdown/reboot

Message ID 1417641522-29056-6-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall Dec. 3, 2014, 9:18 p.m. UTC
When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
should really be turned off for the VM adhering to the suggestions in
the PSCI spec, and it's the sane thing to do.

Also, clarify the behavior and expectations for exits to user space with
the KVM_EXIT_SYSTEM_EVENT case.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/api.txt |  9 +++++++++
 arch/arm/kvm/psci.c               | 19 +++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 3 files changed, 29 insertions(+)

Comments

Christoffer Dall Dec. 8, 2014, 12:58 p.m. UTC | #1
On Mon, Dec 08, 2014 at 12:04:53PM +0000, Marc Zyngier wrote:
> On 03/12/14 21:18, Christoffer Dall wrote:
> > When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
> > should really be turned off for the VM adhering to the suggestions in
> > the PSCI spec, and it's the sane thing to do.
> > 
> > Also, clarify the behavior and expectations for exits to user space with
> > the KVM_EXIT_SYSTEM_EVENT case.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/api.txt |  9 +++++++++
> >  arch/arm/kvm/psci.c               | 19 +++++++++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  3 files changed, 29 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 81f1b97..228f9cf 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2957,6 +2957,15 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes
> >  the system-level event type. The 'flags' field describes architecture
> >  specific flags for the system-level event.
> >  
> > +Valid values for 'type' are:
> > +  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
> > +   VM. Userspace is not obliged to honour this, and if it does honour
> > +   this does not need to destroy the VM synchronously (ie it may call
> > +   KVM_RUN again before shutdown finally occurs).
> > +  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
> > +   As with SHUTDOWN, userspace can choose to ignore the request, or
> > +   to schedule the reset to occur in the future and may call KVM_RUN again.
> > +
> >  		/* Fix the size of the union. */
> >  		char padding[256];
> >  	};
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index 09cf377..ae0bb91 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -15,6 +15,7 @@
> >   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >   */
> >  
> > +#include <linux/preempt.h>
> >  #include <linux/kvm_host.h>
> >  #include <linux/wait.h>
> >  
> > @@ -166,6 +167,24 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> >  
> >  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >  {
> > +	int i;
> > +	struct kvm_vcpu *tmp;
> > +
> > +	/*
> > +	 * The KVM ABI specifies that a system event exit may call KVM_RUN
> > +	 * again and may perform shutdown/reboot at a later time that when the
> > +	 * actual request is made.  Since we are implementing PSCI and a
> > +	 * caller of PSCI reboot and shutdown expects that the system shuts
> > +	 * down or reboots immediately, let's make sure that VCPUs are not run
> > +	 * after this call is handled and before the VCPUs have been
> > +	 * re-initialized.
> > +	 */
> > +	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > +		tmp->arch.pause = true;
> > +	preempt_disable();
> > +	force_vm_exit(cpu_all_mask);
> > +	preempt_enable();
> > +
> 
> I'm slightly uneasy about this force_vm_exit, as this is something that
> is directly triggered by the guest. I suppose it is almost impossible to
> find out which CPUs we're actually using...
> 
Ah, you mean we should only IPI the CPUs that are actually running a
VCPU belonging to this VM?

I guess I could replace it with:

	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
		tmp->arch.pause = true;
		kvm_vcpu_kick(tmp);
	}

or a slightly more optimized "half-open-coded-kvm_vcpu_kick":

	me = get_cpu();
	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
		tmp->arch.pause = true;
		if (tmp->cpu != me && (unsigned)tmp->cpu < nr_cpu_ids &&
		    cpu_online(tmp->cpu)  && kvm_arch_vcpu_should_kick(tmp))
			smp_send_reschedule(tmp->cpu);
	}

which should save us waking up vcpu threads that are parked on
waitqueues.  Not sure it's worth it, maybe it is for 100s of vcpu
systems?

Can we actually replace force_vm_exit() with the more optimized
open-coded version?  That messes with VMID allocation so it really needs
a lot of testing though...

Preferences?

-Christoffer
Christoffer Dall Dec. 12, 2014, 7:42 p.m. UTC | #2
On Mon, Dec 08, 2014 at 01:19:15PM +0000, Marc Zyngier wrote:
> On 08/12/14 12:58, Christoffer Dall wrote:
> > On Mon, Dec 08, 2014 at 12:04:53PM +0000, Marc Zyngier wrote:
> >> On 03/12/14 21:18, Christoffer Dall wrote:
> >>> When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
> >>> should really be turned off for the VM adhering to the suggestions in
> >>> the PSCI spec, and it's the sane thing to do.
> >>>
> >>> Also, clarify the behavior and expectations for exits to user space with
> >>> the KVM_EXIT_SYSTEM_EVENT case.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  Documentation/virtual/kvm/api.txt |  9 +++++++++
> >>>  arch/arm/kvm/psci.c               | 19 +++++++++++++++++++
> >>>  arch/arm64/include/asm/kvm_host.h |  1 +
> >>>  3 files changed, 29 insertions(+)
> >>>
> >>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>> index 81f1b97..228f9cf 100644
> >>> --- a/Documentation/virtual/kvm/api.txt
> >>> +++ b/Documentation/virtual/kvm/api.txt
> >>> @@ -2957,6 +2957,15 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes
> >>>  the system-level event type. The 'flags' field describes architecture
> >>>  specific flags for the system-level event.
> >>>  
> >>> +Valid values for 'type' are:
> >>> +  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
> >>> +   VM. Userspace is not obliged to honour this, and if it does honour
> >>> +   this does not need to destroy the VM synchronously (ie it may call
> >>> +   KVM_RUN again before shutdown finally occurs).
> >>> +  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
> >>> +   As with SHUTDOWN, userspace can choose to ignore the request, or
> >>> +   to schedule the reset to occur in the future and may call KVM_RUN again.
> >>> +
> >>>  		/* Fix the size of the union. */
> >>>  		char padding[256];
> >>>  	};
> >>> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> >>> index 09cf377..ae0bb91 100644
> >>> --- a/arch/arm/kvm/psci.c
> >>> +++ b/arch/arm/kvm/psci.c
> >>> @@ -15,6 +15,7 @@
> >>>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >>>   */
> >>>  
> >>> +#include <linux/preempt.h>
> >>>  #include <linux/kvm_host.h>
> >>>  #include <linux/wait.h>
> >>>  
> >>> @@ -166,6 +167,24 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> >>>  
> >>>  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >>>  {
> >>> +	int i;
> >>> +	struct kvm_vcpu *tmp;
> >>> +
> >>> +	/*
> >>> +	 * The KVM ABI specifies that a system event exit may call KVM_RUN
> >>> +	 * again and may perform shutdown/reboot at a later time that when the
> >>> +	 * actual request is made.  Since we are implementing PSCI and a
> >>> +	 * caller of PSCI reboot and shutdown expects that the system shuts
> >>> +	 * down or reboots immediately, let's make sure that VCPUs are not run
> >>> +	 * after this call is handled and before the VCPUs have been
> >>> +	 * re-initialized.
> >>> +	 */
> >>> +	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> >>> +		tmp->arch.pause = true;
> >>> +	preempt_disable();
> >>> +	force_vm_exit(cpu_all_mask);
> >>> +	preempt_enable();
> >>> +
> >>
> >> I'm slightly uneasy about this force_vm_exit, as this is something that
> >> is directly triggered by the guest. I suppose it is almost impossible to
> >> find out which CPUs we're actually using...
> >>
> > Ah, you mean we should only IPI the CPUs that are actually running a
> > VCPU belonging to this VM?
> > 
> > I guess I could replace it with:
> > 
> > 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > 		tmp->arch.pause = true;
> > 		kvm_vcpu_kick(tmp);
> > 	}
> 
> Ah, that's even simpler than I thought. Yeah, looks good to me.
> 
> > 
> > or a slightly more optimized "half-open-coded-kvm_vcpu_kick":
> > 
> > 	me = get_cpu();
> > 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > 		tmp->arch.pause = true;
> > 		if (tmp->cpu != me && (unsigned)tmp->cpu < nr_cpu_ids &&
> > 		    cpu_online(tmp->cpu)  && kvm_arch_vcpu_should_kick(tmp))
> > 			smp_send_reschedule(tmp->cpu);
> > 	}
> > 
> > which should save us waking up vcpu threads that are parked on
> > waitqueues.  Not sure it's worth it, maybe it is for 100s of vcpu
> > systems?
> 
> Probably not worth it at the moment.
> 
> > Can we actually replace force_vm_exit() with the more optimized
> > open-coded version?  That messes with VMID allocation so it really needs
> > a lot of testing though...
> 
> VMID reallocation almost never occurs, and that's a system-wide event,
> not triggered by a guest. I'd rather not mess with that just yet.
> 
> > Preferences?
> 
> I think your first version is very nice, provided that it doesn't
> introduce any unforeseen regression.
> 

ok, will respin with option #1.

Thanks,
-Christoffer
Christoffer Dall Dec. 12, 2014, 7:49 p.m. UTC | #3
On Mon, Dec 08, 2014 at 01:19:15PM +0000, Marc Zyngier wrote:
> On 08/12/14 12:58, Christoffer Dall wrote:
> > On Mon, Dec 08, 2014 at 12:04:53PM +0000, Marc Zyngier wrote:
> >> On 03/12/14 21:18, Christoffer Dall wrote:
> >>> When a vcpu calls SYSTEM_OFF or SYSTEM_RESET with PSCI v0.2, the vcpus
> >>> should really be turned off for the VM adhering to the suggestions in
> >>> the PSCI spec, and it's the sane thing to do.
> >>>
> >>> Also, clarify the behavior and expectations for exits to user space with
> >>> the KVM_EXIT_SYSTEM_EVENT case.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  Documentation/virtual/kvm/api.txt |  9 +++++++++
> >>>  arch/arm/kvm/psci.c               | 19 +++++++++++++++++++
> >>>  arch/arm64/include/asm/kvm_host.h |  1 +
> >>>  3 files changed, 29 insertions(+)
> >>>
> >>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>> index 81f1b97..228f9cf 100644
> >>> --- a/Documentation/virtual/kvm/api.txt
> >>> +++ b/Documentation/virtual/kvm/api.txt
> >>> @@ -2957,6 +2957,15 @@ HVC instruction based PSCI call from the vcpu. The 'type' field describes
> >>>  the system-level event type. The 'flags' field describes architecture
> >>>  specific flags for the system-level event.
> >>>  
> >>> +Valid values for 'type' are:
> >>> +  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
> >>> +   VM. Userspace is not obliged to honour this, and if it does honour
> >>> +   this does not need to destroy the VM synchronously (ie it may call
> >>> +   KVM_RUN again before shutdown finally occurs).
> >>> +  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
> >>> +   As with SHUTDOWN, userspace can choose to ignore the request, or
> >>> +   to schedule the reset to occur in the future and may call KVM_RUN again.
> >>> +
> >>>  		/* Fix the size of the union. */
> >>>  		char padding[256];
> >>>  	};
> >>> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> >>> index 09cf377..ae0bb91 100644
> >>> --- a/arch/arm/kvm/psci.c
> >>> +++ b/arch/arm/kvm/psci.c
> >>> @@ -15,6 +15,7 @@
> >>>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >>>   */
> >>>  
> >>> +#include <linux/preempt.h>
> >>>  #include <linux/kvm_host.h>
> >>>  #include <linux/wait.h>
> >>>  
> >>> @@ -166,6 +167,24 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
> >>>  
> >>>  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >>>  {
> >>> +	int i;
> >>> +	struct kvm_vcpu *tmp;
> >>> +
> >>> +	/*
> >>> +	 * The KVM ABI specifies that a system event exit may call KVM_RUN
> >>> +	 * again and may perform shutdown/reboot at a later time that when the
> >>> +	 * actual request is made.  Since we are implementing PSCI and a
> >>> +	 * caller of PSCI reboot and shutdown expects that the system shuts
> >>> +	 * down or reboots immediately, let's make sure that VCPUs are not run
> >>> +	 * after this call is handled and before the VCPUs have been
> >>> +	 * re-initialized.
> >>> +	 */
> >>> +	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> >>> +		tmp->arch.pause = true;
> >>> +	preempt_disable();
> >>> +	force_vm_exit(cpu_all_mask);
> >>> +	preempt_enable();
> >>> +
> >>
> >> I'm slightly uneasy about this force_vm_exit, as this is something that
> >> is directly triggered by the guest. I suppose it is almost impossible to
> >> find out which CPUs we're actually using...
> >>
> > Ah, you mean we should only IPI the CPUs that are actually running a
> > VCPU belonging to this VM?
> > 
> > I guess I could replace it with:
> > 
> > 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> > 		tmp->arch.pause = true;
> > 		kvm_vcpu_kick(tmp);
> > 	}
> 
> Ah, that's even simpler than I thought. Yeah, looks good to me.
> 
Can I take this as an ack and apply this series?

-Christoffer
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 81f1b97..228f9cf 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2957,6 +2957,15 @@  HVC instruction based PSCI call from the vcpu. The 'type' field describes
 the system-level event type. The 'flags' field describes architecture
 specific flags for the system-level event.
 
+Valid values for 'type' are:
+  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
+   VM. Userspace is not obliged to honour this, and if it does honour
+   this does not need to destroy the VM synchronously (ie it may call
+   KVM_RUN again before shutdown finally occurs).
+  KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
+   As with SHUTDOWN, userspace can choose to ignore the request, or
+   to schedule the reset to occur in the future and may call KVM_RUN again.
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 09cf377..ae0bb91 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -15,6 +15,7 @@ 
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/preempt.h>
 #include <linux/kvm_host.h>
 #include <linux/wait.h>
 
@@ -166,6 +167,24 @@  static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 
 static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 {
+	int i;
+	struct kvm_vcpu *tmp;
+
+	/*
+	 * The KVM ABI specifies that a system event exit may call KVM_RUN
+	 * again and may perform shutdown/reboot at a later time that when the
+	 * actual request is made.  Since we are implementing PSCI and a
+	 * caller of PSCI reboot and shutdown expects that the system shuts
+	 * down or reboots immediately, let's make sure that VCPUs are not run
+	 * after this call is handled and before the VCPUs have been
+	 * re-initialized.
+	 */
+	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+		tmp->arch.pause = true;
+	preempt_disable();
+	force_vm_exit(cpu_all_mask);
+	preempt_enable();
+
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
 	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 65c6152..0b7dfdb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -198,6 +198,7 @@  struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 
 u64 kvm_call_hyp(void *hypfn, ...);
+void force_vm_exit(const cpumask_t *mask);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);