Message ID | 20220609110337.1238762-2-jaz@semihalf.com |
---|---|
State | New |
Headers | show |
Series | x86: notify hypervisor/VMM about guest entering s2idle | expand |
czw., 9 cze 2022 o 16:55 Sean Christopherson <seanjc@google.com> napisał(a): > > On Thu, Jun 09, 2022, Grzegorz Jaszczyk wrote: > > +9. KVM_HC_SYSTEM_S2IDLE > > +------------------------ > > + > > +:Architecture: x86 > > +:Status: active > > +:Purpose: Notify the hypervisor that the guest is entering s2idle state. > > What about exiting s2idle? E.g. > > 1. VM0 enters s2idle > 2. host notes that VM0 is in s2idle > 3. VM0 exits s2idle > 4. host still thinks VM0 is in s2idle > 5. VM1 enters s2idle > 6. host thinks all VMs are in s2idle, suspends the system I think that this problem couldn't be solved by adding notification about exiting s2idle. Please consider (even after simplifying your example to one VM): 1. VM0 enters s2idle 2. host notes about VM0 is in s2idle 3. host continues with system suspension but in the meantime VM0 exits s2idle and sends notification but it is already too late (VM could not even send notification on time). Above could be actually prevented if the VMM had control over the guest resumption. E.g. after VMM receives notification about guest entering s2idle state, it would park the vCPU actually preventing it from exiting s2idle without VMM intervention. > > > +static void s2idle_hypervisor_notify(void) > > +{ > > + if (static_cpu_has(X86_FEATURE_HYPERVISOR)) > > + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE); > > Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not > be KVM, and if it is KVM, it may be an older version of KVM that doesn't support > the hypercall. The latter scenario won't be fatal unless KVM has been modified, > but blindly doing a hypercall for a different hypervisor could have disastrous > results, e.g. the registers ABIs are different, so the above will make a random > request depending on what is in other GPRs. Good point: we've actually thought about not confusing/breaking VMMs so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the second patch, but not breaking different hypervisors is another story. Would hiding it under new 's2idle_notify_kvm' module parameter work for upstream?: +static bool s2idle_notify_kvm __read_mostly; +module_param(s2idle_notify_kvm, bool, 0644); +MODULE_PARM_DESC(s2idle_notify_kvm, "Notify hypervisor about guest entering s2idle state"); + .. +static void s2idle_hypervisor_notify(void) +{ + if (static_cpu_has(X86_FEATURE_HYPERVISOR) && s2idle_notify_kvm) + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE); +} + > > The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out > to userspace, and not a very good one at that. There are multiple well established > ways to communicate with the VMM without custom hypercalls. Could you please kindly advise about the recommended way of communication with VMM, taking into account that we want to send this notification just before entering s2idle state (please see also answer to next comment), which is at a very late stage of the suspend process with a lot of functionality already suspended? > > > I bet if you're clever this can even be done without any guest changes, e.g. I > gotta imagine acpi_sleep_run_lps0_dsm() triggers MMIO/PIO with the right ACPI > configuration. The problem is that between acpi_sleep_run_lps0_dsm and the place where we introduced hypercall there are several places where we can actually cancel and not enter the suspend state. So trapping on acpi_sleep_run_lps0_dsm which triggers MMIO/PIO would be premature. The other reason for doing it in this place is the fact that s2idle_enter is called from an infinite loop inside s2idle_loop, which could be interrupted by e.g. ACPI EC GPE (not aim for waking-up the system) so s2idle_ops->wake() would return false and s2idle_enter will be triggered again. In this case we would want to get notification about guests actually entering s2idle state again, which wouldn't be possible if we would rely on acpi_sleep_run_lps0_dsm. Best regards, Grzegorz
On Fri, Jun 10, 2022, Grzegorz Jaszczyk wrote: > czw., 9 cze 2022 o 16:55 Sean Christopherson <seanjc@google.com> napisał(a): > Above could be actually prevented if the VMM had control over the > guest resumption. E.g. after VMM receives notification about guest > entering s2idle state, it would park the vCPU actually preventing it > from exiting s2idle without VMM intervention. Ah, so you avoid races by assuming the VM wakes itself from s2idle any time a vCPU is run, even if the vCPU doesn't actually have a wake event. That would be very useful info to put in the changelog. > > > +static void s2idle_hypervisor_notify(void) > > > +{ > > > + if (static_cpu_has(X86_FEATURE_HYPERVISOR)) > > > + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE); > > > > Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not > > be KVM, and if it is KVM, it may be an older version of KVM that doesn't support > > the hypercall. The latter scenario won't be fatal unless KVM has been modified, > > but blindly doing a hypercall for a different hypervisor could have disastrous > > results, e.g. the registers ABIs are different, so the above will make a random > > request depending on what is in other GPRs. > > Good point: we've actually thought about not confusing/breaking VMMs > so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the > second patch, but not breaking different hypervisors is another story. > Would hiding it under new 's2idle_notify_kvm' module parameter work > for upstream?: No, enumerating support via KVM_CPUID_FEATURES is the correct way to do something like this, e.g. see KVM_FEATURE_CLOCKSOURCE2. But honestly I wouldn't spend too much time understanding how all of that works, because I still feel quite strongly that getting KVM involved is completely unnecessary. A solution that isn't KVM specific is preferable as it can then be implemented by any VMM that enumerates s2idle support to the guest. > > The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out > > to userspace, and not a very good one at that. There are multiple well established > > ways to communicate with the VMM without custom hypercalls. > > Could you please kindly advise about the recommended way of > communication with VMM, taking into account that we want to send this > notification just before entering s2idle state (please see also answer > to next comment), which is at a very late stage of the suspend process > with a lot of functionality already suspended? MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for enumerating to the guest that it should do something, why not add a new ACPI_LPS0_* function? E.g. something like static void s2idle_hypervisor_notify(void) { if (lps0_dsm_func_mask > 0) acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY lps0_dsm_func_mask, lps0_dsm_guid); }
diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst index e56fa8b9cfca..9d1836c837e3 100644 --- a/Documentation/virt/kvm/x86/hypercalls.rst +++ b/Documentation/virt/kvm/x86/hypercalls.rst @@ -190,3 +190,10 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID. In addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL. + +9. KVM_HC_SYSTEM_S2IDLE +------------------------ + +:Architecture: x86 +:Status: active +:Purpose: Notify the hypervisor that the guest is entering s2idle state. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e9473c7c7390..6ed4bd6e762b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9306,6 +9306,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) vcpu->arch.complete_userspace_io = complete_hypercall_exit; return 0; } + case KVM_HC_SYSTEM_S2IDLE: + ret = 0; + break; default: ret = -KVM_ENOSYS; break; diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c index 2963229062f8..0ae5e11380d2 100644 --- a/drivers/acpi/x86/s2idle.c +++ b/drivers/acpi/x86/s2idle.c @@ -18,6 +18,7 @@ #include <linux/acpi.h> #include <linux/device.h> #include <linux/suspend.h> +#include <uapi/linux/kvm_para.h> #include "../sleep.h" @@ -520,10 +521,17 @@ void acpi_s2idle_restore_early(void) lps0_dsm_func_mask, lps0_dsm_guid); } +static void s2idle_hypervisor_notify(void) +{ + if (static_cpu_has(X86_FEATURE_HYPERVISOR)) + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE); +} + static const struct platform_s2idle_ops acpi_s2idle_ops_lps0 = { .begin = acpi_s2idle_begin, .prepare = acpi_s2idle_prepare, .prepare_late = acpi_s2idle_prepare_late, + .hypervisor_notify = s2idle_hypervisor_notify, .wake = acpi_s2idle_wake, .restore_early = acpi_s2idle_restore_early, .restore = acpi_s2idle_restore, diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 70f2921e2e70..42e04e0fe8b1 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -191,6 +191,7 @@ struct platform_s2idle_ops { int (*begin)(void); int (*prepare)(void); int (*prepare_late)(void); + void (*hypervisor_notify)(void); bool (*wake)(void); void (*restore_early)(void); void (*restore)(void); diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index 960c7e93d1a9..072e77e40f89 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -30,6 +30,7 @@ #define KVM_HC_SEND_IPI 10 #define KVM_HC_SCHED_YIELD 11 #define KVM_HC_MAP_GPA_RANGE 12 +#define KVM_HC_SYSTEM_S2IDLE 13 /* * hypercalls use architecture specific diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 827075944d28..c641c643290b 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -100,6 +100,10 @@ static void s2idle_enter(void) /* Push all the CPUs into the idle loop. */ wake_up_all_idle_cpus(); + + if (s2idle_ops && s2idle_ops->hypervisor_notify) + s2idle_ops->hypervisor_notify(); + /* Make the current CPU wait so it can enter the idle loop too. */ swait_event_exclusive(s2idle_wait_head, s2idle_state == S2IDLE_STATE_WAKE);