From patchwork Tue Dec 15 09:51:03 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AKASHI Takahiro X-Patchwork-Id: 58420 Delivered-To: patch@linaro.org Received: by 10.112.129.4 with SMTP id ns4csp88875lbb; Tue, 15 Dec 2015 01:52:55 -0800 (PST) X-Received: by 10.98.12.20 with SMTP id u20mr42735792pfi.71.1450173175806; Tue, 15 Dec 2015 01:52:55 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id p20si892974pfa.71.2015.12.15.01.52.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Dec 2015 01:52:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dkim=neutral (body hash did not verify) header.i=@linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1a8mGl-00065Q-7L; Tue, 15 Dec 2015 09:51:43 +0000 Received: from mail-pf0-x22a.google.com ([2607:f8b0:400e:c00::22a]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1a8mGb-0005oY-I7 for linux-arm-kernel@lists.infradead.org; Tue, 15 Dec 2015 09:51:39 +0000 Received: by mail-pf0-x22a.google.com with SMTP id u66so2101998pfb.3 for ; Tue, 15 Dec 2015 01:51:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=gMu6r65h8ZIF8kwK3iwg0P1ZGrlNJy82Hkgj3be+1s8=; b=gzCXRwasAOkv61VSXIEV6b7GEOgwRv34aUH3FiXtSFLSrTMgscIh4hNCnMTqUzZYdb Mijq+ZURvUiNE580X46PB54NR6470N79nuLQ3CxOYBUimCbIvZQ+1VXZ3UUl+1IWa5O+ L+U8rzJaPQOlAoJYV+MYfLnN/8vrMPK4STWLU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=gMu6r65h8ZIF8kwK3iwg0P1ZGrlNJy82Hkgj3be+1s8=; b=gt1ahWCBKbbhvJC8a9IKvIA+bh6yO32rE9/UDVMSC23K1qIh0fYNh6fyDz8Lb6tn0c 16vWvF86HAb2KJKTdw1zoS0oIZRvhoppfwNMEnaXdEB5hUWe+rNCeU6vHbmNQmysyaJH X/AjiY6puWBOB8IA/Z8ymAxU1YdGan//LEpikW3efE90qnSSoYGG9vYRgp2LstYf1h0D hEUFnms+snuaxjpWLRVRASskteICJk0xA13cqYaWa1qQUkmipA7LaSwL+PKBVch033gp Ye0AePw9JG0SIpvoLlMc7w0ArlqNprV7IF32c20OQ1RKRVrLKDpiTwAfycc63L8+UscQ 07IQ== X-Gm-Message-State: ALoCoQlM6vvWDVH0W1XCIZp3YRO2JKGJFIordkaOSuPsDenpzcFYg5msRly3fqEZsCXMEDJK+NmKyx6BszXbJ1v2h2DGUjUPMg== X-Received: by 10.98.16.67 with SMTP id y64mr40299812pfi.152.1450173072443; Tue, 15 Dec 2015 01:51:12 -0800 (PST) Received: from [192.168.1.225] (61-205-82-105m5.grp2.mineo.jp. [61.205.82.105]) by smtp.googlemail.com with ESMTPSA id 1sm1046331pfo.72.2015.12.15.01.51.06 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 15 Dec 2015 01:51:10 -0800 (PST) Subject: Re: [PATCH v12 04/16] arm64: kvm: allows kvm cpu hotplug To: Marc Zyngier , Ashwin Chaugule , Geoff Levand References: <23ca498d5e28017549c6076812d60b18e86fa20e.1448403503.git.geoff@infradead.org> <56604A6E.60102@arm.com> <566A8404.4020507@linaro.org> <566AF9BE.7090806@arm.com> <566E70C7.4070700@linaro.org> <566EFD64.9030209@arm.com> <566FC681.2020900@linaro.org> <566FD32E.2090209@arm.com> From: AKASHI Takahiro Message-ID: <566FE287.4060505@linaro.org> Date: Tue, 15 Dec 2015 18:51:03 +0900 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <566FD32E.2090209@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20151215_015133_750169_01A2D15E X-CRM114-Status: GOOD ( 41.02 ) X-Spam-Score: -2.0 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.0 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [2607:f8b0:400e:c00:0:0:0:22a listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , vikrams@codeaurora.org, Catalin Marinas , Will Deacon , shankerd@codeaurora.org, "linux-arm-kernel@lists.infradead.org" , kexec@lists.infradead.org, Christoffer Dall Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org On 12/15/2015 05:45 PM, Marc Zyngier wrote: > On 15/12/15 07:51, AKASHI Takahiro wrote: >> On 12/15/2015 02:33 AM, Marc Zyngier wrote: >>> On 14/12/15 07:33, AKASHI Takahiro wrote: >>>> Marc, >>>> >>>> On 12/12/2015 01:28 AM, Marc Zyngier wrote: >>>>> On 11/12/15 08:06, AKASHI Takahiro wrote: >>>>>> Ashwin, Marc, >>>>>> >>>>>> On 12/03/2015 10:58 PM, Marc Zyngier wrote: >>>>>>> On 02/12/15 22:40, Ashwin Chaugule wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> On 24 November 2015 at 17:25, Geoff Levand wrote: >>>>>>>>> From: AKASHI Takahiro >>>>>>>>> >>>>>>>>> The current kvm implementation on arm64 does cpu-specific initialization >>>>>>>>> at system boot, and has no way to gracefully shutdown a core in terms of >>>>>>>>> kvm. This prevents, especially, kexec from rebooting the system on a boot >>>>>>>>> core in EL2. >>>>>>>>> >>>>>>>>> This patch adds a cpu tear-down function and also puts an existing cpu-init >>>>>>>>> code into a separate function, kvm_arch_hardware_disable() and >>>>>>>>> kvm_arch_hardware_enable() respectively. >>>>>>>>> We don't need arm64-specific cpu hotplug hook any more. >>>>>>>>> >>>>>>>>> Since this patch modifies common part of code between arm and arm64, one >>>>>>>>> stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid >>>>>>>>> compiling errors. >>>>>>>>> >>>>>>>>> Signed-off-by: AKASHI Takahiro >>>>>>>>> --- >>>>>>>>> arch/arm/include/asm/kvm_host.h | 10 ++++- >>>>>>>>> arch/arm/include/asm/kvm_mmu.h | 1 + >>>>>>>>> arch/arm/kvm/arm.c | 79 ++++++++++++++++++--------------------- >>>>>>>>> arch/arm/kvm/mmu.c | 5 +++ >>>>>>>>> arch/arm64/include/asm/kvm_host.h | 16 +++++++- >>>>>>>>> arch/arm64/include/asm/kvm_mmu.h | 1 + >>>>>>>>> arch/arm64/include/asm/virt.h | 9 +++++ >>>>>>>>> arch/arm64/kvm/hyp-init.S | 33 ++++++++++++++++ >>>>>>>>> arch/arm64/kvm/hyp.S | 32 ++++++++++++++-- >>>>>>>>> 9 files changed, 138 insertions(+), 48 deletions(-) >>>>>>>> >>>>>>>> [..] >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> static struct notifier_block hyp_init_cpu_pm_nb = { >>>>>>>>> @@ -1108,11 +1119,6 @@ static int init_hyp_mode(void) >>>>>>>>> } >>>>>>>>> >>>>>>>>> /* >>>>>>>>> - * Execute the init code on each CPU. >>>>>>>>> - */ >>>>>>>>> - on_each_cpu(cpu_init_hyp_mode, NULL, 1); >>>>>>>>> - >>>>>>>>> - /* >>>>>>>>> * Init HYP view of VGIC >>>>>>>>> */ >>>>>>>>> err = kvm_vgic_hyp_init(); >>>>>>>> >>>>>>>> With this flow, the cpu_init_hyp_mode() is called only at VM guest >>>>>>>> creation, but vgic_hyp_init() is called at bootup. On a system with >>>>>>>> GICv3, it looks like we end up with bogus values from the ICH_VTR_EL2 >>>>>>>> (to get the number of LRs), because we're not reading it from EL2 >>>>>>>> anymore. >>>>>> >>>>>> Thank you for pointing this out. >>>>>> Recently I tested my kdump code on hikey, and as hikey(hi6220) has gic-400, >>>>>> I didn't notice this problem. >>>>> >>>>> Because GIC-400 is a GICv2 implementation, which is entirely MMIO based. >>>>> GICv3 uses some system registers that are only available at EL2, and KVM >>>>> needs some information contained in these registers before being able to >>>>> get initialized. >>>> >>>> I see. >>>> >>>>>>> Indeed, this is completely broken (I just reproduced the issue on a >>>>>>> model). I wish this kind of details had been checked earlier, but thanks >>>>>>> for pointing it out. >>>>>>> >>>>>>>> Whats the best way to fix this? >>>>>>>> - Call kvm_arch_hardware_enable() before vgic_hyp_init() and disable later? >>>>>>>> - Fold the VGIC init stuff back into hardware_enable()? >>>>>>> >>>>>>> None of that works - kvm_arch_hardware_enable() is called once per CPU, >>>>>>> while vgic_hyp_init() can only be called once. Also, >>>>>>> kvm_arch_hardware_enable() is called from interrupt context, and I >>>>>>> wouldn't feel comfortable starting probing DT and allocating stuff from >>>>>>> there. >>>>>> >>>>>> Do you think so? >>>>>> How about the fixup! patch attached below? >>>>>> The point is that, like Ashwin's first idea, we initialize cpus temporarily >>>>>> before kvm_vgic_hyp_init() and then soon reset cpus again. Thus, >>>>>> kvm cpu hotplug will still continue to work as before. >>>>>> Now that cpu_init_hyp_mode() is revived as exactly the same as Marc's >>>>>> original code, the change will not be a big jump. >>>>> >>>>> This seems quite complicated: >>>>> - init EL2 on all CPUs >>>>> - do some initialization >>>>> - tear all CPUs EL2 down >>>>> - let KVM drive the vectors being set or not >>>>> >>>>> My questions are: why do we need to do this on *all* cpus? Can't that >>>>> work on a single one? >>>> >>>> I did initialize all the cpus partly because using preempt_enable/disable >>>> looked a bit ugly and partly because we may, in the future, do additional >>>> per-cpu initialization in kvm_vgic_hyp_init() and/or kvm_timer_hyp_init(). >>>> But if you're comfortable with preempt_*() stuff, I don' care. >>>> >>>> >>>>> Also, the simple fact that we were able to get some junk value is a sign >>>>> that something is amiss. I'd expect a splat of some sort, because we now >>>>> have a possibility of doing things in the wrong context. >>>>> >>>>>> >>>>>> If kvm_hyp_call() in vgic_v3_probe()/kvm_vgic_hyp_init() is a *problem*, >>>>>> I hope this should work. Actually I confirmed that, with this fixup! patch, >>>>>> we could run a kvm guest and also successfully executed kexec on model w/gic-v3. >>>>>> >>>>>> My only concern is the following kernel message I saw when kexec shut down >>>>>> the kernel: >>>>>> (Please note that I was running one kvm quest (pid=961) here.) >>>>>> >>>>>> === >>>>>> sh-4.3# ./kexec -d -e >>>>>> kexec version: 15.11.16.11.06-g41e52e2 >>>>>> arch_process_options:112: command_line: (null) >>>>>> arch_process_options:114: initrd: (null) >>>>>> arch_process_options:115: dtb: (null) >>>>>> arch_process_options:117: port: 0x0 >>>>>> kvm: exiting hardware virtualization >>>>>> kvm [961]: Unsupported exception type: 6248304 <== this message >>>>> >>>>> That makes me feel very uncomfortable. It looks like we've exited a >>>>> guest with some horrible value in X0. How is that even possible? >>>>> >>>>> This deserves to be investigated. >>>> >>>> I guess the problem is that cpu tear-down function is called even if a kvm guest >>>> is still running in kvm_arch_vcpu_ioctl_run(). >>>> So adding a check whether cpu has been initialized or not in every iteration of >>>> kvm_arch_vcpu_ioctl_run() will, if necessary, terminate a guest safely without entering >>>> a guest mode. Since this check is done while interrupt is disabled, it won't >>>> interfere with kvm_arch_hardware_disable() called via IPI. >>>> See the attached fixup patch. >>>> >>>> Again, I verified the code on model. >>>> >>>> Thanks, >>>> -Takahiro AKASHI >>>> >>>>> Thanks, >>>>> >>>>> M. >>>>> >>>> >>>> ----8<---- >>>> From 77f273ba5e0c3dfcf75a5a8d1da8035cc390250c Mon Sep 17 00:00:00 2001 >>>> From: AKASHI Takahiro >>>> Date: Fri, 11 Dec 2015 13:43:35 +0900 >>>> Subject: [PATCH] fixup! arm64: kvm: allows kvm cpu hotplug >>>> >>>> --- >>>> arch/arm/kvm/arm.c | 45 ++++++++++++++++++++++++++++++++++----------- >>>> 1 file changed, 34 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >>>> index 518c3c7..d7e86fb 100644 >>>> --- a/arch/arm/kvm/arm.c >>>> +++ b/arch/arm/kvm/arm.c >>>> @@ -573,7 +573,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) >>>> /* >>>> * Re-check atomic conditions >>>> */ >>>> - if (signal_pending(current)) { >>>> + if (__hyp_get_vectors() == hyp_default_vectors) { >>>> + /* cpu has been torn down */ >>>> + ret = -ENOEXEC; >>>> + run->exit_reason = KVM_EXIT_SHUTDOWN; >>> >>> >>> That feels completely overkill (and very slow). Why don't you maintain a >>> per-cpu variable containing the CPU states, which will avoid calling >>> __hyp_get_vectors() all the time? You should be able to reuse that >>> construct everywhere. >> >> OK. Since I have introduced per-cpu variable, kvm_arm_hardware_enabled, against >> cpuidle issue, we will be able to re-use it. >> >>> Also, I'm not sure about KVM_EXIT_SHUTDOWN. This looks very x86 specific >>> (called on triple fault). >> >> No, I don't think so. > > maz@approximate:~/Work/arm-platforms$ git grep KVM_EXIT_SHUTDOWN > arch/x86/kvm/svm.c: kvm_run->exit_reason = KVM_EXIT_SHUTDOWN; > arch/x86/kvm/vmx.c: vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; > arch/x86/kvm/x86.c: vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; > include/uapi/linux/kvm.h:#define KVM_EXIT_SHUTDOWN 8 > > And that's it. No other architecture ever generates this, and this is > an undocumented API. So I'm not going to let that in until someone actually > defines what this thing means. > >> Looking at kvm_cpu_exec() in kvm-all.c of qemu, KVM_EXIT_SHUTDOWN >> is handled in a generic way and results in a reset request. > > Which is not what we want. We want to indicate that the guest couldn't > be entered. This is not due to a guest doing a triple fault (which is > the way an x86 system gets rebooted). > >> On the other hand, KVM_EXIT_FAIL_ENTRY seems more arch-specific. > > Certainly arch specific, but actually extremely accurate. You couldn't > enter the guest, and you describe the reason in an architecture-specific > fashion. This is also the only exit code that describe this exact case > we're talking about here. > >> In addition, if kvm_vcpu_ioctl() returns a negative value, run->exit_reason >> will never be examined. >> So I think >> ret -> 0 >> run->exit_reason -> KVM_EXIT_SHUTDOWN > > ret = 0 > run->exit_reason = KVM_EXIT_FAIL_ENTRY; > run->hardware_entry_failure_reason = (u64)-ENOEXEC; OK. >> or just >> ret -> -ENOEXEC >> is the best. >> >> In either way, a guest will have no good chance to gracefully shutdown itself >> because we're kexec'ing (without waiting for threads' termination). > > Well, at least userspace gets a chance - and should kexec fail, we have > a chance to recover. Well, the current kexec implementation (on arm64) never fails except very early stage :) So please review the attached fixup patch, again. Thanks, -Takahiro AKASHI > Thanks, > > M. > ----8<---- From ec6c07fe80d6ba96855468f61daffa9b91cf5622 Mon Sep 17 00:00:00 2001 From: AKASHI Takahiro Date: Fri, 11 Dec 2015 13:43:35 +0900 Subject: [PATCH] fixup! arm64: kvm: allows kvm cpu hotplug --- arch/arm/kvm/arm.c | 62 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 42 insertions(+), 20 deletions(-) -- 1.7.9.5 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 518c3c7..05eaa35 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -573,7 +573,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) /* * Re-check atomic conditions */ - if (signal_pending(current)) { + if (!__this_cpu_read(kvm_arm_hardware_enabled)) { + /* cpu has been torn down */ + ret = 0; + run->exit_reason = KVM_EXIT_FAIL_ENTRY; + run->fail_entry.hardware_entry_failure_reason + = (u64)-ENOEXEC; + } else if (signal_pending(current)) { ret = -EINTR; run->exit_reason = KVM_EXIT_INTR; } @@ -950,7 +956,7 @@ long kvm_arch_vm_ioctl(struct file *filp, } } -int kvm_arch_hardware_enable(void) +static void cpu_init_hyp_mode(void) { phys_addr_t boot_pgd_ptr; phys_addr_t pgd_ptr; @@ -958,9 +964,6 @@ int kvm_arch_hardware_enable(void) unsigned long stack_page; unsigned long vector_ptr; - if (__hyp_get_vectors() != hyp_default_vectors) - return 0; - /* Switch from the HYP stub to our own HYP init vector */ __hyp_set_vectors(kvm_get_idmap_vector()); @@ -973,24 +976,38 @@ int kvm_arch_hardware_enable(void) __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr); kvm_arm_init_debug(); - - return 0; } -void kvm_arch_hardware_disable(void) +static void cpu_reset_hyp_mode(void) { phys_addr_t boot_pgd_ptr; phys_addr_t phys_idmap_start; - if (__hyp_get_vectors() == hyp_default_vectors) - return; - boot_pgd_ptr = kvm_mmu_get_boot_httbr(); phys_idmap_start = kvm_get_idmap_start(); __cpu_reset_hyp_mode(boot_pgd_ptr, phys_idmap_start); } +int kvm_arch_hardware_enable(void) +{ + if (!__this_cpu_read(kvm_arm_hardware_enabled)) { + cpu_init_hyp_mode(); + __this_cpu_write(kvm_arm_hardware_enabled, 1); + } + + return 0; +} + +void kvm_arch_hardware_disable(void) +{ + if (!__this_cpu_read(kvm_arm_hardware_enabled)) + return; + + cpu_reset_hyp_mode(); + __this_cpu_write(kvm_arm_hardware_enabled, 0); +} + #ifdef CONFIG_CPU_PM static int hyp_init_cpu_pm_notifier(struct notifier_block *self, unsigned long cmd, @@ -998,19 +1015,13 @@ static int hyp_init_cpu_pm_notifier(struct notifier_block *self, { switch (cmd) { case CPU_PM_ENTER: - if (__hyp_get_vectors() != hyp_default_vectors) - __this_cpu_write(kvm_arm_hardware_enabled, 1); - else - __this_cpu_write(kvm_arm_hardware_enabled, 0); - /* - * don't call kvm_arch_hardware_disable() in case of - * CPU_PM_ENTER because it does't actually save any state. - */ + if (__this_cpu_read(kvm_arm_hardware_enabled)) + cpu_reset_hyp_mode(); return NOTIFY_OK; case CPU_PM_EXIT: if (__this_cpu_read(kvm_arm_hardware_enabled)) - kvm_arch_hardware_enable(); + cpu_init_hyp_mode(); return NOTIFY_OK; @@ -1114,9 +1125,20 @@ static int init_hyp_mode(void) } /* + * Init this CPU temporarily to execute kvm_hyp_call() + * during kvm_vgic_hyp_init(). + */ + preempt_disable(); + cpu_init_hyp_mode(); + + /* * Init HYP view of VGIC */ err = kvm_vgic_hyp_init(); + + cpu_reset_hyp_mode(); + preempt_enable(); + if (err) goto out_free_context;