mbox series

[hyperv-next,v5,00/11] arm64: hyperv: Support Virtual Trust Level Boot

Message ID 20250307220304.247725-1-romank@linux.microsoft.com
Headers show
Series arm64: hyperv: Support Virtual Trust Level Boot | expand

Message

Roman Kisel March 7, 2025, 10:02 p.m. UTC
This patch set allows the Hyper-V code to boot on ARM64 inside a Virtual Trust
Level. These levels are a part of the Virtual Secure Mode documented in the
Top-Level Functional Specification available at
https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/vsm.

The OpenHCL paravisor https://github.com/microsoft/openvmm/tree/main/openhcl
can serve as a practical application of these patches on ARM64.

For validation, I built kernels for the {x86_64, ARM64} x {VTL0, VTL2} set with
a small initrd embedded into the kernel and booted VMs managed by Hyper-V and
OpenVMM off of that.

Starting from V5, the patch series includes a non-functional change to KVM on
arm64 which I tested as well.

[V5]
    - Provide and use a common SMCCC-based infra for the arm64 hypervisor guests
      to detect hypervisor presence.
    ** Thank you, Arnd! **

    - Fix line wraps to follow the rest of the code.
    - Open-code getting IRQ domain parent in the ACPI case to make the code
      better.
    ** Thank you, Bjorn! **

    - Test the binding with the latest dtschema.
    - Clean up the commit title and description.
    - Use proper defines for known constants.
    ** Thank you, Krzysztof! **

    - Extend comment on why ACPI v6 is checked for.
    - Reorder patches to make sure that even with partial series application
      the compilation succeeds.
    - Report VTL the kernel runs in.
    - Use "X86_64" in Kconfig rather than "X86".
    - Extract a non-functional change for hv_get_vmbus_root_device() into
      a separate patch.
    ** Thank you, Michael! **

[V4]
    https://lore.kernel.org/linux-hyperv/20250212014321.1108840-1-romank@linux.microsoft.com/
    - Fixed wording to match acronyms defined in the "Terms and Abbreviations"
      section of the SMCCC specification throughout the patch series.
      **Thank you, Michael!**

    - Replaced the hypervisor ID containing ASCII with an UUID as
      required by the specification.
      **Thank you, Michael!**

    - Added an explicit check for `SMCCC_RET_NOT_SUPPORTED` when discovering the
      hypervisor presence to make the backward compatibility obvious.
      **Thank you, Saurabh!**

    - Split the fix for `get_vtl(void)` out to make it easier to backport.
    - Refactored the configuration options as requested to eliminate the risk
      of building non-functional kernels with randomly selected options.
      **Thank you, Michael!**

    - Refactored the changes not to introduce an additional file with
      a one-line function.
      **Thank you, Wei!**

    - Fixed change description for the VMBus DeviceTree changes, used
      `scripts/get_maintainers.pl` on the latest kernel to get the up-to-date list
      of maintainers as requested.
      **Thank you, Krzysztof!**

    - Removed the added (paranoidal+superfluous) checks for DMA coherence in the
      VMBus driver and instead relied on the DMA and the OF subsystem code.
      **Thank you, Arnd, Krzysztof, Michael!**

    - Used another set of APIs for discovering the hardware interrupt number
      in the VMBus driver to be able to build the driver as a module.
      **Thank you, Michael, Saurabh!**

    - Renamed the newly introduced `get_vmbus_root_device(void)` function to
      `hv_get_vmbus_root_device(void)` as requested.
      **Thank you, Wei!**

    - Applied the suggested small-scale refactoring to simplify changes to the Hyper-V
      PCI driver. Taking the offered liberty of doing the large scale refactoring
      in another patch series.
      **Thank you, Michael!**

    - Added a fix for the issue discovered internally where the CPU would not
      get the interrupt from a PCI device attached to VTL2 as the shared peripheral
      interrupt number (SPI) was not offset by 32 (the first valid SPI number).
      **Thank you, Brian!**

[V3]
    https://lore.kernel.org/lkml/20240726225910.1912537-1-romank@linux.microsoft.com/
    - Employed the SMCCC function recently implemented in the Microsoft Hyper-V
      hypervisor to detect running on Hyper-V/arm64. No dependence on ACPI/DT is
      needed anymore although the source code still falls back to ACPI as the new
      hypervisor might be available only in the Windows Insiders channel just
      yet.
    - As a part of the above, refactored detecting the hypervisor via ACPI FADT.
    - There was a suggestion to explore whether it is feasible or not to express
      that ACPI must be absent for the VTL mode and present for the regular guests
      in the Hyper-V Kconfig file.
      My current conclusion is that this will require refactoring in many places.
      That becomes especially convoluted on x86_64 due to the MSI and APIC
      dependencies. I'd ask to let us tackle that in another patch series (or chalk
      up to nice-have's rather than fires to put out) to separate concerns and
      decrease chances of breakage.
    - While refactoring `get_vtl(void)` and the related code, fixed the hypercall
      output address not to overlap with the input as the Hyper-V TLFS mandates:
      "The input and output parameter lists cannot overlap or cross page boundaries."
      See https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/hypercall-interface
      for more.
      Some might argue that should've been a topic for a separate patch series;
      I'd counter that the change is well-contained (one line), has no dependencies,
      and makes the code legal.
    - Made the VTL boot code (c)leaner as was suggested.
    - Set DMA cache coherency for the VMBus.
    - Updated DT bindings in the VMBus documentation (separated out into a new patch).
    - Fixed `vmbus_set_irq` to use the API that works both for the ACPI and OF.
    - Reworked setting up the vPCI MSI IRQ domain in the non-ACPI case. The logic
      looks a bit fiddly/ad-hoc as I couldn't find the API that would fit the bill.
      Added comments to explain myself.

[V2]
    https://lore.kernel.org/all/20240514224508.212318-1-romank@linux.microsoft.com/
    - Decreased number of #ifdef's
    - Updated the wording in the commit messages to adhere to the guidlines
    - Sending to the correct set of maintainers and mail lists

[V1]
    https://lore.kernel.org/all/20240510160602.1311352-1-romank@linux.microsoft.com/

Roman Kisel (11):
  arm64: kvm, smccc: Introduce and use API for detectting hypervisor
    presence
  arm64: hyperv: Use SMCCC to detect hypervisor presence
  Drivers: hv: Enable VTL mode for arm64
  Drivers: hv: Provide arch-neutral implementation of get_vtl()
  arm64: hyperv: Initialize the Virtual Trust Level field
  arm64, x86: hyperv: Report the VTL the system boots in
  dt-bindings: microsoft,vmbus: Add interrupts and DMA coherence
  Drivers: hv: vmbus: Get the IRQ number from DeviceTree
  Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()
  ACPI: irq: Introduce acpi_get_gsi_dispatcher()
  PCI: hv: Get vPCI MSI IRQ domain from DeviceTree

 .../bindings/bus/microsoft,vmbus.yaml         |  8 +-
 arch/arm64/hyperv/mshyperv.c                  | 46 +++++++++--
 arch/arm64/kvm/hypercalls.c                   |  5 +-
 arch/x86/hyperv/hv_init.c                     | 34 --------
 arch/x86/hyperv/hv_vtl.c                      |  2 +-
 drivers/acpi/irq.c                            | 14 +++-
 drivers/firmware/smccc/kvm_guest.c            | 10 +--
 drivers/firmware/smccc/smccc.c                | 19 +++++
 drivers/hv/Kconfig                            | 10 ++-
 drivers/hv/hv_common.c                        | 31 ++++++++
 drivers/hv/vmbus_drv.c                        | 59 ++++++++++++--
 drivers/pci/controller/pci-hyperv.c           | 79 +++++++++++++++++--
 include/asm-generic/mshyperv.h                |  6 ++
 include/hyperv/hvgdk_mini.h                   |  2 +-
 include/linux/acpi.h                          |  5 +-
 include/linux/arm-smccc.h                     | 55 ++++++++++++-
 include/linux/hyperv.h                        |  2 +
 17 files changed, 308 insertions(+), 79 deletions(-)


base-commit: 3a7f7785eae7cf012af128ca9e383c91e4955354

Comments

Arnd Bergmann March 8, 2025, 9:05 p.m. UTC | #1
On Fri, Mar 7, 2025, at 23:02, Roman Kisel wrote:
> @@ -5,18 +5,20 @@ menu "Microsoft Hyper-V guest support"
>  config HYPERV
>  	tristate "Microsoft Hyper-V client drivers"
>  	depends on (X86 && X86_LOCAL_APIC && HYPERVISOR_GUEST) \
> -		|| (ACPI && ARM64 && !CPU_BIG_ENDIAN)
> +		|| (ARM64 && !CPU_BIG_ENDIAN)
> +	depends on (ACPI || HYPERV_VTL_MODE)
>  	select PARAVIRT
>  	select X86_HV_CALLBACK_VECTOR if X86
> -	select OF_EARLY_FLATTREE if OF
>  	help
>  	  Select this option to run Linux as a Hyper-V client operating
>  	  system.
> 
>  config HYPERV_VTL_MODE
>  	bool "Enable Linux to boot in VTL context"
> -	depends on X86_64 && HYPERV
> +	depends on (X86_64 || ARM64)
>  	depends on SMP
> +	select OF_EARLY_FLATTREE
> +	select OF
>  	default n
>  	help

Having the dependency below the top-level Kconfig entry feels a little
counterintuitive. You could flip that back as it was before by doing

      select HYPERV_VTL_MODE if !ACPI
      depends on ACPI || SMP

in the HYPERV option, leaving the dependency on HYPERV in
HYPERV_VTL_MODE.

Is OF_EARLY_FLATTREE actually needed on x86?

      Arnd
Roman Kisel March 10, 2025, 5:35 p.m. UTC | #2
On 3/8/2025 1:05 PM, Arnd Bergmann wrote:
> On Fri, Mar 7, 2025, at 23:02, Roman Kisel wrote:
>> @@ -5,18 +5,20 @@ menu "Microsoft Hyper-V guest support"
>>   config HYPERV
>>   	tristate "Microsoft Hyper-V client drivers"
>>   	depends on (X86 && X86_LOCAL_APIC && HYPERVISOR_GUEST) \
>> -		|| (ACPI && ARM64 && !CPU_BIG_ENDIAN)
>> +		|| (ARM64 && !CPU_BIG_ENDIAN)
>> +	depends on (ACPI || HYPERV_VTL_MODE)
>>   	select PARAVIRT
>>   	select X86_HV_CALLBACK_VECTOR if X86
>> -	select OF_EARLY_FLATTREE if OF
>>   	help
>>   	  Select this option to run Linux as a Hyper-V client operating
>>   	  system.
>>
>>   config HYPERV_VTL_MODE
>>   	bool "Enable Linux to boot in VTL context"
>> -	depends on X86_64 && HYPERV
>> +	depends on (X86_64 || ARM64)
>>   	depends on SMP
>> +	select OF_EARLY_FLATTREE
>> +	select OF
>>   	default n
>>   	help
> 
> Having the dependency below the top-level Kconfig entry feels a little
> counterintuitive. You could flip that back as it was before by doing
> 
>        select HYPERV_VTL_MODE if !ACPI
>        depends on ACPI || SMP
> 
> in the HYPERV option, leaving the dependency on HYPERV in
> HYPERV_VTL_MODE.
> 

I was implementing Michael's suggestion, and might've gone a bit
overboard, my bad. I'll fix this, thanks a lot for reviewing!

> Is OF_EARLY_FLATTREE actually needed on x86?
> 

No, it is not needed on x86. It is only needed when VTL mode is used.

>        Arnd
Michael Kelley March 10, 2025, 9:01 p.m. UTC | #3
From: Arnd Bergmann <arnd@arndb.de> Sent: Saturday, March 8, 2025 1:05 PM
> 
> On Fri, Mar 7, 2025, at 23:02, Roman Kisel wrote:
> > @@ -5,18 +5,20 @@ menu "Microsoft Hyper-V guest support"
> >  config HYPERV
> >  	tristate "Microsoft Hyper-V client drivers"
> >  	depends on (X86 && X86_LOCAL_APIC && HYPERVISOR_GUEST) \
> > -		|| (ACPI && ARM64 && !CPU_BIG_ENDIAN)
> > +		|| (ARM64 && !CPU_BIG_ENDIAN)
> > +	depends on (ACPI || HYPERV_VTL_MODE)
> >  	select PARAVIRT
> >  	select X86_HV_CALLBACK_VECTOR if X86
> > -	select OF_EARLY_FLATTREE if OF
> >  	help
> >  	  Select this option to run Linux as a Hyper-V client operating
> >  	  system.
> >
> >  config HYPERV_VTL_MODE
> >  	bool "Enable Linux to boot in VTL context"
> > -	depends on X86_64 && HYPERV
> > +	depends on (X86_64 || ARM64)
> >  	depends on SMP
> > +	select OF_EARLY_FLATTREE
> > +	select OF
> >  	default n
> >  	help
> 
> Having the dependency below the top-level Kconfig entry feels a little
> counterintuitive. You could flip that back as it was before by doing
> 
>       select HYPERV_VTL_MODE if !ACPI
>       depends on ACPI || SMP
> 
> in the HYPERV option, leaving the dependency on HYPERV in
> HYPERV_VTL_MODE.

I would argue that we don't ever want to implicitly select
HYPERV_VTL_MODE because of some other config setting or
lack thereof.  VTL mode is enough of a special case that it should
only be explicitly selected. If someone omits ACPI, then HYPERV
should not be selectable unless HYPERV_VTL_MODE is explicitly
selected.

The last line of the comment for HYPERV_VTL_MODE says
"A kernel built with this option must run at VTL2, and will not run
as a normal guest."  In other words, don't choose this unless you
100% know that VTL2 is what you want.

Michael

> 
> Is OF_EARLY_FLATTREE actually needed on x86?
> 
>       Arnd
Arnd Bergmann March 10, 2025, 9:20 p.m. UTC | #4
On Mon, Mar 10, 2025, at 22:01, Michael Kelley wrote:
> From: Arnd Bergmann <arnd@arndb.de> Sent: Saturday, March 8, 2025 1:05 PM
>> >  config HYPERV_VTL_MODE
>> >  	bool "Enable Linux to boot in VTL context"
>> > -	depends on X86_64 && HYPERV
>> > +	depends on (X86_64 || ARM64)
>> >  	depends on SMP
>> > +	select OF_EARLY_FLATTREE
>> > +	select OF
>> >  	default n
>> >  	help
>> 
>> Having the dependency below the top-level Kconfig entry feels a little
>> counterintuitive. You could flip that back as it was before by doing
>> 
>>       select HYPERV_VTL_MODE if !ACPI
>>       depends on ACPI || SMP
>> 
>> in the HYPERV option, leaving the dependency on HYPERV in
>> HYPERV_VTL_MODE.
>
> I would argue that we don't ever want to implicitly select
> HYPERV_VTL_MODE because of some other config setting or
> lack thereof.  VTL mode is enough of a special case that it should
> only be explicitly selected. If someone omits ACPI, then HYPERV
> should not be selectable unless HYPERV_VTL_MODE is explicitly
> selected.
>
> The last line of the comment for HYPERV_VTL_MODE says
> "A kernel built with this option must run at VTL2, and will not run
> as a normal guest."  In other words, don't choose this unless you
> 100% know that VTL2 is what you want.

It sounds like the latter is the real problem: enabling a feature
should never prevent something else from working. Can you describe
what VTL context is and why it requires an exception to a rather
fundamental rule here? If you build a kernel that runs on every
single piece of arm64 hardware and every hypervisor, why can't
you add HYPERV_VTL_MODE to that as an option?

      Arnd
Michael Kelley March 10, 2025, 10:18 p.m. UTC | #5
From: Arnd Bergmann <arnd@arndb.de> Sent: Monday, March 10, 2025 2:21 PM
> 
> On Mon, Mar 10, 2025, at 22:01, Michael Kelley wrote:
> > From: Arnd Bergmann <arnd@arndb.de> Sent: Saturday, March 8, 2025 1:05 PM
> >> >  config HYPERV_VTL_MODE
> >> >  	bool "Enable Linux to boot in VTL context"
> >> > -	depends on X86_64 && HYPERV
> >> > +	depends on (X86_64 || ARM64)
> >> >  	depends on SMP
> >> > +	select OF_EARLY_FLATTREE
> >> > +	select OF
> >> >  	default n
> >> >  	help
> >>
> >> Having the dependency below the top-level Kconfig entry feels a little
> >> counterintuitive. You could flip that back as it was before by doing
> >>
> >>       select HYPERV_VTL_MODE if !ACPI
> >>       depends on ACPI || SMP
> >>
> >> in the HYPERV option, leaving the dependency on HYPERV in
> >> HYPERV_VTL_MODE.
> >
> > I would argue that we don't ever want to implicitly select
> > HYPERV_VTL_MODE because of some other config setting or
> > lack thereof.  VTL mode is enough of a special case that it should
> > only be explicitly selected. If someone omits ACPI, then HYPERV
> > should not be selectable unless HYPERV_VTL_MODE is explicitly
> > selected.
> >
> > The last line of the comment for HYPERV_VTL_MODE says
> > "A kernel built with this option must run at VTL2, and will not run
> > as a normal guest."  In other words, don't choose this unless you
> > 100% know that VTL2 is what you want.
> 
> It sounds like the latter is the real problem: enabling a feature
> should never prevent something else from working. Can you describe
> what VTL context is and why it requires an exception to a rather
> fundamental rule here? If you build a kernel that runs on every
> single piece of arm64 hardware and every hypervisor, why can't
> you add HYPERV_VTL_MODE to that as an option?
> 

VTL = Virtual Trust Level, and VSM = Virtual Secure Mode, are Hyper-V's
terminology for offering multiple execution environments with
hierarchical trust in the context of a single VM. A normal guest
operating system runs at VTL 0, and there are no other VTLs in use.
But in some environments, additional software may run as a paravisor
layer between the normal guest OS and the hypervisor. This software
runs at some other VTL > 0, and has a higher privilege level within
the VM than software running at VTL 0 (which is the lowest privilege).
VTL 2 is used today in the Azure cloud with CoCo VMs to run a
paravisor, and there may be other uses in the future. See [1] if you
want more details on VSM and VTLs. Also [2] for the CoCo VM use
case.

Ideally, a Linux kernel image could detect at runtime what VTL it is
running at, and "do the right thing". Unfortunately, on x86 Linux this
has proved difficult (or perhaps impossible) because the amount of
boot-time setup required to ask the question about the current VTL
is significant. The idiosyncrasies and historical baggage of x86 requires
that Linux do some x86-specific initialization steps for VTL > 0
before the question can be asked. Hence the introduction of
CONFIG_HYPERV_VTL_MODE, and the behavior that when it is
selected, the kernel image won't run normally in VTL 0.

I'll go out on a limb and say that I suspect on arm64 a runtime
determination based on querying the VTL *could* be made (though
I'm not the person writing the code). But taking advantage of that
on arm64 produces an undesirable dichotomy with x86.

Roman may have further thoughts on the topic, but that's
what I know about how we got here.

Michael

[1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/vsm
[2] https://techcommunity.microsoft.com/blog/windowsosplatform/openhcl-the-new-open-source-paravisor/4273172
Wei Liu March 12, 2025, 8:31 p.m. UTC | #6
On Wed, Mar 12, 2025 at 11:33:11AM -0700, Roman Kisel wrote:
> 
> 
> On 3/10/2025 3:18 PM, Michael Kelley wrote:
> > From: Arnd Bergmann <arnd@arndb.de> Sent: Monday, March 10, 2025 2:21 PM
> > > 
> > > On Mon, Mar 10, 2025, at 22:01, Michael Kelley wrote:
> > > > From: Arnd Bergmann <arnd@arndb.de> Sent: Saturday, March 8, 2025 1:05 PM
> > > > > >   config HYPERV_VTL_MODE
> > > > > >   	bool "Enable Linux to boot in VTL context"
> > > > > > -	depends on X86_64 && HYPERV
> > > > > > +	depends on (X86_64 || ARM64)
> > > > > >   	depends on SMP
> > > > > > +	select OF_EARLY_FLATTREE
> > > > > > +	select OF
> > > > > >   	default n
> > > > > >   	help
> > > > > 
> > > > > Having the dependency below the top-level Kconfig entry feels a little
> > > > > counterintuitive. You could flip that back as it was before by doing
> > > > > 
> > > > >        select HYPERV_VTL_MODE if !ACPI
> > > > >        depends on ACPI || SMP
> > > > > 
> > > > > in the HYPERV option, leaving the dependency on HYPERV in
> > > > > HYPERV_VTL_MODE.
> > > > 
> > > > I would argue that we don't ever want to implicitly select
> > > > HYPERV_VTL_MODE because of some other config setting or
> > > > lack thereof.  VTL mode is enough of a special case that it should
> > > > only be explicitly selected. If someone omits ACPI, then HYPERV
> > > > should not be selectable unless HYPERV_VTL_MODE is explicitly
> > > > selected.
> > > > 
> > > > The last line of the comment for HYPERV_VTL_MODE says
> > > > "A kernel built with this option must run at VTL2, and will not run
> > > > as a normal guest."  In other words, don't choose this unless you
> > > > 100% know that VTL2 is what you want.
> > > 
> > > It sounds like the latter is the real problem: enabling a feature
> > > should never prevent something else from working. Can you describe
> > > what VTL context is and why it requires an exception to a rather
> > > fundamental rule here? If you build a kernel that runs on every
> > > single piece of arm64 hardware and every hypervisor, why can't
> > > you add HYPERV_VTL_MODE to that as an option?
> > > 
> 
> In the VTL mode, we're running the kernel as secure firmware inside the
> guest (one might see VTL2 working as Intel SMM or Secure World on ARM).
> 
> [...]
> 
> > 
> > Ideally, a Linux kernel image could detect at runtime what VTL it is
> > running at, and "do the right thing". Unfortunately, on x86 Linux this
> > has proved difficult (or perhaps impossible) because the amount of
> > boot-time setup required to ask the question about the current VTL
> > is significant. The idiosyncrasies and historical baggage of x86 requires
> > that Linux do some x86-specific initialization steps for VTL > 0
> > before the question can be asked. Hence the introduction of
> > CONFIG_HYPERV_VTL_MODE, and the behavior that when it is
> > selected, the kernel image won't run normally in VTL 0.
> > 
> > I'll go out on a limb and say that I suspect on arm64 a runtime
> > determination based on querying the VTL *could* be made (though
> > I'm not the person writing the code). But taking advantage of that
> > on arm64 produces an undesirable dichotomy with x86.
> 
> On arm64 that is much easier, I agree. On x86 we'd need a kludge of
> 
> static void __naked __init __aligned(4096) early_hvcall_pg(void)
> {
> 	/*
> 	 * Fill the early hvcall page with `0xF1` aka `INT1` to catch
> 	 * programming errors. The hypervisor will overlay the page with
> 	 * the vendor-specific code sequences to make hypercalls on x86(_64).
> 	 */
> 	asm (".skip 4096, 0xf1");
> }
> 
> static u8 __init early_hvcall_pg_input[4096] __attribute__((aligned(4096)));
> static u8 __init early_hvcall_pg_output[4096]
> __attribute__((aligned(4096)));
> 
> static void __init early_connect_to_hv(void)
> {
> 	union hv_x64_msr_hypercall_contents hypercall_msr;
> 	u64 guest_id;
> 
> 	guest_id = hv_generate_guest_id(LINUX_VERSION_CODE);
> 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> 	hypercall_msr.enable = 1;
> 	hypercall_msr.guest_physical_address =
> __phys_to_pfn(virt_to_phys(early_hvcall_pg));
> 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> }
> 
> or variations thereof.

OT here but what's stopping us from doing this on x86?

It seems to me there is some value in setting up the hypercall page as
early as possible. The same page can be used through the lifetime of the
partition. The early input and output pages should be reclaimed.

Also, since the hypervisor will insert an overlay page, it makes sense
to not allocate a page from Linux at all. When I ported Xen to run as
a guest on Hyper-V, I used that approach. The setup worked just fine.

All being said, things work today, so I'm in no hurry to change things.

Wei.