diff mbox

[FYI,01/13] docs: describe QEMU's VMGenID design

Message ID 1442148227-17343-2-git-send-email-lersek@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek Sept. 13, 2015, 12:43 p.m. UTC
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gal Hammer <ghammer@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---

Notes:
    fyi:
    - move from docs/specs/ to docs/ [Eric, Paolo]
    - fix grammar [Eric]
    - clarify that requirement R1e covers ROM and MMIO too [Michael]
    - replace '"BOCHS"' with '"BOCHS "' in the DataTableRegion operator, so
      that the OEM ID argument matches ACPI_BUILD_APPNAME6 exactly
    - remove the _CRS with the IO descriptor in it, because Windows' VMGENID
      driver chokes on that (but is okay with the absence of the _CRS). See
      <http://thread.gmane.org/gmane.comp.emulators.qemu/357940/focus=2232>
      for more.
    
    rfc:
    - This is based on the super long private email discussion we had two
      months ago, plus on the IRL discussion between Michael and myself @
      the KVM Forum 2015.

 docs/vmgenid.txt | 336 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 336 insertions(+)
 create mode 100644 docs/vmgenid.txt
diff mbox

Patch

diff --git a/docs/vmgenid.txt b/docs/vmgenid.txt
new file mode 100644
index 0000000..4a9c1d0
--- /dev/null
+++ b/docs/vmgenid.txt
@@ -0,0 +1,336 @@ 
+Virtual Machine Generation ID Device
+====================================
+
+The Microsoft specification entitled "Virtual Machine Generation ID",
+maintained at <http://go.microsoft.com/fwlink/?LinkId=260709>, defines an ACPI
+feature that allows the guest OSPM to recognize when it has been returned "to
+an earlier point in time", e.g. by restoring from snapshot, or by incoming
+migration. Quoting the spec,
+
+    The virtual machine generation ID is a feature whereby the virtual machines
+    BIOS will expose a new ID. This is a 128-bit, cryptographically random
+    integer value identifier that will be different every time the virtual
+    machine executes from a different configuration file-such as executing from
+    a recovered snapshot, or executing after restoring from backup. [...]
+
+The document you are reading now extracts the requirements set forth by the
+VMGenID spec for hypervisors that intend to provide the feature, and describes
+QEMU's implementation. The design below targets both SeaBIOS and OVMF as
+compatible guest firmwares, without any changes to either of them.
+
+Requirements
+------------
+
+These requirements are extracted from the "How to implement virtual machine
+generation ID support in a virtualization platform" section of the
+specification, dated August 1, 2012.
+
+R1a. The generation ID shall live in an 8-byte aligned buffer.
+
+R1b. The buffer holding the generation ID shall be in guest RAM, ROM, or device
+     MMIO range.
+
+R1c. The buffer holding the generation ID shall be kept separate from areas
+     used by the operating system.
+
+R1d. The buffer shall not be covered by an AddressRangeMemory or
+     AddressRangeACPI entry in the E820 or UEFI memory map.
+
+R1e. The generation ID shall not live in a page frame that could be mapped with
+     caching disabled. (In other words, regardless of whether the generation ID
+     lives in RAM, ROM or MMIO, it shall only be mapped as cacheable.)
+
+R2 to R5. [These AML requirements are isolated well enough in the Microsoft
+          specification for us to simply refer to them here.]
+
+R6. The hypervisor shall expose a _HID (hardware identifier) object in the
+    VMGenId device's scope that is unique to the hypervisor vendor.
+
+Generation ID buffer design
+---------------------------
+
+QEMU places the generation ID buffer inside a separate fw_cfg blob that is
+exposed to the guest OS with the ACPI linker/loader.
+
+The structure of the blob is as follows. Offsets, sizes and numeric values are
+given in decimal; furthermore the latter are encoded in little endian.
+
+  Offs  Field               Size  Value
+  ----  ------------------  ----  ------------------------------------
+     0  System Description    36
+        Table Header
+     0    Signature            4                                "UEFI"
+     4    Length               4                                    62
+     8    Revision             1                                     1
+     9    Checksum             1                                     0
+    10    OEMID                6        ACPI_BUILD_APPNAME6 ("BOCHS ")
+    16    OEM Table ID         8                            "QEMUPARM"
+    24    OEM Revision         4                                     1
+    28    Creator ID           4          ACPI_BUILD_APPNAME4 ("BXPC")
+    32    Creator Revision     4                                     1
+
+    36  UEFI Table            18
+        Sub-Header
+    36    Identifier          16  417a5dff-bf4b-4abc-a839-6593bb41f452
+    52    DataOffset           2                                    54
+
+    54  ADDR base pointer      8                                    62
+  ....................................................................
+    62  OVMF SDT Header       36                                zeroes
+        probe suppressor
+    98  VMGenID alignment      6                                zeroes
+        padding
+   104  generation ID         16                       128-bit VMGenID
+   120  fw_cfg blob         3976                                zeroes
+        padding
+  4096  <end of blob>
+
+The fw_cfg blob is divided in two parts conceptually (separated by the dotted
+line in the diagram). The first part, up to and excluding offset 62, is a
+"UEFI" ACPI Table, governed by the UEFI specification 2.5, Appendix O. The
+second part is mainly padding, but it also contains the generation ID.
+
+The "UEFI" ACPI Table -- in the first part -- is a "normal" ACPI table whose
+generic header is defined by the ACPI specification, but for which the UEFI
+spec defines the "UEFI" signature and adds two more fixed fields, "Identifier"
+and "DataOffset".
+
+- The Identifier field carries a 128-bit GUID, and enables firmware
+  implementors to install several "UEFI" tables with different internal
+  structures, enabling OSPM to tell them apart based on the (Type-)Identifier
+  GUID field.
+
+  For the purposes of QEMU's VMGenID implementation, we generated a new GUID
+  with the "uuidgen" utility. It should be different from all other
+  "Identifier" values, present and future, but otherwise no other software need
+  be aware of the concrete GUID value we generated.
+
+- The DataOffset field is just an offset into the table where the actual
+  (Identifier-specific) data starts.
+
+  For the purposes of QEMU's VMGenID implementation, we simply set it to the
+  next (QEMU-specific) field, "ADDR base pointer".
+
+Linker/loader commands
+----------------------
+
+The name of the fw_cfg blob is "etc/acpi/qemuparam". The ALLOCATE command that
+instructs the guest firmware to download this fw_cfg blob specifies an
+alignment of 4096, and the blob will have size 4096 too.
+
+An ADD_POINTER command links the "UEFI" ACPI Table at the start of the blob
+into the RSDT.
+
+Another ADD_POINTER command relocates the "ADDR base pointer" field to the
+absolute address of the "OVMF SDT Header probe suppressor" field, within the
+same blob.
+
+After this relocation, an ADD_CHECKSUM command updates the Checksum field,
+covering the entire "UEFI" ACPI Table (which extends up to and excluding offset
+62).
+
+Blob behavior under SeaBIOS
+---------------------------
+
+(Most of the complexity in the blob is ignored when the guest firmware is
+SeaBIOS.)
+
+- SeaBIOS's ACPI linker/loader client allocates the blob in normal RAM
+  (satisfying R1b).
+
+- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's
+  size is also 4KB, the allocation covers a standalone page frame in full
+  (satisfying R1e).
+
+- The 128-bit VMGenID field is located at offset 104 within that page,
+  resulting in a guest-physical address divisible by 8 (satisfying R1a).
+
+- The blob is marked as Reserved in the E820 map (satisfying R1c and R1d).
+
+- The "UEFI" ACPI Table at the start of the blob is linked into the RSDT,
+  in-place.
+
+- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI
+  Table with the DataTableRegion operator, because the table is located in
+  memory marked as AddressRangeReserved.
+
+- The "ADDR base pointer" field points at "OVMF SDT Header probe suppressor",
+  which is right after the "UEFI" ACPI Table inside the blob. At OSPM runtime,
+  the "ADDR" AML method reads the "ADDR base pointer" field, and adds 42, to
+  arrive at the address of the VMGenID field.
+
+  blob @ page offset 0              RSDT
+  +-----------------------+         +-----+
+  | "UEFI" ACPI Table <---------+   | ... |
+  | +-------------------+ |     |   | ... |
+  | | ...               | |     +---- ... |
+  | | ...               | |         +-----+
+  | | ADDR base pointer -----+
+  | +-------------------+ |  |
+  | probe suppressor <-------+
+  | VMGenID @ offset 104  |
+  | padding               |
+  +-----------------------+
+
+Blob behavior under OVMF
+------------------------
+
+The complexity in the blob is required by the two-pass nature of OVMF's ACPI
+linker/loader client, which in turn comes from the fact that OVMF has to
+dissect blobs into individual ACPI tables vs. "other things", tracking the
+ADD_POINTER commands, so that tables can be installed individually, with
+EFI_ACPI_TABLE_PROTOCOL.
+
+- OVMF's ACPI linker/loader client allocates the blob in normal RAM (satisfying
+  R1b).
+
+- Because the ALLOCATE command prescribes an alignment of 4KB, and the blob's
+  size is also 4KB, the allocation covers a standalone page frame in full
+  (satisfying R1e).
+
+- The 128-bit VMGenID field is located at offset 104 within that page,
+  resulting in a guest-physical address divisible by 8 (satisfying R1a).
+
+- OVMF's ACPI linker/loader allocates the blob in EfiACPIMemoryNVS type memory,
+  therefore it is marked as such in the UEFI memmap (satisfying R1c and R1d).
+
+- OVMF identifies the "UEFI" ACPI Table at the start of the blob in the second
+  pass, following the ADD_POINTER command that is meant to link the table into
+  the RSDT. OVMF installs a *copy* of the "UEFI" ACPI Table with
+  EFI_ACPI_TABLE_PROTOCOL (linking the copy into both RSDT and XSDT). Given the
+  "UEFI" signature of the table, EFI_ACPI_TABLE_PROTOCOL places the copy of the
+  table in EfiACPIMemoryNVS type memory.
+
+- The "ADDR" AML method (see later) is allowed to refer to the "UEFI" ACPI
+  Table with the DataTableRegion operator, because the table is located in
+  memory marked as AddressRangeNVS.
+
+- The "ADDR base pointer" field inside the installed table points at "OVMF SDT
+  Header probe suppressor" in the original blob. Because this field is filled
+  with zeros, OVMF's table identification heuristics unconditionally reports a
+  negative when it tracks the relevant ADD_POINTER command to it in the second
+  pass. Therefore the blob is marked as "hosts something else than just ACPI
+  tables", and it is preserved permanently (in the same EfiACPIMemoryNVS type
+  memory where it has been originally allocated).
+
+  At OSPM runtime, the "ADDR" AML method reads the "ADDR base pointer" field,
+  and adds 42, to arrive at the address of the VMGenID field.
+
+  blob @ page offset 0               RSDT         XSDT
+  +-----------------------------+    +-----+      +-----+
+  | "UEFI" ACPI Table (in blob) |    | ... |      | ... |
+  | +-------------------------+ |    | ... ---+   | ... ---------------+
+  | |XXXXXXXXXXXXXXXXXXXXXXXXX| |    +-----+  |   +-----+              |
+  | |XXXXXXX [unused] XXXXXXXX| |             |                        |
+  | |XXXXXXXXXXXXXXXXXXXXXXXXX| |             +------------------------+
+  | +-------------------------+ |                                      |
+  | probe suppressor <-------------+  "UEFI" ACPI Table (installed) <--+
+  | VMGenID @ offset 104        |  |  +---------------------------+
+  | padding                     |  |  | ...                       |
+  +-----------------------------+  |  | ...                       |
+                                   +--- ADDR base pointer         |
+                                      +---------------------------+
+
+ACPI device, control methods
+----------------------------
+
+Requirements R2 through R6 of the VMGenID specification are satisfied with the
+following ACPI logic, exposed by QEMU's ACPI generator in one of the SSDTs, and
+installed by both guest firmwares as such.
+
+The basic idea is that, when the appropriate guest driver calls the ADDR method
+(see R4), OSPM locates the generation ID field in the 4KB blob that lives in
+E820 Reserved (SeaBIOS) or EfiACPIMemoryNVS type (OVMF) memory. The
+guest-physical address of the field is communicated to QEMU via IO ports
+[0x512..0x519] inclusive. Then QEMU is cued through IO port 0x51A to refresh
+(and keep refreshing when appropriate) the generation ID at the passed back
+address. Finally, the method returns the address to the guest driver too, in
+the format required by R4.
+
+    Scope(\_SB) {
+        Device (VMGI) {
+            /* satisfy R2 */
+            Name (_CID, "VM_Gen_Counter")
+
+            /* satisfy R3 */
+            Name (_DDN, "VM_Gen_Counter")
+
+            /* satisfy R6 */
+            Name (_HID, "QEMU0002")
+
+            /* Device status: present, enabled & decoding resources, should be
+             * shown in the UI, functioning properly.
+             */
+            Name (_STA, 0xF)
+
+            /* Satisfy R4.
+             *
+             * This method is serialized because it creates named objects.
+             */
+            Method (ADDR, 0, Serialized) {
+                /* The 8-byte integer field defined as ADBP below is the
+                 * "ADDR base pointer" field in the UEFI ACPI Table.
+                 *
+                 * The DataTableRegion() operator locates that ACPI table by
+                 * scanning the RSDT/XSDT using the (SignatureString,
+                 * OemIDString, OemTableIDString) triplet as key.
+                 *
+                 * Windows XP would normally crash on the DataTableRegion()
+                 * operator, but it never calls the ADDR method, hence it never
+                 * reaches or evaluates DataTableRegion().
+                 */
+                DataTableRegion (TBLR, "UEFI", "BOCHS ", "QEMUPARM")
+                Field (TBLR, AnyAcc, NoLock, Preserve) {
+                  Offset (54),
+                  ADBP, 64
+                }
+
+                /* The first two 4-byte ports are used to communicate the
+                 * 64-bit guest-physical address of the actual (relocated)
+                 * 128-bit generation ID field to QEMU, in little endian
+                 * encoding, so that QEMU can rewrite that field in guest RAM.
+                 *
+                 * A write to last 1-byte port signals that the address has
+                 * been written fully, and QEMU is free to dereference it.
+                 */
+                OperationRegion (VMGR, SystemIO, 0x512, 9)
+                Field (VMGR, DWordAcc, NoLock, Preserve) {
+                    PTLO, 32,
+                    PTHI, 32,
+                    AccessAs (ByteAcc),
+                    DONE, 8
+                }
+
+                /* The ADBP field points to the "OVMF SDT Header probe
+                 * suppressor" area in the blob, at offset 62. In order to
+                 * arrive at the generation ID field at offset 104, we must add
+                 * 42 dynamically.
+                 *
+                 * The RESU buffer below will contain the result of the
+                 * addition. The ADFU field exposes it as an 8-byte integer
+                 * (for storing the sum), while the ADLO and ADHI fields enable
+                 * us to access the result in two separate 4-byte integers.
+                 * This exact integer width is especially important for
+                 * composing the package object that the ADDR method must
+                 * return.
+                 */
+                Name (RESU, Buffer (8) {})
+                CreateQWordField (RESU, 0, ADFU)
+                CreateDWordField (RESU, 0, ADLO)
+                CreateDWordField (RESU, 4, ADHI)
+
+                Add (ADBP, 42, ADFU)
+                Store (ADLO, PTLO)
+                Store (ADHI, PTHI)
+                Store (0, DONE)
+                Return (Package (2) { ADLO, ADHI })
+            }
+        }
+    }
+
+    /* satisfy R5 */
+    Scope (\_GPE) {
+        Method (_E04) {
+            Notify (\_SB.VMGI, 0x80)
+        }
+    }