Message ID | 20190321104745.28068-8-shameerali.kolothum.thodi@huawei.com |
---|---|
State | New |
Headers | show |
Series | ARM virt: ACPI memory hotplug support | expand |
Hi Shameer, [ + Laszlo, Ard, Leif ] On 3/21/19 11:47 AM, Shameer Kolothum wrote: > This is to disable/enable populating DT nodes in case > any conflict with acpi tables. The default is "off". The name of the option sounds misleading to me. Also we don't really know the scope of the disablement. At the moment this just aims to prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > > This will be used in subsequent patch where cold plug > device-memory support is added for DT boot. I am concerned about the fact that in dt mode, by default, you won't see any PCDIMM nodes. > > If DT memory node support is added for cold-plugged device > memory, those memory will be visible to Guest kernel via > UEFI GetMemoryMap() and gets treated as early boot memory. Don't we have an issue in UEFI then. Normally the SRAT indicates whether the slots are hotpluggable or not. Shouldn't the UEFI code look at this info. > Hence memory becomes non hot-un-unpluggable even if Guest > is booted in ACPI mode. > > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > --- > hw/arm/virt.c | 23 +++++++++++++++++++++++ > include/hw/arm/virt.h | 1 + > 2 files changed, 24 insertions(+) > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > index 13db0e9..b602151 100644 > --- a/hw/arm/virt.c > +++ b/hw/arm/virt.c > @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) > vms->highmem = value; > } > > +static bool virt_get_fdt(Object *obj, Error **errp) > +{ > + VirtMachineState *vms = VIRT_MACHINE(obj); > + > + return vms->use_fdt; > +} > + > +static void virt_set_fdt(Object *obj, bool value, Error **errp) > +{ > + VirtMachineState *vms = VIRT_MACHINE(obj); > + > + vms->use_fdt = value; > +} > + > static bool virt_get_its(Object *obj, Error **errp) > { > VirtMachineState *vms = VIRT_MACHINE(obj); > @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) > object_property_set_description(obj, "gic-version", > "Set GIC version. " > "Valid values are 2, 3 and host", NULL); > + /* fdt is disabled by default */ > + vms->use_fdt = false; > + object_property_add_bool(obj, "fdt", virt_get_fdt, > + virt_set_fdt, NULL); > + object_property_set_description(obj, "fdt", > + "Set on/off to enable/disable device tree " > + "nodes in case any conflict with ACPI" in case of Thanks Eric > + "(eg: device memory node)", > + NULL); > > vms->highmem_ecam = !vmc->no_highmem_ecam; > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h > index c5e4c96..14b2e0a 100644 > --- a/include/hw/arm/virt.h > +++ b/include/hw/arm/virt.h > @@ -119,6 +119,7 @@ typedef struct { > bool highmem_ecam; > bool its; > bool virt; > + bool use_fdt; > int32_t gic_version; > VirtIOMMUType iommu; > struct arm_boot_info bootinfo; >
Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.auger@redhat.com] > Sent: 29 March 2019 09:32 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > sameo@linux.intel.com; sebastien.boeuf@intel.com > Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > > Hi Shameer, > > [ + Laszlo, Ard, Leif ] > > On 3/21/19 11:47 AM, Shameer Kolothum wrote: > > This is to disable/enable populating DT nodes in case > > any conflict with acpi tables. The default is "off". > The name of the option sounds misleading to me. Also we don't really > know the scope of the disablement. At the moment this just aims to > prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. Yes, I was not very happy with the name "fdt". If this is not useful other than this device memory conflict case, then we can be more specific. But I am not sure we might need this for any other future conflicts, hence a more generic name. May be "force_fdt" or "dimm_fdt"? I am open to suggestions. > > > > This will be used in subsequent patch where cold plug > > device-memory support is added for DT boot. > I am concerned about the fact that in dt mode, by default, you won't see > any PCDIMM nodes. True. But is there any other way to detect that the Guest is using DT? Thanks, Shameer > > If DT memory node support is added for cold-plugged device > > memory, those memory will be visible to Guest kernel via > > UEFI GetMemoryMap() and gets treated as early boot memory. > Don't we have an issue in UEFI then. Normally the SRAT indicates whether > the slots are hotpluggable or not. Shouldn't the UEFI code look at this > info. > > Hence memory becomes non hot-un-unpluggable even if Guest > > is booted in ACPI mode. > > > > > > > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > > --- > > hw/arm/virt.c | 23 +++++++++++++++++++++++ > > include/hw/arm/virt.h | 1 + > > 2 files changed, 24 insertions(+) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 13db0e9..b602151 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool > value, Error **errp) > > vms->highmem = value; > > } > > > > +static bool virt_get_fdt(Object *obj, Error **errp) > > +{ > > + VirtMachineState *vms = VIRT_MACHINE(obj); > > + > > + return vms->use_fdt; > > +} > > + > > +static void virt_set_fdt(Object *obj, bool value, Error **errp) > > +{ > > + VirtMachineState *vms = VIRT_MACHINE(obj); > > + > > + vms->use_fdt = value; > > +} > > + > > static bool virt_get_its(Object *obj, Error **errp) > > { > > VirtMachineState *vms = VIRT_MACHINE(obj); > > @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) > > object_property_set_description(obj, "gic-version", > > "Set GIC version. " > > "Valid values are 2, 3 and host", > NULL); > > + /* fdt is disabled by default */ > > + vms->use_fdt = false; > > + object_property_add_bool(obj, "fdt", virt_get_fdt, > > + virt_set_fdt, NULL); > > + object_property_set_description(obj, "fdt", > > + "Set on/off to enable/disable > device tree " > > + "nodes in case any conflict with > ACPI" > in case of > > Thanks > > Eric > > + "(eg: device memory node)", > > + NULL); > > > > vms->highmem_ecam = !vmc->no_highmem_ecam; > > > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h > > index c5e4c96..14b2e0a 100644 > > --- a/include/hw/arm/virt.h > > +++ b/include/hw/arm/virt.h > > @@ -119,6 +119,7 @@ typedef struct { > > bool highmem_ecam; > > bool its; > > bool virt; > > + bool use_fdt; > > int32_t gic_version; > > VirtIOMMUType iommu; > > struct arm_boot_info bootinfo; > >
> -----Original Message----- > From: Auger Eric [mailto:eric.auger@redhat.com] > Sent: 29 March 2019 09:32 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > sameo@linux.intel.com; sebastien.boeuf@intel.com > Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > > Hi Shameer, > > [ + Laszlo, Ard, Leif ] > > On 3/21/19 11:47 AM, Shameer Kolothum wrote: > > This is to disable/enable populating DT nodes in case > > any conflict with acpi tables. The default is "off". > The name of the option sounds misleading to me. Also we don't really > know the scope of the disablement. At the moment this just aims to > prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > > > > > This will be used in subsequent patch where cold plug > > device-memory support is added for DT boot. > I am concerned about the fact that in dt mode, by default, you won't see > any PCDIMM nodes. > > > > If DT memory node support is added for cold-plugged device > > memory, those memory will be visible to Guest kernel via > > UEFI GetMemoryMap() and gets treated as early boot memory. > Don't we have an issue in UEFI then. Normally the SRAT indicates whether > the slots are hotpluggable or not. Shouldn't the UEFI code look at this > info. Sorry I missed this part. Yes, that will be a more cleaner solution. Also, to be more clear on what happens, Guest ACPI boot with "fdt=on" , From kernel log, [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] Guest ACPI boot with "fdt=off" , [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff The hotpluggable memory node is absent from early memory nodes here. As you said, it could be possible to detect this node using SRAT in UEFI. Cheers, Shameer > > Hence memory becomes non hot-un-unpluggable even if Guest > > is booted in ACPI mode. > > > > > > > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > > --- > > hw/arm/virt.c | 23 +++++++++++++++++++++++ > > include/hw/arm/virt.h | 1 + > > 2 files changed, 24 insertions(+) > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c > > index 13db0e9..b602151 100644 > > --- a/hw/arm/virt.c > > +++ b/hw/arm/virt.c > > @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool > value, Error **errp) > > vms->highmem = value; > > } > > > > +static bool virt_get_fdt(Object *obj, Error **errp) > > +{ > > + VirtMachineState *vms = VIRT_MACHINE(obj); > > + > > + return vms->use_fdt; > > +} > > + > > +static void virt_set_fdt(Object *obj, bool value, Error **errp) > > +{ > > + VirtMachineState *vms = VIRT_MACHINE(obj); > > + > > + vms->use_fdt = value; > > +} > > + > > static bool virt_get_its(Object *obj, Error **errp) > > { > > VirtMachineState *vms = VIRT_MACHINE(obj); > > @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) > > object_property_set_description(obj, "gic-version", > > "Set GIC version. " > > "Valid values are 2, 3 and host", > NULL); > > + /* fdt is disabled by default */ > > + vms->use_fdt = false; > > + object_property_add_bool(obj, "fdt", virt_get_fdt, > > + virt_set_fdt, NULL); > > + object_property_set_description(obj, "fdt", > > + "Set on/off to enable/disable > device tree " > > + "nodes in case any conflict with > ACPI" > in case of > > Thanks > > Eric > > + "(eg: device memory node)", > > + NULL); > > > > vms->highmem_ecam = !vmc->no_highmem_ecam; > > > > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h > > index c5e4c96..14b2e0a 100644 > > --- a/include/hw/arm/virt.h > > +++ b/include/hw/arm/virt.h > > @@ -119,6 +119,7 @@ typedef struct { > > bool highmem_ecam; > > bool its; > > bool virt; > > + bool use_fdt; > > int32_t gic_version; > > VirtIOMMUType iommu; > > struct arm_boot_info bootinfo; > >
Hi Shameer, On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > > >> -----Original Message----- >> From: Auger Eric [mailto:eric.auger@redhat.com] >> Sent: 29 March 2019 09:32 >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >> sameo@linux.intel.com; sebastien.boeuf@intel.com >> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >> >> Hi Shameer, >> >> [ + Laszlo, Ard, Leif ] >> >> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>> This is to disable/enable populating DT nodes in case >>> any conflict with acpi tables. The default is "off". >> The name of the option sounds misleading to me. Also we don't really >> know the scope of the disablement. At the moment this just aims to >> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >> >>> >>> This will be used in subsequent patch where cold plug >>> device-memory support is added for DT boot. >> I am concerned about the fact that in dt mode, by default, you won't see >> any PCDIMM nodes. >>> >>> If DT memory node support is added for cold-plugged device >>> memory, those memory will be visible to Guest kernel via >>> UEFI GetMemoryMap() and gets treated as early boot memory. >> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >> info. > > Sorry I missed this part. Yes, that will be a more cleaner solution. > > Also, to be more clear on what happens, > > Guest ACPI boot with "fdt=on" , > > From kernel log, > > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. > [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] > > > Guest ACPI boot with "fdt=off" , > > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff > > The hotpluggable memory node is absent from early memory nodes here. OK thank you for the example illustrating the concern. > > As you said, it could be possible to detect this node using SRAT in UEFI. Let's wait for EDK2 experts on this. Thanks Eric > > Cheers, > Shameer > >>> Hence memory becomes non hot-un-unpluggable even if Guest >>> is booted in ACPI mode. >> >> >> >>> >>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> >>> --- >>> hw/arm/virt.c | 23 +++++++++++++++++++++++ >>> include/hw/arm/virt.h | 1 + >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 13db0e9..b602151 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool >> value, Error **errp) >>> vms->highmem = value; >>> } >>> >>> +static bool virt_get_fdt(Object *obj, Error **errp) >>> +{ >>> + VirtMachineState *vms = VIRT_MACHINE(obj); >>> + >>> + return vms->use_fdt; >>> +} >>> + >>> +static void virt_set_fdt(Object *obj, bool value, Error **errp) >>> +{ >>> + VirtMachineState *vms = VIRT_MACHINE(obj); >>> + >>> + vms->use_fdt = value; >>> +} >>> + >>> static bool virt_get_its(Object *obj, Error **errp) >>> { >>> VirtMachineState *vms = VIRT_MACHINE(obj); >>> @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) >>> object_property_set_description(obj, "gic-version", >>> "Set GIC version. " >>> "Valid values are 2, 3 and host", >> NULL); >>> + /* fdt is disabled by default */ >>> + vms->use_fdt = false; >>> + object_property_add_bool(obj, "fdt", virt_get_fdt, >>> + virt_set_fdt, NULL); >>> + object_property_set_description(obj, "fdt", >>> + "Set on/off to enable/disable >> device tree " >>> + "nodes in case any conflict with >> ACPI" >> in case of >> >> Thanks >> >> Eric >>> + "(eg: device memory node)", >>> + NULL); >>> >>> vms->highmem_ecam = !vmc->no_highmem_ecam; >>> >>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h >>> index c5e4c96..14b2e0a 100644 >>> --- a/include/hw/arm/virt.h >>> +++ b/include/hw/arm/virt.h >>> @@ -119,6 +119,7 @@ typedef struct { >>> bool highmem_ecam; >>> bool its; >>> bool virt; >>> + bool use_fdt; >>> int32_t gic_version; >>> VirtIOMMUType iommu; >>> struct arm_boot_info bootinfo; >>>
On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: > > Hi Shameer, > > On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > > > > > >> -----Original Message----- > >> From: Auger Eric [mailto:eric.auger@redhat.com] > >> Sent: 29 March 2019 09:32 > >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > >> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > >> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > >> sameo@linux.intel.com; sebastien.boeuf@intel.com > >> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > >> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > >> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > >> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > >> > >> Hi Shameer, > >> > >> [ + Laszlo, Ard, Leif ] > >> > >> On 3/21/19 11:47 AM, Shameer Kolothum wrote: > >>> This is to disable/enable populating DT nodes in case > >>> any conflict with acpi tables. The default is "off". > >> The name of the option sounds misleading to me. Also we don't really > >> know the scope of the disablement. At the moment this just aims to > >> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > >> > >>> > >>> This will be used in subsequent patch where cold plug > >>> device-memory support is added for DT boot. > >> I am concerned about the fact that in dt mode, by default, you won't see > >> any PCDIMM nodes. > >>> > >>> If DT memory node support is added for cold-plugged device > >>> memory, those memory will be visible to Guest kernel via > >>> UEFI GetMemoryMap() and gets treated as early boot memory. > >> Don't we have an issue in UEFI then. Normally the SRAT indicates whether > >> the slots are hotpluggable or not. Shouldn't the UEFI code look at this > >> info. > > > > Sorry I missed this part. Yes, that will be a more cleaner solution. > > > > Also, to be more clear on what happens, > > > > Guest ACPI boot with "fdt=on" , > > > > From kernel log, > > > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > > [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > > [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > > [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > > [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > > [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > > [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > > [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > > [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > > [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. > > [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] > > > > > > Guest ACPI boot with "fdt=off" , > > > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > > [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > > [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > > [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > > [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > > [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > > [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > > [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > > [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > > [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff > > > > The hotpluggable memory node is absent from early memory nodes here. > > OK thank you for the example illustrating the concern. > > > > As you said, it could be possible to detect this node using SRAT in UEFI. > > Let's wait for EDK2 experts on this. > Happy to chime in, but I need a bit more context here. What is the problem, how does this path try to solve it, and why is that a bad idea?
Hi Shameer, On 3/29/19 10:41 AM, Shameerali Kolothum Thodi wrote: > Hi Eric, > >> -----Original Message----- >> From: Auger Eric [mailto:eric.auger@redhat.com] >> Sent: 29 March 2019 09:32 >> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >> sameo@linux.intel.com; sebastien.boeuf@intel.com >> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >> >> Hi Shameer, >> >> [ + Laszlo, Ard, Leif ] >> >> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>> This is to disable/enable populating DT nodes in case >>> any conflict with acpi tables. The default is "off". >> The name of the option sounds misleading to me. Also we don't really >> know the scope of the disablement. At the moment this just aims to >> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > > Yes, I was not very happy with the name "fdt". If this is not useful other than > this device memory conflict case, then we can be more specific. But I am not sure > we might need this for any other future conflicts, hence a more generic name. > > May be "force_fdt" or "dimm_fdt"? I am open to suggestions. mask_spurious_dt_nodes. But I am unsure this is the way we should go. > >>> >>> This will be used in subsequent patch where cold plug >>> device-memory support is added for DT boot. >> I am concerned about the fact that in dt mode, by default, you won't see >> any PCDIMM nodes. > > True. But is there any other way to detect that the Guest is using DT? I don't know any in machvirt_init, there is firmware_loaded that tells you whether you have a FW image. If this one is not set, you can induce dt. But if there is a FW it can be either DT or ACPI booted. You also have the acpi_enabled knob. Thanks Eric > > Thanks, > Shameer > >>> If DT memory node support is added for cold-plugged device >>> memory, those memory will be visible to Guest kernel via >>> UEFI GetMemoryMap() and gets treated as early boot memory. >> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >> info. >>> Hence memory becomes non hot-un-unpluggable even if Guest >>> is booted in ACPI mode. >> >> >> >>> >>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> >>> --- >>> hw/arm/virt.c | 23 +++++++++++++++++++++++ >>> include/hw/arm/virt.h | 1 + >>> 2 files changed, 24 insertions(+) >>> >>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c >>> index 13db0e9..b602151 100644 >>> --- a/hw/arm/virt.c >>> +++ b/hw/arm/virt.c >>> @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool >> value, Error **errp) >>> vms->highmem = value; >>> } >>> >>> +static bool virt_get_fdt(Object *obj, Error **errp) >>> +{ >>> + VirtMachineState *vms = VIRT_MACHINE(obj); >>> + >>> + return vms->use_fdt; >>> +} >>> + >>> +static void virt_set_fdt(Object *obj, bool value, Error **errp) >>> +{ >>> + VirtMachineState *vms = VIRT_MACHINE(obj); >>> + >>> + vms->use_fdt = value; >>> +} >>> + >>> static bool virt_get_its(Object *obj, Error **errp) >>> { >>> VirtMachineState *vms = VIRT_MACHINE(obj); >>> @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) >>> object_property_set_description(obj, "gic-version", >>> "Set GIC version. " >>> "Valid values are 2, 3 and host", >> NULL); >>> + /* fdt is disabled by default */ >>> + vms->use_fdt = false; >>> + object_property_add_bool(obj, "fdt", virt_get_fdt, >>> + virt_set_fdt, NULL); >>> + object_property_set_description(obj, "fdt", >>> + "Set on/off to enable/disable >> device tree " >>> + "nodes in case any conflict with >> ACPI" >> in case of >> >> Thanks >> >> Eric >>> + "(eg: device memory node)", >>> + NULL); >>> >>> vms->highmem_ecam = !vmc->no_highmem_ecam; >>> >>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h >>> index c5e4c96..14b2e0a 100644 >>> --- a/include/hw/arm/virt.h >>> +++ b/include/hw/arm/virt.h >>> @@ -119,6 +119,7 @@ typedef struct { >>> bool highmem_ecam; >>> bool its; >>> bool virt; >>> + bool use_fdt; >>> int32_t gic_version; >>> VirtIOMMUType iommu; >>> struct arm_boot_info bootinfo; >>>
Hi Ard, On 3/29/19 2:14 PM, Ard Biesheuvel wrote: > On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >> >> Hi Shameer, >> >> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>> >>> >>>> -----Original Message----- >>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>> Sent: 29 March 2019 09:32 >>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>> >>>> Hi Shameer, >>>> >>>> [ + Laszlo, Ard, Leif ] >>>> >>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>> This is to disable/enable populating DT nodes in case >>>>> any conflict with acpi tables. The default is "off". >>>> The name of the option sounds misleading to me. Also we don't really >>>> know the scope of the disablement. At the moment this just aims to >>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>> >>>>> >>>>> This will be used in subsequent patch where cold plug >>>>> device-memory support is added for DT boot. >>>> I am concerned about the fact that in dt mode, by default, you won't see >>>> any PCDIMM nodes. >>>>> >>>>> If DT memory node support is added for cold-plugged device >>>>> memory, those memory will be visible to Guest kernel via >>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>> info. >>> >>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>> >>> Also, to be more clear on what happens, >>> >>> Guest ACPI boot with "fdt=on" , >>> >>> From kernel log, >>> >>> [ 0.000000] Early memory node ranges >>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>> >>> >>> Guest ACPI boot with "fdt=off" , >>> >>> [ 0.000000] Movable zone start for each node >>> [ 0.000000] Early memory node ranges >>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>> >>> The hotpluggable memory node is absent from early memory nodes here. >> >> OK thank you for the example illustrating the concern. >>> >>> As you said, it could be possible to detect this node using SRAT in UEFI. >> >> Let's wait for EDK2 experts on this. >> > > Happy to chime in, but I need a bit more context here. > > What is the problem, how does this path try to solve it, and why is > that a bad idea? > Sure, sorry. This series: - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, https://patchwork.kernel.org/cover/10863301/ aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the SRAT and DSDT parts and relies on GED to trigger the hotplug. We noticed that if we build the hotpluggable memory dt nodes on top of the above ACPI tables, the DIMM slots are interpreted as not hotpluggable memory slots (at least we think so). We think the EDK2 GetMemoryMap() uses the dt node info and ignores the fact that those slots are exposed as hotpluggable in the SRAT for example. So in this series, we are forced to not generate the hotpluggable memory dt nodes if we want the DIMM slots to be effectively recognized as hotpluggable. Could you confirm we have a correct understanding of the EDK2 behaviour and if so, would there be any solution for EDK2 to absorb both the DT nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. At qemu level, detecting we are booting in ACPI mode and purposely removing the above mentioned DT nodes does not look straightforward. Hope this clarifies Thanks Eric
> -----Original Message----- > From: Auger Eric [mailto:eric.auger@redhat.com] > Sent: 29 March 2019 13:56 > To: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Cc: peter.maydell@linaro.org; sameo@linux.intel.com; > qemu-devel@nongnu.org; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; Linuxarm > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > imammedo@redhat.com; sebastien.boeuf@intel.com; Laszlo Ersek > <lersek@redhat.com>; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > feature "fdt" > > Hi Ard, > > On 3/29/19 2:14 PM, Ard Biesheuvel wrote: > > On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: > >> > >> Hi Shameer, > >> > >> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Auger Eric [mailto:eric.auger@redhat.com] > >>>> Sent: 29 March 2019 09:32 > >>>> To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; > >>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; > imammedo@redhat.com; > >>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > >>>> sameo@linux.intel.com; sebastien.boeuf@intel.com > >>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) > <xuwei5@huawei.com>; > >>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > >>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > >>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > >>>> > >>>> Hi Shameer, > >>>> > >>>> [ + Laszlo, Ard, Leif ] > >>>> > >>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: > >>>>> This is to disable/enable populating DT nodes in case > >>>>> any conflict with acpi tables. The default is "off". > >>>> The name of the option sounds misleading to me. Also we don't really > >>>> know the scope of the disablement. At the moment this just aims to > >>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI > mode. > >>>> > >>>>> > >>>>> This will be used in subsequent patch where cold plug > >>>>> device-memory support is added for DT boot. > >>>> I am concerned about the fact that in dt mode, by default, you won't see > >>>> any PCDIMM nodes. > >>>>> > >>>>> If DT memory node support is added for cold-plugged device > >>>>> memory, those memory will be visible to Guest kernel via > >>>>> UEFI GetMemoryMap() and gets treated as early boot memory. > >>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether > >>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this > >>>> info. > >>> > >>> Sorry I missed this part. Yes, that will be a more cleaner solution. > >>> > >>> Also, to be more clear on what happens, > >>> > >>> Guest ACPI boot with "fdt=on" , > >>> > >>> From kernel log, > >>> > >>> [ 0.000000] Early memory node ranges > >>> [ 0.000000] node 0: [mem > 0x0000000040000000-0x00000000bbf5ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bbf60000-0x00000000bbffffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc000000-0x00000000bc02ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc030000-0x00000000bc36ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc370000-0x00000000bf64ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf650000-0x00000000bf6dffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf6e0000-0x00000000bf6effff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf6f0000-0x00000000bf80ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf810000-0x00000000bfffffff] > >>> [ 0.000000] node 0: [mem > 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory > node from DT. > >>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>> [ 0.000000] Initmem setup node 0 [mem > 0x0000000040000000-0x00000000ffffffff] > >>> > >>> > >>> Guest ACPI boot with "fdt=off" , > >>> > >>> [ 0.000000] Movable zone start for each node > >>> [ 0.000000] Early memory node ranges > >>> [ 0.000000] node 0: [mem > 0x0000000040000000-0x00000000bbf5ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bbf60000-0x00000000bbffffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc000000-0x00000000bc02ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc030000-0x00000000bc36ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bc370000-0x00000000bf64ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf650000-0x00000000bf6dffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf6e0000-0x00000000bf6effff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf6f0000-0x00000000bf80ffff] > >>> [ 0.000000] node 0: [mem > 0x00000000bf810000-0x00000000bfffffff] > >>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>> [ 0.000000] Initmem setup node 0 [mem > 0x0000000040000000-0x00000000bfffffff > >>> > >>> The hotpluggable memory node is absent from early memory nodes here. > >> > >> OK thank you for the example illustrating the concern. > >>> > >>> As you said, it could be possible to detect this node using SRAT in UEFI. > >> > >> Let's wait for EDK2 experts on this. > >> > > > > Happy to chime in, but I need a bit more context here. > > > > What is the problem, how does this path try to solve it, and why is > > that a bad idea? > > > Sure, sorry. > > This series: > - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > https://patchwork.kernel.org/cover/10863301/ > > aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > SRAT and DSDT parts and relies on GED to trigger the hotplug. > > We noticed that if we build the hotpluggable memory dt nodes on top of > the above ACPI tables, the DIMM slots are interpreted as not > hotpluggable memory slots (at least we think so). > > We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > fact that those slots are exposed as hotpluggable in the SRAT for example. > > So in this series, we are forced to not generate the hotpluggable memory > dt nodes if we want the DIMM slots to be effectively recognized as > hotpluggable. > > Could you confirm we have a correct understanding of the EDK2 behaviour > and if so, would there be any solution for EDK2 to absorb both the DT > nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > > At qemu level, detecting we are booting in ACPI mode and purposely > removing the above mentioned DT nodes does not look straightforward. > > Hope this clarifies Thanks Eric for this. And this is where you can find the initial discussion on this, https://www.mail-archive.com/qemu-devel@nongnu.org/msg599076.html Thanks, Shameer
On 03/29/19 14:56, Auger Eric wrote: > Hi Ard, > > On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>> >>> Hi Shameer, >>> >>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>> Sent: 29 March 2019 09:32 >>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>> >>>>> Hi Shameer, >>>>> >>>>> [ + Laszlo, Ard, Leif ] >>>>> >>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>> This is to disable/enable populating DT nodes in case >>>>>> any conflict with acpi tables. The default is "off". >>>>> The name of the option sounds misleading to me. Also we don't really >>>>> know the scope of the disablement. At the moment this just aims to >>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>> >>>>>> >>>>>> This will be used in subsequent patch where cold plug >>>>>> device-memory support is added for DT boot. >>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>> any PCDIMM nodes. >>>>>> >>>>>> If DT memory node support is added for cold-plugged device >>>>>> memory, those memory will be visible to Guest kernel via >>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>> info. >>>> >>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>> >>>> Also, to be more clear on what happens, >>>> >>>> Guest ACPI boot with "fdt=on" , >>>> >>>> From kernel log, >>>> >>>> [ 0.000000] Early memory node ranges >>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>> >>>> >>>> Guest ACPI boot with "fdt=off" , >>>> >>>> [ 0.000000] Movable zone start for each node >>>> [ 0.000000] Early memory node ranges >>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>> >>>> The hotpluggable memory node is absent from early memory nodes here. >>> >>> OK thank you for the example illustrating the concern. >>>> >>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>> >>> Let's wait for EDK2 experts on this. >>> >> >> Happy to chime in, but I need a bit more context here. >> >> What is the problem, how does this path try to solve it, and why is >> that a bad idea? >> > Sure, sorry. > > This series: > - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > https://patchwork.kernel.org/cover/10863301/ > > aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > SRAT and DSDT parts and relies on GED to trigger the hotplug. > > We noticed that if we build the hotpluggable memory dt nodes on top of > the above ACPI tables, the DIMM slots are interpreted as not > hotpluggable memory slots (at least we think so). > > We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > fact that those slots are exposed as hotpluggable in the SRAT for example. > > So in this series, we are forced to not generate the hotpluggable memory > dt nodes if we want the DIMM slots to be effectively recognized as > hotpluggable. > > Could you confirm we have a correct understanding of the EDK2 behaviour > and if so, would there be any solution for EDK2 to absorb both the DT > nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > > At qemu level, detecting we are booting in ACPI mode and purposely > removing the above mentioned DT nodes does not look straightforward. The firmware is not enlightened about the ACPI content that comes from QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, as instructed through the ACPI linker/loader script, in order to install the ACPI content for the OS. No actual information is consumed by the firmware from the ACPI payload -- and that's a feature. The firmware does consume DT: - If you start QEMU *with* "-no-acpi", then the DT is both consumed by the firmware (for its own information needs), and passed on to the OS. - If you start QEMU *without* "-no-acpi" (the default), then the DT is consumed only by the firmware (for its own information needs), and the DT is hidden from the OS. The OS gets only the ACPI content (processed/prepared as described above). In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the base/size pairs in all the memory nodes in the DT. For each such base address that is currently tracked as "nonexistent" in the GCD memory space map, the driver currently adds the base/size range as "system memory". This in turn is reflected by the UEFI memmap that the OS gets to see as "conventional memory". If you need some memory ranges to show up as "special" in the UEFI memmap, then you need to distinguish them somehow from the "regular" memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the firmware, so that it act upon the discriminator that you set in the DT. Now... from a brief look at the Platform Init and UEFI specs, my impression is that the hotpluggable (but presently not plugged) DIMM ranges should simply be *absent* from the UEFI memmap; is that correct? (I didn't check the ACPI spec, maybe it specifies the expected behavior in full.) If my impression is correct, then two options (alternatives) exist: (1) Hide the affected memory nodes -- or at least the affected base/size pairs -- from the DT, in case you boot without "-no-acpi" but with an external firmware loaded. Then the firmware will not expose those ranges as "conventional memory" in the UEFI memmap. This approach requires no changes to edk2. This option is precisely what Eric described up-thread, at <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: > in machvirt_init, there is firmware_loaded that tells you whether you > have a FW image. If this one is not set, you can induce dt. But if > there is a FW it can be either DT or ACPI booted. You also have the > acpi_enabled knob. (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in "vl.c"). So, the condition for hiding the hotpluggable memory nodes in question from the DT is: (aarch64 && firmware_loaded && acpi_enabled) (2) Invent and set an "ignore me, firmware" property for the hotpluggable memory nodes in the DT, and update the firmware to honor that property. Thanks Laszlo
On Mon, 1 Apr 2019 15:07:05 +0200 Laszlo Ersek <lersek@redhat.com> wrote: > On 03/29/19 14:56, Auger Eric wrote: > > Hi Ard, > > > > On 3/29/19 2:14 PM, Ard Biesheuvel wrote: > >> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: > >>> > >>> Hi Shameer, > >>> > >>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Auger Eric [mailto:eric.auger@redhat.com] > >>>>> Sent: 29 March 2019 09:32 > >>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > >>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > >>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > >>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com > >>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > >>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > >>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > >>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > >>>>> > >>>>> Hi Shameer, > >>>>> > >>>>> [ + Laszlo, Ard, Leif ] > >>>>> > >>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: > >>>>>> This is to disable/enable populating DT nodes in case > >>>>>> any conflict with acpi tables. The default is "off". > >>>>> The name of the option sounds misleading to me. Also we don't really > >>>>> know the scope of the disablement. At the moment this just aims to > >>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > >>>>> > >>>>>> > >>>>>> This will be used in subsequent patch where cold plug > >>>>>> device-memory support is added for DT boot. > >>>>> I am concerned about the fact that in dt mode, by default, you won't see > >>>>> any PCDIMM nodes. > >>>>>> > >>>>>> If DT memory node support is added for cold-plugged device > >>>>>> memory, those memory will be visible to Guest kernel via > >>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. > >>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether > >>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this > >>>>> info. > >>>> > >>>> Sorry I missed this part. Yes, that will be a more cleaner solution. > >>>> > >>>> Also, to be more clear on what happens, > >>>> > >>>> Guest ACPI boot with "fdt=on" , > >>>> > >>>> From kernel log, > >>>> > >>>> [ 0.000000] Early memory node ranges > >>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. > >>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] > >>>> > >>>> > >>>> Guest ACPI boot with "fdt=off" , > >>>> > >>>> [ 0.000000] Movable zone start for each node > >>>> [ 0.000000] Early memory node ranges > >>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff > >>>> > >>>> The hotpluggable memory node is absent from early memory nodes here. > >>> > >>> OK thank you for the example illustrating the concern. > >>>> > >>>> As you said, it could be possible to detect this node using SRAT in UEFI. > >>> > >>> Let's wait for EDK2 experts on this. > >>> > >> > >> Happy to chime in, but I need a bit more context here. > >> > >> What is the problem, how does this path try to solve it, and why is > >> that a bad idea? > >> > > Sure, sorry. > > > > This series: > > - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > > https://patchwork.kernel.org/cover/10863301/ > > > > aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > > SRAT and DSDT parts and relies on GED to trigger the hotplug. > > > > We noticed that if we build the hotpluggable memory dt nodes on top of > > the above ACPI tables, the DIMM slots are interpreted as not > > hotpluggable memory slots (at least we think so). > > > > We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > > fact that those slots are exposed as hotpluggable in the SRAT for example. > > > > So in this series, we are forced to not generate the hotpluggable memory > > dt nodes if we want the DIMM slots to be effectively recognized as > > hotpluggable. > > > > Could you confirm we have a correct understanding of the EDK2 behaviour > > and if so, would there be any solution for EDK2 to absorb both the DT > > nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > > > > At qemu level, detecting we are booting in ACPI mode and purposely > > removing the above mentioned DT nodes does not look straightforward. > > The firmware is not enlightened about the ACPI content that comes from > QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > as instructed through the ACPI linker/loader script, in order to install > the ACPI content for the OS. No actual information is consumed by the > firmware from the ACPI payload -- and that's a feature. > > The firmware does consume DT: > > - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > the firmware (for its own information needs), and passed on to the OS. > > - If you start QEMU *without* "-no-acpi" (the default), then the DT is > consumed only by the firmware (for its own information needs), and the > DT is hidden from the OS. The OS gets only the ACPI content > (processed/prepared as described above). > > > In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > base/size pairs in all the memory nodes in the DT. For each such base > address that is currently tracked as "nonexistent" in the GCD memory > space map, the driver currently adds the base/size range as "system > memory". This in turn is reflected by the UEFI memmap that the OS gets > to see as "conventional memory". > > If you need some memory ranges to show up as "special" in the UEFI > memmap, then you need to distinguish them somehow from the "regular" > memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the > firmware, so that it act upon the discriminator that you set in the DT. > > > Now... from a brief look at the Platform Init and UEFI specs, my > impression is that the hotpluggable (but presently not plugged) DIMM > ranges should simply be *absent* from the UEFI memmap; is that correct? > (I didn't check the ACPI spec, maybe it specifies the expected behavior > in full.) If my impression is correct, then two options (alternatives) > exist: > > (1) Hide the affected memory nodes -- or at least the affected base/size > pairs -- from the DT, in case you boot without "-no-acpi" but with an > external firmware loaded. Then the firmware will not expose those ranges > as "conventional memory" in the UEFI memmap. This approach requires no > changes to edk2. > > This option is precisely what Eric described up-thread, at > <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: > > > in machvirt_init, there is firmware_loaded that tells you whether you > > have a FW image. If this one is not set, you can induce dt. But if > > there is a FW it can be either DT or ACPI booted. You also have the > > acpi_enabled knob. > > (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > "vl.c"). > > So, the condition for hiding the hotpluggable memory nodes in question > from the DT is: > > (aarch64 && firmware_loaded && acpi_enabled) I'd go with this one, though I have a question for firmware side. Let's assume we would want in future to expose hotpluggable & present memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically can avoid using it for Normal zone based on hint from SRAT table early at boot), but what about firmware can it inspect SRAT table and not use hotpluggable ranges for its own use (or at least do not canibalize them permanently)? > > (2) Invent and set an "ignore me, firmware" property for the > hotpluggable memory nodes in the DT, and update the firmware to honor > that property. > > Thanks > Laszlo
On Fri, 29 Mar 2019 at 20:56, Auger Eric <eric.auger@redhat.com> wrote: > This series: > - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > https://patchwork.kernel.org/cover/10863301/ > > aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > SRAT and DSDT parts and relies on GED to trigger the hotplug. > > We noticed that if we build the hotpluggable memory dt nodes on top of > the above ACPI tables, the DIMM slots are interpreted as not > hotpluggable memory slots (at least we think so). > > We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > fact that those slots are exposed as hotpluggable in the SRAT for example. > > So in this series, we are forced to not generate the hotpluggable memory > dt nodes if we want the DIMM slots to be effectively recognized as > hotpluggable. > > Could you confirm we have a correct understanding of the EDK2 behaviour > and if so, would there be any solution for EDK2 to absorb both the DT > nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > > At qemu level, detecting we are booting in ACPI mode and purposely > removing the above mentioned DT nodes does not look straightforward. My initial response would be to say that hotpluggable memory should be suitably marked up in both the DTB and the ACPI tables so that guest software that cares can tell that it is hotplugged whether it is choosing to consume the DT or the ACPI tables. QEMU should definitely not be reporting the hardware as looking different to the guest based on some guess about what guest software it is booting. thanks -- PMM
On 04/02/19 09:42, Igor Mammedov wrote: > On Mon, 1 Apr 2019 15:07:05 +0200 > Laszlo Ersek <lersek@redhat.com> wrote: > >> On 03/29/19 14:56, Auger Eric wrote: >>> Hi Ard, >>> >>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>> >>>>> Hi Shameer, >>>>> >>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>> Sent: 29 March 2019 09:32 >>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>> >>>>>>> Hi Shameer, >>>>>>> >>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>> >>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>> >>>>>>>> >>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>> device-memory support is added for DT boot. >>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>> any PCDIMM nodes. >>>>>>>> >>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>> info. >>>>>> >>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>> >>>>>> Also, to be more clear on what happens, >>>>>> >>>>>> Guest ACPI boot with "fdt=on" , >>>>>> >>>>>> From kernel log, >>>>>> >>>>>> [ 0.000000] Early memory node ranges >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>> >>>>>> >>>>>> Guest ACPI boot with "fdt=off" , >>>>>> >>>>>> [ 0.000000] Movable zone start for each node >>>>>> [ 0.000000] Early memory node ranges >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>> >>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>> >>>>> OK thank you for the example illustrating the concern. >>>>>> >>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>> >>>>> Let's wait for EDK2 experts on this. >>>>> >>>> >>>> Happy to chime in, but I need a bit more context here. >>>> >>>> What is the problem, how does this path try to solve it, and why is >>>> that a bad idea? >>>> >>> Sure, sorry. >>> >>> This series: >>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>> https://patchwork.kernel.org/cover/10863301/ >>> >>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>> >>> We noticed that if we build the hotpluggable memory dt nodes on top of >>> the above ACPI tables, the DIMM slots are interpreted as not >>> hotpluggable memory slots (at least we think so). >>> >>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>> >>> So in this series, we are forced to not generate the hotpluggable memory >>> dt nodes if we want the DIMM slots to be effectively recognized as >>> hotpluggable. >>> >>> Could you confirm we have a correct understanding of the EDK2 behaviour >>> and if so, would there be any solution for EDK2 to absorb both the DT >>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>> >>> At qemu level, detecting we are booting in ACPI mode and purposely >>> removing the above mentioned DT nodes does not look straightforward. >> >> The firmware is not enlightened about the ACPI content that comes from >> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >> as instructed through the ACPI linker/loader script, in order to install >> the ACPI content for the OS. No actual information is consumed by the >> firmware from the ACPI payload -- and that's a feature. >> >> The firmware does consume DT: >> >> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >> the firmware (for its own information needs), and passed on to the OS. >> >> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >> consumed only by the firmware (for its own information needs), and the >> DT is hidden from the OS. The OS gets only the ACPI content >> (processed/prepared as described above). >> >> >> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >> base/size pairs in all the memory nodes in the DT. For each such base >> address that is currently tracked as "nonexistent" in the GCD memory >> space map, the driver currently adds the base/size range as "system >> memory". This in turn is reflected by the UEFI memmap that the OS gets >> to see as "conventional memory". >> >> If you need some memory ranges to show up as "special" in the UEFI >> memmap, then you need to distinguish them somehow from the "regular" >> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >> firmware, so that it act upon the discriminator that you set in the DT. >> >> >> Now... from a brief look at the Platform Init and UEFI specs, my >> impression is that the hotpluggable (but presently not plugged) DIMM >> ranges should simply be *absent* from the UEFI memmap; is that correct? >> (I didn't check the ACPI spec, maybe it specifies the expected behavior >> in full.) If my impression is correct, then two options (alternatives) >> exist: >> >> (1) Hide the affected memory nodes -- or at least the affected base/size >> pairs -- from the DT, in case you boot without "-no-acpi" but with an >> external firmware loaded. Then the firmware will not expose those ranges >> as "conventional memory" in the UEFI memmap. This approach requires no >> changes to edk2. >> >> This option is precisely what Eric described up-thread, at >> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >> >>> in machvirt_init, there is firmware_loaded that tells you whether you >>> have a FW image. If this one is not set, you can induce dt. But if >>> there is a FW it can be either DT or ACPI booted. You also have the >>> acpi_enabled knob. >> >> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >> "vl.c"). >> >> So, the condition for hiding the hotpluggable memory nodes in question >> from the DT is: >> >> (aarch64 && firmware_loaded && acpi_enabled) > I'd go with this one, though I have a question for firmware side. > Let's assume we would want in future to expose hotpluggable & present > memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically > can avoid using it for Normal zone based on hint from SRAT table early > at boot), but what about firmware can it inspect SRAT table and not use > hotpluggable ranges for its own use (or at least do not canibalize > them permanently)? This is actually two questions: (a) Can the firmware inspect SRAT? If the SRAT table structure isn't very complex, this is technically doable, but the wrong thing to do, IMO. First, we've tried hard to avoid enlightening the firmware about the semantics of QEMU's ACPI tables. Second, this would introduce an ordering constraint (or callbacks) in the firmware, between the driver that processes & installs the ACPI tables, and the driver that translates the memory nodes of the DT to the memory ranges known to UEFI and the OS. If we need such hinting, then option (2) below (from earlier context) would be better: - If it's OK to use an arm/aarch64 specific solution, then new DT properties should work. - If it should be arch-independent, then a dedicated fw_cfg file would be better. (b) Assuming we have the information from some source, can the firmware expose some memory ranges as "usable RAM" to the OS, while staying away from them for its own (firmware) purposes? After consulting Table 25. Memory Type Usage before ExitBootServices() Table 26. Memory Type Usage after ExitBootServices() in UEFI-2.7, I would say that the firmware driver that installs these ranges to the memory (space) map should also allocate the ranges right after, as EfiBootServicesData. This will prevent other drivers / applications in the firmware from allocating chunks out of those areas, and the OS will be at liberty to release and repurpose the ranges after ExitBootServices(). Thanks, Laszlo >> (2) Invent and set an "ignore me, firmware" property for the >> hotpluggable memory nodes in the DT, and update the firmware to honor >> that property. >> >> Thanks >> Laszlo >
Hi Laszlo, > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: 01 April 2019 14:07 > To: Auger Eric <eric.auger@redhat.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org> > Cc: peter.maydell@linaro.org; sameo@linux.intel.com; > qemu-devel@nongnu.org; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; Linuxarm > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > imammedo@redhat.com; sebastien.boeuf@intel.com; Leif Lindholm > <Leif.Lindholm@arm.com> > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > feature "fdt" > > On 03/29/19 14:56, Auger Eric wrote: [...] > > Sure, sorry. > > > > This series: > > - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > > https://patchwork.kernel.org/cover/10863301/ > > > > aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > > SRAT and DSDT parts and relies on GED to trigger the hotplug. > > > > We noticed that if we build the hotpluggable memory dt nodes on top of > > the above ACPI tables, the DIMM slots are interpreted as not > > hotpluggable memory slots (at least we think so). > > > > We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > > fact that those slots are exposed as hotpluggable in the SRAT for example. > > > > So in this series, we are forced to not generate the hotpluggable memory > > dt nodes if we want the DIMM slots to be effectively recognized as > > hotpluggable. > > > > Could you confirm we have a correct understanding of the EDK2 behaviour > > and if so, would there be any solution for EDK2 to absorb both the DT > > nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > > > > At qemu level, detecting we are booting in ACPI mode and purposely > > removing the above mentioned DT nodes does not look straightforward. > > The firmware is not enlightened about the ACPI content that comes from > QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > as instructed through the ACPI linker/loader script, in order to install > the ACPI content for the OS. No actual information is consumed by the > firmware from the ACPI payload -- and that's a feature. > > The firmware does consume DT: > > - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > the firmware (for its own information needs), and passed on to the OS. > > - If you start QEMU *without* "-no-acpi" (the default), then the DT is > consumed only by the firmware (for its own information needs), and the > DT is hidden from the OS. The OS gets only the ACPI content > (processed/prepared as described above). > > > In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > base/size pairs in all the memory nodes in the DT. For each such base > address that is currently tracked as "nonexistent" in the GCD memory > space map, the driver currently adds the base/size range as "system > memory". This in turn is reflected by the UEFI memmap that the OS gets > to see as "conventional memory". > > If you need some memory ranges to show up as "special" in the UEFI > memmap, then you need to distinguish them somehow from the "regular" > memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the > firmware, so that it act upon the discriminator that you set in the DT. > > > Now... from a brief look at the Platform Init and UEFI specs, my > impression is that the hotpluggable (but presently not plugged) DIMM > ranges should simply be *absent* from the UEFI memmap; is that correct? > (I didn't check the ACPI spec, maybe it specifies the expected behavior > in full.) If my impression is correct, then two options (alternatives) > exist: > > (1) Hide the affected memory nodes -- or at least the affected base/size > pairs -- from the DT, in case you boot without "-no-acpi" but with an > external firmware loaded. Then the firmware will not expose those ranges > as "conventional memory" in the UEFI memmap. This approach requires no > changes to edk2. > > This option is precisely what Eric described up-thread, at > <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redh > at.com>: > > > in machvirt_init, there is firmware_loaded that tells you whether you > > have a FW image. If this one is not set, you can induce dt. But if > > there is a FW it can be either DT or ACPI booted. You also have the > > acpi_enabled knob. > > (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > "vl.c"). > > So, the condition for hiding the hotpluggable memory nodes in question > from the DT is: > > (aarch64 && firmware_loaded && acpi_enabled) Thanks for your explanation and suggestions. I had a quick run with the above and it seems to do the job. I will drop this extra opt-in feature patch from this series and instead have this check. Thanks, Shameer > > > (2) Invent and set an "ignore me, firmware" property for the > hotpluggable memory nodes in the DT, and update the firmware to honor > that property. > > Thanks > Laszlo
Hi Laszlo, On 4/1/19 3:07 PM, Laszlo Ersek wrote: > On 03/29/19 14:56, Auger Eric wrote: >> Hi Ard, >> >> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>> >>>> Hi Shameer, >>>> >>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>> Sent: 29 March 2019 09:32 >>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>> >>>>>> Hi Shameer, >>>>>> >>>>>> [ + Laszlo, Ard, Leif ] >>>>>> >>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>> This is to disable/enable populating DT nodes in case >>>>>>> any conflict with acpi tables. The default is "off". >>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>> know the scope of the disablement. At the moment this just aims to >>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>> >>>>>>> >>>>>>> This will be used in subsequent patch where cold plug >>>>>>> device-memory support is added for DT boot. >>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>> any PCDIMM nodes. >>>>>>> >>>>>>> If DT memory node support is added for cold-plugged device >>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>> info. >>>>> >>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>> >>>>> Also, to be more clear on what happens, >>>>> >>>>> Guest ACPI boot with "fdt=on" , >>>>> >>>>> From kernel log, >>>>> >>>>> [ 0.000000] Early memory node ranges >>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>> >>>>> >>>>> Guest ACPI boot with "fdt=off" , >>>>> >>>>> [ 0.000000] Movable zone start for each node >>>>> [ 0.000000] Early memory node ranges >>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>> >>>>> The hotpluggable memory node is absent from early memory nodes here. >>>> >>>> OK thank you for the example illustrating the concern. >>>>> >>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>> >>>> Let's wait for EDK2 experts on this. >>>> >>> >>> Happy to chime in, but I need a bit more context here. >>> >>> What is the problem, how does this path try to solve it, and why is >>> that a bad idea? >>> >> Sure, sorry. >> >> This series: >> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >> https://patchwork.kernel.org/cover/10863301/ >> >> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >> SRAT and DSDT parts and relies on GED to trigger the hotplug. >> >> We noticed that if we build the hotpluggable memory dt nodes on top of >> the above ACPI tables, the DIMM slots are interpreted as not >> hotpluggable memory slots (at least we think so). >> >> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >> fact that those slots are exposed as hotpluggable in the SRAT for example. >> >> So in this series, we are forced to not generate the hotpluggable memory >> dt nodes if we want the DIMM slots to be effectively recognized as >> hotpluggable. >> >> Could you confirm we have a correct understanding of the EDK2 behaviour >> and if so, would there be any solution for EDK2 to absorb both the DT >> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >> >> At qemu level, detecting we are booting in ACPI mode and purposely >> removing the above mentioned DT nodes does not look straightforward. > > The firmware is not enlightened about the ACPI content that comes from > QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > as instructed through the ACPI linker/loader script, in order to install > the ACPI content for the OS. No actual information is consumed by the > firmware from the ACPI payload -- and that's a feature. > > The firmware does consume DT: > > - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > the firmware (for its own information needs), and passed on to the OS. > > - If you start QEMU *without* "-no-acpi" (the default), then the DT is > consumed only by the firmware (for its own information needs), and the > DT is hidden from the OS. The OS gets only the ACPI content > (processed/prepared as described above). > > > In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > base/size pairs in all the memory nodes in the DT. For each such base > address that is currently tracked as "nonexistent" in the GCD memory > space map, the driver currently adds the base/size range as "system > memory". This in turn is reflected by the UEFI memmap that the OS gets > to see as "conventional memory". > > If you need some memory ranges to show up as "special" in the UEFI > memmap, then you need to distinguish them somehow from the "regular" > memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the > firmware, so that it act upon the discriminator that you set in the DT. > > > Now... from a brief look at the Platform Init and UEFI specs, my > impression is that the hotpluggable (but presently not plugged) DIMM > ranges should simply be *absent* from the UEFI memmap; is that correct? > (I didn't check the ACPI spec, maybe it specifies the expected behavior > in full.) If my impression is correct, then two options (alternatives) > exist: > > (1) Hide the affected memory nodes -- or at least the affected base/size > pairs -- from the DT, in case you boot without "-no-acpi" but with an > external firmware loaded. Then the firmware will not expose those ranges > as "conventional memory" in the UEFI memmap. This approach requires no > changes to edk2. > > This option is precisely what Eric described up-thread, at > <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: > >> in machvirt_init, there is firmware_loaded that tells you whether you >> have a FW image. If this one is not set, you can induce dt. But if >> there is a FW it can be either DT or ACPI booted. You also have the >> acpi_enabled knob. > > (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > "vl.c"). > > So, the condition for hiding the hotpluggable memory nodes in question > from the DT is: > > (aarch64 && firmware_loaded && acpi_enabled) Thanks a lot for all those inputs! I don't get why we test aarch64 in above condition (this was useful for high ECAM range as the aarch32 FW was not supporting it but here, is it still meaningful?) Thanks Eric > > > (2) Invent and set an "ignore me, firmware" property for the > hotpluggable memory nodes in the DT, and update the firmware to honor > that property. > > Thanks > Laszlo >
On 04/02/19 17:29, Auger Eric wrote: > Hi Laszlo, > > On 4/1/19 3:07 PM, Laszlo Ersek wrote: >> On 03/29/19 14:56, Auger Eric wrote: >>> Hi Ard, >>> >>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>> >>>>> Hi Shameer, >>>>> >>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>> Sent: 29 March 2019 09:32 >>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>> >>>>>>> Hi Shameer, >>>>>>> >>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>> >>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>> >>>>>>>> >>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>> device-memory support is added for DT boot. >>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>> any PCDIMM nodes. >>>>>>>> >>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>> info. >>>>>> >>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>> >>>>>> Also, to be more clear on what happens, >>>>>> >>>>>> Guest ACPI boot with "fdt=on" , >>>>>> >>>>>> From kernel log, >>>>>> >>>>>> [ 0.000000] Early memory node ranges >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>> >>>>>> >>>>>> Guest ACPI boot with "fdt=off" , >>>>>> >>>>>> [ 0.000000] Movable zone start for each node >>>>>> [ 0.000000] Early memory node ranges >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>> >>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>> >>>>> OK thank you for the example illustrating the concern. >>>>>> >>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>> >>>>> Let's wait for EDK2 experts on this. >>>>> >>>> >>>> Happy to chime in, but I need a bit more context here. >>>> >>>> What is the problem, how does this path try to solve it, and why is >>>> that a bad idea? >>>> >>> Sure, sorry. >>> >>> This series: >>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>> https://patchwork.kernel.org/cover/10863301/ >>> >>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>> >>> We noticed that if we build the hotpluggable memory dt nodes on top of >>> the above ACPI tables, the DIMM slots are interpreted as not >>> hotpluggable memory slots (at least we think so). >>> >>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>> >>> So in this series, we are forced to not generate the hotpluggable memory >>> dt nodes if we want the DIMM slots to be effectively recognized as >>> hotpluggable. >>> >>> Could you confirm we have a correct understanding of the EDK2 behaviour >>> and if so, would there be any solution for EDK2 to absorb both the DT >>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>> >>> At qemu level, detecting we are booting in ACPI mode and purposely >>> removing the above mentioned DT nodes does not look straightforward. >> >> The firmware is not enlightened about the ACPI content that comes from >> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >> as instructed through the ACPI linker/loader script, in order to install >> the ACPI content for the OS. No actual information is consumed by the >> firmware from the ACPI payload -- and that's a feature. >> >> The firmware does consume DT: >> >> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >> the firmware (for its own information needs), and passed on to the OS. >> >> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >> consumed only by the firmware (for its own information needs), and the >> DT is hidden from the OS. The OS gets only the ACPI content >> (processed/prepared as described above). >> >> >> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >> base/size pairs in all the memory nodes in the DT. For each such base >> address that is currently tracked as "nonexistent" in the GCD memory >> space map, the driver currently adds the base/size range as "system >> memory". This in turn is reflected by the UEFI memmap that the OS gets >> to see as "conventional memory". >> >> If you need some memory ranges to show up as "special" in the UEFI >> memmap, then you need to distinguish them somehow from the "regular" >> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >> firmware, so that it act upon the discriminator that you set in the DT. >> >> >> Now... from a brief look at the Platform Init and UEFI specs, my >> impression is that the hotpluggable (but presently not plugged) DIMM >> ranges should simply be *absent* from the UEFI memmap; is that correct? >> (I didn't check the ACPI spec, maybe it specifies the expected behavior >> in full.) If my impression is correct, then two options (alternatives) >> exist: >> >> (1) Hide the affected memory nodes -- or at least the affected base/size >> pairs -- from the DT, in case you boot without "-no-acpi" but with an >> external firmware loaded. Then the firmware will not expose those ranges >> as "conventional memory" in the UEFI memmap. This approach requires no >> changes to edk2. >> >> This option is precisely what Eric described up-thread, at >> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >> >>> in machvirt_init, there is firmware_loaded that tells you whether you >>> have a FW image. If this one is not set, you can induce dt. But if >>> there is a FW it can be either DT or ACPI booted. You also have the >>> acpi_enabled knob. >> >> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >> "vl.c"). >> >> So, the condition for hiding the hotpluggable memory nodes in question >> from the DT is: > >> >> (aarch64 && firmware_loaded && acpi_enabled) > > Thanks a lot for all those inputs! > > I don't get why we test aarch64 in above condition (this was useful for > high ECAM range as the aarch32 FW was not supporting it but here, is it > still meaningful?) Sorry, I should have clarified that. Yes, it is meaningful: While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but not the reverse, on ARM.) So if you run the 32-bit build of the ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with the OS is the DT. This "bitness distinction" is implemented in the firmware already. If you hid the memory nodes from the DT under the condition (!aarch64 && firmware_loaded && acpi_enabled) then the nodes would not be seen by the OS at all (because "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and all the OS can ever get is DT). Thanks, Laszlo > > Thanks > > Eric > >> >> >> (2) Invent and set an "ignore me, firmware" property for the >> hotpluggable memory nodes in the DT, and update the firmware to honor >> that property. >> >> Thanks >> Laszlo >>
Hi Laszlo, On 4/2/19 12:33 PM, Laszlo Ersek wrote: > On 04/02/19 09:42, Igor Mammedov wrote: >> On Mon, 1 Apr 2019 15:07:05 +0200 >> Laszlo Ersek <lersek@redhat.com> wrote: >> >>> On 03/29/19 14:56, Auger Eric wrote: >>>> Hi Ard, >>>> >>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>>> >>>>>> Hi Shameer, >>>>>> >>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>>> >>>>>>>> Hi Shameer, >>>>>>>> >>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>> >>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>>> >>>>>>>>> >>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>> device-memory support is added for DT boot. >>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>>> any PCDIMM nodes. >>>>>>>>> >>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>>> info. >>>>>>> >>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>> >>>>>>> Also, to be more clear on what happens, >>>>>>> >>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>> >>>>>>> From kernel log, >>>>>>> >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>>> >>>>>>> >>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>> >>>>>>> [ 0.000000] Movable zone start for each node >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>>> >>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>> >>>>>> OK thank you for the example illustrating the concern. >>>>>>> >>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>>> >>>>>> Let's wait for EDK2 experts on this. >>>>>> >>>>> >>>>> Happy to chime in, but I need a bit more context here. >>>>> >>>>> What is the problem, how does this path try to solve it, and why is >>>>> that a bad idea? >>>>> >>>> Sure, sorry. >>>> >>>> This series: >>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>> https://patchwork.kernel.org/cover/10863301/ >>>> >>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>> >>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>> the above ACPI tables, the DIMM slots are interpreted as not >>>> hotpluggable memory slots (at least we think so). >>>> >>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>> >>>> So in this series, we are forced to not generate the hotpluggable memory >>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>> hotpluggable. >>>> >>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>> >>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>> removing the above mentioned DT nodes does not look straightforward. >>> >>> The firmware is not enlightened about the ACPI content that comes from >>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>> as instructed through the ACPI linker/loader script, in order to install >>> the ACPI content for the OS. No actual information is consumed by the >>> firmware from the ACPI payload -- and that's a feature. >>> >>> The firmware does consume DT: >>> >>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>> the firmware (for its own information needs), and passed on to the OS. >>> >>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>> consumed only by the firmware (for its own information needs), and the >>> DT is hidden from the OS. The OS gets only the ACPI content >>> (processed/prepared as described above). I am confused by the above statement actually. In the above case what does happen if you pass the acpi=off in the kernel boot parameters? Thanks Eric >>> >>> >>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>> base/size pairs in all the memory nodes in the DT. For each such base >>> address that is currently tracked as "nonexistent" in the GCD memory >>> space map, the driver currently adds the base/size range as "system >>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>> to see as "conventional memory". >>> >>> If you need some memory ranges to show up as "special" in the UEFI >>> memmap, then you need to distinguish them somehow from the "regular" >>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>> firmware, so that it act upon the discriminator that you set in the DT. >>> >>> >>> Now... from a brief look at the Platform Init and UEFI specs, my >>> impression is that the hotpluggable (but presently not plugged) DIMM >>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>> in full.) If my impression is correct, then two options (alternatives) >>> exist: >>> >>> (1) Hide the affected memory nodes -- or at least the affected base/size >>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>> external firmware loaded. Then the firmware will not expose those ranges >>> as "conventional memory" in the UEFI memmap. This approach requires no >>> changes to edk2. >>> >>> This option is precisely what Eric described up-thread, at >>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>> >>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>> have a FW image. If this one is not set, you can induce dt. But if >>>> there is a FW it can be either DT or ACPI booted. You also have the >>>> acpi_enabled knob. >>> >>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>> "vl.c"). >>> >>> So, the condition for hiding the hotpluggable memory nodes in question >>> from the DT is: >>> >>> (aarch64 && firmware_loaded && acpi_enabled) >> I'd go with this one, though I have a question for firmware side. >> Let's assume we would want in future to expose hotpluggable & present >> memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically >> can avoid using it for Normal zone based on hint from SRAT table early >> at boot), but what about firmware can it inspect SRAT table and not use >> hotpluggable ranges for its own use (or at least do not canibalize >> them permanently)? > > This is actually two questions: > > (a) Can the firmware inspect SRAT? > > If the SRAT table structure isn't very complex, this is technically > doable, but the wrong thing to do, IMO. > > First, we've tried hard to avoid enlightening the firmware about the > semantics of QEMU's ACPI tables. > > Second, this would introduce an ordering constraint (or callbacks) in > the firmware, between the driver that processes & installs the ACPI > tables, and the driver that translates the memory nodes of the DT to the > memory ranges known to UEFI and the OS. > > If we need such hinting, then option (2) below (from earlier context) > would be better: > - If it's OK to use an arm/aarch64 specific solution, then new DT > properties should work. > - If it should be arch-independent, then a dedicated fw_cfg file would > be better. > > (b) Assuming we have the information from some source, can the firmware > expose some memory ranges as "usable RAM" to the OS, while staying away > from them for its own (firmware) purposes? > > After consulting > > Table 25. Memory Type Usage before ExitBootServices() > Table 26. Memory Type Usage after ExitBootServices() > > in UEFI-2.7, I would say that the firmware driver that installs these > ranges to the memory (space) map should also allocate the ranges right > after, as EfiBootServicesData. This will prevent other drivers / > applications in the firmware from allocating chunks out of those areas, > and the OS will be at liberty to release and repurpose the ranges after > ExitBootServices(). > > Thanks, > Laszlo > >>> (2) Invent and set an "ignore me, firmware" property for the >>> hotpluggable memory nodes in the DT, and update the firmware to honor >>> that property. >>> >>> Thanks >>> Laszlo >> >
Laszlo, On 4/2/19 5:38 PM, Laszlo Ersek wrote: > On 04/02/19 17:29, Auger Eric wrote: >> Hi Laszlo, >> >> On 4/1/19 3:07 PM, Laszlo Ersek wrote: >>> On 03/29/19 14:56, Auger Eric wrote: >>>> Hi Ard, >>>> >>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>>> >>>>>> Hi Shameer, >>>>>> >>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>>> >>>>>>>> Hi Shameer, >>>>>>>> >>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>> >>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>>> >>>>>>>>> >>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>> device-memory support is added for DT boot. >>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>>> any PCDIMM nodes. >>>>>>>>> >>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>>> info. >>>>>>> >>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>> >>>>>>> Also, to be more clear on what happens, >>>>>>> >>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>> >>>>>>> From kernel log, >>>>>>> >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>>> >>>>>>> >>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>> >>>>>>> [ 0.000000] Movable zone start for each node >>>>>>> [ 0.000000] Early memory node ranges >>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>>> >>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>> >>>>>> OK thank you for the example illustrating the concern. >>>>>>> >>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>>> >>>>>> Let's wait for EDK2 experts on this. >>>>>> >>>>> >>>>> Happy to chime in, but I need a bit more context here. >>>>> >>>>> What is the problem, how does this path try to solve it, and why is >>>>> that a bad idea? >>>>> >>>> Sure, sorry. >>>> >>>> This series: >>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>> https://patchwork.kernel.org/cover/10863301/ >>>> >>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>> >>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>> the above ACPI tables, the DIMM slots are interpreted as not >>>> hotpluggable memory slots (at least we think so). >>>> >>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>> >>>> So in this series, we are forced to not generate the hotpluggable memory >>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>> hotpluggable. >>>> >>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>> >>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>> removing the above mentioned DT nodes does not look straightforward. >>> >>> The firmware is not enlightened about the ACPI content that comes from >>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>> as instructed through the ACPI linker/loader script, in order to install >>> the ACPI content for the OS. No actual information is consumed by the >>> firmware from the ACPI payload -- and that's a feature. >>> >>> The firmware does consume DT: >>> >>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>> the firmware (for its own information needs), and passed on to the OS. >>> >>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>> consumed only by the firmware (for its own information needs), and the >>> DT is hidden from the OS. The OS gets only the ACPI content >>> (processed/prepared as described above). >>> >>> >>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>> base/size pairs in all the memory nodes in the DT. For each such base >>> address that is currently tracked as "nonexistent" in the GCD memory >>> space map, the driver currently adds the base/size range as "system >>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>> to see as "conventional memory". >>> >>> If you need some memory ranges to show up as "special" in the UEFI >>> memmap, then you need to distinguish them somehow from the "regular" >>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>> firmware, so that it act upon the discriminator that you set in the DT. >>> >>> >>> Now... from a brief look at the Platform Init and UEFI specs, my >>> impression is that the hotpluggable (but presently not plugged) DIMM >>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>> in full.) If my impression is correct, then two options (alternatives) >>> exist: >>> >>> (1) Hide the affected memory nodes -- or at least the affected base/size >>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>> external firmware loaded. Then the firmware will not expose those ranges >>> as "conventional memory" in the UEFI memmap. This approach requires no >>> changes to edk2. >>> >>> This option is precisely what Eric described up-thread, at >>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>> >>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>> have a FW image. If this one is not set, you can induce dt. But if >>>> there is a FW it can be either DT or ACPI booted. You also have the >>>> acpi_enabled knob. >>> >>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>> "vl.c"). >>> >>> So, the condition for hiding the hotpluggable memory nodes in question >>> from the DT is: >> >>> >>> (aarch64 && firmware_loaded && acpi_enabled) >> >> Thanks a lot for all those inputs! >> >> I don't get why we test aarch64 in above condition (this was useful for >> high ECAM range as the aarch32 FW was not supporting it but here, is it >> still meaningful?) > > Sorry, I should have clarified that. Yes, it is meaningful: > > While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > not the reverse, on ARM.) So if you run the 32-bit build of the > ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > the OS is the DT. OK. Thank you for the clarification! Eric > > This "bitness distinction" is implemented in the firmware already. If > you hid the memory nodes from the DT under the condition > > (!aarch64 && firmware_loaded && acpi_enabled) > > then the nodes would not be seen by the OS at all (because > "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > all the OS can ever get is DT). > > Thanks, > Laszlo > >> >> Thanks >> >> Eric >> >>> >>> >>> (2) Invent and set an "ignore me, firmware" property for the >>> hotpluggable memory nodes in the DT, and update the firmware to honor >>> that property. >>> >>> Thanks >>> Laszlo >>> >
On 04/02/19 17:42, Auger Eric wrote: > Hi Laszlo, > > On 4/2/19 12:33 PM, Laszlo Ersek wrote: >> On 04/02/19 09:42, Igor Mammedov wrote: >>> On Mon, 1 Apr 2019 15:07:05 +0200 >>> Laszlo Ersek <lersek@redhat.com> wrote: >>> >>>> On 03/29/19 14:56, Auger Eric wrote: >>>>> Hi Ard, >>>>> >>>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>>>> >>>>>>> Hi Shameer, >>>>>>> >>>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>>>> >>>>>>>>> Hi Shameer, >>>>>>>>> >>>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>>> >>>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>>> device-memory support is added for DT boot. >>>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>>>> any PCDIMM nodes. >>>>>>>>>> >>>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>>>> info. >>>>>>>> >>>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>>> >>>>>>>> Also, to be more clear on what happens, >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>>> >>>>>>>> From kernel log, >>>>>>>> >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>>>> >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>>> >>>>>>>> [ 0.000000] Movable zone start for each node >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>>>> >>>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>>> >>>>>>> OK thank you for the example illustrating the concern. >>>>>>>> >>>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>>>> >>>>>>> Let's wait for EDK2 experts on this. >>>>>>> >>>>>> >>>>>> Happy to chime in, but I need a bit more context here. >>>>>> >>>>>> What is the problem, how does this path try to solve it, and why is >>>>>> that a bad idea? >>>>>> >>>>> Sure, sorry. >>>>> >>>>> This series: >>>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>>> https://patchwork.kernel.org/cover/10863301/ >>>>> >>>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>>> >>>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>>> the above ACPI tables, the DIMM slots are interpreted as not >>>>> hotpluggable memory slots (at least we think so). >>>>> >>>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>>> >>>>> So in this series, we are forced to not generate the hotpluggable memory >>>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>>> hotpluggable. >>>>> >>>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>>> >>>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>>> removing the above mentioned DT nodes does not look straightforward. >>>> >>>> The firmware is not enlightened about the ACPI content that comes from >>>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>>> as instructed through the ACPI linker/loader script, in order to install >>>> the ACPI content for the OS. No actual information is consumed by the >>>> firmware from the ACPI payload -- and that's a feature. >>>> >>>> The firmware does consume DT: >>>> >>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>>> the firmware (for its own information needs), and passed on to the OS. >>>> >>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>>> consumed only by the firmware (for its own information needs), and the >>>> DT is hidden from the OS. The OS gets only the ACPI content >>>> (processed/prepared as described above). > I am confused by the above statement actually. In the above case what > does happen if you pass the acpi=off in the kernel boot parameters? If you launch QEMU with "-no-acpi" and you pass "acpi=off" to the guest kernel, then the kernel will not boot successfully, as it will not get DT from the firmware, and it will ignore the ACPI tables that it does get from the firmware. Thanks Laszlo > > Thanks > > Eric >>>> >>>> >>>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>>> base/size pairs in all the memory nodes in the DT. For each such base >>>> address that is currently tracked as "nonexistent" in the GCD memory >>>> space map, the driver currently adds the base/size range as "system >>>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>>> to see as "conventional memory". >>>> >>>> If you need some memory ranges to show up as "special" in the UEFI >>>> memmap, then you need to distinguish them somehow from the "regular" >>>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>>> firmware, so that it act upon the discriminator that you set in the DT. >>>> >>>> >>>> Now... from a brief look at the Platform Init and UEFI specs, my >>>> impression is that the hotpluggable (but presently not plugged) DIMM >>>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>>> in full.) If my impression is correct, then two options (alternatives) >>>> exist: >>>> >>>> (1) Hide the affected memory nodes -- or at least the affected base/size >>>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>>> external firmware loaded. Then the firmware will not expose those ranges >>>> as "conventional memory" in the UEFI memmap. This approach requires no >>>> changes to edk2. >>>> >>>> This option is precisely what Eric described up-thread, at >>>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>>> >>>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>>> have a FW image. If this one is not set, you can induce dt. But if >>>>> there is a FW it can be either DT or ACPI booted. You also have the >>>>> acpi_enabled knob. >>>> >>>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>>> "vl.c"). >>>> >>>> So, the condition for hiding the hotpluggable memory nodes in question >>>> from the DT is: >>>> >>>> (aarch64 && firmware_loaded && acpi_enabled) >>> I'd go with this one, though I have a question for firmware side. >>> Let's assume we would want in future to expose hotpluggable & present >>> memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically >>> can avoid using it for Normal zone based on hint from SRAT table early >>> at boot), but what about firmware can it inspect SRAT table and not use >>> hotpluggable ranges for its own use (or at least do not canibalize >>> them permanently)? >> >> This is actually two questions: >> >> (a) Can the firmware inspect SRAT? >> >> If the SRAT table structure isn't very complex, this is technically >> doable, but the wrong thing to do, IMO. >> >> First, we've tried hard to avoid enlightening the firmware about the >> semantics of QEMU's ACPI tables. >> >> Second, this would introduce an ordering constraint (or callbacks) in >> the firmware, between the driver that processes & installs the ACPI >> tables, and the driver that translates the memory nodes of the DT to the >> memory ranges known to UEFI and the OS. >> >> If we need such hinting, then option (2) below (from earlier context) >> would be better: >> - If it's OK to use an arm/aarch64 specific solution, then new DT >> properties should work. >> - If it should be arch-independent, then a dedicated fw_cfg file would >> be better. >> >> (b) Assuming we have the information from some source, can the firmware >> expose some memory ranges as "usable RAM" to the OS, while staying away >> from them for its own (firmware) purposes? >> >> After consulting >> >> Table 25. Memory Type Usage before ExitBootServices() >> Table 26. Memory Type Usage after ExitBootServices() >> >> in UEFI-2.7, I would say that the firmware driver that installs these >> ranges to the memory (space) map should also allocate the ranges right >> after, as EfiBootServicesData. This will prevent other drivers / >> applications in the firmware from allocating chunks out of those areas, >> and the OS will be at liberty to release and repurpose the ranges after >> ExitBootServices(). >> >> Thanks, >> Laszlo >> >>>> (2) Invent and set an "ignore me, firmware" property for the >>>> hotpluggable memory nodes in the DT, and update the firmware to honor >>>> that property. >>>> >>>> Thanks >>>> Laszlo >>> >>
On 04/02/19 17:52, Laszlo Ersek wrote: > On 04/02/19 17:42, Auger Eric wrote: >>>>> The firmware does consume DT: >>>>> >>>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>>>> the firmware (for its own information needs), and passed on to the OS. >>>>> >>>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>>>> consumed only by the firmware (for its own information needs), and the >>>>> DT is hidden from the OS. The OS gets only the ACPI content >>>>> (processed/prepared as described above). > >> I am confused by the above statement actually. In the above case what >> does happen if you pass the acpi=off in the kernel boot parameters? > > If you launch QEMU with "-no-acpi" and you pass "acpi=off" to the guest > kernel, then the kernel will not boot successfully, as it will not get > DT from the firmware, and it will ignore the ACPI tables that it does > get from the firmware. Sorry, I ended up answering "what happens when you run QEMU *without* -no-acpi and pass acpi=off to the guest kernel". To explain what happens when you boot *with* -no-acpi: in that case, "acpi=off" doesn't matter, since the guest kernel doesn't get ACPI tables anyway. The kernel will go for DT. Thanks Laszlo
Laszlo, On 4/2/19 5:52 PM, Laszlo Ersek wrote: > On 04/02/19 17:42, Auger Eric wrote: >> Hi Laszlo, >> >> On 4/2/19 12:33 PM, Laszlo Ersek wrote: >>> On 04/02/19 09:42, Igor Mammedov wrote: >>>> On Mon, 1 Apr 2019 15:07:05 +0200 >>>> Laszlo Ersek <lersek@redhat.com> wrote: >>>> >>>>> On 03/29/19 14:56, Auger Eric wrote: >>>>>> Hi Ard, >>>>>> >>>>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>>>>> >>>>>>>> Hi Shameer, >>>>>>>> >>>>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>>>>> >>>>>>>>>> Hi Shameer, >>>>>>>>>> >>>>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>>>> >>>>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>>>> device-memory support is added for DT boot. >>>>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>>>>> any PCDIMM nodes. >>>>>>>>>>> >>>>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>>>>> info. >>>>>>>>> >>>>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>>>> >>>>>>>>> Also, to be more clear on what happens, >>>>>>>>> >>>>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>>>> >>>>>>>>> From kernel log, >>>>>>>>> >>>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>>>>> >>>>>>>>> >>>>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>>>> >>>>>>>>> [ 0.000000] Movable zone start for each node >>>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>>>>> >>>>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>>>> >>>>>>>> OK thank you for the example illustrating the concern. >>>>>>>>> >>>>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>>>>> >>>>>>>> Let's wait for EDK2 experts on this. >>>>>>>> >>>>>>> >>>>>>> Happy to chime in, but I need a bit more context here. >>>>>>> >>>>>>> What is the problem, how does this path try to solve it, and why is >>>>>>> that a bad idea? >>>>>>> >>>>>> Sure, sorry. >>>>>> >>>>>> This series: >>>>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>>>> https://patchwork.kernel.org/cover/10863301/ >>>>>> >>>>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>>>> >>>>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>>>> the above ACPI tables, the DIMM slots are interpreted as not >>>>>> hotpluggable memory slots (at least we think so). >>>>>> >>>>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>>>> >>>>>> So in this series, we are forced to not generate the hotpluggable memory >>>>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>>>> hotpluggable. >>>>>> >>>>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>>>> >>>>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>>>> removing the above mentioned DT nodes does not look straightforward. >>>>> >>>>> The firmware is not enlightened about the ACPI content that comes from >>>>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>>>> as instructed through the ACPI linker/loader script, in order to install >>>>> the ACPI content for the OS. No actual information is consumed by the >>>>> firmware from the ACPI payload -- and that's a feature. >>>>> >>>>> The firmware does consume DT: >>>>> >>>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>>>> the firmware (for its own information needs), and passed on to the OS. >>>>> >>>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>>>> consumed only by the firmware (for its own information needs), and the >>>>> DT is hidden from the OS. The OS gets only the ACPI content >>>>> (processed/prepared as described above). > >> I am confused by the above statement actually. In the above case what >> does happen if you pass the acpi=off in the kernel boot parameters? > > If you launch QEMU with "-no-acpi" and you pass "acpi=off" to the guest > kernel, then the kernel will not boot successfully, as it will not get > DT from the firmware, and it will ignore the ACPI tables that it does > get from the firmware. Yup. Sorry this was hidden in my launch scripts. Thanks! Eric > > Thanks > Laszlo > >> >> Thanks >> >> Eric >>>>> >>>>> >>>>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>>>> base/size pairs in all the memory nodes in the DT. For each such base >>>>> address that is currently tracked as "nonexistent" in the GCD memory >>>>> space map, the driver currently adds the base/size range as "system >>>>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>>>> to see as "conventional memory". >>>>> >>>>> If you need some memory ranges to show up as "special" in the UEFI >>>>> memmap, then you need to distinguish them somehow from the "regular" >>>>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>>>> firmware, so that it act upon the discriminator that you set in the DT. >>>>> >>>>> >>>>> Now... from a brief look at the Platform Init and UEFI specs, my >>>>> impression is that the hotpluggable (but presently not plugged) DIMM >>>>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>>>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>>>> in full.) If my impression is correct, then two options (alternatives) >>>>> exist: >>>>> >>>>> (1) Hide the affected memory nodes -- or at least the affected base/size >>>>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>>>> external firmware loaded. Then the firmware will not expose those ranges >>>>> as "conventional memory" in the UEFI memmap. This approach requires no >>>>> changes to edk2. >>>>> >>>>> This option is precisely what Eric described up-thread, at >>>>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>>>> >>>>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>>>> have a FW image. If this one is not set, you can induce dt. But if >>>>>> there is a FW it can be either DT or ACPI booted. You also have the >>>>>> acpi_enabled knob. >>>>> >>>>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>>>> "vl.c"). >>>>> >>>>> So, the condition for hiding the hotpluggable memory nodes in question >>>>> from the DT is: >>>>> >>>>> (aarch64 && firmware_loaded && acpi_enabled) >>>> I'd go with this one, though I have a question for firmware side. >>>> Let's assume we would want in future to expose hotpluggable & present >>>> memory via GetMemoryMap() (like bare-metal does) (guest OS theoretically >>>> can avoid using it for Normal zone based on hint from SRAT table early >>>> at boot), but what about firmware can it inspect SRAT table and not use >>>> hotpluggable ranges for its own use (or at least do not canibalize >>>> them permanently)? >>> >>> This is actually two questions: >>> >>> (a) Can the firmware inspect SRAT? >>> >>> If the SRAT table structure isn't very complex, this is technically >>> doable, but the wrong thing to do, IMO. >>> >>> First, we've tried hard to avoid enlightening the firmware about the >>> semantics of QEMU's ACPI tables. >>> >>> Second, this would introduce an ordering constraint (or callbacks) in >>> the firmware, between the driver that processes & installs the ACPI >>> tables, and the driver that translates the memory nodes of the DT to the >>> memory ranges known to UEFI and the OS. >>> >>> If we need such hinting, then option (2) below (from earlier context) >>> would be better: >>> - If it's OK to use an arm/aarch64 specific solution, then new DT >>> properties should work. >>> - If it should be arch-independent, then a dedicated fw_cfg file would >>> be better. >>> >>> (b) Assuming we have the information from some source, can the firmware >>> expose some memory ranges as "usable RAM" to the OS, while staying away >>> from them for its own (firmware) purposes? >>> >>> After consulting >>> >>> Table 25. Memory Type Usage before ExitBootServices() >>> Table 26. Memory Type Usage after ExitBootServices() >>> >>> in UEFI-2.7, I would say that the firmware driver that installs these >>> ranges to the memory (space) map should also allocate the ranges right >>> after, as EfiBootServicesData. This will prevent other drivers / >>> applications in the firmware from allocating chunks out of those areas, >>> and the OS will be at liberty to release and repurpose the ranges after >>> ExitBootServices(). >>> >>> Thanks, >>> Laszlo >>> >>>>> (2) Invent and set an "ignore me, firmware" property for the >>>>> hotpluggable memory nodes in the DT, and update the firmware to honor >>>>> that property. >>>>> >>>>> Thanks >>>>> Laszlo >>>> >>> > >
On Tue, 2 Apr 2019 17:38:26 +0200 Laszlo Ersek <lersek@redhat.com> wrote: > On 04/02/19 17:29, Auger Eric wrote: > > Hi Laszlo, > > > > On 4/1/19 3:07 PM, Laszlo Ersek wrote: > >> On 03/29/19 14:56, Auger Eric wrote: > >>> Hi Ard, > >>> > >>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: > >>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: > >>>>> > >>>>> Hi Shameer, > >>>>> > >>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] > >>>>>>> Sent: 29 March 2019 09:32 > >>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > >>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > >>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > >>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com > >>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > >>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > >>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > >>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > >>>>>>> > >>>>>>> Hi Shameer, > >>>>>>> > >>>>>>> [ + Laszlo, Ard, Leif ] > >>>>>>> > >>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: > >>>>>>>> This is to disable/enable populating DT nodes in case > >>>>>>>> any conflict with acpi tables. The default is "off". > >>>>>>> The name of the option sounds misleading to me. Also we don't really > >>>>>>> know the scope of the disablement. At the moment this just aims to > >>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > >>>>>>> > >>>>>>>> > >>>>>>>> This will be used in subsequent patch where cold plug > >>>>>>>> device-memory support is added for DT boot. > >>>>>>> I am concerned about the fact that in dt mode, by default, you won't see > >>>>>>> any PCDIMM nodes. > >>>>>>>> > >>>>>>>> If DT memory node support is added for cold-plugged device > >>>>>>>> memory, those memory will be visible to Guest kernel via > >>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. > >>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether > >>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this > >>>>>>> info. > >>>>>> > >>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. > >>>>>> > >>>>>> Also, to be more clear on what happens, > >>>>>> > >>>>>> Guest ACPI boot with "fdt=on" , > >>>>>> > >>>>>> From kernel log, > >>>>>> > >>>>>> [ 0.000000] Early memory node ranges > >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. > >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] > >>>>>> > >>>>>> > >>>>>> Guest ACPI boot with "fdt=off" , > >>>>>> > >>>>>> [ 0.000000] Movable zone start for each node > >>>>>> [ 0.000000] Early memory node ranges > >>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff > >>>>>> > >>>>>> The hotpluggable memory node is absent from early memory nodes here. > >>>>> > >>>>> OK thank you for the example illustrating the concern. > >>>>>> > >>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. > >>>>> > >>>>> Let's wait for EDK2 experts on this. > >>>>> > >>>> > >>>> Happy to chime in, but I need a bit more context here. > >>>> > >>>> What is the problem, how does this path try to solve it, and why is > >>>> that a bad idea? > >>>> > >>> Sure, sorry. > >>> > >>> This series: > >>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > >>> https://patchwork.kernel.org/cover/10863301/ > >>> > >>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > >>> SRAT and DSDT parts and relies on GED to trigger the hotplug. > >>> > >>> We noticed that if we build the hotpluggable memory dt nodes on top of > >>> the above ACPI tables, the DIMM slots are interpreted as not > >>> hotpluggable memory slots (at least we think so). > >>> > >>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > >>> fact that those slots are exposed as hotpluggable in the SRAT for example. > >>> > >>> So in this series, we are forced to not generate the hotpluggable memory > >>> dt nodes if we want the DIMM slots to be effectively recognized as > >>> hotpluggable. > >>> > >>> Could you confirm we have a correct understanding of the EDK2 behaviour > >>> and if so, would there be any solution for EDK2 to absorb both the DT > >>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > >>> > >>> At qemu level, detecting we are booting in ACPI mode and purposely > >>> removing the above mentioned DT nodes does not look straightforward. > >> > >> The firmware is not enlightened about the ACPI content that comes from > >> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > >> as instructed through the ACPI linker/loader script, in order to install > >> the ACPI content for the OS. No actual information is consumed by the > >> firmware from the ACPI payload -- and that's a feature. > >> > >> The firmware does consume DT: > >> > >> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > >> the firmware (for its own information needs), and passed on to the OS. > >> > >> - If you start QEMU *without* "-no-acpi" (the default), then the DT is > >> consumed only by the firmware (for its own information needs), and the > >> DT is hidden from the OS. The OS gets only the ACPI content > >> (processed/prepared as described above). > >> > >> > >> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > >> base/size pairs in all the memory nodes in the DT. For each such base > >> address that is currently tracked as "nonexistent" in the GCD memory > >> space map, the driver currently adds the base/size range as "system > >> memory". This in turn is reflected by the UEFI memmap that the OS gets > >> to see as "conventional memory". > >> > >> If you need some memory ranges to show up as "special" in the UEFI > >> memmap, then you need to distinguish them somehow from the "regular" > >> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the > >> firmware, so that it act upon the discriminator that you set in the DT. > >> > >> > >> Now... from a brief look at the Platform Init and UEFI specs, my > >> impression is that the hotpluggable (but presently not plugged) DIMM > >> ranges should simply be *absent* from the UEFI memmap; is that correct? > >> (I didn't check the ACPI spec, maybe it specifies the expected behavior > >> in full.) If my impression is correct, then two options (alternatives) > >> exist: > >> > >> (1) Hide the affected memory nodes -- or at least the affected base/size > >> pairs -- from the DT, in case you boot without "-no-acpi" but with an > >> external firmware loaded. Then the firmware will not expose those ranges > >> as "conventional memory" in the UEFI memmap. This approach requires no > >> changes to edk2. > >> > >> This option is precisely what Eric described up-thread, at > >> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: > >> > >>> in machvirt_init, there is firmware_loaded that tells you whether you > >>> have a FW image. If this one is not set, you can induce dt. But if > >>> there is a FW it can be either DT or ACPI booted. You also have the > >>> acpi_enabled knob. > >> > >> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > >> "vl.c"). > >> > >> So, the condition for hiding the hotpluggable memory nodes in question > >> from the DT is: > > > >> > >> (aarch64 && firmware_loaded && acpi_enabled) > > > > Thanks a lot for all those inputs! > > > > I don't get why we test aarch64 in above condition (this was useful for > > high ECAM range as the aarch32 FW was not supporting it but here, is it > > still meaningful?) > > Sorry, I should have clarified that. Yes, it is meaningful: > > While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > not the reverse, on ARM.) So if you run the 32-bit build of the > ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > the OS is the DT. > > This "bitness distinction" is implemented in the firmware already. If > you hid the memory nodes from the DT under the condition > > (!aarch64 && firmware_loaded && acpi_enabled) > > then the nodes would not be seen by the OS at all (because > "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > all the OS can ever get is DT). It's getting tricky and I don't like a bit that we are trying to carter 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has a valid about guessing on QEMU side (that's usually a source of problem in the future). Perhaps we should reconsider and think about marking hotplugbbale RAM in DT and let firmware to exclude it from memory map. > Thanks, > Laszlo > > > > > Thanks > > > > Eric > > > >> > >> > >> (2) Invent and set an "ignore me, firmware" property for the > >> hotpluggable memory nodes in the DT, and update the firmware to honor > >> that property. > >> > >> Thanks > >> Laszlo > >> >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: 03 April 2019 10:49 > To: Laszlo Ersek <lersek@redhat.com> > Cc: Auger Eric <eric.auger@redhat.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org>; peter.maydell@linaro.org; > sameo@linux.intel.com; qemu-devel@nongnu.org; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; Linuxarm > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > sebastien.boeuf@intel.com; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > feature "fdt" > > On Tue, 2 Apr 2019 17:38:26 +0200 > Laszlo Ersek <lersek@redhat.com> wrote: [...] > > >>> Sure, sorry. > > >>> > > >>> This series: > > >>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > > >>> https://patchwork.kernel.org/cover/10863301/ > > >>> > > >>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > > >>> SRAT and DSDT parts and relies on GED to trigger the hotplug. > > >>> > > >>> We noticed that if we build the hotpluggable memory dt nodes on top of > > >>> the above ACPI tables, the DIMM slots are interpreted as not > > >>> hotpluggable memory slots (at least we think so). > > >>> > > >>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores > the > > >>> fact that those slots are exposed as hotpluggable in the SRAT for > example. > > >>> > > >>> So in this series, we are forced to not generate the hotpluggable memory > > >>> dt nodes if we want the DIMM slots to be effectively recognized as > > >>> hotpluggable. > > >>> > > >>> Could you confirm we have a correct understanding of the EDK2 > behaviour > > >>> and if so, would there be any solution for EDK2 to absorb both the DT > > >>> nodes and the relevant SRAT/DSDT tables and make the slots > hotpluggable. > > >>> > > >>> At qemu level, detecting we are booting in ACPI mode and purposely > > >>> removing the above mentioned DT nodes does not look straightforward. > > >> > > >> The firmware is not enlightened about the ACPI content that comes from > > >> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > > >> as instructed through the ACPI linker/loader script, in order to install > > >> the ACPI content for the OS. No actual information is consumed by the > > >> firmware from the ACPI payload -- and that's a feature. > > >> > > >> The firmware does consume DT: > > >> > > >> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > > >> the firmware (for its own information needs), and passed on to the OS. > > >> > > >> - If you start QEMU *without* "-no-acpi" (the default), then the DT is > > >> consumed only by the firmware (for its own information needs), and the > > >> DT is hidden from the OS. The OS gets only the ACPI content > > >> (processed/prepared as described above). > > >> > > >> > > >> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > > >> base/size pairs in all the memory nodes in the DT. For each such base > > >> address that is currently tracked as "nonexistent" in the GCD memory > > >> space map, the driver currently adds the base/size range as "system > > >> memory". This in turn is reflected by the UEFI memmap that the OS gets > > >> to see as "conventional memory". > > >> > > >> If you need some memory ranges to show up as "special" in the UEFI > > >> memmap, then you need to distinguish them somehow from the "regular" > > >> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in > the > > >> firmware, so that it act upon the discriminator that you set in the DT. > > >> > > >> > > >> Now... from a brief look at the Platform Init and UEFI specs, my > > >> impression is that the hotpluggable (but presently not plugged) DIMM > > >> ranges should simply be *absent* from the UEFI memmap; is that > correct? > > >> (I didn't check the ACPI spec, maybe it specifies the expected behavior > > >> in full.) If my impression is correct, then two options (alternatives) > > >> exist: > > >> > > >> (1) Hide the affected memory nodes -- or at least the affected base/size > > >> pairs -- from the DT, in case you boot without "-no-acpi" but with an > > >> external firmware loaded. Then the firmware will not expose those ranges > > >> as "conventional memory" in the UEFI memmap. This approach requires > no > > >> changes to edk2. > > >> > > >> This option is precisely what Eric described up-thread, at > > >> > <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redh > at.com>: > > >> > > >>> in machvirt_init, there is firmware_loaded that tells you whether you > > >>> have a FW image. If this one is not set, you can induce dt. But if > > >>> there is a FW it can be either DT or ACPI booted. You also have the > > >>> acpi_enabled knob. > > >> > > >> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > > >> "vl.c"). > > >> > > >> So, the condition for hiding the hotpluggable memory nodes in question > > >> from the DT is: > > > > > >> > > >> (aarch64 && firmware_loaded && acpi_enabled) > > > > > > Thanks a lot for all those inputs! > > > > > > I don't get why we test aarch64 in above condition (this was useful for > > > high ECAM range as the aarch32 FW was not supporting it but here, is it > > > still meaningful?) > > > > Sorry, I should have clarified that. Yes, it is meaningful: > > > > While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > > 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > > not the reverse, on ARM.) So if you run the 32-bit build of the > > ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > > the OS is the DT. Just to confirm, does that mean with 32-bit build of the UEFI, the OS cannot boot with ACPI and uses DT only. So, If ((aarch64 && firmware_loaded && acpi_enabled) { Hide_hotpluggable_memory_nodes() } else { Add_ hotpluggable_memory_nodes() } should work for all cases? > > This "bitness distinction" is implemented in the firmware already. If > > you hid the memory nodes from the DT under the condition > > > > (!aarch64 && firmware_loaded && acpi_enabled) > > > > then the nodes would not be seen by the OS at all (because > > "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > > all the OS can ever get is DT). > > It's getting tricky and I don't like a bit that we are trying to carter > 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > a valid about guessing on QEMU side (that's usually a source of problem > in the future). If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI boot), then do we really have the issue of memory becoming non hot-un-unpluggable? May be I am missing something. Thanks, Shameer > Perhaps we should reconsider and think about marking hotplugbbale RAM > in DT and let firmware to exclude it from memory map. > > > Thanks, > > Laszlo > > > > > > > > Thanks > > > > > > Eric > > > > > >> > > >> > > >> (2) Invent and set an "ignore me, firmware" property for the > > >> hotpluggable memory nodes in the DT, and update the firmware to honor > > >> that property. > > >> > > >> Thanks > > >> Laszlo > > >> > >
On 04/03/19 11:49, Igor Mammedov wrote: > On Tue, 2 Apr 2019 17:38:26 +0200 > Laszlo Ersek <lersek@redhat.com> wrote: > >> On 04/02/19 17:29, Auger Eric wrote: >>> Hi Laszlo, >>> >>> On 4/1/19 3:07 PM, Laszlo Ersek wrote: >>>> On 03/29/19 14:56, Auger Eric wrote: >>>>> Hi Ard, >>>>> >>>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: >>>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: >>>>>>> >>>>>>> Hi Shameer, >>>>>>> >>>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] >>>>>>>>> Sent: 29 March 2019 09:32 >>>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; >>>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; >>>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; >>>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com >>>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; >>>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel >>>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> >>>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" >>>>>>>>> >>>>>>>>> Hi Shameer, >>>>>>>>> >>>>>>>>> [ + Laszlo, Ard, Leif ] >>>>>>>>> >>>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: >>>>>>>>>> This is to disable/enable populating DT nodes in case >>>>>>>>>> any conflict with acpi tables. The default is "off". >>>>>>>>> The name of the option sounds misleading to me. Also we don't really >>>>>>>>> know the scope of the disablement. At the moment this just aims to >>>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> This will be used in subsequent patch where cold plug >>>>>>>>>> device-memory support is added for DT boot. >>>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see >>>>>>>>> any PCDIMM nodes. >>>>>>>>>> >>>>>>>>>> If DT memory node support is added for cold-plugged device >>>>>>>>>> memory, those memory will be visible to Guest kernel via >>>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. >>>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether >>>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this >>>>>>>>> info. >>>>>>>> >>>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. >>>>>>>> >>>>>>>> Also, to be more clear on what happens, >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=on" , >>>>>>>> >>>>>>>> From kernel log, >>>>>>>> >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] >>>>>>>> >>>>>>>> >>>>>>>> Guest ACPI boot with "fdt=off" , >>>>>>>> >>>>>>>> [ 0.000000] Movable zone start for each node >>>>>>>> [ 0.000000] Early memory node ranges >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff >>>>>>>> >>>>>>>> The hotpluggable memory node is absent from early memory nodes here. >>>>>>> >>>>>>> OK thank you for the example illustrating the concern. >>>>>>>> >>>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. >>>>>>> >>>>>>> Let's wait for EDK2 experts on this. >>>>>>> >>>>>> >>>>>> Happy to chime in, but I need a bit more context here. >>>>>> >>>>>> What is the problem, how does this path try to solve it, and why is >>>>>> that a bad idea? >>>>>> >>>>> Sure, sorry. >>>>> >>>>> This series: >>>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, >>>>> https://patchwork.kernel.org/cover/10863301/ >>>>> >>>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the >>>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. >>>>> >>>>> We noticed that if we build the hotpluggable memory dt nodes on top of >>>>> the above ACPI tables, the DIMM slots are interpreted as not >>>>> hotpluggable memory slots (at least we think so). >>>>> >>>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the >>>>> fact that those slots are exposed as hotpluggable in the SRAT for example. >>>>> >>>>> So in this series, we are forced to not generate the hotpluggable memory >>>>> dt nodes if we want the DIMM slots to be effectively recognized as >>>>> hotpluggable. >>>>> >>>>> Could you confirm we have a correct understanding of the EDK2 behaviour >>>>> and if so, would there be any solution for EDK2 to absorb both the DT >>>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. >>>>> >>>>> At qemu level, detecting we are booting in ACPI mode and purposely >>>>> removing the above mentioned DT nodes does not look straightforward. >>>> >>>> The firmware is not enlightened about the ACPI content that comes from >>>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, >>>> as instructed through the ACPI linker/loader script, in order to install >>>> the ACPI content for the OS. No actual information is consumed by the >>>> firmware from the ACPI payload -- and that's a feature. >>>> >>>> The firmware does consume DT: >>>> >>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by >>>> the firmware (for its own information needs), and passed on to the OS. >>>> >>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is >>>> consumed only by the firmware (for its own information needs), and the >>>> DT is hidden from the OS. The OS gets only the ACPI content >>>> (processed/prepared as described above). >>>> >>>> >>>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the >>>> base/size pairs in all the memory nodes in the DT. For each such base >>>> address that is currently tracked as "nonexistent" in the GCD memory >>>> space map, the driver currently adds the base/size range as "system >>>> memory". This in turn is reflected by the UEFI memmap that the OS gets >>>> to see as "conventional memory". >>>> >>>> If you need some memory ranges to show up as "special" in the UEFI >>>> memmap, then you need to distinguish them somehow from the "regular" >>>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the >>>> firmware, so that it act upon the discriminator that you set in the DT. >>>> >>>> >>>> Now... from a brief look at the Platform Init and UEFI specs, my >>>> impression is that the hotpluggable (but presently not plugged) DIMM >>>> ranges should simply be *absent* from the UEFI memmap; is that correct? >>>> (I didn't check the ACPI spec, maybe it specifies the expected behavior >>>> in full.) If my impression is correct, then two options (alternatives) >>>> exist: >>>> >>>> (1) Hide the affected memory nodes -- or at least the affected base/size >>>> pairs -- from the DT, in case you boot without "-no-acpi" but with an >>>> external firmware loaded. Then the firmware will not expose those ranges >>>> as "conventional memory" in the UEFI memmap. This approach requires no >>>> changes to edk2. >>>> >>>> This option is precisely what Eric described up-thread, at >>>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: >>>> >>>>> in machvirt_init, there is firmware_loaded that tells you whether you >>>>> have a FW image. If this one is not set, you can induce dt. But if >>>>> there is a FW it can be either DT or ACPI booted. You also have the >>>>> acpi_enabled knob. >>>> >>>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in >>>> "vl.c"). >>>> >>>> So, the condition for hiding the hotpluggable memory nodes in question >>>> from the DT is: >>> >>>> >>>> (aarch64 && firmware_loaded && acpi_enabled) >>> >>> Thanks a lot for all those inputs! >>> >>> I don't get why we test aarch64 in above condition (this was useful for >>> high ECAM range as the aarch32 FW was not supporting it but here, is it >>> still meaningful?) >> >> Sorry, I should have clarified that. Yes, it is meaningful: >> >> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a >> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but >> not the reverse, on ARM.) So if you run the 32-bit build of the >> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with >> the OS is the DT. >> >> This "bitness distinction" is implemented in the firmware already. If >> you hid the memory nodes from the DT under the condition >> >> (!aarch64 && firmware_loaded && acpi_enabled) >> >> then the nodes would not be seen by the OS at all (because >> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and >> all the OS can ever get is DT). > > It's getting tricky and I don't like a bit that we are trying to carter > 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > a valid about guessing on QEMU side (that's usually a source of problem > in the future). > > Perhaps we should reconsider and think about marking hotplugbbale RAM > in DT and let firmware to exclude it from memory map. I'm fine either way. (I'm glad to continue discussing either option; that shouldn't be taken as a preference on my end.) With option (2), please consider the new version dependency between QEMU and the firmware -- this may or may not affect migration. (Thinking about migration is difficult, so I'll leave that to you all :) ) Thanks Laszlo >>>> (2) Invent and set an "ignore me, firmware" property for the >>>> hotpluggable memory nodes in the DT, and update the firmware to honor >>>> that property. >>>> >>>> Thanks >>>> Laszlo >>>> >> >
On 04/03/19 14:10, Shameerali Kolothum Thodi wrote: >>>>> So, the condition for hiding the hotpluggable memory nodes in question >>>>> from the DT is: >>>> >>>>> >>>>> (aarch64 && firmware_loaded && acpi_enabled) >>> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a >>> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but >>> not the reverse, on ARM.) So if you run the 32-bit build of the >>> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with >>> the OS is the DT. > > Just to confirm, does that mean with 32-bit build of the UEFI, the OS cannot > boot with ACPI and uses DT only. Indeed. > So, > > If ((aarch64 && firmware_loaded && acpi_enabled) { > Hide_hotpluggable_memory_nodes() > } else { > Add_ hotpluggable_memory_nodes() > } > > should work for all cases? Yes. Here's what happens when any one of the subconditions evaluates to false: - ARM32 has no ACPI bindings, so the guest kernel can only use DT. - On AARCH64, if you don't "load the firmware" (= don't use UEFI), then there won't be an ACPI entry point for the OS to locate (the RSD PTR is defined by the ACPI spec in UEFI terms, for AARCH64). So the guest kernel can only use DT. - When on AARCH64 and using UEFI, but asking QEMU not to generate ACPI content, the firmware will not install any ACPI tables, so the guest kernel can only use DT. >>> This "bitness distinction" is implemented in the firmware already. If >>> you hid the memory nodes from the DT under the condition >>> >>> (!aarch64 && firmware_loaded && acpi_enabled) >>> >>> then the nodes would not be seen by the OS at all (because >>> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and >>> all the OS can ever get is DT). >> >> It's getting tricky and I don't like a bit that we are trying to carter >> 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has >> a valid about guessing on QEMU side (that's usually a source of problem >> in the future). > > If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI boot), > then do we really have the issue of memory becoming non hot-un-unpluggable? > May be I am missing something. I think Igor and Peter dislike adding complex logic to QEMU that reflects the behavior of a specific firmware. AIUI their objection isn't that it wouldn't work, but that it's not the right thing to do, from a design perspective. Thanks, Laszlo
Hi Laszlo, > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com] > Sent: 03 April 2019 14:29 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > Igor Mammedov <imammedo@redhat.com> > Cc: Auger Eric <eric.auger@redhat.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org>; peter.maydell@linaro.org; > sameo@linux.intel.com; qemu-devel@nongnu.org; Linuxarm > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > sebastien.boeuf@intel.com; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > feature "fdt" > > On 04/03/19 14:10, Shameerali Kolothum Thodi wrote: > > >>>>> So, the condition for hiding the hotpluggable memory nodes in question > >>>>> from the DT is: > >>>> > >>>>> > >>>>> (aarch64 && firmware_loaded && acpi_enabled) > > >>> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > >>> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > >>> not the reverse, on ARM.) So if you run the 32-bit build of the > >>> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > >>> the OS is the DT. > > > > Just to confirm, does that mean with 32-bit build of the UEFI, the OS cannot > > boot with ACPI and uses DT only. > > Indeed. > > > So, > > > > If ((aarch64 && firmware_loaded && acpi_enabled) { > > Hide_hotpluggable_memory_nodes() > > } else { > > Add_ hotpluggable_memory_nodes() > > } > > > > should work for all cases? > > Yes. > > Here's what happens when any one of the subconditions evaluates to false: > > - ARM32 has no ACPI bindings, so the guest kernel can only use DT. > > - On AARCH64, if you don't "load the firmware" (= don't use UEFI), then > there won't be an ACPI entry point for the OS to locate (the RSD PTR > is defined by the ACPI spec in UEFI terms, for AARCH64). So the guest > kernel can only use DT. > > - When on AARCH64 and using UEFI, but asking QEMU not to generate ACPI > content, the firmware will not install any ACPI tables, so the guest > kernel can only use DT. > Thanks. That makes it very clear. Much appreciated. > >>> This "bitness distinction" is implemented in the firmware already. If > >>> you hid the memory nodes from the DT under the condition > >>> > >>> (!aarch64 && firmware_loaded && acpi_enabled) > >>> > >>> then the nodes would not be seen by the OS at all (because > >>> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > >>> all the OS can ever get is DT). > >> > >> It's getting tricky and I don't like a bit that we are trying to carter > >> 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > >> a valid about guessing on QEMU side (that's usually a source of problem > >> in the future). > > > > If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI boot), > > then do we really have the issue of memory becoming non > hot-un-unpluggable? > > May be I am missing something. > > I think Igor and Peter dislike adding complex logic to QEMU that > reflects the behavior of a specific firmware. AIUI their objection isn't > that it wouldn't work, but that it's not the right thing to do, from a > design perspective. Understood. Hope we can converge on something soon. Cheers, Shameer
On Wed, 3 Apr 2019 16:25:49 +0000 Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > Hi Laszlo, > > > -----Original Message----- > > From: Laszlo Ersek [mailto:lersek@redhat.com] > > Sent: 03 April 2019 14:29 > > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > > Igor Mammedov <imammedo@redhat.com> > > Cc: Auger Eric <eric.auger@redhat.com>; Ard Biesheuvel > > <ard.biesheuvel@linaro.org>; peter.maydell@linaro.org; > > sameo@linux.intel.com; qemu-devel@nongnu.org; Linuxarm > > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > > sebastien.boeuf@intel.com; Leif Lindholm <Leif.Lindholm@arm.com> > > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > > feature "fdt" > > > > On 04/03/19 14:10, Shameerali Kolothum Thodi wrote: > > > > >>>>> So, the condition for hiding the hotpluggable memory nodes in question > > >>>>> from the DT is: > > >>>> > > >>>>> > > >>>>> (aarch64 && firmware_loaded && acpi_enabled) > > > > >>> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > > >>> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > > >>> not the reverse, on ARM.) So if you run the 32-bit build of the > > >>> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > > >>> the OS is the DT. > > > > > > Just to confirm, does that mean with 32-bit build of the UEFI, the OS cannot > > > boot with ACPI and uses DT only. > > > > Indeed. > > > > > So, > > > > > > If ((aarch64 && firmware_loaded && acpi_enabled) { > > > Hide_hotpluggable_memory_nodes() > > > } else { > > > Add_ hotpluggable_memory_nodes() > > > } > > > > > > should work for all cases? > > > > Yes. > > > > Here's what happens when any one of the subconditions evaluates to false: > > > > - ARM32 has no ACPI bindings, so the guest kernel can only use DT. > > > > - On AARCH64, if you don't "load the firmware" (= don't use UEFI), then > > there won't be an ACPI entry point for the OS to locate (the RSD PTR > > is defined by the ACPI spec in UEFI terms, for AARCH64). So the guest > > kernel can only use DT. > > > > - When on AARCH64 and using UEFI, but asking QEMU not to generate ACPI > > content, the firmware will not install any ACPI tables, so the guest > > kernel can only use DT. > > > > Thanks. That makes it very clear. Much appreciated. > > > >>> This "bitness distinction" is implemented in the firmware already. If > > >>> you hid the memory nodes from the DT under the condition > > >>> > > >>> (!aarch64 && firmware_loaded && acpi_enabled) > > >>> > > >>> then the nodes would not be seen by the OS at all (because > > >>> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > > >>> all the OS can ever get is DT). > > >> > > >> It's getting tricky and I don't like a bit that we are trying to carter > > >> 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > > >> a valid about guessing on QEMU side (that's usually a source of problem > > >> in the future). > > > > > > If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI boot), > > > then do we really have the issue of memory becoming non > > hot-un-unpluggable? > > > May be I am missing something. > > > > I think Igor and Peter dislike adding complex logic to QEMU that > > reflects the behavior of a specific firmware. AIUI their objection isn't > > that it wouldn't work, but that it's not the right thing to do, from a > > design perspective. > > Understood. Hope we can converge on something soon. Lets try adding a parameter to memory descriptors in DT that would mark them as hotpluggable. > Cheers, > Shameer
On Wed, 3 Apr 2019 15:19:52 +0200 Laszlo Ersek <lersek@redhat.com> wrote: > On 04/03/19 11:49, Igor Mammedov wrote: > > On Tue, 2 Apr 2019 17:38:26 +0200 > > Laszlo Ersek <lersek@redhat.com> wrote: > > > >> On 04/02/19 17:29, Auger Eric wrote: > >>> Hi Laszlo, > >>> > >>> On 4/1/19 3:07 PM, Laszlo Ersek wrote: > >>>> On 03/29/19 14:56, Auger Eric wrote: > >>>>> Hi Ard, > >>>>> > >>>>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote: > >>>>>> On Fri, 29 Mar 2019 at 14:12, Auger Eric <eric.auger@redhat.com> wrote: > >>>>>>> > >>>>>>> Hi Shameer, > >>>>>>> > >>>>>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Auger Eric [mailto:eric.auger@redhat.com] > >>>>>>>>> Sent: 29 March 2019 09:32 > >>>>>>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > >>>>>>>>> qemu-devel@nongnu.org; qemu-arm@nongnu.org; imammedo@redhat.com; > >>>>>>>>> peter.maydell@linaro.org; shannon.zhaosl@gmail.com; > >>>>>>>>> sameo@linux.intel.com; sebastien.boeuf@intel.com > >>>>>>>>> Cc: Linuxarm <linuxarm@huawei.com>; xuwei (O) <xuwei5@huawei.com>; > >>>>>>>>> Laszlo Ersek <lersek@redhat.com>; Ard Biesheuvel > >>>>>>>>> <ard.biesheuvel@linaro.org>; Leif Lindholm <Leif.Lindholm@arm.com> > >>>>>>>>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt" > >>>>>>>>> > >>>>>>>>> Hi Shameer, > >>>>>>>>> > >>>>>>>>> [ + Laszlo, Ard, Leif ] > >>>>>>>>> > >>>>>>>>> On 3/21/19 11:47 AM, Shameer Kolothum wrote: > >>>>>>>>>> This is to disable/enable populating DT nodes in case > >>>>>>>>>> any conflict with acpi tables. The default is "off". > >>>>>>>>> The name of the option sounds misleading to me. Also we don't really > >>>>>>>>> know the scope of the disablement. At the moment this just aims to > >>>>>>>>> prevent the hotpluggable dt nodes from being added if we boot in ACPI mode. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This will be used in subsequent patch where cold plug > >>>>>>>>>> device-memory support is added for DT boot. > >>>>>>>>> I am concerned about the fact that in dt mode, by default, you won't see > >>>>>>>>> any PCDIMM nodes. > >>>>>>>>>> > >>>>>>>>>> If DT memory node support is added for cold-plugged device > >>>>>>>>>> memory, those memory will be visible to Guest kernel via > >>>>>>>>>> UEFI GetMemoryMap() and gets treated as early boot memory. > >>>>>>>>> Don't we have an issue in UEFI then. Normally the SRAT indicates whether > >>>>>>>>> the slots are hotpluggable or not. Shouldn't the UEFI code look at this > >>>>>>>>> info. > >>>>>>>> > >>>>>>>> Sorry I missed this part. Yes, that will be a more cleaner solution. > >>>>>>>> > >>>>>>>> Also, to be more clear on what happens, > >>>>>>>> > >>>>>>>> Guest ACPI boot with "fdt=on" , > >>>>>>>> > >>>>>>>> From kernel log, > >>>>>>>> > >>>>>>>> [ 0.000000] Early memory node ranges > >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000c0000000-0x00000000ffffffff] --> This is the hotpluggable memory node from DT. > >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000ffffffff] > >>>>>>>> > >>>>>>>> > >>>>>>>> Guest ACPI boot with "fdt=off" , > >>>>>>>> > >>>>>>>> [ 0.000000] Movable zone start for each node > >>>>>>>> [ 0.000000] Early memory node ranges > >>>>>>>> [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bbf5ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bbf60000-0x00000000bbffffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc000000-0x00000000bc02ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc030000-0x00000000bc36ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bc370000-0x00000000bf64ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf650000-0x00000000bf6dffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6e0000-0x00000000bf6effff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf6f0000-0x00000000bf80ffff] > >>>>>>>> [ 0.000000] node 0: [mem 0x00000000bf810000-0x00000000bfffffff] > >>>>>>>> [ 0.000000] Zeroed struct page in unavailable ranges: 1040 pages > >>>>>>>> [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff > >>>>>>>> > >>>>>>>> The hotpluggable memory node is absent from early memory nodes here. > >>>>>>> > >>>>>>> OK thank you for the example illustrating the concern. > >>>>>>>> > >>>>>>>> As you said, it could be possible to detect this node using SRAT in UEFI. > >>>>>>> > >>>>>>> Let's wait for EDK2 experts on this. > >>>>>>> > >>>>>> > >>>>>> Happy to chime in, but I need a bit more context here. > >>>>>> > >>>>>> What is the problem, how does this path try to solve it, and why is > >>>>>> that a bad idea? > >>>>>> > >>>>> Sure, sorry. > >>>>> > >>>>> This series: > >>>>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support, > >>>>> https://patchwork.kernel.org/cover/10863301/ > >>>>> > >>>>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the > >>>>> SRAT and DSDT parts and relies on GED to trigger the hotplug. > >>>>> > >>>>> We noticed that if we build the hotpluggable memory dt nodes on top of > >>>>> the above ACPI tables, the DIMM slots are interpreted as not > >>>>> hotpluggable memory slots (at least we think so). > >>>>> > >>>>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores the > >>>>> fact that those slots are exposed as hotpluggable in the SRAT for example. > >>>>> > >>>>> So in this series, we are forced to not generate the hotpluggable memory > >>>>> dt nodes if we want the DIMM slots to be effectively recognized as > >>>>> hotpluggable. > >>>>> > >>>>> Could you confirm we have a correct understanding of the EDK2 behaviour > >>>>> and if so, would there be any solution for EDK2 to absorb both the DT > >>>>> nodes and the relevant SRAT/DSDT tables and make the slots hotpluggable. > >>>>> > >>>>> At qemu level, detecting we are booting in ACPI mode and purposely > >>>>> removing the above mentioned DT nodes does not look straightforward. > >>>> > >>>> The firmware is not enlightened about the ACPI content that comes from > >>>> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware, > >>>> as instructed through the ACPI linker/loader script, in order to install > >>>> the ACPI content for the OS. No actual information is consumed by the > >>>> firmware from the ACPI payload -- and that's a feature. > >>>> > >>>> The firmware does consume DT: > >>>> > >>>> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by > >>>> the firmware (for its own information needs), and passed on to the OS. > >>>> > >>>> - If you start QEMU *without* "-no-acpi" (the default), then the DT is > >>>> consumed only by the firmware (for its own information needs), and the > >>>> DT is hidden from the OS. The OS gets only the ACPI content > >>>> (processed/prepared as described above). > >>>> > >>>> > >>>> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the > >>>> base/size pairs in all the memory nodes in the DT. For each such base > >>>> address that is currently tracked as "nonexistent" in the GCD memory > >>>> space map, the driver currently adds the base/size range as "system > >>>> memory". This in turn is reflected by the UEFI memmap that the OS gets > >>>> to see as "conventional memory". > >>>> > >>>> If you need some memory ranges to show up as "special" in the UEFI > >>>> memmap, then you need to distinguish them somehow from the "regular" > >>>> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in the > >>>> firmware, so that it act upon the discriminator that you set in the DT. > >>>> > >>>> > >>>> Now... from a brief look at the Platform Init and UEFI specs, my > >>>> impression is that the hotpluggable (but presently not plugged) DIMM > >>>> ranges should simply be *absent* from the UEFI memmap; is that correct? > >>>> (I didn't check the ACPI spec, maybe it specifies the expected behavior > >>>> in full.) If my impression is correct, then two options (alternatives) > >>>> exist: > >>>> > >>>> (1) Hide the affected memory nodes -- or at least the affected base/size > >>>> pairs -- from the DT, in case you boot without "-no-acpi" but with an > >>>> external firmware loaded. Then the firmware will not expose those ranges > >>>> as "conventional memory" in the UEFI memmap. This approach requires no > >>>> changes to edk2. > >>>> > >>>> This option is precisely what Eric described up-thread, at > >>>> <http://mid.mail-archive.com/3f0a5793-dd35-a497-2248-8eb0cd3c3a16@redhat.com>: > >>>> > >>>>> in machvirt_init, there is firmware_loaded that tells you whether you > >>>>> have a FW image. If this one is not set, you can induce dt. But if > >>>>> there is a FW it can be either DT or ACPI booted. You also have the > >>>>> acpi_enabled knob. > >>>> > >>>> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in > >>>> "vl.c"). > >>>> > >>>> So, the condition for hiding the hotpluggable memory nodes in question > >>>> from the DT is: > >>> > >>>> > >>>> (aarch64 && firmware_loaded && acpi_enabled) > >>> > >>> Thanks a lot for all those inputs! > >>> > >>> I don't get why we test aarch64 in above condition (this was useful for > >>> high ECAM range as the aarch32 FW was not supporting it but here, is it > >>> still meaningful?) > >> > >> Sorry, I should have clarified that. Yes, it is meaningful: > >> > >> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a > >> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but > >> not the reverse, on ARM.) So if you run the 32-bit build of the > >> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with > >> the OS is the DT. > >> > >> This "bitness distinction" is implemented in the firmware already. If > >> you hid the memory nodes from the DT under the condition > >> > >> (!aarch64 && firmware_loaded && acpi_enabled) > >> > >> then the nodes would not be seen by the OS at all (because > >> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and > >> all the OS can ever get is DT). > > > > It's getting tricky and I don't like a bit that we are trying to carter > > 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has > > a valid about guessing on QEMU side (that's usually a source of problem > > in the future). > > > > Perhaps we should reconsider and think about marking hotplugbbale RAM > > in DT and let firmware to exclude it from memory map. > > I'm fine either way. > > (I'm glad to continue discussing either option; that shouldn't be taken > as a preference on my end.) > > With option (2), please consider the new version dependency between QEMU > and the firmware -- this may or may not affect migration. (Thinking > about migration is difficult, so I'll leave that to you all :) ) I don't see any issues with migrations so far, it will change the size of DT but it's all new CLI so any existing machine should not have new options hence it would keep OLD DT size. > > Thanks > Laszlo > > >>>> (2) Invent and set an "ignore me, firmware" property for the > >>>> hotpluggable memory nodes in the DT, and update the firmware to honor > >>>> that property. > >>>> > >>>> Thanks > >>>> Laszlo > >>>> > >> > > >
> -----Original Message----- > From: Igor Mammedov [mailto:imammedo@redhat.com] > Sent: 08 April 2019 09:12 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: Laszlo Ersek <lersek@redhat.com>; Auger Eric <eric.auger@redhat.com>; > Ard Biesheuvel <ard.biesheuvel@linaro.org>; peter.maydell@linaro.org; > sameo@linux.intel.com; qemu-devel@nongnu.org; Linuxarm > <linuxarm@huawei.com>; shannon.zhaosl@gmail.com; > qemu-arm@nongnu.org; xuwei (O) <xuwei5@huawei.com>; > sebastien.boeuf@intel.com; Leif Lindholm <Leif.Lindholm@arm.com> > Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in > feature "fdt" [...] > > > > If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI > boot), > > > > then do we really have the issue of memory becoming non > > > hot-un-unpluggable? > > > > May be I am missing something. > > > > > > I think Igor and Peter dislike adding complex logic to QEMU that > > > reflects the behavior of a specific firmware. AIUI their objection isn't > > > that it wouldn't work, but that it's not the right thing to do, from a > > > design perspective. > > > > Understood. Hope we can converge on something soon. > Lets try adding a parameter to memory descriptors in DT that would mark > them as hotpluggable. Just send out v4 incorporating this. Please take a look and let me know. Thanks, Shameer
diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 13db0e9..b602151 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1717,6 +1717,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_fdt(Object *obj, Error **errp) +{ + VirtMachineState *vms = VIRT_MACHINE(obj); + + return vms->use_fdt; +} + +static void virt_set_fdt(Object *obj, bool value, Error **errp) +{ + VirtMachineState *vms = VIRT_MACHINE(obj); + + vms->use_fdt = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2005,6 +2019,15 @@ static void virt_instance_init(Object *obj) object_property_set_description(obj, "gic-version", "Set GIC version. " "Valid values are 2, 3 and host", NULL); + /* fdt is disabled by default */ + vms->use_fdt = false; + object_property_add_bool(obj, "fdt", virt_get_fdt, + virt_set_fdt, NULL); + object_property_set_description(obj, "fdt", + "Set on/off to enable/disable device tree " + "nodes in case any conflict with ACPI" + "(eg: device memory node)", + NULL); vms->highmem_ecam = !vmc->no_highmem_ecam; diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index c5e4c96..14b2e0a 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -119,6 +119,7 @@ typedef struct { bool highmem_ecam; bool its; bool virt; + bool use_fdt; int32_t gic_version; VirtIOMMUType iommu; struct arm_boot_info bootinfo;
This is to disable/enable populating DT nodes in case any conflict with acpi tables. The default is "off". This will be used in subsequent patch where cold plug device-memory support is added for DT boot. If DT memory node support is added for cold-plugged device memory, those memory will be visible to Guest kernel via UEFI GetMemoryMap() and gets treated as early boot memory. Hence memory becomes non hot-un-unpluggable even if Guest is booted in ACPI mode. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> --- hw/arm/virt.c | 23 +++++++++++++++++++++++ include/hw/arm/virt.h | 1 + 2 files changed, 24 insertions(+) -- 2.7.4