diff mbox

[v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

Message ID 20161115015049.2735-1-lersek@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek Nov. 15, 2016, 1:50 a.m. UTC
The generic edk2 SMM infrastructure prefers
EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If
Trigger() only brings the current processor into SMM, then edk2 handles it
in the following ways:

(1) If Trigger() is executed by the BSP (which is guaranteed before
    ExitBootServices(), but is not necessarily true at runtime), then:

    (a) If edk2 has been configured for "traditional" SMM synchronization,
        then the BSP sends directed SMIs to the APs with APIC delivery,
        bringing them into SMM individually. Then the BSP runs the SMI
        handler / dispatcher.

    (b) If edk2 has been configured for "relaxed" SMM synchronization,
        then the APs that are not already in SMM are not brought in, and
        the BSP runs the SMI handler / dispatcher.

(2) If Trigger() is executed by an AP (which is possible after
    ExitBootServices(), and can be forced e.g. by "taskset -c 1
    efibootmgr"), then the AP in question brings in the BSP with a
    directed SMI, and the BSP runs the SMI handler / dispatcher.

The smaller problem with (1a) and (2) is that the BSP and AP
synchronization is slow. For example, the "taskset -c 1 efibootmgr"
command from (2) can take more than 3 seconds to complete, because
efibootmgr accesses non-volatile UEFI variables intensively.

The larger problem is that QEMU's current behavior diverges from the
behavior usually seen on physical hardware, and that keeps exposing
obscure corner cases, race conditions and other instabilities in edk2,
which generally expects / prefers a software SMI to affect all CPUs at
once.

Therefore introduce a special APM_STS value (0x51) that causes QEMU to
inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger()
can utilize this to accommodate edk2's preference about "broadcast" SMI.

SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in
the SeaBIOS code), so this change should be transparent to it.

While the original posting of this patch
<http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>
only intended to speed up (2), based on our recent "stress testing" of SMM
this patch actually provides functional improvements. (There are no code
changes relative to the original posting.)

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>

---
 hw/isa/lpc_ich9.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

-- 
2.9.2

Comments

Paolo Bonzini Nov. 15, 2016, 1:59 p.m. UTC | #1
On 15/11/2016 02:50, Laszlo Ersek wrote:
> The generic edk2 SMM infrastructure prefers

> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If

> Trigger() only brings the current processor into SMM, then edk2 handles it

> in the following ways:

> 

> (1) If Trigger() is executed by the BSP (which is guaranteed before

>     ExitBootServices(), but is not necessarily true at runtime), then:

> 

>     (a) If edk2 has been configured for "traditional" SMM synchronization,

>         then the BSP sends directed SMIs to the APs with APIC delivery,

>         bringing them into SMM individually. Then the BSP runs the SMI

>         handler / dispatcher.

> 

>     (b) If edk2 has been configured for "relaxed" SMM synchronization,

>         then the APs that are not already in SMM are not brought in, and

>         the BSP runs the SMI handler / dispatcher.

> 

> (2) If Trigger() is executed by an AP (which is possible after

>     ExitBootServices(), and can be forced e.g. by "taskset -c 1

>     efibootmgr"), then the AP in question brings in the BSP with a

>     directed SMI, and the BSP runs the SMI handler / dispatcher.

> 

> The smaller problem with (1a) and (2) is that the BSP and AP

> synchronization is slow. For example, the "taskset -c 1 efibootmgr"

> command from (2) can take more than 3 seconds to complete, because

> efibootmgr accesses non-volatile UEFI variables intensively.

> 

> The larger problem is that QEMU's current behavior diverges from the

> behavior usually seen on physical hardware, and that keeps exposing

> obscure corner cases, race conditions and other instabilities in edk2,

> which generally expects / prefers a software SMI to affect all CPUs at

> once.

> 

> Therefore introduce a special APM_STS value (0x51) that causes QEMU to

> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger()

> can utilize this to accommodate edk2's preference about "broadcast" SMI.

> 

> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in

> the SeaBIOS code), so this change should be transparent to it.

> 

> While the original posting of this patch

> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>

> only intended to speed up (2), based on our recent "stress testing" of SMM

> this patch actually provides functional improvements. (There are no code

> changes relative to the original posting.)

> 

> Cc: "Michael S. Tsirkin" <mst@redhat.com>

> Cc: Paolo Bonzini <pbonzini@redhat.com>

> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com>

> Signed-off-by: Laszlo Ersek <lersek@redhat.com>


I'm queuing this for 2.8, but I have a question---should this feature be
detectable, and if so how?

Paolo

> ---

>  hw/isa/lpc_ich9.c | 12 +++++++++++-

>  1 file changed, 11 insertions(+), 1 deletion(-)

> 

> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

> index 10d1ee8b9310..f2fe644fdaa4 100644

> --- a/hw/isa/lpc_ich9.c

> +++ b/hw/isa/lpc_ich9.c

> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)

>  

>  /* APM */

>  

> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

> +

>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>  {

>      ICH9LPCState *lpc = arg;

> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>  

>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

> +            CPUState *cs;

> +

> +            CPU_FOREACH(cs) {

> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

> +            }

> +        } else {

> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> +        }

>      }

>  }

>  

>
Laszlo Ersek Nov. 15, 2016, 3:39 p.m. UTC | #2
On 11/15/16 14:59, Paolo Bonzini wrote:
> 

> 

> On 15/11/2016 02:50, Laszlo Ersek wrote:

>> The generic edk2 SMM infrastructure prefers

>> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If

>> Trigger() only brings the current processor into SMM, then edk2 handles it

>> in the following ways:

>>

>> (1) If Trigger() is executed by the BSP (which is guaranteed before

>>     ExitBootServices(), but is not necessarily true at runtime), then:

>>

>>     (a) If edk2 has been configured for "traditional" SMM synchronization,

>>         then the BSP sends directed SMIs to the APs with APIC delivery,

>>         bringing them into SMM individually. Then the BSP runs the SMI

>>         handler / dispatcher.

>>

>>     (b) If edk2 has been configured for "relaxed" SMM synchronization,

>>         then the APs that are not already in SMM are not brought in, and

>>         the BSP runs the SMI handler / dispatcher.

>>

>> (2) If Trigger() is executed by an AP (which is possible after

>>     ExitBootServices(), and can be forced e.g. by "taskset -c 1

>>     efibootmgr"), then the AP in question brings in the BSP with a

>>     directed SMI, and the BSP runs the SMI handler / dispatcher.

>>

>> The smaller problem with (1a) and (2) is that the BSP and AP

>> synchronization is slow. For example, the "taskset -c 1 efibootmgr"

>> command from (2) can take more than 3 seconds to complete, because

>> efibootmgr accesses non-volatile UEFI variables intensively.

>>

>> The larger problem is that QEMU's current behavior diverges from the

>> behavior usually seen on physical hardware, and that keeps exposing

>> obscure corner cases, race conditions and other instabilities in edk2,

>> which generally expects / prefers a software SMI to affect all CPUs at

>> once.

>>

>> Therefore introduce a special APM_STS value (0x51) that causes QEMU to

>> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger()

>> can utilize this to accommodate edk2's preference about "broadcast" SMI.

>>

>> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in

>> the SeaBIOS code), so this change should be transparent to it.

>>

>> While the original posting of this patch

>> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>

>> only intended to speed up (2), based on our recent "stress testing" of SMM

>> this patch actually provides functional improvements. (There are no code

>> changes relative to the original posting.)

>>

>> Cc: "Michael S. Tsirkin" <mst@redhat.com>

>> Cc: Paolo Bonzini <pbonzini@redhat.com>

>> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com>

>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>

> 

> I'm queuing this for 2.8,


Thank you!

> but I have a question---should this feature be

> detectable, and if so how?


That's the exact question I wanted to ask you. :)

... How about an fw_cfg file? For example:

- name: etc/broadcast_smi
- value type: uint8_t
- role: if the file exists, the broadcast SMI capability exists. Read
the uint8_t value from the fw_cfg file. Write the uint8_t value read to
ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to
request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so
if the fw_cfg file exists, those values will never be provided.

This would allow me to make the OVMF patches more robust:
- first, I don't have to hardcode the value 'Q' in SmmControl2Dxe,
- second, I can check for the presence of "etc/broadcast_smi" in
PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode
dynamically.

I would quite like this approach, as simply reverting the
PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break
SMM hard on QEMUs that do not support the broadcast SMI feature. The
default would remain the current setting.

Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I
think (of course I wouldn't mind making 2.8 nevertheless).

Do you think we should make the broadcast SMI capability (and the
descriptor fw_cfg file) machine-type dependent? I do think so (if for
nothing else then for "rather safe than sorry"), but I always struggle
with this kind of question, so any advice is most welcome...

Thank you!
Laszlo

> 

> Paolo

> 

>> ---

>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

>>  1 file changed, 11 insertions(+), 1 deletion(-)

>>

>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

>> index 10d1ee8b9310..f2fe644fdaa4 100644

>> --- a/hw/isa/lpc_ich9.c

>> +++ b/hw/isa/lpc_ich9.c

>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)

>>  

>>  /* APM */

>>  

>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

>> +

>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>  {

>>      ICH9LPCState *lpc = arg;

>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>  

>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

>> +            CPUState *cs;

>> +

>> +            CPU_FOREACH(cs) {

>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

>> +            }

>> +        } else {

>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>> +        }

>>      }

>>  }

>>  

>>
Michael S. Tsirkin Nov. 15, 2016, 3:45 p.m. UTC | #3
On Tue, Nov 15, 2016 at 04:39:04PM +0100, Laszlo Ersek wrote:
> On 11/15/16 14:59, Paolo Bonzini wrote:

> > 

> > 

> > On 15/11/2016 02:50, Laszlo Ersek wrote:

> >> The generic edk2 SMM infrastructure prefers

> >> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If

> >> Trigger() only brings the current processor into SMM, then edk2 handles it

> >> in the following ways:

> >>

> >> (1) If Trigger() is executed by the BSP (which is guaranteed before

> >>     ExitBootServices(), but is not necessarily true at runtime), then:

> >>

> >>     (a) If edk2 has been configured for "traditional" SMM synchronization,

> >>         then the BSP sends directed SMIs to the APs with APIC delivery,

> >>         bringing them into SMM individually. Then the BSP runs the SMI

> >>         handler / dispatcher.

> >>

> >>     (b) If edk2 has been configured for "relaxed" SMM synchronization,

> >>         then the APs that are not already in SMM are not brought in, and

> >>         the BSP runs the SMI handler / dispatcher.

> >>

> >> (2) If Trigger() is executed by an AP (which is possible after

> >>     ExitBootServices(), and can be forced e.g. by "taskset -c 1

> >>     efibootmgr"), then the AP in question brings in the BSP with a

> >>     directed SMI, and the BSP runs the SMI handler / dispatcher.

> >>

> >> The smaller problem with (1a) and (2) is that the BSP and AP

> >> synchronization is slow. For example, the "taskset -c 1 efibootmgr"

> >> command from (2) can take more than 3 seconds to complete, because

> >> efibootmgr accesses non-volatile UEFI variables intensively.

> >>

> >> The larger problem is that QEMU's current behavior diverges from the

> >> behavior usually seen on physical hardware, and that keeps exposing

> >> obscure corner cases, race conditions and other instabilities in edk2,

> >> which generally expects / prefers a software SMI to affect all CPUs at

> >> once.

> >>

> >> Therefore introduce a special APM_STS value (0x51) that causes QEMU to

> >> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger()

> >> can utilize this to accommodate edk2's preference about "broadcast" SMI.

> >>

> >> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in

> >> the SeaBIOS code), so this change should be transparent to it.

> >>

> >> While the original posting of this patch

> >> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>

> >> only intended to speed up (2), based on our recent "stress testing" of SMM

> >> this patch actually provides functional improvements. (There are no code

> >> changes relative to the original posting.)

> >>

> >> Cc: "Michael S. Tsirkin" <mst@redhat.com>

> >> Cc: Paolo Bonzini <pbonzini@redhat.com>

> >> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com>

> >> Signed-off-by: Laszlo Ersek <lersek@redhat.com>

> > 

> > I'm queuing this for 2.8,

> 

> Thank you!

> 

> > but I have a question---should this feature be

> > detectable, and if so how?

> 

> That's the exact question I wanted to ask you. :)

> 

> ... How about an fw_cfg file? For example:

> 

> - name: etc/broadcast_smi

> - value type: uint8_t

> - role: if the file exists, the broadcast SMI capability exists. Read

> the uint8_t value from the fw_cfg file. Write the uint8_t value read to

> ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to

> request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so

> if the fw_cfg file exists, those values will never be provided.

> 

> This would allow me to make the OVMF patches more robust:

> - first, I don't have to hardcode the value 'Q' in SmmControl2Dxe,

> - second, I can check for the presence of "etc/broadcast_smi" in

> PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode

> dynamically.

> 

> I would quite like this approach, as simply reverting the

> PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break

> SMM hard on QEMUs that do not support the broadcast SMI feature. The

> default would remain the current setting.

> 

> Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I

> think (of course I wouldn't mind making 2.8 nevertheless).

> 

> Do you think we should make the broadcast SMI capability (and the

> descriptor fw_cfg file) machine-type dependent? I do think so (if for

> nothing else then for "rather safe than sorry"), but I always struggle

> with this kind of question, so any advice is most welcome...

> 

> Thank you!

> Laszlo


Hmm it's a bugfix so why not just backport the fix?
I don't see why do we want work-arounds in UEFI -
does not seem much easier.


> > 

> > Paolo

> > 

> >> ---

> >>  hw/isa/lpc_ich9.c | 12 +++++++++++-

> >>  1 file changed, 11 insertions(+), 1 deletion(-)

> >>

> >> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

> >> index 10d1ee8b9310..f2fe644fdaa4 100644

> >> --- a/hw/isa/lpc_ich9.c

> >> +++ b/hw/isa/lpc_ich9.c

> >> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)

> >>  

> >>  /* APM */

> >>  

> >> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

> >> +

> >>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

> >>  {

> >>      ICH9LPCState *lpc = arg;

> >> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

> >>  

> >>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

> >>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

> >> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

> >> +            CPUState *cs;

> >> +

> >> +            CPU_FOREACH(cs) {

> >> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

> >> +            }

> >> +        } else {

> >> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >> +        }

> >>      }

> >>  }

> >>  

> >>
Laszlo Ersek Nov. 15, 2016, 4:40 p.m. UTC | #4
On 11/15/16 16:45, Michael S. Tsirkin wrote:
> On Tue, Nov 15, 2016 at 04:39:04PM +0100, Laszlo Ersek wrote:

>> On 11/15/16 14:59, Paolo Bonzini wrote:

>>>

>>>

>>> On 15/11/2016 02:50, Laszlo Ersek wrote:

>>>> The generic edk2 SMM infrastructure prefers

>>>> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each processor. If

>>>> Trigger() only brings the current processor into SMM, then edk2 handles it

>>>> in the following ways:

>>>>

>>>> (1) If Trigger() is executed by the BSP (which is guaranteed before

>>>>     ExitBootServices(), but is not necessarily true at runtime), then:

>>>>

>>>>     (a) If edk2 has been configured for "traditional" SMM synchronization,

>>>>         then the BSP sends directed SMIs to the APs with APIC delivery,

>>>>         bringing them into SMM individually. Then the BSP runs the SMI

>>>>         handler / dispatcher.

>>>>

>>>>     (b) If edk2 has been configured for "relaxed" SMM synchronization,

>>>>         then the APs that are not already in SMM are not brought in, and

>>>>         the BSP runs the SMI handler / dispatcher.

>>>>

>>>> (2) If Trigger() is executed by an AP (which is possible after

>>>>     ExitBootServices(), and can be forced e.g. by "taskset -c 1

>>>>     efibootmgr"), then the AP in question brings in the BSP with a

>>>>     directed SMI, and the BSP runs the SMI handler / dispatcher.

>>>>

>>>> The smaller problem with (1a) and (2) is that the BSP and AP

>>>> synchronization is slow. For example, the "taskset -c 1 efibootmgr"

>>>> command from (2) can take more than 3 seconds to complete, because

>>>> efibootmgr accesses non-volatile UEFI variables intensively.

>>>>

>>>> The larger problem is that QEMU's current behavior diverges from the

>>>> behavior usually seen on physical hardware, and that keeps exposing

>>>> obscure corner cases, race conditions and other instabilities in edk2,

>>>> which generally expects / prefers a software SMI to affect all CPUs at

>>>> once.

>>>>

>>>> Therefore introduce a special APM_STS value (0x51) that causes QEMU to

>>>> inject the SMI on all VCPUs. OVMF's EFI_SMM_CONTROL2_PROTOCOL.Trigger()

>>>> can utilize this to accommodate edk2's preference about "broadcast" SMI.

>>>>

>>>> SeaBIOS uses values 0x00 and 0x01 for APM_STS (called PORT_SMI_STATUS in

>>>> the SeaBIOS code), so this change should be transparent to it.

>>>>

>>>> While the original posting of this patch

>>>> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>

>>>> only intended to speed up (2), based on our recent "stress testing" of SMM

>>>> this patch actually provides functional improvements. (There are no code

>>>> changes relative to the original posting.)

>>>>

>>>> Cc: "Michael S. Tsirkin" <mst@redhat.com>

>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>

>>>> Also-suggested-by: Paolo Bonzini <pbonzini@redhat.com>

>>>> Signed-off-by: Laszlo Ersek <lersek@redhat.com>

>>>

>>> I'm queuing this for 2.8,

>>

>> Thank you!

>>

>>> but I have a question---should this feature be

>>> detectable, and if so how?

>>

>> That's the exact question I wanted to ask you. :)

>>

>> ... How about an fw_cfg file? For example:

>>

>> - name: etc/broadcast_smi

>> - value type: uint8_t

>> - role: if the file exists, the broadcast SMI capability exists. Read

>> the uint8_t value from the fw_cfg file. Write the uint8_t value read to

>> ICH9_APM_STS first, before triggering the SMI via ICH9_APM_CNT, to

>> request a broadcast SMI. The values 0 and 1 are reserved for SeaBIOS, so

>> if the fw_cfg file exists, those values will never be provided.

>>

>> This would allow me to make the OVMF patches more robust:

>> - first, I don't have to hardcode the value 'Q' in SmmControl2Dxe,

>> - second, I can check for the presence of "etc/broadcast_smi" in

>> PlatformPei, and set PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode

>> dynamically.

>>

>> I would quite like this approach, as simply reverting the

>> PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode settings in OVMF will break

>> SMM hard on QEMUs that do not support the broadcast SMI feature. The

>> default would remain the current setting.

>>

>> Also, if we add the fw_cfg item, we don't need to rush this into 2.8 I

>> think (of course I wouldn't mind making 2.8 nevertheless).

>>

>> Do you think we should make the broadcast SMI capability (and the

>> descriptor fw_cfg file) machine-type dependent? I do think so (if for

>> nothing else then for "rather safe than sorry"), but I always struggle

>> with this kind of question, so any advice is most welcome...

>>

>> Thank you!

>> Laszlo

> 

> Hmm it's a bugfix so why not just backport the fix?

> I don't see why do we want work-arounds in UEFI -

> does not seem much easier.


If the consensus is that the patch is a QEMU bugfix (as opposed to a
feature) and that it is eligible for the currently supported upstream
stable branches, that's the best, no doubt.

For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
correct; when I was writing the OVMF docs, I must have misunderstood the
requirements and needlessly required 2.5+; 2.4+ should have been fine.)

Which means the fix should be backported as far as stable-2.4.

Should we proceed with that? CC'ing Mike Roth and the stable list.

Thanks!
Laszlo

> 

> 

>>>

>>> Paolo

>>>

>>>> ---

>>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

>>>>  1 file changed, 11 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

>>>> index 10d1ee8b9310..f2fe644fdaa4 100644

>>>> --- a/hw/isa/lpc_ich9.c

>>>> +++ b/hw/isa/lpc_ich9.c

>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)

>>>>  

>>>>  /* APM */

>>>>  

>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

>>>> +

>>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>>>  {

>>>>      ICH9LPCState *lpc = arg;

>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>>>  

>>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

>>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

>>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

>>>> +            CPUState *cs;

>>>> +

>>>> +            CPU_FOREACH(cs) {

>>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

>>>> +            }

>>>> +        } else {

>>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>> +        }

>>>>      }

>>>>  }

>>>>  

>>>>
Paolo Bonzini Nov. 16, 2016, 12:47 p.m. UTC | #5
> If the consensus is that the patch is a QEMU bugfix (as opposed to a

> feature) and that it is eligible for the currently supported upstream

> stable branches, that's the best, no doubt.


The currently supported upstream stable branches is just 2.7. :)

I'm okay with bending the rules and including it in 2.8, but it's
worrisome that you also needed to go back from relaxed to traditional
delivery, meaning that old QEMU + new OVMF will take ages to boot.

If this is the case, I still think this needs some kind of discovery
mechanism, unless OVMF can just say "things were too broken, stop
supporting SMM on QEMUs older than 2.8".

For example:

- OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
setting is used for the PCD; this would be backwards compatibility mode.

- we could have another magic 0xB2 value, which is implemented directly
in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
to detect the new feature.  It can fail to start if using traditional
AP and the new feature is not there.

By the way, in case OVMF needs to use SmmSwDispatch in the future, I
would make QEMU use broadcast behavior for all values in the 0x10-0xff
range, or something like that.

Paolo

> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The

> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually

> correct; when I was writing the OVMF docs, I must have misunderstood the

> requirements and needlessly required 2.5+; 2.4+ should have been fine.)

> 

> Which means the fix should be backported as far as stable-2.4.

> 

> Should we proceed with that? CC'ing Mike Roth and the stable list.

> 

> Thanks!

> Laszlo

> 

> > 

> > 

> >>>

> >>> Paolo

> >>>

> >>>> ---

> >>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

> >>>>  1 file changed, 11 insertions(+), 1 deletion(-)

> >>>>

> >>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

> >>>> index 10d1ee8b9310..f2fe644fdaa4 100644

> >>>> --- a/hw/isa/lpc_ich9.c

> >>>> +++ b/hw/isa/lpc_ich9.c

> >>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool

> >>>> smm_enabled)

> >>>>  

> >>>>  /* APM */

> >>>>  

> >>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

> >>>> +

> >>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

> >>>>  {

> >>>>      ICH9LPCState *lpc = arg;

> >>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,

> >>>> void *arg)

> >>>>  

> >>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

> >>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

> >>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

> >>>> +            CPUState *cs;

> >>>> +

> >>>> +            CPU_FOREACH(cs) {

> >>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

> >>>> +            }

> >>>> +        } else {

> >>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >>>> +        }

> >>>>      }

> >>>>  }

> >>>>  

> >>>>

> 

>
Michael S. Tsirkin Nov. 16, 2016, 1:18 p.m. UTC | #6
On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote:
> 

> > If the consensus is that the patch is a QEMU bugfix (as opposed to a

> > feature) and that it is eligible for the currently supported upstream

> > stable branches, that's the best, no doubt.

> 

> The currently supported upstream stable branches is just 2.7. :)

> 

> I'm okay with bending the rules and including it in 2.8, but it's

> worrisome that you also needed to go back from relaxed to traditional

> delivery, meaning that old QEMU + new OVMF will take ages to boot.

> 

> If this is the case, I still think this needs some kind of discovery

> mechanism, unless OVMF can just say "things were too broken, stop

> supporting SMM on QEMUs older than 2.8".

> 

> For example:

> 

> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP

> setting is used for the PCD; this would be backwards compatibility mode.

> 

> - we could have another magic 0xB2 value, which is implemented directly

> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> to detect the new feature.  It can fail to start if using traditional

> AP and the new feature is not there.


If we keep collecting these magic values, should architect it
and do a host/guest bitmap like virtio does?

> By the way, in case OVMF needs to use SmmSwDispatch in the future, I

> would make QEMU use broadcast behavior for all values in the 0x10-0xff

> range, or something like that.

> 

> Paolo


It bothers me with all these ideas is that it's PV.
Unavoidable?

> > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The

> > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually

> > correct; when I was writing the OVMF docs, I must have misunderstood the

> > requirements and needlessly required 2.5+; 2.4+ should have been fine.)

> > 

> > Which means the fix should be backported as far as stable-2.4.

> > 

> > Should we proceed with that? CC'ing Mike Roth and the stable list.

> > 

> > Thanks!

> > Laszlo

> > 

> > > 

> > > 

> > >>>

> > >>> Paolo

> > >>>

> > >>>> ---

> > >>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

> > >>>>  1 file changed, 11 insertions(+), 1 deletion(-)

> > >>>>

> > >>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

> > >>>> index 10d1ee8b9310..f2fe644fdaa4 100644

> > >>>> --- a/hw/isa/lpc_ich9.c

> > >>>> +++ b/hw/isa/lpc_ich9.c

> > >>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool

> > >>>> smm_enabled)

> > >>>>  

> > >>>>  /* APM */

> > >>>>  

> > >>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

> > >>>> +

> > >>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

> > >>>>  {

> > >>>>      ICH9LPCState *lpc = arg;

> > >>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,

> > >>>> void *arg)

> > >>>>  

> > >>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

> > >>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

> > >>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> > >>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

> > >>>> +            CPUState *cs;

> > >>>> +

> > >>>> +            CPU_FOREACH(cs) {

> > >>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

> > >>>> +            }

> > >>>> +        } else {

> > >>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> > >>>> +        }

> > >>>>      }

> > >>>>  }

> > >>>>  

> > >>>>

> > 

> >
Paolo Bonzini Nov. 16, 2016, 2:05 p.m. UTC | #7
On 16/11/2016 14:18, Michael S. Tsirkin wrote:
> > - we could have another magic 0xB2 value, which is implemented directly

> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> > to detect the new feature.  It can fail to start if using traditional

> > AP and the new feature is not there.

> 

> If we keep collecting these magic values, should architect it

> and do a host/guest bitmap like virtio does?


The value written in 0xB3 can certainly be a feature bitmap.  For now we
would have for example

bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI
bit 1-7	zero

Paolo
Laszlo Ersek Nov. 16, 2016, 5:37 p.m. UTC | #8
On 11/16/16 13:47, Paolo Bonzini wrote:
> 

>> If the consensus is that the patch is a QEMU bugfix (as opposed to a

>> feature) and that it is eligible for the currently supported upstream

>> stable branches, that's the best, no doubt.

> 

> The currently supported upstream stable branches is just 2.7. :)

> 

> I'm okay with bending the rules and including it in 2.8, but it's

> worrisome that you also needed to go back from relaxed to traditional

> delivery, meaning that old QEMU + new OVMF will take ages to boot.

> 

> If this is the case, I still think this needs some kind of discovery

> mechanism, unless OVMF can just say "things were too broken, stop

> supporting SMM on QEMUs older than 2.8".

> 

> For example:

> 

> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP

> setting is used for the PCD; this would be backwards compatibility mode.


Okay, but this still means that the PCD has to become dynamic, and we
must set the PCD earlier (likely in PlatformPei) based on something.

I guess that's what the next paragraph is about:

> - we could have another magic 0xB2 value, which is implemented directly

> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> to detect the new feature.  It can fail to start if using traditional

> AP and the new feature is not there.


Please explain in more detail. If I write to 0xB2 (by invoking the
Trigger() method or somehow else), then on old QEMU's that will raise a
sync / unicast SMI. The SMI handler in edk2 will run, but no request
parameters will have been set up by OVMF, so the SMI handler will do...
no clue what. I don't think this is a good idea.

My preference is fw_cfg ATM. It provides a prove, flexible and
extensible interface (it's easy to add new files for future features).
If we expect more knobs in the area, I can modify my proposal to use
"etc/smi/broadcast", so we can add "etc/smi/XXXX" later.

Do you have any specific arguments against fw_cfg? As I suggested in my
previous email, with fw_cfg I can implement the change in OVMF such that
the default behavior wouldn't change -- the default delivery would
remain relaxed, and the broadcast wouldn't be requested, unless the
fw_cfg file told OVMF otherwise.

> By the way, in case OVMF needs to use SmmSwDispatch in the future, I

> would make QEMU use broadcast behavior for all values in the 0x10-0xff

> range, or something like that.


Are we talking control/command (0xB2) or scratch/data (0xB3) register
values? My patches currently use the scratch/data register to provide
the hint to QEMU; that register is less likely to interfere with
anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch
uses command/control values to distinguish the called functions. Should
we keep the broadcast / unicast decision separate from the
control/command value ?

Thanks
Laszlo

> 

> Paolo

> 

>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The

>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually

>> correct; when I was writing the OVMF docs, I must have misunderstood the

>> requirements and needlessly required 2.5+; 2.4+ should have been fine.)

>>

>> Which means the fix should be backported as far as stable-2.4.

>>

>> Should we proceed with that? CC'ing Mike Roth and the stable list.

>>

>> Thanks!

>> Laszlo

>>

>>>

>>>

>>>>>

>>>>> Paolo

>>>>>

>>>>>> ---

>>>>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)

>>>>>>

>>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

>>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644

>>>>>> --- a/hw/isa/lpc_ich9.c

>>>>>> +++ b/hw/isa/lpc_ich9.c

>>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool

>>>>>> smm_enabled)

>>>>>>  

>>>>>>  /* APM */

>>>>>>  

>>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

>>>>>> +

>>>>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>>>>>  {

>>>>>>      ICH9LPCState *lpc = arg;

>>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,

>>>>>> void *arg)

>>>>>>  

>>>>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

>>>>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

>>>>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

>>>>>> +            CPUState *cs;

>>>>>> +

>>>>>> +            CPU_FOREACH(cs) {

>>>>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

>>>>>> +            }

>>>>>> +        } else {

>>>>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>>>> +        }

>>>>>>      }

>>>>>>  }

>>>>>>  

>>>>>>

>>

>>
Laszlo Ersek Nov. 16, 2016, 5:56 p.m. UTC | #9
On 11/16/16 14:18, Michael S. Tsirkin wrote:
> On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote:

>>

>>> If the consensus is that the patch is a QEMU bugfix (as opposed to a

>>> feature) and that it is eligible for the currently supported upstream

>>> stable branches, that's the best, no doubt.

>>

>> The currently supported upstream stable branches is just 2.7. :)

>>

>> I'm okay with bending the rules and including it in 2.8, but it's

>> worrisome that you also needed to go back from relaxed to traditional

>> delivery, meaning that old QEMU + new OVMF will take ages to boot.

>>

>> If this is the case, I still think this needs some kind of discovery

>> mechanism, unless OVMF can just say "things were too broken, stop

>> supporting SMM on QEMUs older than 2.8".

>>

>> For example:

>>

>> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP

>> setting is used for the PCD; this would be backwards compatibility mode.

>>

>> - we could have another magic 0xB2 value, which is implemented directly

>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>> to detect the new feature.  It can fail to start if using traditional

>> AP and the new feature is not there.

> 

> If we keep collecting these magic values, should architect it

> and do a host/guest bitmap like virtio does?


A feature bitmap is not a bad idea; I can modify my proposal to say,
'"etc/smi/features" is a little-endian uint64_t feature bitmap, where
bit #0 is the availability of broadcast SMIs. Request it by writing 'Q'
to STS before triggering an SMI via writing CNT'.

Another example where we use a feature bitmap is fw_cfg itself (the DMA
capability is signaled by bit 1).

However, feature *negotiation* is overkill, in my opinion.

> 

>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I

>> would make QEMU use broadcast behavior for all values in the 0x10-0xff

>> range, or something like that.

>>

>> Paolo

> 

> It bothers me with all these ideas is that it's PV.

> Unavoidable?


It seems so, yes -- as I understand it, the software-initiated SMI on
bare metal Q35 is meant to be broadcast unconditionally, but we had
diverged from that in our Q35 implementation, historically. SeaBIOS came
to rely on the unicast nature of QEMU's SMI (AIUI) and now we have to
invent a way to select the non-historical broadcast.

(

BTW, I foresee further Frankensteinization of Q35, as the maximum amount
of SMRAM (TSEG) it provides, by spec, is 8MB, and that might not be
enough for a very large VCPU count.

(The SMM stack was originally tested against 255 VCPUs, yes, but the
VCPU max continues to grow, plus edk2 developers keep adding SMM
features that require more SMRAM -- sometimes more SMRAM even per CPU.)

We have one unused bit pattern left in the TSEG_SZ bit field of the
ESMRAMC register, namely binary 11, which stands for "reserved". We
might want to commandeer that down the line, and associate a really
large SMRAM / TSEG size with it -- 128MB or 256MB, for example. Or, we
could use it to signal some other way for TSEG size configuration.

The TSEG is carved out of the end of the <4GB RAM, so larger TSEGs than
8MB should fit, as long as the guest is started with enough memory.

Anyway, I digress...

)

Thanks
Laszlo

> 

>>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The

>>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually

>>> correct; when I was writing the OVMF docs, I must have misunderstood the

>>> requirements and needlessly required 2.5+; 2.4+ should have been fine.)

>>>

>>> Which means the fix should be backported as far as stable-2.4.

>>>

>>> Should we proceed with that? CC'ing Mike Roth and the stable list.

>>>

>>> Thanks!

>>> Laszlo

>>>

>>>>

>>>>

>>>>>>

>>>>>> Paolo

>>>>>>

>>>>>>> ---

>>>>>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

>>>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)

>>>>>>>

>>>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

>>>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644

>>>>>>> --- a/hw/isa/lpc_ich9.c

>>>>>>> +++ b/hw/isa/lpc_ich9.c

>>>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool

>>>>>>> smm_enabled)

>>>>>>>  

>>>>>>>  /* APM */

>>>>>>>  

>>>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

>>>>>>> +

>>>>>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

>>>>>>>  {

>>>>>>>      ICH9LPCState *lpc = arg;

>>>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,

>>>>>>> void *arg)

>>>>>>>  

>>>>>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

>>>>>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

>>>>>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>>>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

>>>>>>> +            CPUState *cs;

>>>>>>> +

>>>>>>> +            CPU_FOREACH(cs) {

>>>>>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

>>>>>>> +            }

>>>>>>> +        } else {

>>>>>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

>>>>>>> +        }

>>>>>>>      }

>>>>>>>  }

>>>>>>>  

>>>>>>>

>>>

>>>
Laszlo Ersek Nov. 16, 2016, 6:03 p.m. UTC | #10
On 11/16/16 15:05, Paolo Bonzini wrote:
> 

> 

> On 16/11/2016 14:18, Michael S. Tsirkin wrote:

>>> - we could have another magic 0xB2 value, which is implemented directly

>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>>> to detect the new feature.  It can fail to start if using traditional

>>> AP and the new feature is not there.

>>

>> If we keep collecting these magic values, should architect it

>> and do a host/guest bitmap like virtio does?

> 

> The value written in 0xB3 can certainly be a feature bitmap.  For now we

> would have for example

> 

> bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI

> bit 1-7	zero


Doable, but:
- doesn't address how OVMF learns about the broadcast SMI availability,
- the command value OVMF currently writes is 0.

How about this:
- etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0
stands for broadcast SMI availability
- 0xB2 is the command value (independent of 0xB3)
- 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS
reserves bit#0 already (uses values 0 and 1), so we can use the
remaining 7 bits for requesting features. Bit#1 (value 2) could be the
broadcast SMI.

This does resemble a kind of feature negotiation, except the host cannot
signal back an error (unsupported combination of features), like
virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

Thanks
Laszlo
Paolo Bonzini Nov. 16, 2016, 6:04 p.m. UTC | #11
> I guess that's what the next paragraph is about:

> 

> > - we could have another magic 0xB2 value, which is implemented directly

> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> > to detect the new feature.  It can fail to start if using traditional

> > AP and the new feature is not there.

> 

> Please explain in more detail. If I write to 0xB2 (by invoking the

> Trigger() method or somehow else), then on old QEMU's that will raise a

> sync / unicast SMI. The SMI handler in edk2 will run, but no request

> parameters will have been set up by OVMF, so the SMI handler will do...

> no clue what.


It should hopefully do nothing.  A spurious SMI (such as the one caused
by the write to 0xB2) should not crash OVMF.

SMBASE relocation uses IPIs, so my hope was to use the
SmmCpuFeaturesSmmRelocationComplete hook.

> My preference is fw_cfg ATM. It provides a prove, flexible and

> extensible interface (it's easy to add new files for future features).

> If we expect more knobs in the area, I can modify my proposal to use

> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later.


Did you know there are 16 entries only for fw_cfg files? :)  And we're
using already 20 in the worst case:

    genroms/linuxboot.bin
    genroms/kvmvapic.bin
    NVDIMM_DSM_MEM_FILE
    "etc/smbios/smbios-tables"
    "etc/smbios/smbios-anchor"
    "etc/acpi/tables"
    "etc/table-loader"
    ACPI_BUILD_TPMLOG_FILE
    ACPI_BUILD_RSDP_FILE
    "etc/e820"
    "etc/msr_feature_control"
    "etc/reserved-memory-end"
    "etc/pvpanic-port"
    "etc/boot-menu-wait"
    "bootsplash.jpg"
    "etc/boot-fail-wait"
    "etc/igd-opregion"
    "etc/igd-bdsm-size"
    "etc/extra-pci-roots"
    "bootorder"

Therefore, so close to the release I'm a bit worried about doing
changes to fw_cfg or adding more fw_cfg files.  Though we just got
rid of one file for the number of CPUs, so I guess we might not care.

> Do you have any specific arguments against fw_cfg? As I suggested in my

> previous email, with fw_cfg I can implement the change in OVMF such that

> the default behavior wouldn't change -- the default delivery would

> remain relaxed, and the broadcast wouldn't be requested, unless the

> fw_cfg file told OVMF otherwise.

> 

> > By the way, in case OVMF needs to use SmmSwDispatch in the future, I

> > would make QEMU use broadcast behavior for all values in the 0x10-0xff

> > range, or something like that.

> 

> Are we talking control/command (0xB2) or scratch/data (0xB3) register

> values? My patches currently use the scratch/data register to provide

> the hint to QEMU; that register is less likely to interfere with

> anything the SMM core in edk2 does.


Sorry I confused the two registers.  0xb3 is more or less unused as far
as I can see indeed.

Paolo
Laszlo Ersek Nov. 16, 2016, 6:50 p.m. UTC | #12
On 11/16/16 19:04, Paolo Bonzini wrote:
>> I guess that's what the next paragraph is about:

>>

>>> - we could have another magic 0xB2 value, which is implemented directly

>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>>> to detect the new feature.  It can fail to start if using traditional

>>> AP and the new feature is not there.

>>

>> Please explain in more detail. If I write to 0xB2 (by invoking the

>> Trigger() method or somehow else), then on old QEMU's that will raise a

>> sync / unicast SMI. The SMI handler in edk2 will run, but no request

>> parameters will have been set up by OVMF, so the SMI handler will do...

>> no clue what.

> 

> It should hopefully do nothing.  A spurious SMI (such as the one caused

> by the write to 0xB2) should not crash OVMF.

> 

> SMBASE relocation uses IPIs, so my hope was to use the

> SmmCpuFeaturesSmmRelocationComplete hook.


From a cursory look, SmmCpuFeaturesSmmRelocationComplete() seems to be
called early enough from PiSmmCpuDxeSmm that we might be able to call
PcdSet() from it, for updating PcdCpuSmmApSyncTimeout and
PcdCpuSmmSyncMode. I perceive it a bit too close to the edge :)

>> My preference is fw_cfg ATM. It provides a prove, flexible and

>> extensible interface (it's easy to add new files for future features).

>> If we expect more knobs in the area, I can modify my proposal to use

>> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later.

> 

> Did you know there are 16 entries only for fw_cfg files? :)


Yes, I've known that, but it can be changed by redefining
FW_CFG_FILE_SLOTS, can't it? The key type for fw_cfg is uint16_t, so we
should have some reserves.

> And we're

> using already 20 in the worst case:

> 

>     genroms/linuxboot.bin

>     genroms/kvmvapic.bin

>     NVDIMM_DSM_MEM_FILE

>     "etc/smbios/smbios-tables"

>     "etc/smbios/smbios-anchor"

>     "etc/acpi/tables"

>     "etc/table-loader"

>     ACPI_BUILD_TPMLOG_FILE

>     ACPI_BUILD_RSDP_FILE

>     "etc/e820"

>     "etc/msr_feature_control"

>     "etc/reserved-memory-end"

>     "etc/pvpanic-port"

>     "etc/boot-menu-wait"

>     "bootsplash.jpg"

>     "etc/boot-fail-wait"

>     "etc/igd-opregion"

>     "etc/igd-bdsm-size"

>     "etc/extra-pci-roots"

>     "bootorder"

> 

> Therefore, so close to the release I'm a bit worried about doing

> changes to fw_cfg or adding more fw_cfg files.  Though we just got

> rid of one file for the number of CPUs, so I guess we might not care.


I agree with your caution about this. I'm also perfectly fine if this
update misses 2.8. :)

> 

>> Do you have any specific arguments against fw_cfg? As I suggested in my

>> previous email, with fw_cfg I can implement the change in OVMF such that

>> the default behavior wouldn't change -- the default delivery would

>> remain relaxed, and the broadcast wouldn't be requested, unless the

>> fw_cfg file told OVMF otherwise.

>>

>>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I

>>> would make QEMU use broadcast behavior for all values in the 0x10-0xff

>>> range, or something like that.

>>

>> Are we talking control/command (0xB2) or scratch/data (0xB3) register

>> values? My patches currently use the scratch/data register to provide

>> the hint to QEMU; that register is less likely to interfere with

>> anything the SMM core in edk2 does.

> 

> Sorry I confused the two registers.  0xb3 is more or less unused as far

> as I can see indeed.


Thanks
Laszlo
Michael S. Tsirkin Nov. 16, 2016, 8:27 p.m. UTC | #13
On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote:
> On 11/16/16 15:05, Paolo Bonzini wrote:

> > 

> > 

> > On 16/11/2016 14:18, Michael S. Tsirkin wrote:

> >>> - we could have another magic 0xB2 value, which is implemented directly

> >>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> >>> to detect the new feature.  It can fail to start if using traditional

> >>> AP and the new feature is not there.

> >>

> >> If we keep collecting these magic values, should architect it

> >> and do a host/guest bitmap like virtio does?

> > 

> > The value written in 0xB3 can certainly be a feature bitmap.  For now we

> > would have for example

> > 

> > bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI

> > bit 1-7	zero

> 

> Doable, but:

> - doesn't address how OVMF learns about the broadcast SMI availability,

> - the command value OVMF currently writes is 0.

> 

> How about this:

> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0

> stands for broadcast SMI availability

> - 0xB2 is the command value (independent of 0xB3)

> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS

> reserves bit#0 already (uses values 0 and 1), so we can use the

> remaining 7 bits for requesting features. Bit#1 (value 2) could be the

> broadcast SMI.

> 

> This does resemble a kind of feature negotiation, except the host cannot

> signal back an error (unsupported combination of features), like

> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

> 

> Thanks

> Laszlo


I think that if you are going to do it, do it like 1.0:
- same bitmap for host and guest. how about a writeable fw cfg file?
- use 0XB3 bit for FEATURES_OK

-- 
MST
Michael S. Tsirkin Nov. 16, 2016, 8:32 p.m. UTC | #14
On Wed, Nov 16, 2016 at 06:37:30PM +0100, Laszlo Ersek wrote:
> On 11/16/16 13:47, Paolo Bonzini wrote:

> > 

> >> If the consensus is that the patch is a QEMU bugfix (as opposed to a

> >> feature) and that it is eligible for the currently supported upstream

> >> stable branches, that's the best, no doubt.

> > 

> > The currently supported upstream stable branches is just 2.7. :)

> > 

> > I'm okay with bending the rules and including it in 2.8, but it's

> > worrisome that you also needed to go back from relaxed to traditional

> > delivery, meaning that old QEMU + new OVMF will take ages to boot.

> > 

> > If this is the case, I still think this needs some kind of discovery

> > mechanism, unless OVMF can just say "things were too broken, stop

> > supporting SMM on QEMUs older than 2.8".

> > 

> > For example:

> > 

> > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP

> > setting is used for the PCD; this would be backwards compatibility mode.

> 

> Okay, but this still means that the PCD has to become dynamic, and we

> must set the PCD earlier (likely in PlatformPei) based on something.

> 

> I guess that's what the next paragraph is about:

> 

> > - we could have another magic 0xB2 value, which is implemented directly

> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> > to detect the new feature.  It can fail to start if using traditional

> > AP and the new feature is not there.

> 

> Please explain in more detail. If I write to 0xB2 (by invoking the

> Trigger() method or somehow else), then on old QEMU's that will raise a

> sync / unicast SMI. The SMI handler in edk2 will run, but no request

> parameters will have been set up by OVMF, so the SMI handler will do...

> no clue what. I don't think this is a good idea.

> 

> My preference is fw_cfg ATM. It provides a prove, flexible and

> extensible interface (it's easy to add new files for future features).

> If we expect more knobs in the area, I can modify my proposal to use

> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later.

> 

> Do you have any specific arguments against fw_cfg? As I suggested in my

> previous email, with fw_cfg I can implement the change in OVMF such that

> the default behavior wouldn't change -- the default delivery would

> remain relaxed, and the broadcast wouldn't be requested, unless the

> fw_cfg file told OVMF otherwise.


Only thing is, I think it's a good idea in the future to be able
to build OVMF without legacy QEMU support. E.g. there are all
people that want to speed up boot.
Add some ifdefs in code for that?
And add comments to document which version needs these hacks.



> > By the way, in case OVMF needs to use SmmSwDispatch in the future, I

> > would make QEMU use broadcast behavior for all values in the 0x10-0xff

> > range, or something like that.

> 

> Are we talking control/command (0xB2) or scratch/data (0xB3) register

> values? My patches currently use the scratch/data register to provide

> the hint to QEMU; that register is less likely to interfere with

> anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch

> uses command/control values to distinguish the called functions. Should

> we keep the broadcast / unicast decision separate from the

> control/command value ?

> 

> Thanks

> Laszlo

> 

> > 

> > Paolo

> > 

> >> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The

> >> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually

> >> correct; when I was writing the OVMF docs, I must have misunderstood the

> >> requirements and needlessly required 2.5+; 2.4+ should have been fine.)

> >>

> >> Which means the fix should be backported as far as stable-2.4.

> >>

> >> Should we proceed with that? CC'ing Mike Roth and the stable list.

> >>

> >> Thanks!

> >> Laszlo

> >>

> >>>

> >>>

> >>>>>

> >>>>> Paolo

> >>>>>

> >>>>>> ---

> >>>>>>  hw/isa/lpc_ich9.c | 12 +++++++++++-

> >>>>>>  1 file changed, 11 insertions(+), 1 deletion(-)

> >>>>>>

> >>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c

> >>>>>> index 10d1ee8b9310..f2fe644fdaa4 100644

> >>>>>> --- a/hw/isa/lpc_ich9.c

> >>>>>> +++ b/hw/isa/lpc_ich9.c

> >>>>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool

> >>>>>> smm_enabled)

> >>>>>>  

> >>>>>>  /* APM */

> >>>>>>  

> >>>>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'

> >>>>>> +

> >>>>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)

> >>>>>>  {

> >>>>>>      ICH9LPCState *lpc = arg;

> >>>>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,

> >>>>>> void *arg)

> >>>>>>  

> >>>>>>      /* SMI_EN = PMBASE + 30. SMI control and enable register */

> >>>>>>      if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {

> >>>>>> -        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >>>>>> +        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {

> >>>>>> +            CPUState *cs;

> >>>>>> +

> >>>>>> +            CPU_FOREACH(cs) {

> >>>>>> +                cpu_interrupt(cs, CPU_INTERRUPT_SMI);

> >>>>>> +            }

> >>>>>> +        } else {

> >>>>>> +            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);

> >>>>>> +        }

> >>>>>>      }

> >>>>>>  }

> >>>>>>  

> >>>>>>

> >>

> >>
Michael S. Tsirkin Nov. 16, 2016, 8:38 p.m. UTC | #15
On Wed, Nov 16, 2016 at 01:04:00PM -0500, Paolo Bonzini wrote:
> > I guess that's what the next paragraph is about:

> > 

> > > - we could have another magic 0xB2 value, which is implemented directly

> > > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> > > to detect the new feature.  It can fail to start if using traditional

> > > AP and the new feature is not there.

> > 

> > Please explain in more detail. If I write to 0xB2 (by invoking the

> > Trigger() method or somehow else), then on old QEMU's that will raise a

> > sync / unicast SMI. The SMI handler in edk2 will run, but no request

> > parameters will have been set up by OVMF, so the SMI handler will do...

> > no clue what.

> 

> It should hopefully do nothing.  A spurious SMI (such as the one caused

> by the write to 0xB2) should not crash OVMF.

> 

> SMBASE relocation uses IPIs, so my hope was to use the

> SmmCpuFeaturesSmmRelocationComplete hook.

> 

> > My preference is fw_cfg ATM. It provides a prove, flexible and

> > extensible interface (it's easy to add new files for future features).

> > If we expect more knobs in the area, I can modify my proposal to use

> > "etc/smi/broadcast", so we can add "etc/smi/XXXX" later.

> 

> Did you know there are 16 entries only for fw_cfg files? :)  And we're

> using already 20 in the worst case:

> 

>     genroms/linuxboot.bin

>     genroms/kvmvapic.bin

>     NVDIMM_DSM_MEM_FILE

>     "etc/smbios/smbios-tables"

>     "etc/smbios/smbios-anchor"

>     "etc/acpi/tables"

>     "etc/table-loader"

>     ACPI_BUILD_TPMLOG_FILE

>     ACPI_BUILD_RSDP_FILE

>     "etc/e820"

>     "etc/msr_feature_control"

>     "etc/reserved-memory-end"

>     "etc/pvpanic-port"

>     "etc/boot-menu-wait"

>     "bootsplash.jpg"

>     "etc/boot-fail-wait"

>     "etc/igd-opregion"

>     "etc/igd-bdsm-size"

>     "etc/extra-pci-roots"

>     "bootorder"

> 

> Therefore, so close to the release I'm a bit worried about doing

> changes to fw_cfg or adding more fw_cfg files.


Indeed. Is an unconditional thing so bad?
What would be the observed behaviour with new OVMF on old QEMU?
Note you need to migrate during boot to notice this.

> Though we just got

> rid of one file for the number of CPUs, so I guess we might not care.

> 

> > Do you have any specific arguments against fw_cfg? As I suggested in my

> > previous email, with fw_cfg I can implement the change in OVMF such that

> > the default behavior wouldn't change -- the default delivery would

> > remain relaxed, and the broadcast wouldn't be requested, unless the

> > fw_cfg file told OVMF otherwise.

> > 

> > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I

> > > would make QEMU use broadcast behavior for all values in the 0x10-0xff

> > > range, or something like that.

> > 

> > Are we talking control/command (0xB2) or scratch/data (0xB3) register

> > values? My patches currently use the scratch/data register to provide

> > the hint to QEMU; that register is less likely to interfere with

> > anything the SMM core in edk2 does.

> 

> Sorry I confused the two registers.  0xb3 is more or less unused as far

> as I can see indeed.

> 

> Paolo
Laszlo Ersek Nov. 17, 2016, 9:26 a.m. UTC | #16
On 11/16/16 21:38, Michael S. Tsirkin wrote:
> On Wed, Nov 16, 2016 at 01:04:00PM -0500, Paolo Bonzini wrote:

>>> I guess that's what the next paragraph is about:

>>>

>>>> - we could have another magic 0xB2 value, which is implemented directly

>>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>>>> to detect the new feature.  It can fail to start if using traditional

>>>> AP and the new feature is not there.

>>>

>>> Please explain in more detail. If I write to 0xB2 (by invoking the

>>> Trigger() method or somehow else), then on old QEMU's that will raise a

>>> sync / unicast SMI. The SMI handler in edk2 will run, but no request

>>> parameters will have been set up by OVMF, so the SMI handler will do...

>>> no clue what.

>>

>> It should hopefully do nothing.  A spurious SMI (such as the one caused

>> by the write to 0xB2) should not crash OVMF.

>>

>> SMBASE relocation uses IPIs, so my hope was to use the

>> SmmCpuFeaturesSmmRelocationComplete hook.

>>

>>> My preference is fw_cfg ATM. It provides a prove, flexible and

>>> extensible interface (it's easy to add new files for future features).

>>> If we expect more knobs in the area, I can modify my proposal to use

>>> "etc/smi/broadcast", so we can add "etc/smi/XXXX" later.

>>

>> Did you know there are 16 entries only for fw_cfg files? :)  And we're

>> using already 20 in the worst case:

>>

>>     genroms/linuxboot.bin

>>     genroms/kvmvapic.bin

>>     NVDIMM_DSM_MEM_FILE

>>     "etc/smbios/smbios-tables"

>>     "etc/smbios/smbios-anchor"

>>     "etc/acpi/tables"

>>     "etc/table-loader"

>>     ACPI_BUILD_TPMLOG_FILE

>>     ACPI_BUILD_RSDP_FILE

>>     "etc/e820"

>>     "etc/msr_feature_control"

>>     "etc/reserved-memory-end"

>>     "etc/pvpanic-port"

>>     "etc/boot-menu-wait"

>>     "bootsplash.jpg"

>>     "etc/boot-fail-wait"

>>     "etc/igd-opregion"

>>     "etc/igd-bdsm-size"

>>     "etc/extra-pci-roots"

>>     "bootorder"

>>

>> Therefore, so close to the release I'm a bit worried about doing

>> changes to fw_cfg or adding more fw_cfg files.

> 

> Indeed. Is an unconditional thing so bad?

> What would be the observed behaviour with new OVMF on old QEMU?


The SMM stack would expect broadcast SMIs but only unicast SMIs would
occur. This would
- introduce very long delays in the handling on each SMI as the
synchronization algorithms time out and then force individual APs into
SMM by LAPIC writes,
- expose obscure synchronization bugs in the SMM stack, especially
during S3 resume.

The directed / unicast SMI model is less tested in edk2 and a number of
super-obscure corner cases remain.

Thanks
Laszlo

> Note you need to migrate during boot to notice this.

> 

>> Though we just got

>> rid of one file for the number of CPUs, so I guess we might not care.

>>

>>> Do you have any specific arguments against fw_cfg? As I suggested in my

>>> previous email, with fw_cfg I can implement the change in OVMF such that

>>> the default behavior wouldn't change -- the default delivery would

>>> remain relaxed, and the broadcast wouldn't be requested, unless the

>>> fw_cfg file told OVMF otherwise.

>>>

>>>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I

>>>> would make QEMU use broadcast behavior for all values in the 0x10-0xff

>>>> range, or something like that.

>>>

>>> Are we talking control/command (0xB2) or scratch/data (0xB3) register

>>> values? My patches currently use the scratch/data register to provide

>>> the hint to QEMU; that register is less likely to interfere with

>>> anything the SMM core in edk2 does.

>>

>> Sorry I confused the two registers.  0xb3 is more or less unused as far

>> as I can see indeed.

>>

>> Paolo
Laszlo Ersek Nov. 17, 2016, 1:16 p.m. UTC | #17
On 11/16/16 21:27, Michael S. Tsirkin wrote:
> On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote:

>> On 11/16/16 15:05, Paolo Bonzini wrote:

>>>

>>>

>>> On 16/11/2016 14:18, Michael S. Tsirkin wrote:

>>>>> - we could have another magic 0xB2 value, which is implemented directly

>>>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>>>>> to detect the new feature.  It can fail to start if using traditional

>>>>> AP and the new feature is not there.

>>>>

>>>> If we keep collecting these magic values, should architect it

>>>> and do a host/guest bitmap like virtio does?

>>>

>>> The value written in 0xB3 can certainly be a feature bitmap.  For now we

>>> would have for example

>>>

>>> bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI

>>> bit 1-7	zero

>>

>> Doable, but:

>> - doesn't address how OVMF learns about the broadcast SMI availability,

>> - the command value OVMF currently writes is 0.

>>

>> How about this:

>> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0

>> stands for broadcast SMI availability

>> - 0xB2 is the command value (independent of 0xB3)

>> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS

>> reserves bit#0 already (uses values 0 and 1), so we can use the

>> remaining 7 bits for requesting features. Bit#1 (value 2) could be the

>> broadcast SMI.

>>

>> This does resemble a kind of feature negotiation, except the host cannot

>> signal back an error (unsupported combination of features), like

>> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

>>

>> Thanks

>> Laszlo

> 

> I think that if you are going to do it, do it like 1.0:

> - same bitmap for host and guest. how about a writeable fw cfg file?


fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac
and 6cec43e178cde).

How about this alternative, in STS:
- bit 0: read and written transparently
- bit 1: on write:
         0 -- set features in bits 2-7
         1 -- query host features into bits 2-7
         on read:
         - after querying features:
           - reads back as 0 if the interface is supported
           - reads back as 1 if the interface is missing
         - after setting features:
           - reads back as 0 if the feature subset is valid
           - reads back as 1 otherwise
- bit 2: on write:
         - when setting features: request broadcast SMI
         - when querying features: ignored
         on read:
         - after setting features: zero
         - after querying features: broadcast SMI availability (1 if
           available)

- bit 3-7: future features (I think 5 more features for SMI handling
           should suffice), working similarly to bit 2

SeaBIOS writes values 0x00 and 0x01, and expects to find the same when
reading back. Bit pattern 0000_000?b  translates to "clear all
features", which always succeeds and results in behavior identical to
the current one, hence bits 1-7 read back as zero.

OVMF:
- write 0x02, read back value:
  - if bit 1 is set, interface is missing
  - otherwise feature bitmap was returned in bits 2-7
- select requested features in bits 2-7, set bit 1 to 0, write value,
  read back value
  - if bit 1 is set, the feature subset is invalid
  - okay otherwise

Thanks
Laszlo

> - use 0XB3 bit for FEATURES_OK

>
Michael S. Tsirkin Nov. 17, 2016, 5:46 p.m. UTC | #18
On Thu, Nov 17, 2016 at 02:16:35PM +0100, Laszlo Ersek wrote:
> On 11/16/16 21:27, Michael S. Tsirkin wrote:

> > On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote:

> >> On 11/16/16 15:05, Paolo Bonzini wrote:

> >>>

> >>>

> >>> On 16/11/2016 14:18, Michael S. Tsirkin wrote:

> >>>>> - we could have another magic 0xB2 value, which is implemented directly

> >>>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

> >>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

> >>>>> to detect the new feature.  It can fail to start if using traditional

> >>>>> AP and the new feature is not there.

> >>>>

> >>>> If we keep collecting these magic values, should architect it

> >>>> and do a host/guest bitmap like virtio does?

> >>>

> >>> The value written in 0xB3 can certainly be a feature bitmap.  For now we

> >>> would have for example

> >>>

> >>> bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI

> >>> bit 1-7	zero

> >>

> >> Doable, but:

> >> - doesn't address how OVMF learns about the broadcast SMI availability,

> >> - the command value OVMF currently writes is 0.

> >>

> >> How about this:

> >> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0

> >> stands for broadcast SMI availability

> >> - 0xB2 is the command value (independent of 0xB3)

> >> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS

> >> reserves bit#0 already (uses values 0 and 1), so we can use the

> >> remaining 7 bits for requesting features. Bit#1 (value 2) could be the

> >> broadcast SMI.

> >>

> >> This does resemble a kind of feature negotiation, except the host cannot

> >> signal back an error (unsupported combination of features), like

> >> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

> >>

> >> Thanks

> >> Laszlo

> > 

> > I think that if you are going to do it, do it like 1.0:

> > - same bitmap for host and guest. how about a writeable fw cfg file?

> 

> fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac

> and 6cec43e178cde).

> 

> How about this alternative, in STS:

> - bit 0: read and written transparently

> - bit 1: on write:

>          0 -- set features in bits 2-7

>          1 -- query host features into bits 2-7

>          on read:

>          - after querying features:

>            - reads back as 0 if the interface is supported

>            - reads back as 1 if the interface is missing

>          - after setting features:

>            - reads back as 0 if the feature subset is valid

>            - reads back as 1 otherwise

> - bit 2: on write:

>          - when setting features: request broadcast SMI

>          - when querying features: ignored

>          on read:

>          - after setting features: zero

>          - after querying features: broadcast SMI availability (1 if

>            available)

> 

> - bit 3-7: future features (I think 5 more features for SMI handling

>            should suffice), working similarly to bit 2

> 

> SeaBIOS writes values 0x00 and 0x01, and expects to find the same when

> reading back. Bit pattern 0000_000?b  translates to "clear all

> features", which always succeeds and results in behavior identical to

> the current one, hence bits 1-7 read back as zero.

> 

> OVMF:

> - write 0x02, read back value:

>   - if bit 1 is set, interface is missing

>   - otherwise feature bitmap was returned in bits 2-7

> - select requested features in bits 2-7, set bit 1 to 0, write value,

>   read back value

>   - if bit 1 is set, the feature subset is invalid

>   - okay otherwise

> 

> Thanks

> Laszlo



It's all fine, or we can make fw cfg writeable again (I posted
a patch for that a while ago), but it's all a bit too much
for this release.

Let's just defer it, or do you have a better idea?

> > - use 0XB3 bit for FEATURES_OK

> >
Laszlo Ersek Nov. 17, 2016, 6:45 p.m. UTC | #19
On 11/17/16 18:46, Michael S. Tsirkin wrote:
> On Thu, Nov 17, 2016 at 02:16:35PM +0100, Laszlo Ersek wrote:

>> On 11/16/16 21:27, Michael S. Tsirkin wrote:

>>> On Wed, Nov 16, 2016 at 07:03:27PM +0100, Laszlo Ersek wrote:

>>>> On 11/16/16 15:05, Paolo Bonzini wrote:

>>>>>

>>>>>

>>>>> On 16/11/2016 14:18, Michael S. Tsirkin wrote:

>>>>>>> - we could have another magic 0xB2 value, which is implemented directly

>>>>>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it

>>>>>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)

>>>>>>> to detect the new feature.  It can fail to start if using traditional

>>>>>>> AP and the new feature is not there.

>>>>>>

>>>>>> If we keep collecting these magic values, should architect it

>>>>>> and do a host/guest bitmap like virtio does?

>>>>>

>>>>> The value written in 0xB3 can certainly be a feature bitmap.  For now we

>>>>> would have for example

>>>>>

>>>>> bit 0	if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI

>>>>> bit 1-7	zero

>>>>

>>>> Doable, but:

>>>> - doesn't address how OVMF learns about the broadcast SMI availability,

>>>> - the command value OVMF currently writes is 0.

>>>>

>>>> How about this:

>>>> - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0

>>>> stands for broadcast SMI availability

>>>> - 0xB2 is the command value (independent of 0xB3)

>>>> - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS

>>>> reserves bit#0 already (uses values 0 and 1), so we can use the

>>>> remaining 7 bits for requesting features. Bit#1 (value 2) could be the

>>>> broadcast SMI.

>>>>

>>>> This does resemble a kind of feature negotiation, except the host cannot

>>>> signal back an error (unsupported combination of features), like

>>>> virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

>>>>

>>>> Thanks

>>>> Laszlo

>>>

>>> I think that if you are going to do it, do it like 1.0:

>>> - same bitmap for host and guest. how about a writeable fw cfg file?

>>

>> fw_cfg files are not writeable since qemu 2.4 (see commits 023e3148567ac

>> and 6cec43e178cde).

>>

>> How about this alternative, in STS:

>> - bit 0: read and written transparently

>> - bit 1: on write:

>>          0 -- set features in bits 2-7

>>          1 -- query host features into bits 2-7

>>          on read:

>>          - after querying features:

>>            - reads back as 0 if the interface is supported

>>            - reads back as 1 if the interface is missing

>>          - after setting features:

>>            - reads back as 0 if the feature subset is valid

>>            - reads back as 1 otherwise

>> - bit 2: on write:

>>          - when setting features: request broadcast SMI

>>          - when querying features: ignored

>>          on read:

>>          - after setting features: zero

>>          - after querying features: broadcast SMI availability (1 if

>>            available)

>>

>> - bit 3-7: future features (I think 5 more features for SMI handling

>>            should suffice), working similarly to bit 2

>>

>> SeaBIOS writes values 0x00 and 0x01, and expects to find the same when

>> reading back. Bit pattern 0000_000?b  translates to "clear all

>> features", which always succeeds and results in behavior identical to

>> the current one, hence bits 1-7 read back as zero.

>>

>> OVMF:

>> - write 0x02, read back value:

>>   - if bit 1 is set, interface is missing

>>   - otherwise feature bitmap was returned in bits 2-7

>> - select requested features in bits 2-7, set bit 1 to 0, write value,

>>   read back value

>>   - if bit 1 is set, the feature subset is invalid

>>   - okay otherwise

>>

>> Thanks

>> Laszlo

> 

> 

> It's all fine, or we can make fw cfg writeable again (I posted

> a patch for that a while ago), but it's all a bit too much

> for this release.

> 

> Let's just defer it, or do you have a better idea?


I'm writing patches for the above proposal (including a document under
docs/specs/), and I plan to post them soon. They're definitely 2.9
material though -- I don't mind if I have to wait a bit even just to get
feedback on those patches :)

So, to make it formal for the patch that started this thread:

Self-NAK

Will post something better / more flexible soon, for 2.9.

(Hopefully I'll remember to put "[for-2.9]" in the subject.)

Thanks!
Laszlo

>>> - use 0XB3 bit for FEATURES_OK

>>>
diff mbox

Patch

diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
index 10d1ee8b9310..f2fe644fdaa4 100644
--- a/hw/isa/lpc_ich9.c
+++ b/hw/isa/lpc_ich9.c
@@ -372,6 +372,8 @@  void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool smm_enabled)
 
 /* APM */
 
+#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
+
 static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
 {
     ICH9LPCState *lpc = arg;
@@ -386,7 +388,15 @@  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
 
     /* SMI_EN = PMBASE + 30. SMI control and enable register */
     if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
-        cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
+        if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
+            CPUState *cs;
+
+            CPU_FOREACH(cs) {
+                cpu_interrupt(cs, CPU_INTERRUPT_SMI);
+            }
+        } else {
+            cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
+        }
     }
 }