Message ID | 20200914134321.958079-3-pizhenwei@bytedance.com |
---|---|
State | New |
Headers | show |
Series | add MEMORY_FAILURE event | expand |
On Mon, 14 Sep 2020 at 14:53, zhenwei pi <pizhenwei@bytedance.com> wrote: > > Introduce 4 memory failure events for a guest. Then uplayer could > know when/why/what happened to a guest during hitting a hardware > memory failure. > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> > --- > +## > +# @MemoryFailureAction: > +# > +# Host memory failure occurs, handled by QEMU. > +# > +# @hypervisor-ignore: action optional memory failure at QEMU process > +# addressspace (none PC-RAM), QEMU could ignore this > +# hardware memory failure. > +# > +# @hypervisor-stop: action required memory failure at QEMU process address > +# space (none PC-RAM), QEMU has to stop itself. I'm not entirely clear what the descriptions here are trying to say. These would be for memory failure events which are reported by the host and which are not in guest RAM but are in the memory QEMU itself is using ? ("PC-RAM" is a bit x86-specific.) > +# > +# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE > +# handling, QEMU injects MCE to guest. > +# > +# @guest-triple-fault: action required memory failure at PC-RAM, but guest does > +# not enable MCE handling. QEMU raises triple fault and > +# shutdown/reset. Also see detailed info in QEMU log. "triple fault" sounds rather x86-specific; other architectures also have support for host memory failure notifications, so we should design the QAPI events to have architecture-neutral definitions and descriptions. I think the four cases you're trying to distinguish here are: (1) action-optional memory failure in memory used by the hypervisor (which QEMU has ignored other than to report this event) (2) action-required memory failure in memory used by the hypervisor (QEMU is stopping) (3) action-required memory failure in guest memory, which QEMU has reported to the guest (4) action-required memory failure in guest memory, but the guest OS does not support a mechanism for reporting it Is that right? Anyway, I think we should try to find names for the failure types that are not x86-specific. thanks -- PMM
On 9/21/20 8:48 PM, Peter Maydell wrote: > On Mon, 14 Sep 2020 at 14:53, zhenwei pi <pizhenwei@bytedance.com> wrote: >> >> Introduce 4 memory failure events for a guest. Then uplayer could >> know when/why/what happened to a guest during hitting a hardware >> memory failure. >> >> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> >> --- >> +## >> +# @MemoryFailureAction: >> +# >> +# Host memory failure occurs, handled by QEMU. >> +# >> +# @hypervisor-ignore: action optional memory failure at QEMU process >> +# addressspace (none PC-RAM), QEMU could ignore this >> +# hardware memory failure. >> +# >> +# @hypervisor-stop: action required memory failure at QEMU process address >> +# space (none PC-RAM), QEMU has to stop itself. > > I'm not entirely clear what the descriptions here are trying to say. > These would be for memory failure events which are reported by the > host and which are not in guest RAM but are in the memory QEMU itself > is using ? ("PC-RAM" is a bit x86-specific.) > >> +# >> +# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE >> +# handling, QEMU injects MCE to guest. >> +# >> +# @guest-triple-fault: action required memory failure at PC-RAM, but guest does >> +# not enable MCE handling. QEMU raises triple fault and >> +# shutdown/reset. Also see detailed info in QEMU log. > > "triple fault" sounds rather x86-specific; other architectures > also have support for host memory failure notifications, so we > should design the QAPI events to have architecture-neutral > definitions and descriptions. > > I think the four cases you're trying to distinguish here are: > (1) action-optional memory failure in memory used by the hypervisor > (which QEMU has ignored other than to report this event) > (2) action-required memory failure in memory used by the hypervisor > (QEMU is stopping) > (3) action-required memory failure in guest memory, which QEMU > has reported to the guest > (4) action-required memory failure in guest memory, but the > guest OS does not support a mechanism for reporting it > > Is that right? > > Anyway, I think we should try to find names for the failure > types that are not x86-specific. > > thanks > -- PMM > Right, to make architecture-neutral, how about these changes: 'PC-RAM' -> 'guest-memory' 'guest-mce' -> 'guest-mce-inject' 'guest-triple-fault' -> 'guest-mce-fault' -- zhenwei pi
On 21/09/20 15:10, zhenwei pi wrote: >> > Right, to make architecture-neutral, how about these changes: > 'PC-RAM' -> 'guest-memory' > 'guest-mce' -> 'guest-mce-inject' > 'guest-triple-fault' -> 'guest-mce-fault' Perhaps we should have three fields 1) recipient: 'hypervisor' or 'guest' 2) action: 'ignore', 'inject', 'fatal' 3) kind: 'action-optional' or 'action-required' And possibly: 4) recursive: true or false On x86 "recursive" would be set if MCIP=1. Paolo
diff --git a/qapi/run-state.json b/qapi/run-state.json index 7cc9f96a5b..fdc39ce262 100644 --- a/qapi/run-state.json +++ b/qapi/run-state.json @@ -475,3 +475,49 @@ 'psw-mask': 'uint64', 'psw-addr': 'uint64', 'reason': 'S390CrashReason' } } + +## +# @MEMORY_FAILURE: +# +# Emitted when a memory failure occurs on host side. +# +# @action: action that has been taken. action is defined as @MemoryFailureAction. +# +# Since: 5.2 +# +# Example: +# +# <- { "event": "MEMORY_FAILURE", +# "data": { "action": "guest-mce" } } +# +## +{ 'event': 'MEMORY_FAILURE', + 'data': { 'action': 'MemoryFailureAction'} } + +## +# @MemoryFailureAction: +# +# Host memory failure occurs, handled by QEMU. +# +# @hypervisor-ignore: action optional memory failure at QEMU process +# addressspace (none PC-RAM), QEMU could ignore this +# hardware memory failure. +# +# @hypervisor-stop: action required memory failure at QEMU process address +# space (none PC-RAM), QEMU has to stop itself. +# +# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE +# handling, QEMU injects MCE to guest. +# +# @guest-triple-fault: action required memory failure at PC-RAM, but guest does +# not enable MCE handling. QEMU raises triple fault and +# shutdown/reset. Also see detailed info in QEMU log. +# +# Since: 5.2 +# +## +{ 'enum': 'MemoryFailureAction', + 'data': [ 'hypervisor-ignore', + 'hypervisor-stop', + 'guest-mce', + 'guest-triple-fault' ] }
Introduce 4 memory failure events for a guest. Then uplayer could know when/why/what happened to a guest during hitting a hardware memory failure. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> --- qapi/run-state.json | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)