[RFC] KVM: arm64: don't single-step for non-emulated faults

Message ID	20181107171031.22573-1-alex.bennee@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org> To: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, christoffer.dall@linaro.org, marc.zyngier@arm.com Cc: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Christoffer Dall <christoffer.dall@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com>, linux-kernel@vger.kernel.org (open list) Subject: [RFC PATCH] KVM: arm64: don't single-step for non-emulated faults Date: Wed, 7 Nov 2018 17:10:31 +0000 Message-Id: <20181107171031.22573-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	[RFC] KVM: arm64: don't single-step for non-emulated faults \| expand [RFC] KVM: arm64: don't single-step for non-emulated faults

Alex Bennée Nov. 7, 2018, 5:10 p.m. UTC

Not all faults handled by handle_exit are instruction emulations. For
example a ESR_ELx_EC_IABT will result in the page tables being updated
but the instruction that triggered the fault hasn't actually executed
yet. We use the simple heuristic of checking for a changed PC before
seeing if kvm_arm_handle_step_debug wants to claim we stepped an
instruction.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 arch/arm64/kvm/handle_exit.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.17.1

Peter Maydell Nov. 7, 2018, 5:39 p.m. UTC | #1

On 7 November 2018 at 17:10, Alex Bennée <alex.bennee@linaro.org> wrote:
> Not all faults handled by handle_exit are instruction emulations. For

> example a ESR_ELx_EC_IABT will result in the page tables being updated

> but the instruction that triggered the fault hasn't actually executed

> yet. We use the simple heuristic of checking for a changed PC before

> seeing if kvm_arm_handle_step_debug wants to claim we stepped an

> instruction.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

What's the rationale for this change? Presumably it's fixing
something, but the commit message doesn't really say what...

This feels to me like it's working around the fact that
we've separated two things ("advance pc (or set it if we're
going to make the guest take an exception)" and "notice that
we have completed a single step") that should be handled
at one point in the code.

thanks
-- PMM

Peter Maydell Nov. 7, 2018, 5:53 p.m. UTC | #2

On 7 November 2018 at 17:39, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 7 November 2018 at 17:10, Alex Bennée <alex.bennee@linaro.org> wrote:

>> Not all faults handled by handle_exit are instruction emulations. For

>> example a ESR_ELx_EC_IABT will result in the page tables being updated

>> but the instruction that triggered the fault hasn't actually executed

>> yet. We use the simple heuristic of checking for a changed PC before

>> seeing if kvm_arm_handle_step_debug wants to claim we stepped an

>> instruction.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>

> What's the rationale for this change? Presumably it's fixing

> something, but the commit message doesn't really say what...

>

> This feels to me like it's working around the fact that

> we've separated two things ("advance pc (or set it if we're

> going to make the guest take an exception)" and "notice that

> we have completed a single step") that should be handled

> at one point in the code.


...so for instance if your guest PC is at the entrypoint for
an exception, and you singlestep and take the same exception
again, this should count as a single step completed, even
though the PC has not changed. Granted, that's a little
contrived, but it can happen in cases where the guest gets
completely confused and is sitting in a tight loop taking
exceptions because there's no ram at the vector table
address, or whatever.

thanks
-- PMM

Mark Rutland Nov. 7, 2018, 6:01 p.m. UTC | #3

On Wed, Nov 07, 2018 at 05:10:31PM +0000, Alex Bennée wrote:
> Not all faults handled by handle_exit are instruction emulations. For

> example a ESR_ELx_EC_IABT will result in the page tables being updated

> but the instruction that triggered the fault hasn't actually executed

> yet. We use the simple heuristic of checking for a changed PC before

> seeing if kvm_arm_handle_step_debug wants to claim we stepped an

> instruction.

> 

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  arch/arm64/kvm/handle_exit.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c

> index e5e741bfffe1..b8252e72f882 100644

> --- a/arch/arm64/kvm/handle_exit.c

> +++ b/arch/arm64/kvm/handle_exit.c

> @@ -214,6 +214,7 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)

>  static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

>  {

>  	int handled;

> +        unsigned long old_pc = *vcpu_pc(vcpu);

>  

>  	/*

>  	 * See ARM ARM B1.14.1: "Hyp traps on instructions

> @@ -233,7 +234,8 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

>  	 * kvm_arm_handle_step_debug() sets the exit_reason on the kvm_run

>  	 * structure if we need to return to userspace.

>  	 */

> -	if (handled > 0 && kvm_arm_handle_step_debug(vcpu, run))

> +	if (handled > 0 && *vcpu_pc(vcpu) != old_pc &&


This doesn't work if the emulation is equivalent to a branch-to-self, so
I don't think that we want to do this.

When are we failing to advance the single-step state machine correctly?

Thanks,
Mark.

> +	    kvm_arm_handle_step_debug(vcpu, run))

>  		handled = 0;

>  

>  	return handled;

> -- 

> 2.17.1

>

Mark Rutland Nov. 7, 2018, 6:08 p.m. UTC | #4

On Wed, Nov 07, 2018 at 06:01:20PM +0000, Mark Rutland wrote:
> On Wed, Nov 07, 2018 at 05:10:31PM +0000, Alex Bennée wrote:

> > Not all faults handled by handle_exit are instruction emulations. For

> > example a ESR_ELx_EC_IABT will result in the page tables being updated

> > but the instruction that triggered the fault hasn't actually executed

> > yet. We use the simple heuristic of checking for a changed PC before

> > seeing if kvm_arm_handle_step_debug wants to claim we stepped an

> > instruction.

> > 

> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> > ---

> >  arch/arm64/kvm/handle_exit.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> > 

> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c

> > index e5e741bfffe1..b8252e72f882 100644

> > --- a/arch/arm64/kvm/handle_exit.c

> > +++ b/arch/arm64/kvm/handle_exit.c

> > @@ -214,6 +214,7 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)

> >  static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

> >  {

> >  	int handled;

> > +        unsigned long old_pc = *vcpu_pc(vcpu);

> >  

> >  	/*

> >  	 * See ARM ARM B1.14.1: "Hyp traps on instructions

> > @@ -233,7 +234,8 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

> >  	 * kvm_arm_handle_step_debug() sets the exit_reason on the kvm_run

> >  	 * structure if we need to return to userspace.

> >  	 */

> > -	if (handled > 0 && kvm_arm_handle_step_debug(vcpu, run))

> > +	if (handled > 0 && *vcpu_pc(vcpu) != old_pc &&

> 

> This doesn't work if the emulation is equivalent to a branch-to-self, so

> I don't think that we want to do this.

> 

> When are we failing to advance the single-step state machine correctly?


I don't understand how this is intended to work currently.

Surely kvm_skip_instr() should advance the state machine as necessary,
so that we can rely on the HW to generate any necessary single-step
exception when we next return to the guest?

... and if userspace decides to emulate something, it's up to it to
advance the state machine consistently.

Thanks,
Mark.

Alex Bennée Nov. 8, 2018, 12:26 p.m. UTC | #5

Peter Maydell <peter.maydell@linaro.org> writes:

> On 7 November 2018 at 17:39, Peter Maydell <peter.maydell@linaro.org> wrote:

>> On 7 November 2018 at 17:10, Alex Bennée <alex.bennee@linaro.org> wrote:

>>> Not all faults handled by handle_exit are instruction emulations. For

>>> example a ESR_ELx_EC_IABT will result in the page tables being updated

>>> but the instruction that triggered the fault hasn't actually executed

>>> yet. We use the simple heuristic of checking for a changed PC before

>>> seeing if kvm_arm_handle_step_debug wants to claim we stepped an

>>> instruction.

>>>

>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>

>> What's the rationale for this change? Presumably it's fixing

>> something, but the commit message doesn't really say what...

>>

>> This feels to me like it's working around the fact that

>> we've separated two things ("advance pc (or set it if we're

>> going to make the guest take an exception)" and "notice that

>> we have completed a single step") that should be handled

>> at one point in the code.

>

> ...so for instance if your guest PC is at the entrypoint for

> an exception, and you singlestep and take the same exception

> again, this should count as a single step completed, even

> though the PC has not changed. Granted, that's a little

> contrived, but it can happen in cases where the guest gets

> completely confused and is sitting in a tight loop taking

> exceptions because there's no ram at the vector table

> address, or whatever.


The alternative I thought of as I was hacking^H^H^H^H^H^H carefully
engineering this was to expand arm_exit_handlers[] and tag each handler
that was an instruction emulation and gate on that.

--
Alex Bennée

Alex Bennée Nov. 8, 2018, 12:40 p.m. UTC | #6

Mark Rutland <mark.rutland@arm.com> writes:

> On Wed, Nov 07, 2018 at 06:01:20PM +0000, Mark Rutland wrote:

>> On Wed, Nov 07, 2018 at 05:10:31PM +0000, Alex Bennée wrote:

>> > Not all faults handled by handle_exit are instruction emulations. For

>> > example a ESR_ELx_EC_IABT will result in the page tables being updated

>> > but the instruction that triggered the fault hasn't actually executed

>> > yet. We use the simple heuristic of checking for a changed PC before

>> > seeing if kvm_arm_handle_step_debug wants to claim we stepped an

>> > instruction.

>> >

>> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> > ---

>> >  arch/arm64/kvm/handle_exit.c | 4 +++-

>> >  1 file changed, 3 insertions(+), 1 deletion(-)

>> >

>> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c

>> > index e5e741bfffe1..b8252e72f882 100644

>> > --- a/arch/arm64/kvm/handle_exit.c

>> > +++ b/arch/arm64/kvm/handle_exit.c

>> > @@ -214,6 +214,7 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)

>> >  static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

>> >  {

>> >  	int handled;

>> > +        unsigned long old_pc = *vcpu_pc(vcpu);

>> >

>> >  	/*

>> >  	 * See ARM ARM B1.14.1: "Hyp traps on instructions

>> > @@ -233,7 +234,8 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

>> >  	 * kvm_arm_handle_step_debug() sets the exit_reason on the kvm_run

>> >  	 * structure if we need to return to userspace.

>> >  	 */

>> > -	if (handled > 0 && kvm_arm_handle_step_debug(vcpu, run))

>> > +	if (handled > 0 && *vcpu_pc(vcpu) != old_pc &&

>>

>> This doesn't work if the emulation is equivalent to a branch-to-self, so

>> I don't think that we want to do this.

>>

>> When are we failing to advance the single-step state machine

>> correctly?


When the trap is not actually an instruction emulation - e.g. setting up
the page tables on a fault. Because we are in the act of single-stepping
an instruction that didn't actually execute we erroneously return to
userspace pretending we did even though we shouldn't.

>

> I don't understand how this is intended to work currently.

>

> Surely kvm_skip_instr() should advance the state machine as necessary,

> so that we can rely on the HW to generate any necessary single-step

> exception when we next return to the guest?


It doesn't currently (at least for aarch64, the aarch32 skip code does
more messing about). But the decision isn't really about futzing with
the single-step flags but about returning to userspace so the
single-step is seen. Changing a > 0 to return to the guest to a 0 to
exit to userspace while setting the exit reason.
>

> ... and if userspace decides to emulate something, it's up to it to

> advance the state machine consistently.


Well that's a little more complex. We actually exit to handle the MMIO
stuff and then return so it can complete before exiting again for the
step (see virt/kvm/arm/arm.c):

	if (run->exit_reason == KVM_EXIT_MMIO) {
		ret = kvm_handle_mmio_return(vcpu, vcpu->run);
		if (ret)
			return ret;
		if (kvm_arm_handle_step_debug(vcpu, vcpu->run))
			return 0;
	}


>

> Thanks,

> Mark.



--
Alex Bennée

Mark Rutland Nov. 8, 2018, 1:51 p.m. UTC | #7

On Thu, Nov 08, 2018 at 12:40:11PM +0000, Alex Bennée wrote:
> Mark Rutland <mark.rutland@arm.com> writes:

> > On Wed, Nov 07, 2018 at 06:01:20PM +0000, Mark Rutland wrote:

> >> On Wed, Nov 07, 2018 at 05:10:31PM +0000, Alex Bennée wrote:

> >> > Not all faults handled by handle_exit are instruction emulations. For

> >> > example a ESR_ELx_EC_IABT will result in the page tables being updated

> >> > but the instruction that triggered the fault hasn't actually executed

> >> > yet. We use the simple heuristic of checking for a changed PC before

> >> > seeing if kvm_arm_handle_step_debug wants to claim we stepped an

> >> > instruction.

> >> >

> >> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> >> > ---

> >> >  arch/arm64/kvm/handle_exit.c | 4 +++-

> >> >  1 file changed, 3 insertions(+), 1 deletion(-)

> >> >

> >> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c

> >> > index e5e741bfffe1..b8252e72f882 100644

> >> > --- a/arch/arm64/kvm/handle_exit.c

> >> > +++ b/arch/arm64/kvm/handle_exit.c

> >> > @@ -214,6 +214,7 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)

> >> >  static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

> >> >  {

> >> >  	int handled;

> >> > +        unsigned long old_pc = *vcpu_pc(vcpu);

> >> >

> >> >  	/*

> >> >  	 * See ARM ARM B1.14.1: "Hyp traps on instructions

> >> > @@ -233,7 +234,8 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

> >> >  	 * kvm_arm_handle_step_debug() sets the exit_reason on the kvm_run

> >> >  	 * structure if we need to return to userspace.

> >> >  	 */

> >> > -	if (handled > 0 && kvm_arm_handle_step_debug(vcpu, run))

> >> > +	if (handled > 0 && *vcpu_pc(vcpu) != old_pc &&

> >>

> >> This doesn't work if the emulation is equivalent to a branch-to-self, so

> >> I don't think that we want to do this.

> >>

> >> When are we failing to advance the single-step state machine

> >> correctly?

> 

> When the trap is not actually an instruction emulation - e.g. setting up

> the page tables on a fault. Because we are in the act of single-stepping

> an instruction that didn't actually execute we erroneously return to

> userspace pretending we did even though we shouldn't.

I think one problem here is that we're trying to use one bit of state
(the KVM_GUESTDBG_SINGLESTEP) when we actually need two.

I had expected that we'd follow the architectural single-step state
machine, and have three states:

* inactive/disabled: not single stepping

* active-not-pending: the current instruction will be stepped, and we'll
  transition to active-pending before executing the next instruction.

* active-pending: the current instruction will raise a software step
  debug exception, before being executed.

For that to work, all we have to do is advence the state machine when we
emulate/skip an instruction, and the HW will raise the exception for us
when we enter the guest (which is the only place we have to handle the
step exception).

We need two bits of internal state for that, but KVM only gives us a
single KVM_GUESTDBG_SINGLESTEP flag, and we might exit to userspace
mid-emulation (e.g. for MMIO). To avoid that resulting in skipping two
instructions at a time, we currently add explicit
kvm_arm_handle_step_debug() checks everywhere after we've (possibly)
emulated an instruction, but these seem to hit too often.

One problem is that I couldn't spot when we advance the PC for an MMIO
trap. I presume we do that in the kernel, *after* the MMIO trap, but I
can't see where that happens.

Thanks,
Mark.

Alex Bennée Nov. 8, 2018, 2:28 p.m. UTC | #8

Mark Rutland <mark.rutland@arm.com> writes:

> On Thu, Nov 08, 2018 at 12:40:11PM +0000, Alex Bennée wrote:

>> Mark Rutland <mark.rutland@arm.com> writes:

>> > On Wed, Nov 07, 2018 at 06:01:20PM +0000, Mark Rutland wrote:

>> >> On Wed, Nov 07, 2018 at 05:10:31PM +0000, Alex Bennée wrote:

>> >> > Not all faults handled by handle_exit are instruction emulations. For

>> >> > example a ESR_ELx_EC_IABT will result in the page tables being updated

>> >> > but the instruction that triggered the fault hasn't actually executed

>> >> > yet. We use the simple heuristic of checking for a changed PC before

>> >> > seeing if kvm_arm_handle_step_debug wants to claim we stepped an

>> >> > instruction.

>> >> >

>> >> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

<snip>
>> >> > @@ -233,7 +234,8 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)

>> >> >  	 * kvm_arm_handle_step_debug() sets the exit_reason on the kvm_run

>> >> >  	 * structure if we need to return to userspace.

>> >> >  	 */

>> >> > -	if (handled > 0 && kvm_arm_handle_step_debug(vcpu, run))

>> >> > +	if (handled > 0 && *vcpu_pc(vcpu) != old_pc &&

>> >>

<snip>
>> >> When are we failing to advance the single-step state machine

>> >> correctly?

>>

>> When the trap is not actually an instruction emulation - e.g. setting up

>> the page tables on a fault. Because we are in the act of single-stepping

>> an instruction that didn't actually execute we erroneously return to

>> userspace pretending we did even though we shouldn't.

>

> I think one problem here is that we're trying to use one bit of state

> (the KVM_GUESTDBG_SINGLESTEP) when we actually need two.

>

> I had expected that we'd follow the architectural single-step state

> machine, and have three states:

>

> * inactive/disabled: not single stepping

>

> * active-not-pending: the current instruction will be stepped, and we'll

>   transition to active-pending before executing the next instruction.

>

> * active-pending: the current instruction will raise a software step

>   debug exception, before being executed.

>

> For that to work, all we have to do is advence the state machine when we

> emulate/skip an instruction, and the HW will raise the exception for us

> when we enter the guest (which is the only place we have to handle the

> step exception).


We also elide the fact that single-stepping is happening from the guest
here by piggy backing the step bit onto cpsr() as we enter KVM rather
than just tracking the state of the bit.

The current flow of guest debug is very much "as I enter what do I need
to set" rather than tracking state between VCPU_RUN events.

> We need two bits of internal state for that, but KVM only gives us a

> single KVM_GUESTDBG_SINGLESTEP flag, and we might exit to userspace

> mid-emulation (e.g. for MMIO). To avoid that resulting in skipping two

> instructions at a time, we currently add explicit

> kvm_arm_handle_step_debug() checks everywhere after we've (possibly)

> emulated an instruction, but these seem to hit too often.


Yes - treating all exits as potential emulations is problematic and we
are increasing complexity to track which exits are and aren't
actual *completed* instruction emulations which can also be a
multi-stage thing split between userspace and the kernel.

> One problem is that I couldn't spot when we advance the PC for an MMIO

> trap. I presume we do that in the kernel, *after* the MMIO trap, but I

> can't see where that happens.


Nope it gets done before during decode_hsr in mmio.c:

	/*
	 * The MMIO instruction is emulated and should not be re-executed
	 * in the guest.
	 */
	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));

That is a little non-obvious but before guest debug support was added it
makes sense as the whole trap->kernel->user->kernel->guest cycle is
"atomic" w.r.t the guest. It's also common code for
in-kernel/in-userspace emulation.

For single-step we just built on that and completed the single-step
after mmio was complete.

>

> Thanks,

> Mark.



--
Alex Bennée

Peter Maydell Nov. 8, 2018, 2:38 p.m. UTC | #9

On 8 November 2018 at 14:28, Alex Bennée <alex.bennee@linaro.org> wrote:
>

> Mark Rutland <mark.rutland@arm.com> writes:

>> One problem is that I couldn't spot when we advance the PC for an MMIO

>> trap. I presume we do that in the kernel, *after* the MMIO trap, but I

>> can't see where that happens.

>

> Nope it gets done before during decode_hsr in mmio.c:

>

>         /*

>          * The MMIO instruction is emulated and should not be re-executed

>          * in the guest.

>          */

>         kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));

I think that this attempt to do the PC-advance early is
probably an underlying problem that is not helping the
code structure here.

An enhancement that's been floated previously is that the
MMIO emulation in userspace should be able to report back
to KVM "nope, that access should generate a guest synchronous
external abort (with ESR_EL1.EA = 0/1)".
If we have that, then we definitely need to not advance the
PC until after userspace has done the emulation and told
us whether the memory access succeeded or not...

thanks
-- PMM

Mark Rutland Nov. 9, 2018, 11:56 a.m. UTC | #10

On Thu, Nov 08, 2018 at 02:38:43PM +0000, Peter Maydell wrote:
> On 8 November 2018 at 14:28, Alex Bennée <alex.bennee@linaro.org> wrote:

> >

> > Mark Rutland <mark.rutland@arm.com> writes:

> >> One problem is that I couldn't spot when we advance the PC for an MMIO

> >> trap. I presume we do that in the kernel, *after* the MMIO trap, but I

> >> can't see where that happens.

> >

> > Nope it gets done before during decode_hsr in mmio.c:

> >

> >         /*

> >          * The MMIO instruction is emulated and should not be re-executed

> >          * in the guest.

> >          */

> >         kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));

> 

> I think that this attempt to do the PC-advance early is

> probably an underlying problem that is not helping the

> code structure here.

> 

> An enhancement that's been floated previously is that the

> MMIO emulation in userspace should be able to report back

> to KVM "nope, that access should generate a guest synchronous

> external abort (with ESR_EL1.EA = 0/1)".

> If we have that, then we definitely need to not advance the

> PC until after userspace has done the emulation and told

> us whether the memory access succeeded or not...

Yup.

I think that we absolutely want to do all the CPU state advancement (PC,
SS bit, etc) at the point we apply the effects of the instruction. Not
before we emulate the instruction, and not higher/lower in the call
stack.

We have a big problem in that guest-directed singlestep and
host-directed singlestep don't compose, and given that host-directed
singlestep doesn't work reliably today I'd be tempted to rip that out
until we've fixed guest-directed singlestep.

We should have a story for how host-directed debug is handled
transparently from the PoV of a guest using guest-directed debug.

Thanks,
Mark.

Alex Bennée Nov. 9, 2018, 12:24 p.m. UTC | #11

Mark Rutland <mark.rutland@arm.com> writes:

> On Thu, Nov 08, 2018 at 02:38:43PM +0000, Peter Maydell wrote:

>> On 8 November 2018 at 14:28, Alex Bennée <alex.bennee@linaro.org> wrote:

>> >

>> > Mark Rutland <mark.rutland@arm.com> writes:

>> >> One problem is that I couldn't spot when we advance the PC for an MMIO

>> >> trap. I presume we do that in the kernel, *after* the MMIO trap, but I

>> >> can't see where that happens.

>> >

>> > Nope it gets done before during decode_hsr in mmio.c:

>> >

>> >         /*

>> >          * The MMIO instruction is emulated and should not be re-executed

>> >          * in the guest.

>> >          */

>> >         kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));

>>

>> I think that this attempt to do the PC-advance early is

>> probably an underlying problem that is not helping the

>> code structure here.

>>

>> An enhancement that's been floated previously is that the

>> MMIO emulation in userspace should be able to report back

>> to KVM "nope, that access should generate a guest synchronous

>> external abort (with ESR_EL1.EA = 0/1)".

>> If we have that, then we definitely need to not advance the

>> PC until after userspace has done the emulation and told

>> us whether the memory access succeeded or not...

>

> Yup.

>

> I think that we absolutely want to do all the CPU state advancement (PC,

> SS bit, etc) at the point we apply the effects of the instruction. Not

> before we emulate the instruction, and not higher/lower in the call

> stack.

There is certainly an argument to abstract some of the advancement logic
so we can make debug related decisions in one place. I don't know how
much churn we would need to get there.

Currently most of the guest debug decisions are made in one place as we
head into the guest run. Generally I don't think the emulation code want
to know or care about the SS bit or what debug is currently happening
although I guess the presence of the SS bit could be used to decide on
exactly what exit type you are going to do - back to guest or out to
user space. Currently kvm_arm_handle_step_debug on cares about host
directed debug but we could expand it to raise the appropriate guest
exception if required.

> We have a big problem in that guest-directed singlestep and

> host-directed singlestep don't compose, and given that host-directed

> singlestep doesn't work reliably today I'd be tempted to rip that out

> until we've fixed guest-directed singlestep.

Getting host and guest debug to run at the same time is tricky given we
end up subverting guest state when the host debug is in control. It did
make my head spin when I worked on it originally which led to the
acceptance that guest debug couldn't be expected to work transparently
while host directed debug was in effect. If we can support it without
adding complexity then that would be great but it's a pretty niche use
case.

I'd be loathed to rip out the current single step support as it does
actually work pretty well - it's just corner cases with emulated MMIO
and first step that are failing. Last I looked these were cases x86
didn't even get right and no one has called to remove it's single step
support AFAIK.

> We should have a story for how host-directed debug is handled

> transparently from the PoV of a guest using guest-directed debug.

What's the use case for this apart from having a cleaner abstraction? Do
users really spend time running multiple gdbs at different levels in
the stack?

>

> Thanks,

> Mark.

--
Alex Bennée

Mark Rutland Nov. 9, 2018, 12:49 p.m. UTC | #12

On Fri, Nov 09, 2018 at 12:24:41PM +0000, Alex Bennée wrote:
> Mark Rutland <mark.rutland@arm.com> writes:

> > On Thu, Nov 08, 2018 at 02:38:43PM +0000, Peter Maydell wrote:

> >> On 8 November 2018 at 14:28, Alex Bennée <alex.bennee@linaro.org> wrote:

> >> >

> >> > Mark Rutland <mark.rutland@arm.com> writes:

> >> >> One problem is that I couldn't spot when we advance the PC for an MMIO

> >> >> trap. I presume we do that in the kernel, *after* the MMIO trap, but I

> >> >> can't see where that happens.

> >> >

> >> > Nope it gets done before during decode_hsr in mmio.c:

> >> >

> >> >         /*

> >> >          * The MMIO instruction is emulated and should not be re-executed

> >> >          * in the guest.

> >> >          */

> >> >         kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));

> >>

> >> I think that this attempt to do the PC-advance early is

> >> probably an underlying problem that is not helping the

> >> code structure here.

> >>

> >> An enhancement that's been floated previously is that the

> >> MMIO emulation in userspace should be able to report back

> >> to KVM "nope, that access should generate a guest synchronous

> >> external abort (with ESR_EL1.EA = 0/1)".

> >> If we have that, then we definitely need to not advance the

> >> PC until after userspace has done the emulation and told

> >> us whether the memory access succeeded or not...

> >

> > Yup.

> >

> > I think that we absolutely want to do all the CPU state advancement (PC,

> > SS bit, etc) at the point we apply the effects of the instruction. Not

> > before we emulate the instruction, and not higher/lower in the call

> > stack.

> 

> There is certainly an argument to abstract some of the advancement logic

> so we can make debug related decisions in one place. I don't know how

> much churn we would need to get there.


I'm not saying anything about *decisions*. I'm saying that we can make
the state consistent by advancing the singlestep state in the same way
that HW does, at the instant it advances the PC.

i.e. do that in kvm_skip_instr(), as I've done in my local tree.

That mirrors the HW, and we don't need to special-case any handling for
emulated vs non-emulated instructions.

> Currently most of the guest debug decisions are made in one place as we

> head into the guest run. Generally I don't think the emulation code want

> to know or care about the SS bit or what debug is currently happening

> although I guess the presence of the SS bit could be used to decide on

> exactly what exit type you are going to do - back to guest or out to

> user space. Currently kvm_arm_handle_step_debug on cares about host

> directed debug but we could expand it to raise the appropriate guest

> exception if required.


So long as we consistently advance the singlestep state when we emulate
an instruction, we shouldn't need kvm_arm_handle_step_debug() at all. If
we emulate an instruction, we'll return to the guest with PSTATE.SS
clear, and the HW will generate the exception for us.

This is *vastly* simpler to reason about.

I have local patches which I intend to post shortly.

> > We have a big problem in that guest-directed singlestep and

> > host-directed singlestep don't compose, and given that host-directed

> > singlestep doesn't work reliably today I'd be tempted to rip that out

> > until we've fixed guest-directed singlestep.

> 

> Getting host and guest debug to run at the same time is tricky given we

> end up subverting guest state when the host debug is in control. It did

> make my head spin when I worked on it originally which led to the

> acceptance that guest debug couldn't be expected to work transparently

> while host directed debug was in effect. If we can support it without

> adding complexity then that would be great but it's a pretty niche use

> case.


At the very least we need to define whether the kernel or userspace is
responsible for this. If we say it's userspace's responsibility to
virtualize this when it takes control of guest debug, but QEMU doesn't
do so, that's fine by me.

> I'd be loathed to rip out the current single step support as it does

> actually work pretty well - it's just corner cases with emulated MMIO

> and first step that are failing. Last I looked these were cases x86

> didn't even get right and no one has called to remove it's single step

> support AFAIK.

> 

> > We should have a story for how host-directed debug is handled

> > transparently from the PoV of a guest using guest-directed debug.

> 

> What's the use case for this apart from having a cleaner abstraction?


Providing the architecturally mandated behaviour to the guest.

> Do users really spend time running multiple gdbs at different levels

> in the stack?


It's not just about GDB. Things like kprobes, live patching, etc in a
guest may use singlestep, and this may be *critical* to the correct
operation of a given guest.

People *will* want to debug such features under a hypervisor. I know
that I personally want to be able to do so.

In general, a debugger shouldn't silently corrupt guest state or
behaviour, as KVM_GUESTDBG_SINGLESTEP behaviour effectively does today.

Thanks,
Mark.

Peter Maydell Nov. 9, 2018, 12:56 p.m. UTC | #13

On 9 November 2018 at 12:49, Mark Rutland <mark.rutland@arm.com> wrote:
> I'm not saying anything about *decisions*. I'm saying that we can make

> the state consistent by advancing the singlestep state in the same way

> that HW does, at the instant it advances the PC.

>

> i.e. do that in kvm_skip_instr(), as I've done in my local tree.

>

> That mirrors the HW, and we don't need to special-case any handling for

> emulated vs non-emulated instructions.

You also need to do it in the "set PC because we're making the guest
take an exception" code path, which doesn't go through kvm_skip_instr().
This corresponds to the two kinds of "step completed" in hardware as
noted in DDI0487D.a D2.12.3 fig D2-3 footnote b:
 * executing the instruction to be stepped without taking an exception
 * taking an exception to an exception level that debug exceptions
   are enabled from [ie guest EL1 in our case]

thanks
-- PMM

Mark Rutland Nov. 9, 2018, 1:29 p.m. UTC | #14

On Fri, Nov 09, 2018 at 12:56:54PM +0000, Peter Maydell wrote:
> On 9 November 2018 at 12:49, Mark Rutland <mark.rutland@arm.com> wrote:

> > I'm not saying anything about *decisions*. I'm saying that we can make

> > the state consistent by advancing the singlestep state in the same way

> > that HW does, at the instant it advances the PC.

> >

> > i.e. do that in kvm_skip_instr(), as I've done in my local tree.

> >

> > That mirrors the HW, and we don't need to special-case any handling for

> > emulated vs non-emulated instructions.

> 

> You also need to do it in the "set PC because we're making the guest

> take an exception" code path, which doesn't go through kvm_skip_instr().


Sure.

> This corresponds to the two kinds of "step completed" in hardware as

> noted in DDI0487D.a D2.12.3 fig D2-3 footnote b:

>  * executing the instruction to be stepped without taking an exception

>  * taking an exception to an exception level that debug exceptions

>    are enabled from [ie guest EL1 in our case]


Thanks for the pointer!

Mark.

[RFC] KVM: arm64: don't single-step for non-emulated faults

Commit Message

Comments

Patch