diff mbox series

cpus.c: Fix race condition in cpu_stop_current()

Message ID 20181207155911.12710-1-peter.maydell@linaro.org
State Superseded
Headers show
Series cpus.c: Fix race condition in cpu_stop_current() | expand

Commit Message

Peter Maydell Dec. 7, 2018, 3:59 p.m. UTC
We use cpu_stop_current() to ensure the current CPU has stopped
from places like qemu_system_reset_request(). Unfortunately its
current implementation has a race. It calls qemu_cpu_stop(),
which sets cpu->stopped to true even though the CPU hasn't
actually stopped yet. The main thread will look at the flags
set by qemu_system_reset_request() and call pause_all_vcpus().
pause_all_vcpus() waits for every cpu to have cpu->stopped true,
so it can continue (and we will start the system reset operation)
before the vcpu thread has got back to its top level loop.

Instead, just set cpu->stop and call cpu_exit(). This will
cause the vcpu to exit back to the top level loop, and there
(as part of the wait_io_event code) it will call qemu_cpu_stop().

This fixes bugs where the reset request appeared to be ignored
or the CPU misbehaved because the reset operation started
to change vcpu state while the vcpu thread was still using it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

---
We discussed this a little while back:
https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html
and Jaap reported a bug which I suspect of being the same thing:
https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

Annoyingly I have lost the test case that demonstrated this
race, but I analysed it at the time and this should definitely
fix it. I have opted not to try to address any of the other
possible cleanup here (eg vm_stop() has a potential similar
race if called from a vcpu thread I suspect), since it gets
pretty tangled.

Jaap: could you test whether this patch fixes the issue you
were seeing, please?
---
 cpus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.19.2

Comments

Jaap Crezee Dec. 8, 2018, 8:47 a.m. UTC | #1
Hi all,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> Jaap: could you test whether this patch fixes the issue you

> were seeing, please?


I have applied the patch and started my test tool against it. It will need some time as I have also seen cases where it only failed after 600 reboots.
My testtool logs into the VM over ssh and reboots it. Meanwhile it will check the device going offline and coming back online again and starts over
(including a random wait).

I will keep you updated. Thanks!

Jaap
Jaap Crezee Dec. 10, 2018, 7:43 a.m. UTC | #2
Hello all,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> Jaap: could you test whether this patch fixes the issue you

> were seeing, please?



My test is going very well. With the patch applied, I have no longer been able to freeze/hang the VM. Currently at 7024 reboots and counting over
runtime 1 day 23 hours. I will start testing on my production environment as well.

Tested-by: Jaap Crezee <jaap@jcz.nl>



regards,


Jaap
Alex Bennée Dec. 10, 2018, 11:05 a.m. UTC | #3
Peter Maydell <peter.maydell@linaro.org> writes:

> We use cpu_stop_current() to ensure the current CPU has stopped

> from places like qemu_system_reset_request(). Unfortunately its

> current implementation has a race. It calls qemu_cpu_stop(),

> which sets cpu->stopped to true even though the CPU hasn't

> actually stopped yet. The main thread will look at the flags

> set by qemu_system_reset_request() and call pause_all_vcpus().

> pause_all_vcpus() waits for every cpu to have cpu->stopped true,

> so it can continue (and we will start the system reset operation)

> before the vcpu thread has got back to its top level loop.

>

> Instead, just set cpu->stop and call cpu_exit(). This will

> cause the vcpu to exit back to the top level loop, and there

> (as part of the wait_io_event code) it will call qemu_cpu_stop().

>

> This fixes bugs where the reset request appeared to be ignored

> or the CPU misbehaved because the reset operation started

> to change vcpu state while the vcpu thread was still using it.

>

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

> We discussed this a little while back:

> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html

> and Jaap reported a bug which I suspect of being the same thing:

> https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

>

> Annoyingly I have lost the test case that demonstrated this

> race, but I analysed it at the time and this should definitely

> fix it. I have opted not to try to address any of the other

> possible cleanup here (eg vm_stop() has a potential similar

> race if called from a vcpu thread I suspect), since it gets

> pretty tangled.

>

> Jaap: could you test whether this patch fixes the issue you

> were seeing, please?

> ---

>  cpus.c | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

>

> diff --git a/cpus.c b/cpus.c

> index 0ddeeefc14f..b09b7027126 100644

> --- a/cpus.c

> +++ b/cpus.c

> @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)

>  void cpu_stop_current(void)

>  {

>      if (current_cpu) {

> -        qemu_cpu_stop(current_cpu, true);

> +        current_cpu->stop = true;

> +        cpu_exit(current_cpu);


Should the FIXME in vm_stop also be fixed?

        /*
         * FIXME: should not return to device code in case
         * vm_stop() has been requested.
         */
        cpu_stop_current();
        return 0;


>      }

>  }



--
Alex Bennée
Peter Maydell Dec. 10, 2018, 11:17 a.m. UTC | #4
On Mon, 10 Dec 2018 at 11:06, Alex Bennée <alex.bennee@linaro.org> wrote:
>

>

> Peter Maydell <peter.maydell@linaro.org> writes:

> > We discussed this a little while back:

> > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html

> > and Jaap reported a bug which I suspect of being the same thing:

> > https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

> >

> > Annoyingly I have lost the test case that demonstrated this

> > race, but I analysed it at the time and this should definitely

> > fix it. I have opted not to try to address any of the other

> > possible cleanup here (eg vm_stop() has a potential similar

> > race if called from a vcpu thread I suspect), since it gets

> > pretty tangled.

> >

> > Jaap: could you test whether this patch fixes the issue you

> > were seeing, please?

> > ---

> >  cpus.c | 3 ++-

> >  1 file changed, 2 insertions(+), 1 deletion(-)

> >

> > diff --git a/cpus.c b/cpus.c

> > index 0ddeeefc14f..b09b7027126 100644

> > --- a/cpus.c

> > +++ b/cpus.c

> > @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)

> >  void cpu_stop_current(void)

> >  {

> >      if (current_cpu) {

> > -        qemu_cpu_stop(current_cpu, true);

> > +        current_cpu->stop = true;

> > +        cpu_exit(current_cpu);

>

> Should the FIXME in vm_stop also be fixed?

>

>         /*

>          * FIXME: should not return to device code in case

>          * vm_stop() has been requested.

>          */

>         cpu_stop_current();

>         return 0;


This is one of the things I had in mind with:
> > I have opted not to try to address any of the other

> > possible cleanup here (eg vm_stop() has a potential similar

> > race if called from a vcpu thread I suspect), since it gets

> > pretty tangled.


though I might actually have meant pause_all_vcpus().
(For pause_all_vcpus() I think the correct thing is to
fix the hw/i386/kvmvapic.c code to work in some other way,
and then assert that pause_all_vcpus() is never called from
the VCPU thread.)

At any rate, this code is quite complicated, so I think
it's worth just fixing this specific race without getting
tangled up in everything else we could potentially refactor.

I'm not even sure how we would arrange for vm_stop() to
avoid returning to device emulation code if it has been
called from there -- I would favour instead changing/defining
the semantics to be like the shutdown-request and reset-request
where the device code expects that control will return but
the VM stop happens at the next opportunity, ie as soon
as the execution of the insn which wrote to the device
register has completed.

thanks
-- PMM
Alex Bennée Dec. 10, 2018, 12:15 p.m. UTC | #5
Peter Maydell <peter.maydell@linaro.org> writes:

> On Mon, 10 Dec 2018 at 11:06, Alex Bennée <alex.bennee@linaro.org> wrote:

>>

>>

>> Peter Maydell <peter.maydell@linaro.org> writes:

>> > We discussed this a little while back:

>> > https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html

>> > and Jaap reported a bug which I suspect of being the same thing:

>> > https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

>> >

>> > Annoyingly I have lost the test case that demonstrated this

>> > race, but I analysed it at the time and this should definitely

>> > fix it. I have opted not to try to address any of the other

>> > possible cleanup here (eg vm_stop() has a potential similar

>> > race if called from a vcpu thread I suspect), since it gets

>> > pretty tangled.

>> >

>> > Jaap: could you test whether this patch fixes the issue you

>> > were seeing, please?

>> > ---

>> >  cpus.c | 3 ++-

>> >  1 file changed, 2 insertions(+), 1 deletion(-)

>> >

>> > diff --git a/cpus.c b/cpus.c

>> > index 0ddeeefc14f..b09b7027126 100644

>> > --- a/cpus.c

>> > +++ b/cpus.c

>> > @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)

>> >  void cpu_stop_current(void)

>> >  {

>> >      if (current_cpu) {

>> > -        qemu_cpu_stop(current_cpu, true);

>> > +        current_cpu->stop = true;

>> > +        cpu_exit(current_cpu);

>>

>> Should the FIXME in vm_stop also be fixed?

>>

>>         /*

>>          * FIXME: should not return to device code in case

>>          * vm_stop() has been requested.

>>          */

>>         cpu_stop_current();

>>         return 0;

>

> This is one of the things I had in mind with:

>> > I have opted not to try to address any of the other

>> > possible cleanup here (eg vm_stop() has a potential similar

>> > race if called from a vcpu thread I suspect), since it gets

>> > pretty tangled.

>

> though I might actually have meant pause_all_vcpus().

> (For pause_all_vcpus() I think the correct thing is to

> fix the hw/i386/kvmvapic.c code to work in some other way,

> and then assert that pause_all_vcpus() is never called from

> the VCPU thread.)


I thought we had already fixed this, but it is yet another case. See the
patch_instruction code in the same file. The only niggle is ensuring
that either the helper is called from last instruction in the block or
forcing a cpu_exit after queuing the work.

The i386 helpers are tricky because they seem to be very deeply nested
so its hard to be sure everything really is done.

But yes I agree that pause_all_vcpus() should be reserved for non-vCPU
contexts only.

> At any rate, this code is quite complicated, so I think

> it's worth just fixing this specific race without getting

> tangled up in everything else we could potentially refactor.


Fair enough.

>

> I'm not even sure how we would arrange for vm_stop() to

> avoid returning to device emulation code if it has been

> called from there -- I would favour instead changing/defining

> the semantics to be like the shutdown-request and reset-request

> where the device code expects that control will return but

> the VM stop happens at the next opportunity, ie as soon

> as the execution of the insn which wrote to the device

> register has completed.


Is there anyway we can assert we are in a helper that has caused all
globals to be saved before the call? Otherwise we need to make similar
guarantees that the ARM TLB flushes have that they are always the last
in a block of instructions.

prep_port0092_write in PPC seems to do a similar thing. Perhaps we need
to expose some common helpers for vcpus?

--
Alex Bennée
Peter Maydell Dec. 10, 2018, 1:07 p.m. UTC | #6
On Mon, 10 Dec 2018 at 12:15, Alex Bennée <alex.bennee@linaro.org> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:

> > though I might actually have meant pause_all_vcpus().

> > (For pause_all_vcpus() I think the correct thing is to

> > fix the hw/i386/kvmvapic.c code to work in some other way,

> > and then assert that pause_all_vcpus() is never called from

> > the VCPU thread.)

>

> I thought we had already fixed this, but it is yet another case. See the

> patch_instruction code in the same file. The only niggle is ensuring

> that either the helper is called from last instruction in the block or

> forcing a cpu_exit after queuing the work.


Note that the call to pause_all_vcpus() is inside an
"if (kvm_enabled())" so it doesn't matter what the TCG
code does in the way of dividing code up into blocks.
(Though the comment suggests that making it work in TCG
might be nice in theory.)

thanks
-- PMM
KONRAD Frederic Dec. 10, 2018, 2:30 p.m. UTC | #7
Hi Peter,

Thanks for that patch!

I'm seeing the same kind of issue when I run 8 qemu in parallel but it doesn't
seem to be fixed by this patch. Is it supposed to fix the issue when we are
doing a reset_request through a MMIO device?

It happens (rarely) with this kind of guest code:

exit:
   write to the register to reset the device
loop:
   branch loop

The code after the reset is executed.. can't we exit the loop directly with
cpu_loop_exit after cpu_exit?

Thanks,
Fred

Le 12/7/18 à 4:59 PM, Peter Maydell a écrit :
> We use cpu_stop_current() to ensure the current CPU has stopped

> from places like qemu_system_reset_request(). Unfortunately its

> current implementation has a race. It calls qemu_cpu_stop(),

> which sets cpu->stopped to true even though the CPU hasn't

> actually stopped yet. The main thread will look at the flags

> set by qemu_system_reset_request() and call pause_all_vcpus().

> pause_all_vcpus() waits for every cpu to have cpu->stopped true,

> so it can continue (and we will start the system reset operation)

> before the vcpu thread has got back to its top level loop.

> 

> Instead, just set cpu->stop and call cpu_exit(). This will

> cause the vcpu to exit back to the top level loop, and there

> (as part of the wait_io_event code) it will call qemu_cpu_stop().

> 

> This fixes bugs where the reset request appeared to be ignored

> or the CPU misbehaved because the reset operation started

> to change vcpu state while the vcpu thread was still using it.

> 

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

> We discussed this a little while back:

> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html

> and Jaap reported a bug which I suspect of being the same thing:

> https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

> 

> Annoyingly I have lost the test case that demonstrated this

> race, but I analysed it at the time and this should definitely

> fix it. I have opted not to try to address any of the other

> possible cleanup here (eg vm_stop() has a potential similar

> race if called from a vcpu thread I suspect), since it gets

> pretty tangled.

> 

> Jaap: could you test whether this patch fixes the issue you

> were seeing, please?

> ---

>   cpus.c | 3 ++-

>   1 file changed, 2 insertions(+), 1 deletion(-)

> 

> diff --git a/cpus.c b/cpus.c

> index 0ddeeefc14f..b09b7027126 100644

> --- a/cpus.c

> +++ b/cpus.c

> @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)

>   void cpu_stop_current(void)

>   {

>       if (current_cpu) {

> -        qemu_cpu_stop(current_cpu, true);

> +        current_cpu->stop = true;

> +        cpu_exit(current_cpu);

>       }

>   }

>   

>
Peter Maydell Dec. 10, 2018, 2:39 p.m. UTC | #8
On Mon, 10 Dec 2018 at 14:30, KONRAD Frederic
<frederic.konrad@adacore.com> wrote:
>

> Hi Peter,

>

> Thanks for that patch!

>

> I'm seeing the same kind of issue when I run 8 qemu in parallel but it doesn't

> seem to be fixed by this patch. Is it supposed to fix the issue when we are

> doing a reset_request through a MMIO device?

>

> It happens (rarely) with this kind of guest code:

>

> exit:

>    write to the register to reset the device

> loop:

>    branch loop

>

> The code after the reset is executed.. can't we exit the loop directly with

> cpu_loop_exit after cpu_exit?


cpu_loop_exit would abort the execution of the store instruction
that writes to the reset register. I'm not sure that's a great
idea. My thought was more that we should just make sure that insn
is the last one in the TB, so effectively we execute that insn and
then reset the system before executing any further insns. Thinking
it over though I'm not sure that we do do anything that could
avoid having more insns following in the same TB, unless you're
using singlestep or icount...

thanks
-- PMM
KONRAD Frederic Dec. 10, 2018, 2:52 p.m. UTC | #9
Le 12/10/18 à 3:39 PM, Peter Maydell a écrit :
> On Mon, 10 Dec 2018 at 14:30, KONRAD Frederic

> <frederic.konrad@adacore.com> wrote:

>>

>> Hi Peter,

>>

>> Thanks for that patch!

>>

>> I'm seeing the same kind of issue when I run 8 qemu in parallel but it doesn't

>> seem to be fixed by this patch. Is it supposed to fix the issue when we are

>> doing a reset_request through a MMIO device?

>>

>> It happens (rarely) with this kind of guest code:

>>

>> exit:

>>     write to the register to reset the device

>> loop:

>>     branch loop

>>

>> The code after the reset is executed.. can't we exit the loop directly with

>> cpu_loop_exit after cpu_exit?

> 

> cpu_loop_exit would abort the execution of the store instruction

> that writes to the reset register. I'm not sure that's a great

> idea. My thought was more that we should just make sure that insn

> is the last one in the TB, so effectively we execute that insn and

> then reset the system before executing any further insns. Thinking

> it over though I'm not sure that we do do anything that could

> avoid having more insns following in the same TB, unless you're

> using singlestep or icount...


Exactly I think we don't do anything for that.. But we can't guess which IO will
require the loop to be exited though..

Fred

> 

> thanks

> -- PMM

>
Jaap Crezee Dec. 10, 2018, 8:58 p.m. UTC | #10
Hi again,

On 12/7/18 4:59 PM, Peter Maydell wrote:
> We use cpu_stop_current() to ensure the current CPU has stopped

> from places like qemu_system_reset_request(). Unfortunately its

> current implementation has a race. It calls qemu_cpu_stop(),

> which sets cpu->stopped to true even though the CPU hasn't

> actually stopped yet. The main thread will look at the flags

> set by qemu_system_reset_request() and call pause_all_vcpus().

> pause_all_vcpus() waits for every cpu to have cpu->stopped true,

> so it can continue (and we will start the system reset operation)

> before the vcpu thread has got back to its top level loop.

> 

> Instead, just set cpu->stop and call cpu_exit(). This will

> cause the vcpu to exit back to the top level loop, and there

> (as part of the wait_io_event code) it will call qemu_cpu_stop().

> 

> This fixes bugs where the reset request appeared to be ignored

> or the CPU misbehaved because the reset operation started

> to change vcpu state while the vcpu thread was still using it.

> 

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> ---

> We discussed this a little while back:

> https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg00154.html

> and Jaap reported a bug which I suspect of being the same thing:

> https://lists.gnu.org/archive/html/qemu-discuss/2018-10/msg00014.html

> 

> Annoyingly I have lost the test case that demonstrated this

> race, but I analysed it at the time and this should definitely

> fix it. I have opted not to try to address any of the other

> possible cleanup here (eg vm_stop() has a potential similar

> race if called from a vcpu thread I suspect), since it gets

> pretty tangled.

> 

> Jaap: could you test whether this patch fixes the issue you

> were seeing, please?

> ---

>  cpus.c | 3 ++-

>  1 file changed, 2 insertions(+), 1 deletion(-)

> 

> diff --git a/cpus.c b/cpus.c

> index 0ddeeefc14f..b09b7027126 100644

> --- a/cpus.c

> +++ b/cpus.c

> @@ -2100,7 +2100,8 @@ void qemu_init_vcpu(CPUState *cpu)

>  void cpu_stop_current(void)

>  {

>      if (current_cpu) {

> -        qemu_cpu_stop(current_cpu, true);

> +        current_cpu->stop = true;

> +        cpu_exit(current_cpu);

>      }

>  }

>  

> 



The patch also fixed the issue on my production system. Thanks!

regards,


Jaap
Emilio Cota Dec. 11, 2018, 1:06 a.m. UTC | #11
On Fri, Dec 07, 2018 at 15:59:11 +0000, Peter Maydell wrote:
> We use cpu_stop_current() to ensure the current CPU has stopped

> from places like qemu_system_reset_request(). Unfortunately its

> current implementation has a race. It calls qemu_cpu_stop(),

> which sets cpu->stopped to true even though the CPU hasn't

> actually stopped yet. The main thread will look at the flags

> set by qemu_system_reset_request() and call pause_all_vcpus().

> pause_all_vcpus() waits for every cpu to have cpu->stopped true,

> so it can continue (and we will start the system reset operation)

> before the vcpu thread has got back to its top level loop.

> 

> Instead, just set cpu->stop and call cpu_exit(). This will

> cause the vcpu to exit back to the top level loop, and there

> (as part of the wait_io_event code) it will call qemu_cpu_stop().

> 

> This fixes bugs where the reset request appeared to be ignored

> or the CPU misbehaved because the reset operation started

> to change vcpu state while the vcpu thread was still using it.

> 

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


Reviewed-by: Emilio G. Cota <cota@braap.org>


Thanks,

		E.
Peter Maydell Jan. 4, 2019, 3:36 p.m. UTC | #12
On Fri, 7 Dec 2018 at 16:00, Peter Maydell <peter.maydell@linaro.org> wrote:
>

> We use cpu_stop_current() to ensure the current CPU has stopped

> from places like qemu_system_reset_request(). Unfortunately its

> current implementation has a race. It calls qemu_cpu_stop(),

> which sets cpu->stopped to true even though the CPU hasn't

> actually stopped yet. The main thread will look at the flags

> set by qemu_system_reset_request() and call pause_all_vcpus().

> pause_all_vcpus() waits for every cpu to have cpu->stopped true,

> so it can continue (and we will start the system reset operation)

> before the vcpu thread has got back to its top level loop.

>

> Instead, just set cpu->stop and call cpu_exit(). This will

> cause the vcpu to exit back to the top level loop, and there

> (as part of the wait_io_event code) it will call qemu_cpu_stop().

>

> This fixes bugs where the reset request appeared to be ignored

> or the CPU misbehaved because the reset operation started

> to change vcpu state while the vcpu thread was still using it.

>

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


I'm going to put this in via target-arm.next, unless anybody
would like to suggest another route.

thanks
-- PMM
diff mbox series

Patch

diff --git a/cpus.c b/cpus.c
index 0ddeeefc14f..b09b7027126 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2100,7 +2100,8 @@  void qemu_init_vcpu(CPUState *cpu)
 void cpu_stop_current(void)
 {
     if (current_cpu) {
-        qemu_cpu_stop(current_cpu, true);
+        current_cpu->stop = true;
+        cpu_exit(current_cpu);
     }
 }