[v2,04/11] translate: downgrade IRQ BQL asserts to tcg_debug_assert

Message ID 20170302195337.31558-5-alex.bennee@linaro.org
State New
Headers show
Series
  • MTTCG fixups for 2.9
Related show

Commit Message

Alex Bennée March 2, 2017, 7:53 p.m.
While on MTTCG hosts it is very important that updates to
cpu->interrupt_request are protected by the BQL not all guests have
been converted to the correct locking yet. As a result we are seeing
breaking on non-MTTCG enabled guests in production builds.

The locking in the guests needs to be fixed but while running single
threaded they will continue to work. By moving the asserts to
tcg_debug_asserts() they will still be useful during conversion
work (much like the existing assert_memory_lock/assert_tb_lock
asserts).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 translate-all.c    | 2 +-
 translate-common.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

-- 
2.11.0

Comments

Peter Maydell March 3, 2017, 10:08 a.m. | #1
On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote:
> While on MTTCG hosts it is very important that updates to

> cpu->interrupt_request are protected by the BQL not all guests have

> been converted to the correct locking yet. As a result we are seeing

> breaking on non-MTTCG enabled guests in production builds.

>

> The locking in the guests needs to be fixed but while running single

> threaded they will continue to work. By moving the asserts to

> tcg_debug_asserts() they will still be useful during conversion

> work (much like the existing assert_memory_lock/assert_tb_lock

> asserts).

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  translate-all.c    | 2 +-

>  translate-common.c | 3 ++-

>  2 files changed, 3 insertions(+), 2 deletions(-)

>

> diff --git a/translate-all.c b/translate-all.c

> index 9bac061c9b..7ee273410d 100644

> --- a/translate-all.c

> +++ b/translate-all.c

> @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)

>

>  void cpu_interrupt(CPUState *cpu, int mask)

>  {

> -    g_assert(qemu_mutex_iothread_locked());

> +    tcg_debug_assert(qemu_mutex_iothread_locked());


If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert()
turns into "if (!(X)) { __builtin_unreachable(); }", which
means that instead of asserting we now run straight
into compiler undefined behaviour, don't we?
If what we want is "don't actually check this condition in
the non-tcg-debug config" then we should do something
that means we don't actually check the condition...

thanks
-- PMM
Alex Bennée March 3, 2017, 11:05 a.m. | #2
Peter Maydell <peter.maydell@linaro.org> writes:

> On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote:

>> While on MTTCG hosts it is very important that updates to

>> cpu->interrupt_request are protected by the BQL not all guests have

>> been converted to the correct locking yet. As a result we are seeing

>> breaking on non-MTTCG enabled guests in production builds.

>>

>> The locking in the guests needs to be fixed but while running single

>> threaded they will continue to work. By moving the asserts to

>> tcg_debug_asserts() they will still be useful during conversion

>> work (much like the existing assert_memory_lock/assert_tb_lock

>> asserts).

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> ---

>>  translate-all.c    | 2 +-

>>  translate-common.c | 3 ++-

>>  2 files changed, 3 insertions(+), 2 deletions(-)

>>

>> diff --git a/translate-all.c b/translate-all.c

>> index 9bac061c9b..7ee273410d 100644

>> --- a/translate-all.c

>> +++ b/translate-all.c

>> @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)

>>

>>  void cpu_interrupt(CPUState *cpu, int mask)

>>  {

>> -    g_assert(qemu_mutex_iothread_locked());

>> +    tcg_debug_assert(qemu_mutex_iothread_locked());

>

> If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert()

> turns into "if (!(X)) { __builtin_unreachable(); }", which

> means that instead of asserting we now run straight

> into compiler undefined behaviour, don't we?


According to the commit that added it
(c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to
the compiler. Reading the GCC notes however seems to contradict that.

FWIW I did test it in both builds and we do used tese for a bunch of
other asserts and they haven't blown up.

> If what we want is "don't actually check this condition in

> the non-tcg-debug config" then we should do something

> that means we don't actually check the condition...


Hmm:

28	intptr_t qemu_real_host_page_mask;
29
30	#ifndef CONFIG_USER_ONLY
31	/* mask must never be zero, except for A20 change call */
32	static void tcg_handle_interrupt(CPUState *cpu, int mask)
33	{
34	    int old_mask;
35	    tcg_debug_assert(qemu_mutex_iothread_locked());
36
37	    old_mask = cpu->interrupt_request;
Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code.
Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>.
Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code.
Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>.
   0x24db0a <tcg_handle_interrupt+10>:	callq  0x27a570 <qemu_mutex_iothread_locked>
   0x24db0f <tcg_handle_interrupt+15>:	mov    0xa8(%rbx),%ebp
   0x24db15 <tcg_handle_interrupt+21>:	mov    %r12d,%eax
   0x24db18 <tcg_handle_interrupt+24>:	mov    %rbx,%rdi
   0x24db1b <tcg_handle_interrupt+27>:	or     %ebp,%eax
   0x24db1d <tcg_handle_interrupt+29>:	mov    %eax,0xa8(%rbx)
   0x24db23 <tcg_handle_interrupt+35>:	callq  0x27a530 <qemu_cpu_is_self>

It certainly looks as though it makes the call but ignores the result?

>

> thanks

> -- PMM



--
Alex Bennée
Peter Maydell March 3, 2017, 11:19 a.m. | #3
On 3 March 2017 at 11:05, Alex Bennée <alex.bennee@linaro.org> wrote:
> According to the commit that added it

> (c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to

> the compiler. Reading the GCC notes however seems to contradict that.

>

> FWIW I did test it in both builds and we do used tese for a bunch of

> other asserts and they haven't blown up.

>

>> If what we want is "don't actually check this condition in

>> the non-tcg-debug config" then we should do something

>> that means we don't actually check the condition...

>

> Hmm:

>

> 28      intptr_t qemu_real_host_page_mask;

> 29

> 30      #ifndef CONFIG_USER_ONLY

> 31      /* mask must never be zero, except for A20 change call */

> 32      static void tcg_handle_interrupt(CPUState *cpu, int mask)

> 33      {

> 34          int old_mask;

> 35          tcg_debug_assert(qemu_mutex_iothread_locked());

> 36

> 37          old_mask = cpu->interrupt_request;

> Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code.

> Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>.

> Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code.

> Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>.

>    0x24db0a <tcg_handle_interrupt+10>:  callq  0x27a570 <qemu_mutex_iothread_locked>

>    0x24db0f <tcg_handle_interrupt+15>:  mov    0xa8(%rbx),%ebp

>    0x24db15 <tcg_handle_interrupt+21>:  mov    %r12d,%eax

>    0x24db18 <tcg_handle_interrupt+24>:  mov    %rbx,%rdi

>    0x24db1b <tcg_handle_interrupt+27>:  or     %ebp,%eax

>    0x24db1d <tcg_handle_interrupt+29>:  mov    %eax,0xa8(%rbx)

>    0x24db23 <tcg_handle_interrupt+35>:  callq  0x27a530 <qemu_cpu_is_self>

>

> It certainly looks as though it makes the call but ignores the result?


That is one permitted implementation of the undefined behaviour,
certainly. Not the only one, though. In particular you're telling
the compiler's optimization passes "this can never be reached"
which can result in the optimizer making significantly different
decisions (especially if it manages to inline things or it can
look "inside" the condition being asserted here, etc etc).

thanks
-- PMM
Paolo Bonzini March 3, 2017, 11:49 a.m. | #4
On 03/03/2017 11:08, Peter Maydell wrote:
> On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote:

>> While on MTTCG hosts it is very important that updates to

>> cpu->interrupt_request are protected by the BQL not all guests have

>> been converted to the correct locking yet. As a result we are seeing

>> breaking on non-MTTCG enabled guests in production builds.

>>

>> The locking in the guests needs to be fixed but while running single

>> threaded they will continue to work. By moving the asserts to

>> tcg_debug_asserts() they will still be useful during conversion

>> work (much like the existing assert_memory_lock/assert_tb_lock

>> asserts).

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> ---

>>  translate-all.c    | 2 +-

>>  translate-common.c | 3 ++-

>>  2 files changed, 3 insertions(+), 2 deletions(-)

>>

>> diff --git a/translate-all.c b/translate-all.c

>> index 9bac061c9b..7ee273410d 100644

>> --- a/translate-all.c

>> +++ b/translate-all.c

>> @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)

>>

>>  void cpu_interrupt(CPUState *cpu, int mask)

>>  {

>> -    g_assert(qemu_mutex_iothread_locked());

>> +    tcg_debug_assert(qemu_mutex_iothread_locked());

> 

> If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert()

> turns into "if (!(X)) { __builtin_unreachable(); }", which

> means that instead of asserting we now run straight

> into compiler undefined behaviour, don't we?

> If what we want is "don't actually check this condition in

> the non-tcg-debug config" then we should do something

> that means we don't actually check the condition...


I think we should fix it for good and leave this as a hard assertion.

Paolo
Richard Henderson March 3, 2017, 7:35 p.m. | #5
On 03/03/2017 10:19 PM, Peter Maydell wrote:
> On 3 March 2017 at 11:05, Alex Bennée <alex.bennee@linaro.org> wrote:

>> According to the commit that added it

>> (c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to

>> the compiler. Reading the GCC notes however seems to contradict that.

>>

>> FWIW I did test it in both builds and we do used tese for a bunch of

>> other asserts and they haven't blown up.

>>

>>> If what we want is "don't actually check this condition in

>>> the non-tcg-debug config" then we should do something

>>> that means we don't actually check the condition...

>>

>> Hmm:

>>

>> 28      intptr_t qemu_real_host_page_mask;

>> 29

>> 30      #ifndef CONFIG_USER_ONLY

>> 31      /* mask must never be zero, except for A20 change call */

>> 32      static void tcg_handle_interrupt(CPUState *cpu, int mask)

>> 33      {

>> 34          int old_mask;

>> 35          tcg_debug_assert(qemu_mutex_iothread_locked());

>> 36

>> 37          old_mask = cpu->interrupt_request;

>> Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code.

>> Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>.

>> Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code.

>> Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>.

>>    0x24db0a <tcg_handle_interrupt+10>:  callq  0x27a570 <qemu_mutex_iothread_locked>

>>    0x24db0f <tcg_handle_interrupt+15>:  mov    0xa8(%rbx),%ebp

>>    0x24db15 <tcg_handle_interrupt+21>:  mov    %r12d,%eax

>>    0x24db18 <tcg_handle_interrupt+24>:  mov    %rbx,%rdi

>>    0x24db1b <tcg_handle_interrupt+27>:  or     %ebp,%eax

>>    0x24db1d <tcg_handle_interrupt+29>:  mov    %eax,0xa8(%rbx)

>>    0x24db23 <tcg_handle_interrupt+35>:  callq  0x27a530 <qemu_cpu_is_self>

>>

>> It certainly looks as though it makes the call but ignores the result?

>

> That is one permitted implementation of the undefined behaviour,

> certainly. Not the only one, though. In particular you're telling

> the compiler's optimization passes "this can never be reached"

> which can result in the optimizer making significantly different

> decisions (especially if it manages to inline things or it can

> look "inside" the condition being asserted here, etc etc).


Yep, Peter's right.

Which is exactly the point when you have a condition like (X > 0); letting the 
compiler have the same information for the production build that it would have 
gleaned from the debug build.

But that's not the same as dropping the assert, which is what you wanted.

On the other hand, isn't "assert" instead of "g_assert" exactly what you 
wanted?  Don't we define NDEBUG for production builds?


r~
Eric Blake March 3, 2017, 7:47 p.m. | #6
On 03/03/2017 01:35 PM, Richard Henderson wrote:
> 

> Which is exactly the point when you have a condition like (X > 0);

> letting the compiler have the same information for the production build

> that it would have gleaned from the debug build.

> 

> But that's not the same as dropping the assert, which is what you wanted.

> 

> On the other hand, isn't "assert" instead of "g_assert" exactly what you

> wanted?  Don't we define NDEBUG for production builds?


No - no one in their right mind defines NDEBUG for qemu.  For better or
worse, there are too many places where we liberally use assert(), and
where the code WILL crash and burn when it falls through to subsequent
code if a failed assert() is not equivalent to a fatal exit.  (I still
try to make sure we avoid any new side-effects in assert in my code
reviews, but there's no way you'll convince me to audit the code base
for NDEBUG-safety violations).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Eric Blake March 3, 2017, 7:48 p.m. | #7
On 03/03/2017 01:47 PM, Eric Blake wrote:
> On 03/03/2017 01:35 PM, Richard Henderson wrote:

>>

>> Which is exactly the point when you have a condition like (X > 0);

>> letting the compiler have the same information for the production build

>> that it would have gleaned from the debug build.

>>

>> But that's not the same as dropping the assert, which is what you wanted.

>>

>> On the other hand, isn't "assert" instead of "g_assert" exactly what you

>> wanted?  Don't we define NDEBUG for production builds?

> 

> No - no one in their right mind defines NDEBUG for qemu.  For better or

> worse, there are too many places where we liberally use assert(), and

> where the code WILL crash and burn when it falls through to subsequent

> code if a failed assert() is not equivalent to a fatal exit.  (I still

> try to make sure we avoid any new side-effects in assert in my code

> reviews, but there's no way you'll convince me to audit the code base

> for NDEBUG-safety violations).


A quick git grep shows, among others:

hw/scsi/mptsas.c:     * When we do, we might be able to re-enable NDEBUG
below.
hw/scsi/mptsas.c:#ifdef NDEBUG
hw/scsi/mptsas.c:#error building with NDEBUG is not supported
hw/virtio/virtio.c:     * When we do, we might be able to re-enable
NDEBUG below.
hw/virtio/virtio.c:#ifdef NDEBUG
hw/virtio/virtio.c:#error building with NDEBUG is not supported


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Patch

diff --git a/translate-all.c b/translate-all.c
index 9bac061c9b..7ee273410d 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1928,7 +1928,7 @@  void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
 
 void cpu_interrupt(CPUState *cpu, int mask)
 {
-    g_assert(qemu_mutex_iothread_locked());
+    tcg_debug_assert(qemu_mutex_iothread_locked());
     cpu->interrupt_request |= mask;
     cpu->tcg_exit_req = 1;
 }
diff --git a/translate-common.c b/translate-common.c
index d504dd0d33..24e05c077a 100644
--- a/translate-common.c
+++ b/translate-common.c
@@ -22,6 +22,7 @@ 
 #include "qom/cpu.h"
 #include "sysemu/cpus.h"
 #include "qemu/main-loop.h"
+#include "tcg.h"
 
 uintptr_t qemu_real_host_page_size;
 intptr_t qemu_real_host_page_mask;
@@ -31,7 +32,7 @@  intptr_t qemu_real_host_page_mask;
 static void tcg_handle_interrupt(CPUState *cpu, int mask)
 {
     int old_mask;
-    g_assert(qemu_mutex_iothread_locked());
+    tcg_debug_assert(qemu_mutex_iothread_locked());
 
     old_mask = cpu->interrupt_request;
     cpu->interrupt_request |= mask;