Message ID | 20170302195337.31558-5-alex.bennee@linaro.org |
---|---|
State | New |
Headers | show |
Series | MTTCG fixups for 2.9 | expand |
On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote: > While on MTTCG hosts it is very important that updates to > cpu->interrupt_request are protected by the BQL not all guests have > been converted to the correct locking yet. As a result we are seeing > breaking on non-MTTCG enabled guests in production builds. > > The locking in the guests needs to be fixed but while running single > threaded they will continue to work. By moving the asserts to > tcg_debug_asserts() they will still be useful during conversion > work (much like the existing assert_memory_lock/assert_tb_lock > asserts). > > Signed-off-by: Alex Bennée <alex.bennee@linaro.org> > --- > translate-all.c | 2 +- > translate-common.c | 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/translate-all.c b/translate-all.c > index 9bac061c9b..7ee273410d 100644 > --- a/translate-all.c > +++ b/translate-all.c > @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf) > > void cpu_interrupt(CPUState *cpu, int mask) > { > - g_assert(qemu_mutex_iothread_locked()); > + tcg_debug_assert(qemu_mutex_iothread_locked()); If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert() turns into "if (!(X)) { __builtin_unreachable(); }", which means that instead of asserting we now run straight into compiler undefined behaviour, don't we? If what we want is "don't actually check this condition in the non-tcg-debug config" then we should do something that means we don't actually check the condition... thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote: >> While on MTTCG hosts it is very important that updates to >> cpu->interrupt_request are protected by the BQL not all guests have >> been converted to the correct locking yet. As a result we are seeing >> breaking on non-MTTCG enabled guests in production builds. >> >> The locking in the guests needs to be fixed but while running single >> threaded they will continue to work. By moving the asserts to >> tcg_debug_asserts() they will still be useful during conversion >> work (much like the existing assert_memory_lock/assert_tb_lock >> asserts). >> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> --- >> translate-all.c | 2 +- >> translate-common.c | 3 ++- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/translate-all.c b/translate-all.c >> index 9bac061c9b..7ee273410d 100644 >> --- a/translate-all.c >> +++ b/translate-all.c >> @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf) >> >> void cpu_interrupt(CPUState *cpu, int mask) >> { >> - g_assert(qemu_mutex_iothread_locked()); >> + tcg_debug_assert(qemu_mutex_iothread_locked()); > > If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert() > turns into "if (!(X)) { __builtin_unreachable(); }", which > means that instead of asserting we now run straight > into compiler undefined behaviour, don't we? According to the commit that added it (c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to the compiler. Reading the GCC notes however seems to contradict that. FWIW I did test it in both builds and we do used tese for a bunch of other asserts and they haven't blown up. > If what we want is "don't actually check this condition in > the non-tcg-debug config" then we should do something > that means we don't actually check the condition... Hmm: 28 intptr_t qemu_real_host_page_mask; 29 30 #ifndef CONFIG_USER_ONLY 31 /* mask must never be zero, except for A20 change call */ 32 static void tcg_handle_interrupt(CPUState *cpu, int mask) 33 { 34 int old_mask; 35 tcg_debug_assert(qemu_mutex_iothread_locked()); 36 37 old_mask = cpu->interrupt_request; Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code. Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>. Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code. Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>. 0x24db0a <tcg_handle_interrupt+10>: callq 0x27a570 <qemu_mutex_iothread_locked> 0x24db0f <tcg_handle_interrupt+15>: mov 0xa8(%rbx),%ebp 0x24db15 <tcg_handle_interrupt+21>: mov %r12d,%eax 0x24db18 <tcg_handle_interrupt+24>: mov %rbx,%rdi 0x24db1b <tcg_handle_interrupt+27>: or %ebp,%eax 0x24db1d <tcg_handle_interrupt+29>: mov %eax,0xa8(%rbx) 0x24db23 <tcg_handle_interrupt+35>: callq 0x27a530 <qemu_cpu_is_self> It certainly looks as though it makes the call but ignores the result? > > thanks > -- PMM -- Alex Bennée
On 3 March 2017 at 11:05, Alex Bennée <alex.bennee@linaro.org> wrote: > According to the commit that added it > (c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to > the compiler. Reading the GCC notes however seems to contradict that. > > FWIW I did test it in both builds and we do used tese for a bunch of > other asserts and they haven't blown up. > >> If what we want is "don't actually check this condition in >> the non-tcg-debug config" then we should do something >> that means we don't actually check the condition... > > Hmm: > > 28 intptr_t qemu_real_host_page_mask; > 29 > 30 #ifndef CONFIG_USER_ONLY > 31 /* mask must never be zero, except for A20 change call */ > 32 static void tcg_handle_interrupt(CPUState *cpu, int mask) > 33 { > 34 int old_mask; > 35 tcg_debug_assert(qemu_mutex_iothread_locked()); > 36 > 37 old_mask = cpu->interrupt_request; > Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code. > Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>. > Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code. > Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>. > 0x24db0a <tcg_handle_interrupt+10>: callq 0x27a570 <qemu_mutex_iothread_locked> > 0x24db0f <tcg_handle_interrupt+15>: mov 0xa8(%rbx),%ebp > 0x24db15 <tcg_handle_interrupt+21>: mov %r12d,%eax > 0x24db18 <tcg_handle_interrupt+24>: mov %rbx,%rdi > 0x24db1b <tcg_handle_interrupt+27>: or %ebp,%eax > 0x24db1d <tcg_handle_interrupt+29>: mov %eax,0xa8(%rbx) > 0x24db23 <tcg_handle_interrupt+35>: callq 0x27a530 <qemu_cpu_is_self> > > It certainly looks as though it makes the call but ignores the result? That is one permitted implementation of the undefined behaviour, certainly. Not the only one, though. In particular you're telling the compiler's optimization passes "this can never be reached" which can result in the optimizer making significantly different decisions (especially if it manages to inline things or it can look "inside" the condition being asserted here, etc etc). thanks -- PMM
On 03/03/2017 11:08, Peter Maydell wrote: > On 2 March 2017 at 19:53, Alex Bennée <alex.bennee@linaro.org> wrote: >> While on MTTCG hosts it is very important that updates to >> cpu->interrupt_request are protected by the BQL not all guests have >> been converted to the correct locking yet. As a result we are seeing >> breaking on non-MTTCG enabled guests in production builds. >> >> The locking in the guests needs to be fixed but while running single >> threaded they will continue to work. By moving the asserts to >> tcg_debug_asserts() they will still be useful during conversion >> work (much like the existing assert_memory_lock/assert_tb_lock >> asserts). >> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> >> --- >> translate-all.c | 2 +- >> translate-common.c | 3 ++- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/translate-all.c b/translate-all.c >> index 9bac061c9b..7ee273410d 100644 >> --- a/translate-all.c >> +++ b/translate-all.c >> @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf) >> >> void cpu_interrupt(CPUState *cpu, int mask) >> { >> - g_assert(qemu_mutex_iothread_locked()); >> + tcg_debug_assert(qemu_mutex_iothread_locked()); > > If CONFIG_DEBUG_TCG isn't enabled then tcg_debug_assert() > turns into "if (!(X)) { __builtin_unreachable(); }", which > means that instead of asserting we now run straight > into compiler undefined behaviour, don't we? > If what we want is "don't actually check this condition in > the non-tcg-debug config" then we should do something > that means we don't actually check the condition... I think we should fix it for good and leave this as a hard assertion. Paolo
On 03/03/2017 10:19 PM, Peter Maydell wrote: > On 3 March 2017 at 11:05, Alex Bennée <alex.bennee@linaro.org> wrote: >> According to the commit that added it >> (c552d6c038f7cf4058d1fd5987118ffd41e0e050) it is meant to be a hint to >> the compiler. Reading the GCC notes however seems to contradict that. >> >> FWIW I did test it in both builds and we do used tese for a bunch of >> other asserts and they haven't blown up. >> >>> If what we want is "don't actually check this condition in >>> the non-tcg-debug config" then we should do something >>> that means we don't actually check the condition... >> >> Hmm: >> >> 28 intptr_t qemu_real_host_page_mask; >> 29 >> 30 #ifndef CONFIG_USER_ONLY >> 31 /* mask must never be zero, except for A20 change call */ >> 32 static void tcg_handle_interrupt(CPUState *cpu, int mask) >> 33 { >> 34 int old_mask; >> 35 tcg_debug_assert(qemu_mutex_iothread_locked()); >> 36 >> 37 old_mask = cpu->interrupt_request; >> Line 34 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0a <tcg_handle_interrupt+10> but contains no code. >> Line 35 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0a <tcg_handle_interrupt+10> and ends at 0x24db0f <tcg_handle_interrupt+15>. >> Line 36 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" is at address 0x24db0f <tcg_handle_interrupt+15> but contains no code. >> Line 37 of "/home/alex/lsrc/qemu/qemu.git/translate-common.c" starts at address 0x24db0f <tcg_handle_interrupt+15> and ends at 0x24db15 <tcg_handle_interrupt+21>. >> 0x24db0a <tcg_handle_interrupt+10>: callq 0x27a570 <qemu_mutex_iothread_locked> >> 0x24db0f <tcg_handle_interrupt+15>: mov 0xa8(%rbx),%ebp >> 0x24db15 <tcg_handle_interrupt+21>: mov %r12d,%eax >> 0x24db18 <tcg_handle_interrupt+24>: mov %rbx,%rdi >> 0x24db1b <tcg_handle_interrupt+27>: or %ebp,%eax >> 0x24db1d <tcg_handle_interrupt+29>: mov %eax,0xa8(%rbx) >> 0x24db23 <tcg_handle_interrupt+35>: callq 0x27a530 <qemu_cpu_is_self> >> >> It certainly looks as though it makes the call but ignores the result? > > That is one permitted implementation of the undefined behaviour, > certainly. Not the only one, though. In particular you're telling > the compiler's optimization passes "this can never be reached" > which can result in the optimizer making significantly different > decisions (especially if it manages to inline things or it can > look "inside" the condition being asserted here, etc etc). Yep, Peter's right. Which is exactly the point when you have a condition like (X > 0); letting the compiler have the same information for the production build that it would have gleaned from the debug build. But that's not the same as dropping the assert, which is what you wanted. On the other hand, isn't "assert" instead of "g_assert" exactly what you wanted? Don't we define NDEBUG for production builds? r~
On 03/03/2017 01:35 PM, Richard Henderson wrote: > > Which is exactly the point when you have a condition like (X > 0); > letting the compiler have the same information for the production build > that it would have gleaned from the debug build. > > But that's not the same as dropping the assert, which is what you wanted. > > On the other hand, isn't "assert" instead of "g_assert" exactly what you > wanted? Don't we define NDEBUG for production builds? No - no one in their right mind defines NDEBUG for qemu. For better or worse, there are too many places where we liberally use assert(), and where the code WILL crash and burn when it falls through to subsequent code if a failed assert() is not equivalent to a fatal exit. (I still try to make sure we avoid any new side-effects in assert in my code reviews, but there's no way you'll convince me to audit the code base for NDEBUG-safety violations). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
On 03/03/2017 01:47 PM, Eric Blake wrote: > On 03/03/2017 01:35 PM, Richard Henderson wrote: >> >> Which is exactly the point when you have a condition like (X > 0); >> letting the compiler have the same information for the production build >> that it would have gleaned from the debug build. >> >> But that's not the same as dropping the assert, which is what you wanted. >> >> On the other hand, isn't "assert" instead of "g_assert" exactly what you >> wanted? Don't we define NDEBUG for production builds? > > No - no one in their right mind defines NDEBUG for qemu. For better or > worse, there are too many places where we liberally use assert(), and > where the code WILL crash and burn when it falls through to subsequent > code if a failed assert() is not equivalent to a fatal exit. (I still > try to make sure we avoid any new side-effects in assert in my code > reviews, but there's no way you'll convince me to audit the code base > for NDEBUG-safety violations). A quick git grep shows, among others: hw/scsi/mptsas.c: * When we do, we might be able to re-enable NDEBUG below. hw/scsi/mptsas.c:#ifdef NDEBUG hw/scsi/mptsas.c:#error building with NDEBUG is not supported hw/virtio/virtio.c: * When we do, we might be able to re-enable NDEBUG below. hw/virtio/virtio.c:#ifdef NDEBUG hw/virtio/virtio.c:#error building with NDEBUG is not supported -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
diff --git a/translate-all.c b/translate-all.c index 9bac061c9b..7ee273410d 100644 --- a/translate-all.c +++ b/translate-all.c @@ -1928,7 +1928,7 @@ void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf) void cpu_interrupt(CPUState *cpu, int mask) { - g_assert(qemu_mutex_iothread_locked()); + tcg_debug_assert(qemu_mutex_iothread_locked()); cpu->interrupt_request |= mask; cpu->tcg_exit_req = 1; } diff --git a/translate-common.c b/translate-common.c index d504dd0d33..24e05c077a 100644 --- a/translate-common.c +++ b/translate-common.c @@ -22,6 +22,7 @@ #include "qom/cpu.h" #include "sysemu/cpus.h" #include "qemu/main-loop.h" +#include "tcg.h" uintptr_t qemu_real_host_page_size; intptr_t qemu_real_host_page_mask; @@ -31,7 +32,7 @@ intptr_t qemu_real_host_page_mask; static void tcg_handle_interrupt(CPUState *cpu, int mask) { int old_mask; - g_assert(qemu_mutex_iothread_locked()); + tcg_debug_assert(qemu_mutex_iothread_locked()); old_mask = cpu->interrupt_request; cpu->interrupt_request |= mask;
While on MTTCG hosts it is very important that updates to cpu->interrupt_request are protected by the BQL not all guests have been converted to the correct locking yet. As a result we are seeing breaking on non-MTTCG enabled guests in production builds. The locking in the guests needs to be fixed but while running single threaded they will continue to work. By moving the asserts to tcg_debug_asserts() they will still be useful during conversion work (much like the existing assert_memory_lock/assert_tb_lock asserts). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- translate-all.c | 2 +- translate-common.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) -- 2.11.0