Message ID | 20180119084417.7100.69568.stgit@pasha-VirtualBox |
---|---|
State | New |
Headers | show |
Series | None | expand |
On 19/01/2018 09:44, Pavel Dovgalyuk wrote: > while (all_cpu_threads_idle()) { > + qemu_mutex_lock_iothread(); > stop_tcg_kick_timer(); > qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); > + qemu_mutex_unlock_iothread(); > } cpu_has_work cannot be called outside BQL yet. You first need to access cpu->interrupt_request with atomics. In general, testing the condition outside the mutex is a very dangerous pattern (and I'm usually the one who enjoys dangerous patterns). But also, taking a slightly wider look: > static void qemu_tcg_rr_wait_io_event(CPUState *cpu) > { > while (all_cpu_threads_idle()) { > + qemu_mutex_lock_iothread(); > stop_tcg_kick_timer(); > qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); > + qemu_mutex_unlock_iothread(); > } > > start_tcg_kick_timer(); > > qemu_wait_io_event_common(cpu); > - > - qemu_mutex_unlock_iothread(); > } > You are adding a qemu_mutex_lock_iothread to a function that wasn't there before. Either it was broken before, or it is now. Paolo
> From: Paolo Bonzini [mailto:pbonzini@redhat.com] > On 19/01/2018 09:44, Pavel Dovgalyuk wrote: > > while (all_cpu_threads_idle()) { > > + qemu_mutex_lock_iothread(); > > stop_tcg_kick_timer(); > > qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); > > + qemu_mutex_unlock_iothread(); > > } > > cpu_has_work cannot be called outside BQL yet. You first need to access > cpu->interrupt_request with atomics. > > In general, testing the condition outside the mutex is a very dangerous > pattern (and I'm usually the one who enjoys dangerous patterns). It means, that I'll have to fix all the has_work function to avoid races, because x86_cpu_has_work may have them? static bool x86_cpu_has_work(CPUState *cs) { X86CPU *cpu = X86_CPU(cs); CPUX86State *env = &cpu->env; return ((cs->interrupt_request & (CPU_INTERRUPT_HARD | CPU_INTERRUPT_POLL)) && (env->eflags & IF_MASK)) || (cs->interrupt_request & (CPU_INTERRUPT_NMI | CPU_INTERRUPT_INIT | CPU_INTERRUPT_SIPI | CPU_INTERRUPT_MCE)) || ((cs->interrupt_request & CPU_INTERRUPT_SMI) && !(env->hflags & HF_SMM_MASK)); } Pavel Dovgalyuk
On 19/01/2018 13:05, Pavel Dovgalyuk wrote: >> From: Paolo Bonzini [mailto:pbonzini@redhat.com] >> On 19/01/2018 09:44, Pavel Dovgalyuk wrote: >>> while (all_cpu_threads_idle()) { >>> + qemu_mutex_lock_iothread(); >>> stop_tcg_kick_timer(); >>> qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); >>> + qemu_mutex_unlock_iothread(); >>> } >> >> cpu_has_work cannot be called outside BQL yet. You first need to access >> cpu->interrupt_request with atomics. >> >> In general, testing the condition outside the mutex is a very dangerous >> pattern (and I'm usually the one who enjoys dangerous patterns). > > It means, that I'll have to fix all the has_work function to avoid races, > because x86_cpu_has_work may have them? Why only x86_cpu_has_work? Even reading cs->interrupt_request outside the mutex is unsafe. Paolo > static bool x86_cpu_has_work(CPUState *cs) > { > X86CPU *cpu = X86_CPU(cs); > CPUX86State *env = &cpu->env; > > return ((cs->interrupt_request & (CPU_INTERRUPT_HARD | > CPU_INTERRUPT_POLL)) && > (env->eflags & IF_MASK)) || > (cs->interrupt_request & (CPU_INTERRUPT_NMI | > CPU_INTERRUPT_INIT | > CPU_INTERRUPT_SIPI | > CPU_INTERRUPT_MCE)) || > ((cs->interrupt_request & CPU_INTERRUPT_SMI) && > !(env->hflags & HF_SMM_MASK)); > } > > Pavel Dovgalyuk >
> -----Original Message----- > From: Paolo Bonzini [mailto:pbonzini@redhat.com] > Sent: Friday, January 19, 2018 3:20 PM > To: Pavel Dovgalyuk; 'Pavel Dovgalyuk'; qemu-devel@nongnu.org > Cc: kwolf@redhat.com; peter.maydell@linaro.org; boost.lists@gmail.com; quintela@redhat.com; > jasowang@redhat.com; mst@redhat.com; zuban32s@gmail.com; maria.klimushenkova@ispras.ru; > kraxel@redhat.com; alex.bennee@linaro.org > Subject: Re: [RFC PATCH v4 13/23] cpus: only take BQL for sleeping threads > > On 19/01/2018 13:05, Pavel Dovgalyuk wrote: > >> From: Paolo Bonzini [mailto:pbonzini@redhat.com] > >> On 19/01/2018 09:44, Pavel Dovgalyuk wrote: > >>> while (all_cpu_threads_idle()) { > >>> + qemu_mutex_lock_iothread(); > >>> stop_tcg_kick_timer(); > >>> qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); > >>> + qemu_mutex_unlock_iothread(); > >>> } > >> > >> cpu_has_work cannot be called outside BQL yet. You first need to access > >> cpu->interrupt_request with atomics. > >> > >> In general, testing the condition outside the mutex is a very dangerous > >> pattern (and I'm usually the one who enjoys dangerous patterns). > > > > It means, that I'll have to fix all the has_work function to avoid races, > > because x86_cpu_has_work may have them? > > Why only x86_cpu_has_work? > > Even reading cs->interrupt_request outside the mutex is unsafe. All the vcpu function that access interrupt controller or peripheral state may be unsafe? How can it work safely then? Pavel Dovgalyuk
On 19/01/2018 13:25, Pavel Dovgalyuk wrote: >>> It means, that I'll have to fix all the has_work function to avoid races, >>> because x86_cpu_has_work may have them? >> Why only x86_cpu_has_work? >> >> Even reading cs->interrupt_request outside the mutex is unsafe. > All the vcpu function that access interrupt controller or peripheral state may be unsafe? > How can it work safely then? They do it inside the big QEMU lock. But here you're calling cpu_has_work (via all_cpu_threads_idle) outside the lock. Paolo
> From: Paolo Bonzini [mailto:pbonzini@redhat.com] > Sent: Friday, January 19, 2018 3:26 PM > To: Pavel Dovgalyuk; 'Pavel Dovgalyuk'; qemu-devel@nongnu.org > Cc: kwolf@redhat.com; peter.maydell@linaro.org; boost.lists@gmail.com; quintela@redhat.com; > jasowang@redhat.com; mst@redhat.com; zuban32s@gmail.com; maria.klimushenkova@ispras.ru; > kraxel@redhat.com; alex.bennee@linaro.org > Subject: Re: [RFC PATCH v4 13/23] cpus: only take BQL for sleeping threads > > On 19/01/2018 13:25, Pavel Dovgalyuk wrote: > >>> It means, that I'll have to fix all the has_work function to avoid races, > >>> because x86_cpu_has_work may have them? > >> Why only x86_cpu_has_work? > >> > >> Even reading cs->interrupt_request outside the mutex is unsafe. > > All the vcpu function that access interrupt controller or peripheral state may be unsafe? > > How can it work safely then? > > They do it inside the big QEMU lock. Right. Without these patches. They are within the replay lock. And BQL is not covering vcpu execution with these patches. Therefore RR will be ok and regular execution may encounter races? It means that I missed something in Alex ideas, because he prepared the initial patches. > But here you're calling cpu_has_work (via all_cpu_threads_idle) outside the lock. Yes, I see, but what we have to do? Pavel Dovgalyuk
Pavel Dovgalyuk > -----Original Message----- > From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] > Sent: Friday, January 19, 2018 3:37 PM > To: 'Paolo Bonzini'; 'Pavel Dovgalyuk'; qemu-devel@nongnu.org > Cc: kwolf@redhat.com; peter.maydell@linaro.org; boost.lists@gmail.com; quintela@redhat.com; > jasowang@redhat.com; mst@redhat.com; zuban32s@gmail.com; maria.klimushenkova@ispras.ru; > kraxel@redhat.com; alex.bennee@linaro.org > Subject: RE: [RFC PATCH v4 13/23] cpus: only take BQL for sleeping threads > > > From: Paolo Bonzini [mailto:pbonzini@redhat.com] > > Sent: Friday, January 19, 2018 3:26 PM > > To: Pavel Dovgalyuk; 'Pavel Dovgalyuk'; qemu-devel@nongnu.org > > Cc: kwolf@redhat.com; peter.maydell@linaro.org; boost.lists@gmail.com; quintela@redhat.com; > > jasowang@redhat.com; mst@redhat.com; zuban32s@gmail.com; maria.klimushenkova@ispras.ru; > > kraxel@redhat.com; alex.bennee@linaro.org > > Subject: Re: [RFC PATCH v4 13/23] cpus: only take BQL for sleeping threads > > > > On 19/01/2018 13:25, Pavel Dovgalyuk wrote: > > >>> It means, that I'll have to fix all the has_work function to avoid races, > > >>> because x86_cpu_has_work may have them? > > >> Why only x86_cpu_has_work? > > >> > > >> Even reading cs->interrupt_request outside the mutex is unsafe. > > > All the vcpu function that access interrupt controller or peripheral state may be unsafe? > > > How can it work safely then? > > > > They do it inside the big QEMU lock. > > Right. Without these patches. Ah, I forgot about unlocking in tcg_cpu_exec. Now I see, that vcpu is taking the lock when trying to access the peripherals. Therefore, I have to fix all *_cpu_has_work, right? > They are within the replay lock. And BQL is not covering vcpu execution with these patches. > Therefore RR will be ok and regular execution may encounter races? > It means that I missed something in Alex ideas, because he prepared the initial patches. > > > But here you're calling cpu_has_work (via all_cpu_threads_idle) outside the lock. > > Yes, I see, but what we have to do? Pavel Dovgalyuk
On 19/01/2018 13:36, Pavel Dovgalyuk wrote: >> From: Paolo Bonzini [mailto:pbonzini@redhat.com] >> Sent: Friday, January 19, 2018 3:26 PM >> To: Pavel Dovgalyuk; 'Pavel Dovgalyuk'; qemu-devel@nongnu.org >> Cc: kwolf@redhat.com; peter.maydell@linaro.org; boost.lists@gmail.com; quintela@redhat.com; >> jasowang@redhat.com; mst@redhat.com; zuban32s@gmail.com; maria.klimushenkova@ispras.ru; >> kraxel@redhat.com; alex.bennee@linaro.org >> Subject: Re: [RFC PATCH v4 13/23] cpus: only take BQL for sleeping threads >> >> On 19/01/2018 13:25, Pavel Dovgalyuk wrote: >>>>> It means, that I'll have to fix all the has_work function to avoid races, >>>>> because x86_cpu_has_work may have them? >>>> Why only x86_cpu_has_work? >>>> >>>> Even reading cs->interrupt_request outside the mutex is unsafe. >>> All the vcpu function that access interrupt controller or peripheral state may be unsafe? >>> How can it work safely then? >> >> They do it inside the big QEMU lock. > > Right. Without these patches. > They are within the replay lock. And BQL is not covering vcpu execution with these patches. > Therefore RR will be ok and regular execution may encounter races? > It means that I missed something in Alex ideas, because he prepared the initial patches. Yes. >> But here you're calling cpu_has_work (via all_cpu_threads_idle) outside the lock. > > Yes, I see, but what we have to do? I don't know. But the idiom in these patches, while(...) { lock() cond_wait() unlock() } is unsafe as well, so the issue is more than just cpu_has_work. Paolo
> From: Paolo Bonzini [mailto:pbonzini@redhat.com] > On 19/01/2018 13:36, Pavel Dovgalyuk wrote: > >> From: Paolo Bonzini [mailto:pbonzini@redhat.com] > >> On 19/01/2018 13:25, Pavel Dovgalyuk wrote: > >>>>> It means, that I'll have to fix all the has_work function to avoid races, > >>>>> because x86_cpu_has_work may have them? > >>>> Why only x86_cpu_has_work? > >>>> > >>>> Even reading cs->interrupt_request outside the mutex is unsafe. > >>> All the vcpu function that access interrupt controller or peripheral state may be unsafe? > >>> How can it work safely then? > >> > >> They do it inside the big QEMU lock. > > > > Right. Without these patches. > > They are within the replay lock. And BQL is not covering vcpu execution with these patches. > > Therefore RR will be ok and regular execution may encounter races? > > It means that I missed something in Alex ideas, because he prepared the initial patches. > > Yes. > > >> But here you're calling cpu_has_work (via all_cpu_threads_idle) outside the lock. > > > > Yes, I see, but what we have to do? > > I don't know. But the idiom in these patches, > > while(...) { > lock() > cond_wait() > unlock() > } > > is unsafe as well, so the issue is more than just cpu_has_work. Maybe it's better to omit this patch? It seems that replay and regular execution are ok without it. Pavel Dovgalyuk
diff --git a/cpus-common.c b/cpus-common.c index 59f751e..64661c3 100644 --- a/cpus-common.c +++ b/cpus-common.c @@ -310,6 +310,11 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, queue_work_on_cpu(cpu, wi); } +/* Work items run outside of the BQL. This is essential for avoiding a + * deadlock for exclusive work but also applies to non-exclusive work. + * If the work requires cross-vCPU changes then it should use the + * exclusive mechanism. + */ void process_queued_cpu_work(CPUState *cpu) { struct qemu_work_item *wi; @@ -327,17 +332,9 @@ void process_queued_cpu_work(CPUState *cpu) } qemu_mutex_unlock(&cpu->work_mutex); if (wi->exclusive) { - /* Running work items outside the BQL avoids the following deadlock: - * 1) start_exclusive() is called with the BQL taken while another - * CPU is running; 2) cpu_exec in the other CPU tries to takes the - * BQL, so it goes to sleep; start_exclusive() is sleeping too, so - * neither CPU can proceed. - */ - qemu_mutex_unlock_iothread(); start_exclusive(); wi->func(cpu, wi->data); end_exclusive(); - qemu_mutex_lock_iothread(); } else { wi->func(cpu, wi->data); } diff --git a/cpus.c b/cpus.c index ca86d9f..c841333 100644 --- a/cpus.c +++ b/cpus.c @@ -1143,23 +1143,24 @@ static void qemu_wait_io_event_common(CPUState *cpu) static void qemu_tcg_rr_wait_io_event(CPUState *cpu) { while (all_cpu_threads_idle()) { + qemu_mutex_lock_iothread(); stop_tcg_kick_timer(); qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); + qemu_mutex_unlock_iothread(); } start_tcg_kick_timer(); qemu_wait_io_event_common(cpu); - - qemu_mutex_unlock_iothread(); } static void qemu_wait_io_event(CPUState *cpu) { - qemu_mutex_lock_iothread(); while (cpu_thread_is_idle(cpu)) { + qemu_mutex_lock_iothread(); qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex); + qemu_mutex_unlock_iothread(); } #ifdef _WIN32