[v2] tcg: Really fix cpu_io_recompile

Message ID	20180319031545.29359-1-richard.henderson@linaro.org
State	Superseded
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Mon, 19 Mar 2018 11:15:45 +0800 Message-Id: <20180319031545.29359-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile Precedence: list Cc: peter.maydell@linaro.org, Pavel.Dovgaluk@ispras.ru, pbonzini@redhat.com Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	[v2] tcg: Really fix cpu_io_recompile \| expand [v2] tcg: Really fix cpu_io_recompile

Message ID

20180319031545.29359-1-richard.henderson@linaro.org

State

Superseded

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	2001:4830:134:3::11 as permitted sender)
	client-ip=2001:4830:134:3::11; 
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Mon, 19 Mar 2018 11:15:45 +0800
Message-Id: <20180319031545.29359-1-richard.henderson@linaro.org>
Subject: [Qemu-devel] [PATCH v2] tcg: Really fix cpu_io_recompile
Precedence: list
Cc: peter.maydell@linaro.org, Pavel.Dovgaluk@ispras.ru, pbonzini@redhat.com
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

[v2] tcg: Really fix cpu_io_recompile | expand

Commit Message

Richard Henderson March 19, 2018, 3:15 a.m. UTC

We have confused the number of instructions that have been
executed in the TB with the number of instructions needed
to repeat the I/O instruction.

We have used cpu_restore_state_from_tb, which means that
the guest pc is pointing to the I/O instruction.  The only
time the answer to the later question is not 1 is when
MIPS or SH4 need to re-execute the branch for the delay
slot as well.

We must rely on cpu->cflags_next_tb to generate the next TB,
as otherwise we have a race condition with other guest cpus
within the TB cache.

Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

---

My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.


r~

---
 accel/tcg/translate-all.c | 37 ++++++++++---------------------------
 1 file changed, 10 insertions(+), 27 deletions(-)

-- 
2.14.3

Comments

Pavel Dovgalyuk March 19, 2018, 6:30 a.m. UTC | #1

> From: Richard Henderson [mailto:richard.henderson@linaro.org]

> We have confused the number of instructions that have been

> executed in the TB with the number of instructions needed

> to repeat the I/O instruction.

> 

> We have used cpu_restore_state_from_tb, which means that

> the guest pc is pointing to the I/O instruction.  The only

> time the answer to the later question is not 1 is when

> MIPS or SH4 need to re-execute the branch for the delay

> slot as well.

> 

> We must rely on cpu->cflags_next_tb to generate the next TB,

> as otherwise we have a race condition with other guest cpus

> within the TB cache.

> 

> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

> 

> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.

> 


Works for Ciro's ARM sample and doesn't break icount and replay for i386.
Tested-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>



Pavel Dovgalyuk
> r~

> 

> ---

>  accel/tcg/translate-all.c | 37 ++++++++++---------------------------

>  1 file changed, 10 insertions(+), 27 deletions(-)

> 

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index 5ad1b919bc..d4190602d1 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>      CPUArchState *env = cpu->env_ptr;

>  #endif

>      TranslationBlock *tb;

> -    uint32_t n, flags;

> -    target_ulong pc, cs_base;

> +    uint32_t n;

> 

>      tb_lock();

>      tb = tb_find_pc(retaddr);

> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",

>                    (void *)retaddr);

>      }

> -    n = cpu->icount_decr.u16.low + tb->icount;

>      cpu_restore_state_from_tb(cpu, tb, retaddr);

> -    /* Calculate how many instructions had been executed before the fault

> -       occurred.  */

> -    n = n - cpu->icount_decr.u16.low;

> -    /* Generate a new TB ending on the I/O insn.  */

> -    n++;

> +

>      /* On MIPS and SH, delay slot instructions can only be restarted if

>         they were already the first instruction in the TB.  If this is not

>         the first instruction in a TB then re-execute the preceding

>         branch.  */

> +    n = 1;

>  #if defined(TARGET_MIPS)

> -    if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {

> +    if ((env->hflags & MIPS_HFLAG_BMASK) != 0

> +        && env->active_tc.PC != tb->pc) {

>          env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);

>          cpu->icount_decr.u16.low++;

>          env->hflags &= ~MIPS_HFLAG_BMASK;

> +        n = 2;

>      }

>  #elif defined(TARGET_SH4)

>      if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0

> -            && n > 1) {

> +        && env->pc != tb->pc) {

>          env->pc -= 2;

>          cpu->icount_decr.u16.low++;

>          env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);

> +        n = 2;

>      }

>  #endif

> -    /* This should never happen.  */

> -    if (n > CF_COUNT_MASK) {

> -        cpu_abort(cpu, "TB too big during recompile");

> -    }

> 

> -    pc = tb->pc;

> -    cs_base = tb->cs_base;

> -    flags = tb->flags;

> -    tb_phys_invalidate(tb, -1);

> -

> -    /* Execute one IO instruction without caching

> -       instead of creating large TB. */

> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;

> +    /* Generate a new TB executing the I/O insn.  */

> +    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;

> 

>      if (tb->cflags & CF_NOCACHE) {

>          if (tb->orig_tb) {

> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          tb_remove(tb);

>      }

> 

> -    /* Generate new TB instead of the current one. */

> -    /* FIXME: In theory this could raise an exception.  In practice

> -       we have already translated the block once so it's probably ok.  */

> -    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);

> -

>      /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not

>       * the first in the TB) then we end up generating a whole new TB and

>       *  repeating the fault, which is horribly inefficient.

> --

> 2.14.3

Paolo Bonzini March 19, 2018, 3:54 p.m. UTC | #2

On 19/03/2018 04:15, Richard Henderson wrote:
> We have confused the number of instructions that have been

> executed in the TB with the number of instructions needed

> to repeat the I/O instruction.

> 

> We have used cpu_restore_state_from_tb, which means that

> the guest pc is pointing to the I/O instruction.  The only

> time the answer to the later question is not 1 is when

> MIPS or SH4 need to re-execute the branch for the delay

> slot as well.

> 

> We must rely on cpu->cflags_next_tb to generate the next TB,

> as otherwise we have a race condition with other guest cpus

> within the TB cache.

> 

> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> ---

> 

> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.


Thanks, let me know if you prefer to send a pull request yourself, or if
I should include it in the next.

Thanks,

Paolo

> 

> 

> r~

> 

> ---

>  accel/tcg/translate-all.c | 37 ++++++++++---------------------------

>  1 file changed, 10 insertions(+), 27 deletions(-)

> 

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index 5ad1b919bc..d4190602d1 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>      CPUArchState *env = cpu->env_ptr;

>  #endif

>      TranslationBlock *tb;

> -    uint32_t n, flags;

> -    target_ulong pc, cs_base;

> +    uint32_t n;

>  

>      tb_lock();

>      tb = tb_find_pc(retaddr);

> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",

>                    (void *)retaddr);

>      }

> -    n = cpu->icount_decr.u16.low + tb->icount;

>      cpu_restore_state_from_tb(cpu, tb, retaddr);

> -    /* Calculate how many instructions had been executed before the fault

> -       occurred.  */

> -    n = n - cpu->icount_decr.u16.low;

> -    /* Generate a new TB ending on the I/O insn.  */

> -    n++;

> +

>      /* On MIPS and SH, delay slot instructions can only be restarted if

>         they were already the first instruction in the TB.  If this is not

>         the first instruction in a TB then re-execute the preceding

>         branch.  */

> +    n = 1;

>  #if defined(TARGET_MIPS)

> -    if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {

> +    if ((env->hflags & MIPS_HFLAG_BMASK) != 0

> +        && env->active_tc.PC != tb->pc) {

>          env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);

>          cpu->icount_decr.u16.low++;

>          env->hflags &= ~MIPS_HFLAG_BMASK;

> +        n = 2;

>      }

>  #elif defined(TARGET_SH4)

>      if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0

> -            && n > 1) {

> +        && env->pc != tb->pc) {

>          env->pc -= 2;

>          cpu->icount_decr.u16.low++;

>          env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);

> +        n = 2;

>      }

>  #endif

> -    /* This should never happen.  */

> -    if (n > CF_COUNT_MASK) {

> -        cpu_abort(cpu, "TB too big during recompile");

> -    }

>  

> -    pc = tb->pc;

> -    cs_base = tb->cs_base;

> -    flags = tb->flags;

> -    tb_phys_invalidate(tb, -1);

> -

> -    /* Execute one IO instruction without caching

> -       instead of creating large TB. */

> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;

> +    /* Generate a new TB executing the I/O insn.  */

> +    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;

>  

>      if (tb->cflags & CF_NOCACHE) {

>          if (tb->orig_tb) {

> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          tb_remove(tb);

>      }

>  

> -    /* Generate new TB instead of the current one. */

> -    /* FIXME: In theory this could raise an exception.  In practice

> -       we have already translated the block once so it's probably ok.  */

> -    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);

> -

>      /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not

>       * the first in the TB) then we end up generating a whole new TB and

>       *  repeating the fault, which is horribly inefficient.

>

Richard Henderson March 20, 2018, 12:39 a.m. UTC | #3

On 03/19/2018 11:54 PM, Paolo Bonzini wrote:
> On 19/03/2018 04:15, Richard Henderson wrote:

>> We have confused the number of instructions that have been

>> executed in the TB with the number of instructions needed

>> to repeat the I/O instruction.

>>

>> We have used cpu_restore_state_from_tb, which means that

>> the guest pc is pointing to the I/O instruction.  The only

>> time the answer to the later question is not 1 is when

>> MIPS or SH4 need to re-execute the branch for the delay

>> slot as well.

>>

>> We must rely on cpu->cflags_next_tb to generate the next TB,

>> as otherwise we have a race condition with other guest cpus

>> within the TB cache.

>>

>> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>> ---

>>

>> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.

> 

> Thanks, let me know if you prefer to send a pull request yourself, or if

> I should include it in the next.


I'm at Linaro Connect this week.  Please include this in your next.


r~

Philippe Mathieu-Daudé March 20, 2018, 12:52 a.m. UTC | #4

On 03/19/2018 04:15 AM, Richard Henderson wrote:
> We have confused the number of instructions that have been

> executed in the TB with the number of instructions needed

> to repeat the I/O instruction.

> 

> We have used cpu_restore_state_from_tb, which means that

> the guest pc is pointing to the I/O instruction.  The only

> time the answer to the later question is not 1 is when

> MIPS or SH4 need to re-execute the branch for the delay

> slot as well.

> 

> We must rely on cpu->cflags_next_tb to generate the next TB,

> as otherwise we have a race condition with other guest cpus

> within the TB cache.

> 

> Fixes: 0790f86861079b1932679d0f011e431aaf4ee9e2

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>


> ---

> 

> My v1 raced with Paolo's pull request, so v2 now fixes Pavel's fix.

> 

> 

> r~

> 

> ---

>  accel/tcg/translate-all.c | 37 ++++++++++---------------------------

>  1 file changed, 10 insertions(+), 27 deletions(-)

> 

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index 5ad1b919bc..d4190602d1 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -1728,8 +1728,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>      CPUArchState *env = cpu->env_ptr;

>  #endif

>      TranslationBlock *tb;

> -    uint32_t n, flags;

> -    target_ulong pc, cs_base;

> +    uint32_t n;

>  

>      tb_lock();

>      tb = tb_find_pc(retaddr);

> @@ -1737,44 +1736,33 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",

>                    (void *)retaddr);

>      }

> -    n = cpu->icount_decr.u16.low + tb->icount;

>      cpu_restore_state_from_tb(cpu, tb, retaddr);

> -    /* Calculate how many instructions had been executed before the fault

> -       occurred.  */

> -    n = n - cpu->icount_decr.u16.low;

> -    /* Generate a new TB ending on the I/O insn.  */

> -    n++;

> +

>      /* On MIPS and SH, delay slot instructions can only be restarted if

>         they were already the first instruction in the TB.  If this is not

>         the first instruction in a TB then re-execute the preceding

>         branch.  */

> +    n = 1;

>  #if defined(TARGET_MIPS)

> -    if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {

> +    if ((env->hflags & MIPS_HFLAG_BMASK) != 0

> +        && env->active_tc.PC != tb->pc) {

>          env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);

>          cpu->icount_decr.u16.low++;

>          env->hflags &= ~MIPS_HFLAG_BMASK;

> +        n = 2;

>      }

>  #elif defined(TARGET_SH4)

>      if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0

> -            && n > 1) {

> +        && env->pc != tb->pc) {

>          env->pc -= 2;

>          cpu->icount_decr.u16.low++;

>          env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);

> +        n = 2;

>      }

>  #endif

> -    /* This should never happen.  */

> -    if (n > CF_COUNT_MASK) {

> -        cpu_abort(cpu, "TB too big during recompile");

> -    }

>  

> -    pc = tb->pc;

> -    cs_base = tb->cs_base;

> -    flags = tb->flags;

> -    tb_phys_invalidate(tb, -1);

> -

> -    /* Execute one IO instruction without caching

> -       instead of creating large TB. */

> -    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;

> +    /* Generate a new TB executing the I/O insn.  */

> +    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;

>  

>      if (tb->cflags & CF_NOCACHE) {

>          if (tb->orig_tb) {

> @@ -1785,11 +1773,6 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)

>          tb_remove(tb);

>      }

>  

> -    /* Generate new TB instead of the current one. */

> -    /* FIXME: In theory this could raise an exception.  In practice

> -       we have already translated the block once so it's probably ok.  */

> -    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);

> -

>      /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not

>       * the first in the TB) then we end up generating a whole new TB and

>       *  repeating the fault, which is horribly inefficient.

>

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5ad1b919bc..d4190602d1 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1728,8 +1728,7 @@  void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     CPUArchState *env = cpu->env_ptr;
 #endif
     TranslationBlock *tb;
-    uint32_t n, flags;
-    target_ulong pc, cs_base;
+    uint32_t n;
 
     tb_lock();
     tb = tb_find_pc(retaddr);
@@ -1737,44 +1736,33 @@  void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
                   (void *)retaddr);
     }
-    n = cpu->icount_decr.u16.low + tb->icount;
     cpu_restore_state_from_tb(cpu, tb, retaddr);
-    /* Calculate how many instructions had been executed before the fault
-       occurred.  */
-    n = n - cpu->icount_decr.u16.low;
-    /* Generate a new TB ending on the I/O insn.  */
-    n++;
+
     /* On MIPS and SH, delay slot instructions can only be restarted if
        they were already the first instruction in the TB.  If this is not
        the first instruction in a TB then re-execute the preceding
        branch.  */
+    n = 1;
 #if defined(TARGET_MIPS)
-    if ((env->hflags & MIPS_HFLAG_BMASK) != 0 && n > 1) {
+    if ((env->hflags & MIPS_HFLAG_BMASK) != 0
+        && env->active_tc.PC != tb->pc) {
         env->active_tc.PC -= (env->hflags & MIPS_HFLAG_B16 ? 2 : 4);
         cpu->icount_decr.u16.low++;
         env->hflags &= ~MIPS_HFLAG_BMASK;
+        n = 2;
     }
 #elif defined(TARGET_SH4)
     if ((env->flags & ((DELAY_SLOT | DELAY_SLOT_CONDITIONAL))) != 0
-            && n > 1) {
+        && env->pc != tb->pc) {
         env->pc -= 2;
         cpu->icount_decr.u16.low++;
         env->flags &= ~(DELAY_SLOT | DELAY_SLOT_CONDITIONAL);
+        n = 2;
     }
 #endif
-    /* This should never happen.  */
-    if (n > CF_COUNT_MASK) {
-        cpu_abort(cpu, "TB too big during recompile");
-    }
 
-    pc = tb->pc;
-    cs_base = tb->cs_base;
-    flags = tb->flags;
-    tb_phys_invalidate(tb, -1);
-
-    /* Execute one IO instruction without caching
-       instead of creating large TB. */
-    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | CF_NOCACHE | 1;
+    /* Generate a new TB executing the I/O insn.  */
+    cpu->cflags_next_tb = curr_cflags() | CF_LAST_IO | n;
 
     if (tb->cflags & CF_NOCACHE) {
         if (tb->orig_tb) {
@@ -1785,11 +1773,6 @@  void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
         tb_remove(tb);
     }
 
-    /* Generate new TB instead of the current one. */
-    /* FIXME: In theory this could raise an exception.  In practice
-       we have already translated the block once so it's probably ok.  */
-    tb_gen_code(cpu, pc, cs_base, flags, curr_cflags() | CF_LAST_IO | n);
-
     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
      * the first in the TB) then we end up generating a whole new TB and
      *  repeating the fault, which is horribly inefficient.

[v2] tcg: Really fix cpu_io_recompile

Commit Message

Comments

Patch