diff mbox series

[PULL,18/23] accel/tcg: re-factor non-RAM execution code

Message ID 20210218094706.23038-19-alex.bennee@linaro.org
State Accepted
Commit 873d64ac30e64a5d0d91fca7320f4bcba19bf230
Headers show
Series plugin updates (hwprofile, CF_NOCACHE, io_recompile) | expand

Commit Message

Alex Bennée Feb. 18, 2021, 9:47 a.m. UTC
There is no real need to use CF_NOCACHE here. As long as the TB isn't
linked to other TBs or included in the QHT or jump cache then it will
only get executed once.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>

-- 
2.20.1

Comments

Peter Maydell April 15, 2021, 1:18 p.m. UTC | #1
On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:
>

> There is no real need to use CF_NOCACHE here. As long as the TB isn't

> linked to other TBs or included in the QHT or jump cache then it will

> only get executed once.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>


Hi; I've just noticed that this commit seems to break the case of:
 * execution of code not from a RAM block
 * when icount is enabled
 * and an instruction is an IO insn that triggers io-recompile

because:

> @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>          tb_reset_jump(tb, 1);

>      }

>

> +    /*

> +     * If the TB is not associated with a physical RAM page then

> +     * it must be a temporary one-insn TB, and we have nothing to do

> +     * except fill in the page_addr[] fields. Return early before

> +     * attempting to link to other TBs or add to the lookup table.

> +     */

> +    if (phys_pc == -1) {

> +        tb->page_addr[0] = tb->page_addr[1] = -1;

> +        return tb;

> +    }


we used to fall through here, which meant we called
tcg_tb_insert(tb). No we no longer do. That's bad, because
cpu_io_recompile() does:

    tb = tcg_tb_lookup(retaddr);
    if (!tb) {
        cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
                  (void *)retaddr);
    }

and since it can no longer find the TB, QEMU aborts.

thanks
-- PMM
Peter Maydell April 15, 2021, 1:37 p.m. UTC | #2
On Thu, 15 Apr 2021 at 14:18, Peter Maydell <peter.maydell@linaro.org> wrote:
>

> On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:

> >

> > There is no real need to use CF_NOCACHE here. As long as the TB isn't

> > linked to other TBs or included in the QHT or jump cache then it will

> > only get executed once.

> >

> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> > Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>

>

> Hi; I've just noticed that this commit seems to break the case of:

>  * execution of code not from a RAM block

>  * when icount is enabled

>  * and an instruction is an IO insn that triggers io-recompile

>

> because:

>

> > @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

> >          tb_reset_jump(tb, 1);

> >      }

> >

> > +    /*

> > +     * If the TB is not associated with a physical RAM page then

> > +     * it must be a temporary one-insn TB, and we have nothing to do

> > +     * except fill in the page_addr[] fields. Return early before

> > +     * attempting to link to other TBs or add to the lookup table.

> > +     */

> > +    if (phys_pc == -1) {

> > +        tb->page_addr[0] = tb->page_addr[1] = -1;

> > +        return tb;

> > +    }

>

> we used to fall through here, which meant we called

> tcg_tb_insert(tb). No we no longer do. That's bad, because

> cpu_io_recompile() does:

>

>     tb = tcg_tb_lookup(retaddr);

>     if (!tb) {

>         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",

>                   (void *)retaddr);

>     }

>

> and since it can no longer find the TB, QEMU aborts.


Adding the tcg_tb_insert() call to the early exit path:

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ba6ab09790e..6014285e4dc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -2081,6 +2081,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
      */
     if (phys_pc == -1) {
         tb->page_addr[0] = tb->page_addr[1] = -1;
+        tcg_tb_insert(tb);
         return tb;
     }

seems to fix my test case, but I don't know enough about the new
design here to know if that has undesirable side effects.

-- PMM
Alex Bennée April 15, 2021, 2:31 p.m. UTC | #3
--8<---------------cut here---------------start------------->8---

Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 15 Apr 2021 at 14:18, Peter Maydell <peter.maydell@linaro.org> wrote:

>>

>> On Thu, 18 Feb 2021 at 09:47, Alex Bennée <alex.bennee@linaro.org> wrote:

>> >

>> > There is no real need to use CF_NOCACHE here. As long as the TB isn't

>> > linked to other TBs or included in the QHT or jump cache then it will

>> > only get executed once.

>> >

>> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

>> > Message-Id: <20210213130325.14781-19-alex.bennee@linaro.org>

>>

>> Hi; I've just noticed that this commit seems to break the case of:

>>  * execution of code not from a RAM block

>>  * when icount is enabled

>>  * and an instruction is an IO insn that triggers io-recompile

>>

>> because:

>>

>> > @@ -2097,6 +2086,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>> >          tb_reset_jump(tb, 1);

>> >      }

>> >

>> > +    /*

>> > +     * If the TB is not associated with a physical RAM page then

>> > +     * it must be a temporary one-insn TB, and we have nothing to do

>> > +     * except fill in the page_addr[] fields. Return early before

>> > +     * attempting to link to other TBs or add to the lookup table.

>> > +     */

>> > +    if (phys_pc == -1) {

>> > +        tb->page_addr[0] = tb->page_addr[1] = -1;

>> > +        return tb;

>> > +    }

>>

>> we used to fall through here, which meant we called

>> tcg_tb_insert(tb). No we no longer do. That's bad, because

>> cpu_io_recompile() does:

>>

>>     tb = tcg_tb_lookup(retaddr);

>>     if (!tb) {

>>         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",

>>                   (void *)retaddr);

>>     }

>>

>> and since it can no longer find the TB, QEMU aborts.

>

> Adding the tcg_tb_insert() call to the early exit path:

>

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index ba6ab09790e..6014285e4dc 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -2081,6 +2081,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>       */

>      if (phys_pc == -1) {

>          tb->page_addr[0] = tb->page_addr[1] = -1;

> +        tcg_tb_insert(tb);

>          return tb;

>      }

>

> seems to fix my test case, but I don't know enough about the new

> design here to know if that has undesirable side effects.


No we don't want to do that as the comment says above. However as it's a
single instruction block it can do IO so could you try this instead
please:

--8<---------------cut here---------------start------------->8---
accel/tcg: avoid re-translating one-shot instructions

By definition a single instruction is capable of being an IO
instruction. This avoids a problem of triggering a cpu_io_recompile on
a non-cached translation which would only do exactly this anyway.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>


1 file changed, 1 insertion(+), 1 deletion(-)
accel/tcg/translate-all.c | 2 +-

modified   accel/tcg/translate-all.c
@@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
     if (phys_pc == -1) {
         /* Generate a one-shot TB with 1 insn in it */
-        cflags = (cflags & ~CF_COUNT_MASK) | 1;
+        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;
     }
 
     max_insns = cflags & CF_COUNT_MASK;
--8<---------------cut here---------------end--------------->8---


-- 
Alex Bennée
Peter Maydell April 15, 2021, 2:54 p.m. UTC | #4
On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:
> --8<---------------cut here---------------start------------->8---

> accel/tcg: avoid re-translating one-shot instructions

>

> By definition a single instruction is capable of being an IO

> instruction. This avoids a problem of triggering a cpu_io_recompile on

> a non-cached translation which would only do exactly this anyway.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>

> 1 file changed, 1 insertion(+), 1 deletion(-)

> accel/tcg/translate-all.c | 2 +-

>

> modified   accel/tcg/translate-all.c

> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>

>      if (phys_pc == -1) {

>          /* Generate a one-shot TB with 1 insn in it */

> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>      }

>

>      max_insns = cflags & CF_COUNT_MASK;

> --8<---------------cut here---------------end--------------->8---


Yes, this fixes the problem. Do we want to put this in for 6.0? My
feeling is that executing from non-RAM is pretty niche, so maybe
if we need an rc4 anyway, but this isn't important enough to cause an
rc4 itself.

-- PMM
Philippe Mathieu-Daudé April 15, 2021, 3:55 p.m. UTC | #5
On 4/15/21 4:54 PM, Peter Maydell wrote:
> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

>> --8<---------------cut here---------------start------------->8---

>> accel/tcg: avoid re-translating one-shot instructions

>>

>> By definition a single instruction is capable of being an IO

>> instruction. This avoids a problem of triggering a cpu_io_recompile on

>> a non-cached translation which would only do exactly this anyway.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>

>> 1 file changed, 1 insertion(+), 1 deletion(-)

>> accel/tcg/translate-all.c | 2 +-

>>

>> modified   accel/tcg/translate-all.c

>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>>

>>      if (phys_pc == -1) {

>>          /* Generate a one-shot TB with 1 insn in it */

>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>>      }

>>

>>      max_insns = cflags & CF_COUNT_MASK;

>> --8<---------------cut here---------------end--------------->8---

> 

> Yes, this fixes the problem. Do we want to put this in for 6.0? My

> feeling is that executing from non-RAM is pretty niche, so maybe

> if we need an rc4 anyway, but this isn't important enough to cause an

> rc4 itself.


Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).
Cédric Le Goater April 15, 2021, 5:18 p.m. UTC | #6
On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:
> On 4/15/21 4:54 PM, Peter Maydell wrote:

>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

>>> --8<---------------cut here---------------start------------->8---

>>> accel/tcg: avoid re-translating one-shot instructions

>>>

>>> By definition a single instruction is capable of being an IO

>>> instruction. This avoids a problem of triggering a cpu_io_recompile on

>>> a non-cached translation which would only do exactly this anyway.

>>>

>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>>

>>> 1 file changed, 1 insertion(+), 1 deletion(-)

>>> accel/tcg/translate-all.c | 2 +-

>>>

>>> modified   accel/tcg/translate-all.c

>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>>>

>>>      if (phys_pc == -1) {

>>>          /* Generate a one-shot TB with 1 insn in it */

>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>>>      }

>>>

>>>      max_insns = cflags & CF_COUNT_MASK;

>>> --8<---------------cut here---------------end--------------->8---

>>

>> Yes, this fixes the problem. Do we want to put this in for 6.0? My

>> feeling is that executing from non-RAM is pretty niche, so maybe

>> if we need an rc4 anyway, but this isn't important enough to cause an

>> rc4 itself.

> 

> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).


You need to set the 'execute-in-place' machine option to load/execute the
instructions from the AHB window of CE0. It's not on by default because 
boot can be really slow with some recent u-boot which heavily trash the TBs.

But this seems to work fine with -rc3. 

C.
Peter Maydell April 15, 2021, 5:34 p.m. UTC | #7
On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:
>

> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:

> > On 4/15/21 4:54 PM, Peter Maydell wrote:

> >> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

> >>> --8<---------------cut here---------------start------------->8---

> >>> accel/tcg: avoid re-translating one-shot instructions

> >>>

> >>> By definition a single instruction is capable of being an IO

> >>> instruction. This avoids a problem of triggering a cpu_io_recompile on

> >>> a non-cached translation which would only do exactly this anyway.

> >>>

> >>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> >>>

> >>> 1 file changed, 1 insertion(+), 1 deletion(-)

> >>> accel/tcg/translate-all.c | 2 +-

> >>>

> >>> modified   accel/tcg/translate-all.c

> >>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

> >>>

> >>>      if (phys_pc == -1) {

> >>>          /* Generate a one-shot TB with 1 insn in it */

> >>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

> >>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

> >>>      }

> >>>

> >>>      max_insns = cflags & CF_COUNT_MASK;

> >>> --8<---------------cut here---------------end--------------->8---

> >>

> >> Yes, this fixes the problem. Do we want to put this in for 6.0? My

> >> feeling is that executing from non-RAM is pretty niche, so maybe

> >> if we need an rc4 anyway, but this isn't important enough to cause an

> >> rc4 itself.

> >

> > Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).

>

> You need to set the 'execute-in-place' machine option to load/execute the

> instructions from the AHB window of CE0. It's not on by default because

> boot can be really slow with some recent u-boot which heavily trash the TBs.

>

> But this seems to work fine with -rc3.


Triggering the bug requires both execute-in-place and -icount -- did
you test with -icount enabled?

thanks
-- PMM
Cédric Le Goater April 16, 2021, 7:55 a.m. UTC | #8
On 4/15/21 7:34 PM, Peter Maydell wrote:
> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:

>>

>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:

>>> On 4/15/21 4:54 PM, Peter Maydell wrote:

>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

>>>>> --8<---------------cut here---------------start------------->8---

>>>>> accel/tcg: avoid re-translating one-shot instructions

>>>>>

>>>>> By definition a single instruction is capable of being an IO

>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on

>>>>> a non-cached translation which would only do exactly this anyway.

>>>>>

>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>>>>

>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)

>>>>> accel/tcg/translate-all.c | 2 +-

>>>>>

>>>>> modified   accel/tcg/translate-all.c

>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>>>>>

>>>>>      if (phys_pc == -1) {

>>>>>          /* Generate a one-shot TB with 1 insn in it */

>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>>>>>      }

>>>>>

>>>>>      max_insns = cflags & CF_COUNT_MASK;

>>>>> --8<---------------cut here---------------end--------------->8---

>>>>

>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My

>>>> feeling is that executing from non-RAM is pretty niche, so maybe

>>>> if we need an rc4 anyway, but this isn't important enough to cause an

>>>> rc4 itself.

>>>

>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).

>>

>> You need to set the 'execute-in-place' machine option to load/execute the

>> instructions from the AHB window of CE0. It's not on by default because

>> boot can be really slow with some recent u-boot which heavily trash the TBs.

>>

>> But this seems to work fine with -rc3.

> 

> Triggering the bug requires both execute-in-place and -icount -- did

> you test with -icount enabled?


It crashes.

Thanks,

C. 

$ qemu-system-arm -M romulus-bmc,execute-in-place=true -icount auto -drive file=./flash-romulus,format=raw,if=mtd  -serial mon:stdio
qemu: fatal: cpu_io_recompile: could not find TB for pc=0x7efbcc001992
R00=0005107a R01=00000000 R02=00000000 R03=00000000
R04=00000350 R05=00000000 R06=00000000 R07=00000000
R08=00000000 R09=00000000 R10=00000000 R11=00000000
R12=00000000 R13=00000000 R14=00000350 R15=00000c70
PSR=400001d3 -Z-- A S svc32
s00=00000000 s01=00000000 d00=0000000000000000
s02=00000000 s03=00000000 d01=0000000000000000
s04=00000000 s05=00000000 d02=0000000000000000
s06=00000000 s07=00000000 d03=0000000000000000
s08=00000000 s09=00000000 d04=0000000000000000
s10=00000000 s11=00000000 d05=0000000000000000
s12=00000000 s13=00000000 d06=0000000000000000
s14=00000000 s15=00000000 d07=0000000000000000
s16=00000000 s17=00000000 d08=0000000000000000
s18=00000000 s19=00000000 d09=0000000000000000
s20=00000000 s21=00000000 d10=0000000000000000
s22=00000000 s23=00000000 d11=0000000000000000
s24=00000000 s25=00000000 d12=0000000000000000
s26=00000000 s27=00000000 d13=0000000000000000
s28=00000000 s29=00000000 d14=0000000000000000
s30=00000000 s31=00000000 d15=0000000000000000
FPSCR: 00000000
Aborted (core dumped)
Alex Bennée April 16, 2021, 9:14 a.m. UTC | #9
Cédric Le Goater <clg@kaod.org> writes:

> On 4/15/21 7:34 PM, Peter Maydell wrote:

>> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:

>>>

>>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:

>>>> On 4/15/21 4:54 PM, Peter Maydell wrote:

>>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

>>>>>> --8<---------------cut here---------------start------------->8---

>>>>>> accel/tcg: avoid re-translating one-shot instructions

>>>>>>

>>>>>> By definition a single instruction is capable of being an IO

>>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on

>>>>>> a non-cached translation which would only do exactly this anyway.

>>>>>>

>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>>>>>

>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>> accel/tcg/translate-all.c | 2 +-

>>>>>>

>>>>>> modified   accel/tcg/translate-all.c

>>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>>>>>>

>>>>>>      if (phys_pc == -1) {

>>>>>>          /* Generate a one-shot TB with 1 insn in it */

>>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

>>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>>>>>>      }

>>>>>>

>>>>>>      max_insns = cflags & CF_COUNT_MASK;

>>>>>> --8<---------------cut here---------------end--------------->8---

>>>>>

>>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My

>>>>> feeling is that executing from non-RAM is pretty niche, so maybe

>>>>> if we need an rc4 anyway, but this isn't important enough to cause an

>>>>> rc4 itself.

>>>>

>>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).

>>>

>>> You need to set the 'execute-in-place' machine option to load/execute the

>>> instructions from the AHB window of CE0. It's not on by default because

>>> boot can be really slow with some recent u-boot which heavily trash the TBs.

>>>

>>> But this seems to work fine with -rc3.

>> 

>> Triggering the bug requires both execute-in-place and -icount -- did

>> you test with -icount enabled?

>

> It crashes.



Without the above patch? I've re-posted as a proper patch here:

  Subject: [RFC PATCH] accel/tcg: avoid re-translating one-shot instructions
  Date: Thu, 15 Apr 2021 17:24:53 +0100
  Message-Id: <20210415162454.22056-1-alex.bennee@linaro.org>

-- 
Alex Bennée
Cédric Le Goater April 16, 2021, 10:14 a.m. UTC | #10
On 4/16/21 11:14 AM, Alex Bennée wrote:
> 

> Cédric Le Goater <clg@kaod.org> writes:

> 

>> On 4/15/21 7:34 PM, Peter Maydell wrote:

>>> On Thu, 15 Apr 2021 at 18:18, Cédric Le Goater <clg@kaod.org> wrote:

>>>>

>>>> On 4/15/21 5:55 PM, Philippe Mathieu-Daudé wrote:

>>>>> On 4/15/21 4:54 PM, Peter Maydell wrote:

>>>>>> On Thu, 15 Apr 2021 at 15:32, Alex Bennée <alex.bennee@linaro.org> wrote:

>>>>>>> --8<---------------cut here---------------start------------->8---

>>>>>>> accel/tcg: avoid re-translating one-shot instructions

>>>>>>>

>>>>>>> By definition a single instruction is capable of being an IO

>>>>>>> instruction. This avoids a problem of triggering a cpu_io_recompile on

>>>>>>> a non-cached translation which would only do exactly this anyway.

>>>>>>>

>>>>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>>>>>>>

>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)

>>>>>>> accel/tcg/translate-all.c | 2 +-

>>>>>>>

>>>>>>> modified   accel/tcg/translate-all.c

>>>>>>> @@ -1863,7 +1863,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

>>>>>>>

>>>>>>>      if (phys_pc == -1) {

>>>>>>>          /* Generate a one-shot TB with 1 insn in it */

>>>>>>> -        cflags = (cflags & ~CF_COUNT_MASK) | 1;

>>>>>>> +        cflags = (cflags & ~CF_COUNT_MASK) | CF_LAST_IO | 1;

>>>>>>>      }

>>>>>>>

>>>>>>>      max_insns = cflags & CF_COUNT_MASK;

>>>>>>> --8<---------------cut here---------------end--------------->8---

>>>>>>

>>>>>> Yes, this fixes the problem. Do we want to put this in for 6.0? My

>>>>>> feeling is that executing from non-RAM is pretty niche, so maybe

>>>>>> if we need an rc4 anyway, but this isn't important enough to cause an

>>>>>> rc4 itself.

>>>>>

>>>>> Isn't it the default for Aspeed machines (with U-Boot)? (Cc'ing Cédric).

>>>>

>>>> You need to set the 'execute-in-place' machine option to load/execute the

>>>> instructions from the AHB window of CE0. It's not on by default because

>>>> boot can be really slow with some recent u-boot which heavily trash the TBs.

>>>>

>>>> But this seems to work fine with -rc3.

>>>

>>> Triggering the bug requires both execute-in-place and -icount -- did

>>> you test with -icount enabled?

>>

>> It crashes.

> 

> 

> Without the above patch? I've re-posted as a proper patch here:

> 

>   Subject: [RFC PATCH] accel/tcg: avoid re-translating one-shot instructions

>   Date: Thu, 15 Apr 2021 17:24:53 +0100

>   Message-Id: <20210415162454.22056-1-alex.bennee@linaro.org>

> 



This patch does not fix the crash for the aspeed machines.

C.
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c0b98e76b9..72b3c663c5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1779,7 +1779,8 @@  static inline void tb_page_add(PageDesc *p, TranslationBlock *tb,
 #endif
 }
 
-/* add a new TB and link it to the physical page tables. phys_page2 is
+/*
+ * Add a new TB and link it to the physical page tables. phys_page2 is
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
@@ -1798,17 +1799,6 @@  tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 
     assert_memory_lock();
 
-    if (phys_pc == -1) {
-        /*
-         * If the TB is not associated with a physical RAM page then
-         * it must be a temporary one-insn TB, and we have nothing to do
-         * except fill in the page_addr[] fields.
-         */
-        assert(tb->cflags & CF_NOCACHE);
-        tb->page_addr[0] = tb->page_addr[1] = -1;
-        return tb;
-    }
-
     /*
      * Add the TB to the page list, acquiring first the pages's locks.
      * We keep the locks held until after inserting the TB in the hash table,
@@ -1881,9 +1871,8 @@  TranslationBlock *tb_gen_code(CPUState *cpu,
     phys_pc = get_page_addr_code(env, pc);
 
     if (phys_pc == -1) {
-        /* Generate a temporary TB with 1 insn in it */
-        cflags &= ~CF_COUNT_MASK;
-        cflags |= CF_NOCACHE | 1;
+        /* Generate a one-shot TB with 1 insn in it */
+        cflags = (cflags & ~CF_COUNT_MASK) | 1;
     }
 
     cflags &= ~CF_CLUSTER_MASK;
@@ -2097,6 +2086,17 @@  TranslationBlock *tb_gen_code(CPUState *cpu,
         tb_reset_jump(tb, 1);
     }
 
+    /*
+     * If the TB is not associated with a physical RAM page then
+     * it must be a temporary one-insn TB, and we have nothing to do
+     * except fill in the page_addr[] fields. Return early before
+     * attempting to link to other TBs or add to the lookup table.
+     */
+    if (phys_pc == -1) {
+        tb->page_addr[0] = tb->page_addr[1] = -1;
+        return tb;
+    }
+
     /* check next page if needed */
     virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
     phys_page2 = -1;