mbox series

[v8,00/22] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Message ID 20191018174431.1784-1-richard.henderson@linaro.org
Headers show
Series target/arm: Reduce overhead of cpu_get_tb_cpu_state | expand

Message

Richard Henderson Oct. 18, 2019, 5:44 p.m. UTC
Changes since v7:
  * Rebuild hflags for all successful nvic writes (Peter).
  * Rebuild hflags for Xscale sctlr writes (Peter).

Changes since v6:
  * Regen hflags in two more places for m-profile (patch 19).

Changes since v5:
  * Fix the debug assertion ifdef in the final patch.
  * Add more calls to arm_rebuild_hflags: CPSR and M-profile
    These become two new patches, 18 & 19.
  * Update some comments per review. (Alex)

Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).
...


r~


Richard Henderson (22):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes
  target/arm: Rebuild hflags at MSR writes
  target/arm: Rebuild hflags at CPSR writes
  target/arm: Rebuild hflags at Xscale SCTLR writes
  target/arm: Rebuild hflags for M-profile
  target/arm: Rebuild hflags for M-profile NVIC
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 hw/intc/armv7m_nvic.c      |  22 ++-
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 393 ++++++++++++++++++++++++-------------
 target/arm/m_helper.c      |   6 +
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   4 +
 target/arm/translate-a64.c |  13 +-
 target/arm/translate.c     |  33 +++-
 13 files changed, 390 insertions(+), 184 deletions(-)

-- 
2.17.1

Comments

Peter Maydell Oct. 22, 2019, 12:47 p.m. UTC | #1
On Fri, 18 Oct 2019 at 18:44, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> Changes since v7:

>   * Rebuild hflags for all successful nvic writes (Peter).

>   * Rebuild hflags for Xscale sctlr writes (Peter).

>

> Changes since v6:

>   * Regen hflags in two more places for m-profile (patch 19).

>

> Changes since v5:

>   * Fix the debug assertion ifdef in the final patch.

>   * Add more calls to arm_rebuild_hflags: CPSR and M-profile

>     These become two new patches, 18 & 19.

>   * Update some comments per review. (Alex)

>

> Changes since v4:

>   * Split patch 1 into 15 smaller patches.

>   * Cache the new DEBUG_TARGET_EL field.

>   * Split out m-profile hflags separately from a-profile 32-bit.

>   * Move around non-cached tb flags as well, avoiding repetitive

>     checks for m-profile or other mutually exclusive conditions.

>

>   I haven't officially re-run the performance test quoted in the

>   last patch, but I have eyeballed "perf top", and have dug into

>   the compiled code a bit, which resulted in a few of the new

>   cleanup patches (e.g. cs_base, arm_mmu_idx_el, and

>   arm_cpu_data_is_big_endian).

> ...

>

>

> r~




Applied to target-arm.next, thanks.

-- PMM
Peter Maydell Oct. 22, 2019, 3:38 p.m. UTC | #2
On Tue, 22 Oct 2019 at 13:47, Peter Maydell <peter.maydell@linaro.org> wrote:
>

> On Fri, 18 Oct 2019 at 18:44, Richard Henderson

> <richard.henderson@linaro.org> wrote:

> >

> > Changes since v7:

> >   * Rebuild hflags for all successful nvic writes (Peter).

> >   * Rebuild hflags for Xscale sctlr writes (Peter).

> >

> > Changes since v6:

> >   * Regen hflags in two more places for m-profile (patch 19).

> >

> > Changes since v5:

> >   * Fix the debug assertion ifdef in the final patch.

> >   * Add more calls to arm_rebuild_hflags: CPSR and M-profile

> >     These become two new patches, 18 & 19.

> >   * Update some comments per review. (Alex)

> >

> > Changes since v4:

> >   * Split patch 1 into 15 smaller patches.

> >   * Cache the new DEBUG_TARGET_EL field.

> >   * Split out m-profile hflags separately from a-profile 32-bit.

> >   * Move around non-cached tb flags as well, avoiding repetitive

> >     checks for m-profile or other mutually exclusive conditions.

> >

> >   I haven't officially re-run the performance test quoted in the

> >   last patch, but I have eyeballed "perf top", and have dug into

> >   the compiled code a bit, which resulted in a few of the new

> >   cleanup patches (e.g. cs_base, arm_mmu_idx_el, and

> >   arm_cpu_data_is_big_endian).

> > ...

> >

> >

> > r~

>

>

>

> Applied to target-arm.next, thanks.


Turns out this asserts in qemu-armeb :-(

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb
-L ./gnemul/qemu-armeb armeb/ls -l dummyfile
qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:
cpu_get_tb_cpu_state: Assertion `flags ==
rebuild_hflags_internal(env)' failed.
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
Segmentation fault (core dumped)

Dropping this series again for the moment.

thanks
-- PMM
Richard Henderson Oct. 23, 2019, 2:49 p.m. UTC | #3
On 10/22/19 11:38 AM, Peter Maydell wrote:
> Turns out this asserts in qemu-armeb :-(

> 

> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb

> -L ./gnemul/qemu-armeb armeb/ls -l dummyfile

> qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:

> cpu_get_tb_cpu_state: Assertion `flags ==

> rebuild_hflags_internal(env)' failed.

> qemu: uncaught target signal 11 (Segmentation fault) - core dumped

> Segmentation fault (core dumped)

> 

> Dropping this series again for the moment.


Argh!  I had forgotten that we have no testing of armeb in check-tcg.

Yes, I see now that we need a recompute in linux-user/{aarch64,arm}/cpu_loop.c
specific to TARGET_WORDS_BIGENDIAN.


r~
Alex Bennée Oct. 23, 2019, 3:17 p.m. UTC | #4
Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/22/19 11:38 AM, Peter Maydell wrote:

>> Turns out this asserts in qemu-armeb :-(

>>

>> /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/armeb-linux-user/qemu-armeb

>> -L ./gnemul/qemu-armeb armeb/ls -l dummyfile

>> qemu-armeb: /home/petmay01/linaro/qemu-for-merges/target/arm/helper.c:11267:

>> cpu_get_tb_cpu_state: Assertion `flags ==

>> rebuild_hflags_internal(env)' failed.

>> qemu: uncaught target signal 11 (Segmentation fault) - core dumped

>> Segmentation fault (core dumped)

>>

>> Dropping this series again for the moment.

> Argh!  I had forgotten that we have no testing of armeb in check-tcg.


Does it need it's own toolchain or can it be done with flags?

>

> Yes, I see now that we need a recompute in linux-user/{aarch64,arm}/cpu_loop.c

> specific to TARGET_WORDS_BIGENDIAN.

>

>

> r~



--
Alex Bennée
Richard Henderson Oct. 23, 2019, 4:13 p.m. UTC | #5
On 10/23/19 11:17 AM, Alex Bennée wrote:
>>> Dropping this series again for the moment.

>> Argh!  I had forgotten that we have no testing of armeb in check-tcg.

> 

> Does it need it's own toolchain or can it be done with flags?


I think the compiler only needs flags, so we might be able to gin something up
for {aarch64_eb,armeb}-linux-user from __start.


r~
Alex Bennée Oct. 23, 2019, 6:06 p.m. UTC | #6
For system emulation? Sure. I think linux-user is harder because you need
the be crt and libs packaged.

On Wed, 23 Oct 2019, 17:13 Richard Henderson, <richard.henderson@linaro.org>
wrote:

> On 10/23/19 11:17 AM, Alex Bennée wrote:

> >>> Dropping this series again for the moment.

> >> Argh!  I had forgotten that we have no testing of armeb in check-tcg.

> >

> > Does it need it's own toolchain or can it be done with flags?

>

> I think the compiler only needs flags, so we might be able to gin

> something up

> for {aarch64_eb,armeb}-linux-user from __start.

>

>

> r~

>
<div dir="auto">For system emulation? Sure. I think linux-user is harder because you need the be crt and libs packaged.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 23 Oct 2019, 17:13 Richard Henderson, &lt;<a href="mailto:richard.henderson@linaro.org">richard.henderson@linaro.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/23/19 11:17 AM, Alex Bennée wrote:<br>
&gt;&gt;&gt; Dropping this series again for the moment.<br>
&gt;&gt; Argh!  I had forgotten that we have no testing of armeb in check-tcg.<br>
&gt; <br>
&gt; Does it need it&#39;s own toolchain or can it be done with flags?<br>
<br>
I think the compiler only needs flags, so we might be able to gin something up<br>
for {aarch64_eb,armeb}-linux-user from __start.<br>
<br>
<br>
r~<br>
</blockquote></div>