mbox series

[v6,00/20] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Message ID 20191011155546.14342-1-richard.henderson@linaro.org
Headers show
Series target/arm: Reduce overhead of cpu_get_tb_cpu_state | expand

Message

Richard Henderson Oct. 11, 2019, 3:55 p.m. UTC
Changes since v5:
  * Fix the debug assertion ifdef in the final patch.
  * Add more calls to arm_rebuild_hflags: CPSR and M-profile
    These become two new patches, 18 & 19.
  * Update some comments per review. (Alex)

Changes since v4:
  * Split patch 1 into 15 smaller patches.
  * Cache the new DEBUG_TARGET_EL field.
  * Split out m-profile hflags separately from a-profile 32-bit.
  * Move around non-cached tb flags as well, avoiding repetitive
    checks for m-profile or other mutually exclusive conditions.

  I haven't officially re-run the performance test quoted in the
  last patch, but I have eyeballed "perf top", and have dug into
  the compiled code a bit, which resulted in a few of the new
  cleanup patches (e.g. cs_base, arm_mmu_idx_el, and
  arm_cpu_data_is_big_endian).

Changes since v3:
  * Rebase.
  * Do not cache XSCALE_CPAR now that it overlaps VECSTRIDE.
  * Leave the new v7m bits as uncached.  I haven't figured
    out all of the ways fpccr is modified.

Changes since v2:
  * Do not cache VECLEN, VECSTRIDE, VFPEN.
    These variables come from VFP_FPSCR and VFP_FPEXC, not from
    system control registers.
  * Move HANDLER and STACKCHECK to rebuild_hflags_a32,
    instead of building them in rebuild_hflags_common.

Changes since v1:
  * Apparently I had started a last-minute API change, and failed to
    covert all of the users, and also failed to re-test afterward.
  * Retain assertions for --enable-debug-tcg.


Richard Henderson (20):
  target/arm: Split out rebuild_hflags_common
  target/arm: Split out rebuild_hflags_a64
  target/arm: Split out rebuild_hflags_common_32
  target/arm: Split arm_cpu_data_is_big_endian
  target/arm: Split out rebuild_hflags_m32
  target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state
  target/arm: Split out rebuild_hflags_a32
  target/arm: Split out rebuild_hflags_aprofile
  target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in
    cpu_get_tb_cpu_state
  target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state
  target/arm: Hoist computation of TBFLAG_A32.VFPEN
  target/arm: Add arm_rebuild_hflags
  target/arm: Split out arm_mmu_idx_el
  target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state
  target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})
  target/arm: Rebuild hflags at EL changes
  target/arm: Rebuild hflags at MSR writes
  target/arm: Rebuild hflags at CPSR writes
  target/arm: Rebuild hflags for M-profile.
  target/arm: Rely on hflags correct in cpu_get_tb_cpu_state

 target/arm/cpu.h           |  84 +++++---
 target/arm/helper.h        |   4 +
 target/arm/internals.h     |   9 +
 linux-user/syscall.c       |   1 +
 target/arm/cpu.c           |   1 +
 target/arm/helper-a64.c    |   3 +
 target/arm/helper.c        | 383 ++++++++++++++++++++++++-------------
 target/arm/m_helper.c      |   6 +
 target/arm/machine.c       |   1 +
 target/arm/op_helper.c     |   4 +
 target/arm/translate-a64.c |  13 +-
 target/arm/translate.c     |  28 ++-
 12 files changed, 363 insertions(+), 174 deletions(-)

-- 
2.17.1

Comments

Peter Maydell Oct. 17, 2019, 3:26 p.m. UTC | #1
On Fri, 11 Oct 2019 at 16:55, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Richard Henderson (20):

>   target/arm: Split out rebuild_hflags_common

>   target/arm: Split out rebuild_hflags_a64

>   target/arm: Split out rebuild_hflags_common_32

>   target/arm: Split arm_cpu_data_is_big_endian

>   target/arm: Split out rebuild_hflags_m32

>   target/arm: Reduce tests vs M-profile in cpu_get_tb_cpu_state

>   target/arm: Split out rebuild_hflags_a32

>   target/arm: Split out rebuild_hflags_aprofile

>   target/arm: Hoist XSCALE_CPAR, VECLEN, VECSTRIDE in

>     cpu_get_tb_cpu_state

>   target/arm: Simplify set of PSTATE_SS in cpu_get_tb_cpu_state

>   target/arm: Hoist computation of TBFLAG_A32.VFPEN

>   target/arm: Add arm_rebuild_hflags

>   target/arm: Split out arm_mmu_idx_el

>   target/arm: Hoist store to cs_base in cpu_get_tb_cpu_state

>   target/arm: Add HELPER(rebuild_hflags_{a32,a64,m32})

>   target/arm: Rebuild hflags at EL changes

>   target/arm: Rebuild hflags at MSR writes

>   target/arm: Rebuild hflags at CPSR writes

>   target/arm: Rebuild hflags for M-profile.

>   target/arm: Rely on hflags correct in cpu_get_tb_cpu_state


Don't we also need to do something to rebuild the hflags
for M-profile writes to the memory mapped system registers?
For instance rebuild_hflags_m32() bakes in state which
cares about env->v7m.ccr, which is set via nvic_writel(),
but I don't see anything whereby the write to the NVIC
register triggers a rebuild of the hflags value. Maybe I
missed it?

thanks
-- PMM
Richard Henderson Oct. 17, 2019, 4:25 p.m. UTC | #2
On 10/17/19 8:26 AM, Peter Maydell wrote:
> Don't we also need to do something to rebuild the hflags

> for M-profile writes to the memory mapped system registers?

> For instance rebuild_hflags_m32() bakes in state which

> cares about env->v7m.ccr, which is set via nvic_writel(),

> but I don't see anything whereby the write to the NVIC

> register triggers a rebuild of the hflags value. Maybe I

> missed it?


How do you end the TB after nvic_writel(), so that the new behaviour can be
noticed right now?  We can call arm_rebuild_hflags() within nvic_writel(), but
it still needs to be recognized somehow.  I would expect to place one rebuild
in the place you end the TB...


r~
Peter Maydell Oct. 17, 2019, 5:01 p.m. UTC | #3
On Thu, 17 Oct 2019 at 17:25, Richard Henderson
<richard.henderson@linaro.org> wrote:
>

> On 10/17/19 8:26 AM, Peter Maydell wrote:

> > Don't we also need to do something to rebuild the hflags

> > for M-profile writes to the memory mapped system registers?

> > For instance rebuild_hflags_m32() bakes in state which

> > cares about env->v7m.ccr, which is set via nvic_writel(),

> > but I don't see anything whereby the write to the NVIC

> > register triggers a rebuild of the hflags value. Maybe I

> > missed it?

>

> How do you end the TB after nvic_writel(), so that the new behaviour can be

> noticed right now?  We can call arm_rebuild_hflags() within nvic_writel(), but

> it still needs to be recognized somehow.  I would expect to place one rebuild

> in the place you end the TB...


To copy my reply from irc, just for the record:

we don't end the TB specifically after a write to the NVIC.
This is permitted by the architecture: rules R_BMLS and R_WQQB from
DDI05533B.h v8M Arm ARM:

# R_BMLS The side effects of any access to the SCS that performs
# a context-altering operation take effect when the access
# completes. A DSB instruction can be used to guarantee completion
# of a previous SCS access.
#
# R_WQQB A context synchronization event guarantees that the side
# effects of a previous SCS access are visible to all instructions
# in program order following the context synchronization event.

A context synchronization event is basically an ISB or an
exception entry/exit. Since we do end the TLB on an ISB
or exception entry/exit I think we're within the architectural
rules, but we do need to do a rebuild-hflags somewhere.
At the end of  nvic_sysreg_write() seems like the best place as
it means that QEMU is internally consistent and not likely to
trip over itself.

thanks
-- PMM