mbox series

[v2,00/12] target/arm: Implement FEAT_MOPS

Message ID 20230912140434.1333369-1-peter.maydell@linaro.org
Headers show
Series target/arm: Implement FEAT_MOPS | expand

Message

Peter Maydell Sept. 12, 2023, 2:04 p.m. UTC
This patchset implements the Arm FEAT_MOPS architectural feature,
which is a set of instructions to implement memory copy and set
operations. The new instructions come in sets of three:
 * SETP, SETM, SETE -- memory set
 * SETGP, SETGM, SETME -- memory set with MTE tag setting
 * CPYP, CPYM, CPYE -- memory copy
In each case the copy or set is divided between the "prologue",
"main" and "epilogue" instructions in an implementation-defined
way; in guest code they are expected to always appear in order.

Based-on: 20230911135340.1139553-1-peter.maydell@linaro.org
("target/arm: hwcaps updates, FEAT_HBC")
so we can set the MOPS hwcap in the last patch.

Changes v1->v2:
 * one patch already upstream
 * patches 7, 9, 11 updated to have separate helper functions
   for SET vs SETG and CPY vs CPYF
 * use cpu_st16_mmu() for SETG memory set
 * fix CPYFP saturation limit
 * CPYFM and CPYFE now correctly always copy forwards
 * patch 12 now sets the MOPS hwcap bit

Patches still needing review: 11, 12

There are two things in this patchset that are not currently ideal:

 (1) the MTE tag checking is correct, but not optimal for
 performance, because it reuses the existing checkN() function,
 which was designed to work on small memory areas and so
 prefers to read tag memory a byte at a time rather than in
 larger chunks that then need masking. I have opted to leave
 this as a TODO comment in the code for future improvement
 rather than try to address it in the initial submission.

 (2) we use the same approach the s390 memcopy instruction
 does, of checking for interrupts periodically so that a
 memcopy of 2GB doesn't stall the whole system. This doesn't
 work for icount mode, because there interrupts are all timed
 to number of instructions executed and the memcopy is still
 only a single insn regardless of how long it takes. I've
 not tried to tackle this because I'm not totally sure of
 what the right thing is, and also because it's a preexisting
 problem with the s390 equivalent insn anyway...

I think it's OK for this to go as-is, and we can think about
those problems later, but am open to other opinions on that.

thanks
-- PMM

Peter Maydell (12):
  target/arm: Don't skip MTE checks for LDRT/STRT at EL0
  target/arm: Implement FEAT_MOPS enable bits
  target/arm: Pass unpriv bool to get_a64_user_mem_index()
  target/arm: Define syndrome function for MOPS exceptions
  target/arm: New function allocation_tag_mem_probe()
  target/arm: Implement MTE tag-checking functions for FEAT_MOPS
  target/arm: Implement the SET* instructions
  target/arm: Define new TB flag for ATA0
  target/arm: Implement the SETG* instructions
  target/arm: Implement MTE tag-checking functions for FEAT_MOPS copies
  target/arm: Implement the CPY* instructions
  target/arm: Enable FEAT_MOPS for CPU 'max'

 docs/system/arm/emulation.rst  |   1 +
 target/arm/cpu.h               |   7 +
 target/arm/internals.h         |  55 +++
 target/arm/syndrome.h          |  12 +
 target/arm/tcg/helper-a64.h    |  14 +
 target/arm/tcg/translate.h     |   4 +-
 target/arm/tcg/a64.decode      |  35 ++
 linux-user/elfload.c           |   1 +
 target/arm/helper.c            |  28 +-
 target/arm/tcg/cpu64.c         |   1 +
 target/arm/tcg/helper-a64.c    | 878 +++++++++++++++++++++++++++++++++
 target/arm/tcg/hflags.c        |  21 +
 target/arm/tcg/mte_helper.c    | 241 ++++++++-
 target/arm/tcg/translate-a64.c | 160 +++++-
 14 files changed, 1419 insertions(+), 39 deletions(-)