mbox series

[v3,0/5] Add support for AArch64 MOPS instructions

Message ID 20240510052408.2173579-1-thiago.bauermann@linaro.org
Headers show
Series Add support for AArch64 MOPS instructions | expand

Message

Thiago Jung Bauermann May 10, 2024, 5:24 a.m. UTC
Hello,

This version is to adapt to Luis' clarification that MOPS instructions
don't need to be treated as atomic sequences and can be single-stepped.
If the OS reschedules the inferior to a different CPU while a main or
epilogue instruction is executed, it will reset the sequence back to the
prologue instruction.

Therefore patch 1 is now much smaller and only disables displaced stepping
on MOPS instructions, since they do need to appear consecutively in
memory.  Luis suggested relocating the whole sequence as a block.  I will
implement that suggestion in the near future, but in the mean time I would
like to suggest this approach.

Patch 4 is the only other one that was changed. The
gdb.arch/aarch64-mops-atomic-inst.exp testcase was renamed to
gdb.arch/aarch64-mops-single-step.exp, and adjusted to expect the MOPS
sequence to reset back to the prologue instruction. Also, a small bug was
fixed in its corresponding C file (the bug didn't affect the effectiveness
of the test).

The other patches are unchanged from v2.

Here is the original cover letter for convenience:

This patch series implements GDB support for the new instructions in
AArch64's MOPS feature.  Patch 1 has a small overview.

What is needed from GDB is recognizing the MOPS sequences of instructions
as atomic so that they can be stepped over during instruction single
stepping, and also to avoid doing displaced stepping with them.  This is
done in patch 1.

Patch 2 adds support for the new instructions to the record an replay
target.

The other patches add testcases to test each of the aspects above, plus
one testcase to verify the interaction of the MOPS instructions with
watchpoints.

Tested on Ubuntu 23.10 aarch64-linux-gnu with no regressions, using the
Arm FVP emulator as well as QEMU v8.2.

Thiago Jung Bauermann (5):
  gdb/aarch64: Disable displaced single-step for MOPS instructions
  gdb/aarch64: Add record support for MOPS instructions.
  gdb/testsuite: Add gdb.arch/aarch64-mops-watchpoint.exp
  gdb/testsuite: Add gdb.arch/aarch64-mops-single-step.exp
  gdb/testsuite: Add gdb.reverse/aarch64-mops.exp

 gdb/aarch64-tdep.c                            |  92 +++++++++-
 .../gdb.arch/aarch64-mops-single-step.c       |  73 ++++++++
 .../gdb.arch/aarch64-mops-single-step.exp     | 132 ++++++++++++++
 .../gdb.arch/aarch64-mops-watchpoint.c        |  66 +++++++
 .../gdb.arch/aarch64-mops-watchpoint.exp      |  79 ++++++++
 gdb/testsuite/gdb.reverse/aarch64-mops.c      |  71 ++++++++
 gdb/testsuite/gdb.reverse/aarch64-mops.exp    | 171 ++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                     |  61 +++++++
 8 files changed, 742 insertions(+), 3 deletions(-)
 create mode 100644 gdb/testsuite/gdb.arch/aarch64-mops-single-step.c
 create mode 100644 gdb/testsuite/gdb.arch/aarch64-mops-single-step.exp
 create mode 100644 gdb/testsuite/gdb.arch/aarch64-mops-watchpoint.c
 create mode 100644 gdb/testsuite/gdb.arch/aarch64-mops-watchpoint.exp
 create mode 100644 gdb/testsuite/gdb.reverse/aarch64-mops.c
 create mode 100644 gdb/testsuite/gdb.reverse/aarch64-mops.exp


base-commit: 5021daf303393722f58f4422d7ad53d526aa2d50

Comments

Pedro Alves May 10, 2024, 2:16 p.m. UTC | #1
On 2024-05-10 06:24, Thiago Jung Bauermann wrote:
> Hello,
> 
> This version is to adapt to Luis' clarification that MOPS instructions
> don't need to be treated as atomic sequences and can be single-stepped.
> If the OS reschedules the inferior to a different CPU while a main or
> epilogue instruction is executed, it will reset the sequence back to the
> prologue instruction.

Curious -- if you single step each of the instructions, then there will
be kernel code executed on the CPU in between each of the instructions
in the sequence, and other userspace code (of other tasks too, like the
debugger itself, potentially).  So the kernel is free to context switch
in between the instructions in the sequence, and _only_ restarts the sequence
when the task is moved to another CPU?  Weird that it can context switch
without losing state on the same CPU but not to a different CPU.

But then again, I have no idea what the instructions themselves do.  :-)
Thiago Jung Bauermann May 21, 2024, 10:18 p.m. UTC | #2
Pedro Alves <pedro@palves.net> writes:

> On 2024-05-10 06:24, Thiago Jung Bauermann wrote:
>> Hello,
>> 
>> This version is to adapt to Luis' clarification that MOPS instructions
>> don't need to be treated as atomic sequences and can be single-stepped.
>> If the OS reschedules the inferior to a different CPU while a main or
>> epilogue instruction is executed, it will reset the sequence back to the
>> prologue instruction.
>
> Curious -- if you single step each of the instructions, then there will
> be kernel code executed on the CPU in between each of the instructions
> in the sequence, and other userspace code (of other tasks too, like the
> debugger itself, potentially).  So the kernel is free to context switch
> in between the instructions in the sequence, and _only_ restarts the sequence
> when the task is moved to another CPU?  Weird that it can context switch
> without losing state on the same CPU but not to a different CPU.

The kernel commits implementing this behaviour actually have a good
explanation on this:

$ git log --reverse -n2 8cd076a67dc8 
commit 8536ceaa747174ded7983f13906b225e0c33ac51
Author:     Kristina Martsenko <kristina.martsenko@arm.com>
AuthorDate: Tue May 9 15:22:31 2023 +0100
Commit:     Catalin Marinas <catalin.marinas@arm.com>
CommitDate: Mon Jun 5 17:05:41 2023 +0100

    arm64: mops: handle MOPS exceptions
    
    The memory copy/set instructions added as part of FEAT_MOPS can take an
    exception (e.g. page fault) part-way through their execution and resume
    execution afterwards.
    
    If however the task is re-scheduled and execution resumes on a different
    CPU, then the CPU may take a new type of exception to indicate this.
    This is because the architecture allows two options (Option A and Option
    B) to implement the instructions and a heterogeneous system can have
    different implementations between CPUs.
    
    In this case the OS has to reset the registers and restart execution
    from the prologue instruction. The algorithm for doing this is provided
    as part of the Arm ARM.
    
    Add an exception handler for the new exception and wire it up for
    userspace tasks.
    
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
    Link: https://lore.kernel.org/r/20230509142235.3284028-8-kristina.martsenko@arm.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit 8cd076a67dc8eac5d613b3258f656efa7a54412e
Author:     Kristina Martsenko <kristina.martsenko@arm.com>
AuthorDate: Tue May 9 15:22:32 2023 +0100
Commit:     Catalin Marinas <catalin.marinas@arm.com>
CommitDate: Mon Jun 5 17:05:41 2023 +0100

    arm64: mops: handle single stepping after MOPS exception
    
    When a MOPS main or epilogue instruction is being executed, the task may
    get scheduled on a different CPU and restart execution from the prologue
    instruction. If the main or epilogue instruction is being single stepped
    then it makes sense to finish the step and take the step exception
    before starting to execute the next (prologue) instruction. So
    fast-forward the single step state machine when taking a MOPS exception.
    
    This means that if a main or epilogue instruction is single stepped with
    ptrace, the debugger will sometimes observe the PC moving back to the
    prologue instruction. (As already mentioned, this should be rare as it
    only happens when the task is scheduled to another CPU during the step.)
    
    This also ensures that perf breakpoints count prologue instructions
    consistently (i.e. every time they are executed), rather than skipping
    them when there also happens to be a breakpoint on a main or epilogue
    instruction.
    
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
    Link: https://lore.kernel.org/r/20230509142235.3284028-9-kristina.martsenko@arm.com
    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


> But then again, I have no idea what the instructions themselves do.  :-)

From the Arm ARM:

"CPYP performs some preconditioning of the arguments suitable for using
the CPYM instruction, and performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYM performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYE performs the last part of the memory copy."

Ditto for other kinds of prologue, main and epilogue instructions.
I would say that the prologue instruction is copies some poorly aligned
bytes at the beginning of the memory region, then the main instruction
copies the memory in chunks that are convenient for the processor
implementation, then the epilogue instruction copies the remaining
poorly aligned bytes at the end.