diff mbox series

[v9,29/29] arm64: mte: Add Memory Tagging Extension documentation

Message ID 20200904103029.32083-30-catalin.marinas@arm.com
State New
Headers show
Series None | expand

Commit Message

Catalin Marinas Sept. 4, 2020, 10:30 a.m. UTC
From: Vincenzo Frascino <vincenzo.frascino@arm.com>


Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
a mechanism to detect the sources of memory related errors which
may be vulnerable to exploitation, including bounds violations,
use-after-free, use-after-return, use-out-of-scope and use before
initialization errors.

Add Memory Tagging Extension documentation for the arm64 linux
kernel support.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

Cc: Will Deacon <will@kernel.org>
---

Notes:
    v7:
    - Add information on ptrace() regset access (NT_ARM_TAGGED_ADDR_CTRL).
    
    v4:
    - Document behaviour of madvise(MADV_DONTNEED/MADV_FREE).
    - Document the initial process state on fork/execve.
    - Clarify when the kernel uaccess checks the tags.
    - Minor updates to the example code.
    - A few other minor clean-ups following review.
    
    v3:
    - Modify the uaccess checking conditions: only when the sync mode is
      selected by the user. In async mode, the kernel uaccesses are not
      checked.
    - Clarify that an include mask of 0 (exclude mask 0xffff) results in
      always generating tag 0.
    - Document the ptrace() interface.
    
    v2:
    - Documented the uaccess kernel tag checking mode.
    - Removed the BTI definitions from cpu-feature-registers.rst.
    - Removed the paragraph stating that MTE depends on the tagged address
      ABI (while the Kconfig entry does, there is no requirement for the
      user to enable both).
    - Changed the GCR_EL1.Exclude handling description following the change
      in the prctl() interface (include vs exclude mask).
    - Updated the example code.

 Documentation/arm64/cpu-feature-registers.rst |   2 +
 Documentation/arm64/elf_hwcaps.rst            |   4 +
 Documentation/arm64/index.rst                 |   1 +
 .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++++
 4 files changed, 312 insertions(+)
 create mode 100644 Documentation/arm64/memory-tagging-extension.rst

Comments

Adhemerval Zanella via Libc-alpha Sept. 17, 2020, 8:11 a.m. UTC | #1
On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:
> From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> 

> Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> a mechanism to detect the sources of memory related errors which

> may be vulnerable to exploitation, including bounds violations,

> use-after-free, use-after-return, use-out-of-scope and use before

> initialization errors.

> 

> Add Memory Tagging Extension documentation for the arm64 linux

> kernel support.

> 

> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>


I'm taking this to mean that Szabolcs is happy with the proposed ABI --
please shout if that's not the case!

Wasn't there a man page kicking around too? Would be good to see that
go upstream (to the manpages project, of course).

Will
Catalin Marinas Sept. 17, 2020, 9:02 a.m. UTC | #2
On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:
> On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > 

> > Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> > a mechanism to detect the sources of memory related errors which

> > may be vulnerable to exploitation, including bounds violations,

> > use-after-free, use-after-return, use-out-of-scope and use before

> > initialization errors.

> > 

> > Add Memory Tagging Extension documentation for the arm64 linux

> > kernel support.

> > 

> > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> 

> I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> please shout if that's not the case!


I think Szabolcs is still on holiday. To summarise the past threads,
AFAICT he's happy with this per-thread control ABI but the discussion
went on whether to expand it in the future (with a new bit) to
synchronise the tag checking mode across all threads of a process. This
adds some complications for the kernel as it needs an IPI to the other
CPUs to set SCTLR_EL1 and it's also racy with multiple threads
requesting different modes.

Now, in the glibc land, if the tag check mode is controlled via
environment variables, the dynamic loader can set this at process start
while still in single-threaded mode and not touch it at run-time. The
MTE checking can still be enabled at run-time, per mapped memory range
via the PROT_MTE flag. This approach doesn't require any additional
changes to the current patches. But it's for Szabolcs to confirm once
he's back.

> Wasn't there a man page kicking around too? Would be good to see that

> go upstream (to the manpages project, of course).


Dave started writing one for the tagged address ABI, not sure where that
is. For the MTE additions, we are waiting for the ABI to be upstreamed.

-- 
Catalin
Dave Martin Sept. 17, 2020, 4:15 p.m. UTC | #3
On Thu, Sep 17, 2020 at 10:02:30AM +0100, Catalin Marinas wrote:
> On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > 

> > > Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> > > a mechanism to detect the sources of memory related errors which

> > > may be vulnerable to exploitation, including bounds violations,

> > > use-after-free, use-after-return, use-out-of-scope and use before

> > > initialization errors.

> > > 

> > > Add Memory Tagging Extension documentation for the arm64 linux

> > > kernel support.

> > > 

> > > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> > 

> > I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> > please shout if that's not the case!

> 

> I think Szabolcs is still on holiday. To summarise the past threads,

> AFAICT he's happy with this per-thread control ABI but the discussion

> went on whether to expand it in the future (with a new bit) to

> synchronise the tag checking mode across all threads of a process. This

> adds some complications for the kernel as it needs an IPI to the other

> CPUs to set SCTLR_EL1 and it's also racy with multiple threads

> requesting different modes.

> 

> Now, in the glibc land, if the tag check mode is controlled via

> environment variables, the dynamic loader can set this at process start

> while still in single-threaded mode and not touch it at run-time. The

> MTE checking can still be enabled at run-time, per mapped memory range

> via the PROT_MTE flag. This approach doesn't require any additional

> changes to the current patches. But it's for Szabolcs to confirm once

> he's back.

> 

> > Wasn't there a man page kicking around too? Would be good to see that

> > go upstream (to the manpages project, of course).

> 

> Dave started writing one for the tagged address ABI, not sure where that

> is. For the MTE additions, we are waiting for the ABI to be upstreamed.


The tagged address ABI control stuff is upstream in the man-pages-5.08
release.

I don't think anyone drafted anything for MTE yet.  Do we consider the
MTE ABI to be sufficiently stable now for it to be worth starting
drafting something?

Cheers
---Dave
Adhemerval Zanella via Libc-alpha Sept. 18, 2020, 8:30 a.m. UTC | #4
On Thu, Sep 17, 2020 at 05:15:53PM +0100, Dave Martin wrote:
> On Thu, Sep 17, 2020 at 10:02:30AM +0100, Catalin Marinas wrote:

> > On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > > On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > > 

> > > > Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> > > > a mechanism to detect the sources of memory related errors which

> > > > may be vulnerable to exploitation, including bounds violations,

> > > > use-after-free, use-after-return, use-out-of-scope and use before

> > > > initialization errors.

> > > > 

> > > > Add Memory Tagging Extension documentation for the arm64 linux

> > > > kernel support.

> > > > 

> > > > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > > > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> > > 

> > > I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> > > please shout if that's not the case!

> > 

> > I think Szabolcs is still on holiday. To summarise the past threads,

> > AFAICT he's happy with this per-thread control ABI but the discussion

> > went on whether to expand it in the future (with a new bit) to

> > synchronise the tag checking mode across all threads of a process. This

> > adds some complications for the kernel as it needs an IPI to the other

> > CPUs to set SCTLR_EL1 and it's also racy with multiple threads

> > requesting different modes.

> > 

> > Now, in the glibc land, if the tag check mode is controlled via

> > environment variables, the dynamic loader can set this at process start

> > while still in single-threaded mode and not touch it at run-time. The

> > MTE checking can still be enabled at run-time, per mapped memory range

> > via the PROT_MTE flag. This approach doesn't require any additional

> > changes to the current patches. But it's for Szabolcs to confirm once

> > he's back.

> > 

> > > Wasn't there a man page kicking around too? Would be good to see that

> > > go upstream (to the manpages project, of course).

> > 

> > Dave started writing one for the tagged address ABI, not sure where that

> > is. For the MTE additions, we are waiting for the ABI to be upstreamed.

> 

> The tagged address ABI control stuff is upstream in the man-pages-5.08

> release.

> 

> I don't think anyone drafted anything for MTE yet.  Do we consider the

> MTE ABI to be sufficiently stable now for it to be worth starting

> drafting something?


I think so, yes. I'm hoping to queue it for 5.10, once I have an Ack from
the Android tools side on the per-thread ABI.

Will
Adhemerval Zanella via Libc-alpha Sept. 22, 2020, 12:22 p.m. UTC | #5
On Fri, Sep 4, 2020 at 12:31 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>

> From: Vincenzo Frascino <vincenzo.frascino@arm.com>

>

> Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> a mechanism to detect the sources of memory related errors which

> may be vulnerable to exploitation, including bounds violations,

> use-after-free, use-after-return, use-out-of-scope and use before

> initialization errors.

>

> Add Memory Tagging Extension documentation for the arm64 linux

> kernel support.

>

> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> Cc: Will Deacon <will@kernel.org>

> ---

>

> Notes:

>     v7:

>     - Add information on ptrace() regset access (NT_ARM_TAGGED_ADDR_CTRL).

>

>     v4:

>     - Document behaviour of madvise(MADV_DONTNEED/MADV_FREE).

>     - Document the initial process state on fork/execve.

>     - Clarify when the kernel uaccess checks the tags.

>     - Minor updates to the example code.

>     - A few other minor clean-ups following review.

>

>     v3:

>     - Modify the uaccess checking conditions: only when the sync mode is

>       selected by the user. In async mode, the kernel uaccesses are not

>       checked.

>     - Clarify that an include mask of 0 (exclude mask 0xffff) results in

>       always generating tag 0.

>     - Document the ptrace() interface.

>

>     v2:

>     - Documented the uaccess kernel tag checking mode.

>     - Removed the BTI definitions from cpu-feature-registers.rst.

>     - Removed the paragraph stating that MTE depends on the tagged address

>       ABI (while the Kconfig entry does, there is no requirement for the

>       user to enable both).

>     - Changed the GCR_EL1.Exclude handling description following the change

>       in the prctl() interface (include vs exclude mask).

>     - Updated the example code.

>

>  Documentation/arm64/cpu-feature-registers.rst |   2 +

>  Documentation/arm64/elf_hwcaps.rst            |   4 +

>  Documentation/arm64/index.rst                 |   1 +

>  .../arm64/memory-tagging-extension.rst        | 305 ++++++++++++++++++

>  4 files changed, 312 insertions(+)

>  create mode 100644 Documentation/arm64/memory-tagging-extension.rst

>

> diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst

> index f28853f80089..328e0c454fbd 100644

> --- a/Documentation/arm64/cpu-feature-registers.rst

> +++ b/Documentation/arm64/cpu-feature-registers.rst

> @@ -175,6 +175,8 @@ infrastructure:

>       +------------------------------+---------+---------+

>       | Name                         |  bits   | visible |

>       +------------------------------+---------+---------+

> +     | MTE                          | [11-8]  |    y    |

> +     +------------------------------+---------+---------+

>       | SSBS                         | [7-4]   |    y    |

>       +------------------------------+---------+---------+

>       | BT                           | [3-0]   |    y    |

> diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst

> index 84a9fd2d41b4..bbd9cf54db6c 100644

> --- a/Documentation/arm64/elf_hwcaps.rst

> +++ b/Documentation/arm64/elf_hwcaps.rst

> @@ -240,6 +240,10 @@ HWCAP2_BTI

>

>      Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.

>

> +HWCAP2_MTE

> +

> +    Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described

> +    by Documentation/arm64/memory-tagging-extension.rst.

>

>  4. Unused AT_HWCAP bits

>  -----------------------

> diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst

> index d9665d83c53a..43b0939d384e 100644

> --- a/Documentation/arm64/index.rst

> +++ b/Documentation/arm64/index.rst

> @@ -14,6 +14,7 @@ ARM64 Architecture

>      hugetlbpage

>      legacy_instructions

>      memory

> +    memory-tagging-extension

>      perf

>      pointer-authentication

>      silicon-errata

> diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst

> new file mode 100644

> index 000000000000..e3709b536b89

> --- /dev/null

> +++ b/Documentation/arm64/memory-tagging-extension.rst

> @@ -0,0 +1,305 @@

> +===============================================

> +Memory Tagging Extension (MTE) in AArch64 Linux

> +===============================================

> +

> +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>

> +         Catalin Marinas <catalin.marinas@arm.com>

> +

> +Date: 2020-02-25

> +

> +This document describes the provision of the Memory Tagging Extension

> +functionality in AArch64 Linux.

> +

> +Introduction

> +============

> +

> +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)

> +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI

> +(Top Byte Ignore) feature and allows software to access a 4-bit

> +allocation tag for each 16-byte granule in the physical address space.

> +Such memory range must be mapped with the Normal-Tagged memory

> +attribute. A logical tag is derived from bits 59-56 of the virtual

> +address used for the memory access. A CPU with MTE enabled will compare

> +the logical tag against the allocation tag and potentially raise an

> +exception on mismatch, subject to system registers configuration.

> +

> +Userspace Support

> +=================

> +

> +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is

> +supported by the hardware, the kernel advertises the feature to

> +userspace via ``HWCAP2_MTE``.

> +

> +PROT_MTE

> +--------

> +

> +To access the allocation tags, a user process must enable the Tagged

> +memory attribute on an address range using a new ``prot`` flag for

> +``mmap()`` and ``mprotect()``:

> +

> +``PROT_MTE`` - Pages allow access to the MTE allocation tags.

> +

> +The allocation tag is set to 0 when such pages are first mapped in the

> +user address space and preserved on copy-on-write. ``MAP_SHARED`` is

> +supported and the allocation tags can be shared between processes.

> +

> +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and

> +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other

> +types of mapping will result in ``-EINVAL`` returned by these system

> +calls.

> +

> +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot

> +be cleared by ``mprotect()``.

> +

> +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and

> +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any

> +point after the system call.

> +

> +Tag Check Faults

> +----------------

> +

> +When ``PROT_MTE`` is enabled on an address range and a mismatch between

> +the logical and allocation tags occurs on access, there are three

> +configurable behaviours:

> +

> +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the

> +  tag check fault.

> +

> +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with

> +  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The

> +  memory access is not performed. If ``SIGSEGV`` is ignored or blocked

> +  by the offending thread, the containing process is terminated with a

> +  ``coredump``.

> +

> +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending

> +  thread, asynchronously following one or multiple tag check faults,

> +  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting

> +  address is unknown).

> +

> +The user can select the above modes, per thread, using the

> +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where

> +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``

> +bit-field:

> +

> +- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults

> +- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode

> +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode

> +

> +The current tag check fault mode can be read using the

> +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.

> +

> +Tag checking can also be disabled for a user thread by setting the

> +``PSTATE.TCO`` bit with ``MSR TCO, #1``.

> +

> +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,

> +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on

> +``sigreturn()``.

> +

> +**Note**: There are no *match-all* logical tags available for user

> +applications.

> +

> +**Note**: Kernel accesses to the user address space (e.g. ``read()``

> +system call) are not checked if the user thread tag checking mode is

> +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is

> +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user

> +address accesses, however it cannot always guarantee it.

> +

> +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions

> +-----------------------------------------------------------------

> +

> +The architecture allows excluding certain tags to be randomly generated

> +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux

> +excludes all tags other than 0. A user thread can enable specific tags

> +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,

> +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap

> +in the ``PR_MTE_TAG_MASK`` bit-field.

> +

> +**Note**: The hardware uses an exclude mask but the ``prctl()``

> +interface provides an include mask. An include mask of ``0`` (exclusion

> +mask ``0xffff``) results in the CPU always generating tag ``0``.

> +

> +Initial process state

> +---------------------

> +

> +On ``execve()``, the new process has the following configuration:

> +

> +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)

> +- Tag checking mode set to ``PR_MTE_TCF_NONE``

> +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)

> +- ``PSTATE.TCO`` set to 0

> +- ``PROT_MTE`` not set on any of the initial memory maps

> +

> +On ``fork()``, the new process inherits the parent's configuration and

> +memory map attributes with the exception of the ``madvise()`` ranges

> +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set

> +to 0).

> +

> +The ``ptrace()`` interface

> +--------------------------

> +

> +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read

> +the tags from or set the tags to a tracee's address space. The

> +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,

> +data)`` where:

> +

> +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.


Typo here, one of those should be POKE.

> +- ``pid`` - the tracee's PID.

> +- ``addr`` - address in the tracee's address space.

> +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to

> +  a buffer of ``iov_len`` length in the tracer's address space.

> +

> +The tags in the tracer's ``iov_base`` buffer are represented as one

> +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the

> +tracee's address space.

> +

> +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel

> +will use the corresponding aligned address.

> +

> +``ptrace()`` return value:

> +

> +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the

> +  number of tags transferred. This may be smaller than the requested

> +  ``iov_len`` if the requested address range in the tracee's or the

> +  tracer's space cannot be accessed or does not have valid tags.

> +- ``-EPERM`` - the specified process cannot be traced.

> +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid

> +  address) and no tags copied. ``iov_len`` not updated.

> +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``

> +  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.

> +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never

> +  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.

> +

> +**Note**: There are no transient errors for the requests above, so user

> +programs should not retry in case of a non-zero system call return.

> +

> +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==

> +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged

> +address ABI control and MTE configuration of a process as per the

> +``prctl()`` options described in

> +Documentation/arm64/tagged-address-abi.rst and above. The corresponding

> +``regset`` is 1 element of 8 bytes (``sizeof(long))``).

> +

> +Example of correct usage

> +========================

> +

> +*MTE Example code*

> +

> +.. code-block:: c

> +

> +    /*

> +     * To be compiled with -march=armv8.5-a+memtag

> +     */

> +    #include <errno.h>

> +    #include <stdint.h>

> +    #include <stdio.h>

> +    #include <stdlib.h>

> +    #include <unistd.h>

> +    #include <sys/auxv.h>

> +    #include <sys/mman.h>

> +    #include <sys/prctl.h>

> +

> +    /*

> +     * From arch/arm64/include/uapi/asm/hwcap.h

> +     */

> +    #define HWCAP2_MTE              (1 << 18)

> +

> +    /*

> +     * From arch/arm64/include/uapi/asm/mman.h

> +     */

> +    #define PROT_MTE                 0x20

> +

> +    /*

> +     * From include/uapi/linux/prctl.h

> +     */

> +    #define PR_SET_TAGGED_ADDR_CTRL 55

> +    #define PR_GET_TAGGED_ADDR_CTRL 56

> +    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)

> +    # define PR_MTE_TCF_SHIFT       1

> +    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)

> +    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)

> +    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)

> +    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)

> +    # define PR_MTE_TAG_SHIFT       3

> +    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)

> +

> +    /*

> +     * Insert a random logical tag into the given pointer.

> +     */

> +    #define insert_random_tag(ptr) ({                       \

> +            uint64_t __val;                                 \

> +            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \

> +            __val;                                          \

> +    })

> +

> +    /*

> +     * Set the allocation tag on the destination address.

> +     */

> +    #define set_tag(tagged_addr) do {                                      \

> +            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \

> +    } while (0)

> +

> +    int main()

> +    {

> +            unsigned char *a;

> +            unsigned long page_sz = sysconf(_SC_PAGESIZE);

> +            unsigned long hwcap2 = getauxval(AT_HWCAP2);

> +

> +            /* check if MTE is present */

> +            if (!(hwcap2 & HWCAP2_MTE))

> +                    return EXIT_FAILURE;

> +

> +            /*

> +             * Enable the tagged address ABI, synchronous MTE tag check faults and

> +             * allow all non-zero tags in the randomly generated set.

> +             */

> +            if (prctl(PR_SET_TAGGED_ADDR_CTRL,

> +                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),

> +                      0, 0, 0)) {

> +                    perror("prctl() failed");

> +                    return EXIT_FAILURE;

> +            }

> +

> +            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,

> +                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

> +            if (a == MAP_FAILED) {

> +                    perror("mmap() failed");

> +                    return EXIT_FAILURE;

> +            }

> +

> +            /*

> +             * Enable MTE on the above anonymous mmap. The flag could be passed

> +             * directly to mmap() and skip this step.

> +             */

> +            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {

> +                    perror("mprotect() failed");

> +                    return EXIT_FAILURE;

> +            }

> +

> +            /* access with the default tag (0) */

> +            a[0] = 1;

> +            a[1] = 2;

> +

> +            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);

> +

> +            /* set the logical and allocation tags */

> +            a = (unsigned char *)insert_random_tag(a);

> +            set_tag(a);

> +

> +            printf("%p\n", a);

> +

> +            /* non-zero tag access */

> +            a[0] = 3;

> +            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);

> +

> +            /*

> +             * If MTE is enabled correctly the next instruction will generate an

> +             * exception.

> +             */

> +            printf("Expecting SIGSEGV...\n");

> +            a[16] = 0xdd;

> +

> +            /* this should not be printed in the PR_MTE_TCF_SYNC mode */

> +            printf("...haven't got one\n");

> +

> +            return EXIT_FAILURE;

> +    }


Acked-by: Andrey Konovalov <andreyknvl@google.com>


Thanks!
Szabolcs Nagy Sept. 22, 2020, 3:52 p.m. UTC | #6
The 09/17/2020 10:02, Catalin Marinas wrote:
> On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

...
> > > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> > 

> > I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> > please shout if that's not the case!

> 

> I think Szabolcs is still on holiday. To summarise the past threads,

> AFAICT he's happy with this per-thread control ABI but the discussion

> went on whether to expand it in the future (with a new bit) to

> synchronise the tag checking mode across all threads of a process. This

> adds some complications for the kernel as it needs an IPI to the other

> CPUs to set SCTLR_EL1 and it's also racy with multiple threads

> requesting different modes.

> 

> Now, in the glibc land, if the tag check mode is controlled via

> environment variables, the dynamic loader can set this at process start

> while still in single-threaded mode and not touch it at run-time. The

> MTE checking can still be enabled at run-time, per mapped memory range

> via the PROT_MTE flag. This approach doesn't require any additional

> changes to the current patches. But it's for Szabolcs to confirm once

> he's back.


my thinking now is that for PROT_MTE use outside
of libc we will need a way to enable tag checks
early so user code does not have to worry about
tag check settings across threads (coordinating
the setting at runtime seems problematic, same
for the irg exclusion set).

if we add a kernel level opt-in mechanism for tag
checks later (e.g. elf marking) or if the settings
are exclusively owned by early libc code then i
think the proposed abi is ok (this is our current
agreement and works as long as no late runtime
change is needed to the settings).

i'm now wondering about the default tag check mode:
it may be better to enable sync tag checks in the
kernel. it's not clear to me what would break with
that. this is probably late to discuss now and libc
would need ways to override the default no matter
what, but i'd like to know if somebody sees problems
or risks with unconditional sync tag checks turned on
(sorry i don't remember if we went through this before).
i assume it would have no effect on a process that
never uses PROT_MTE.
Catalin Marinas Sept. 22, 2020, 4:04 p.m. UTC | #7
On Thu, Sep 17, 2020 at 05:15:53PM +0100, Dave P Martin wrote:
> On Thu, Sep 17, 2020 at 10:02:30AM +0100, Catalin Marinas wrote:

> > On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > > Wasn't there a man page kicking around too? Would be good to see that

> > > go upstream (to the manpages project, of course).

> > 

> > Dave started writing one for the tagged address ABI, not sure where that

> > is. For the MTE additions, we are waiting for the ABI to be upstreamed.

> 

> The tagged address ABI control stuff is upstream in the man-pages-5.08

> release.

> 

> I don't think anyone drafted anything for MTE yet.  Do we consider the

> MTE ABI to be sufficiently stable now for it to be worth starting

> drafting something?


Yes, the ABI is stable. The patches are in linux-next and, unless
something broken is found, we aim for 5.10.

-- 
Catalin
Catalin Marinas Sept. 22, 2020, 4:55 p.m. UTC | #8
Hi Szabolcs,

On Tue, Sep 22, 2020 at 04:52:49PM +0100, Szabolcs Nagy wrote:
> The 09/17/2020 10:02, Catalin Marinas wrote:

> > On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > > On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> ...

> > > > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> > > 

> > > I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> > > please shout if that's not the case!

> > 

> > I think Szabolcs is still on holiday. To summarise the past threads,

> > AFAICT he's happy with this per-thread control ABI but the discussion

> > went on whether to expand it in the future (with a new bit) to

> > synchronise the tag checking mode across all threads of a process. This

> > adds some complications for the kernel as it needs an IPI to the other

> > CPUs to set SCTLR_EL1 and it's also racy with multiple threads

> > requesting different modes.

> > 

> > Now, in the glibc land, if the tag check mode is controlled via

> > environment variables, the dynamic loader can set this at process start

> > while still in single-threaded mode and not touch it at run-time. The

> > MTE checking can still be enabled at run-time, per mapped memory range

> > via the PROT_MTE flag. This approach doesn't require any additional

> > changes to the current patches. But it's for Szabolcs to confirm once

> > he's back.

> 

> my thinking now is that for PROT_MTE use outside of libc we will need

> a way to enable tag checks early so user code does not have to worry

> about tag check settings across threads (coordinating the setting at

> runtime seems problematic, same for the irg exclusion set).


Yeah, such settings are better set at process start time.

We can explore synchronising across threads with an additional PR_* flag
but given the interaction with stack tagging and other potential races,
it will need better coordination with user space and agree on which
settings can be changed (e.g. exclusion mask may not be allowed).
However, at this point, I don't see a strong case for such ABI addition
as long as the application starts with some sane defaults, potentially
driven by the user.

> if we add a kernel level opt-in mechanism for tag checks later (e.g.

> elf marking) or if the settings are exclusively owned by early libc

> code then i think the proposed abi is ok (this is our current

> agreement and works as long as no late runtime change is needed to the

> settings).


In the Android case, run-time changes to the tag checking mode I think
are expected (usually via signal handlers), though per-thread.

> i'm now wondering about the default tag check mode: it may be better

> to enable sync tag checks in the kernel. it's not clear to me what

> would break with that. this is probably late to discuss now and libc

> would need ways to override the default no matter what, but i'd like

> to know if somebody sees problems or risks with unconditional sync tag

> checks turned on (sorry i don't remember if we went through this

> before). i assume it would have no effect on a process that never uses

> PROT_MTE.


I don't think it helps much. We already have a requirement that to be
able to pass tagged pointers to kernel syscalls, the user needs a
prctl(PR_TAGGED_ADDR_ENABLE) call (code already in mainline). Using
PROT_MTE without tagged pointers won't be of much use. So if we are to
set different tag check defaults, we should also enable the tagged addr
ABI automatically.

That said, I still have a preference for MTE and tagged addr ABI to be
explicitly requested by the (human) user either via environment
variables or marked in an ELF note as "safe with/using tags". Given the
recent mremap() issue we caused in glibc, I'm worried that other things
may break with enabling the tagged addr ABI everywhere.

Another aspect is that sync mode by default in a distro where glibc is
MTE-aware will lead to performance regressions. That's another case in
favour of the user explicitly asking for tag checking.

Anyway, I'm open to having a debate on changing the defaults.

-- 
Catalin
Szabolcs Nagy Sept. 23, 2020, 9:10 a.m. UTC | #9
The 09/22/2020 17:55, Catalin Marinas wrote:
> On Tue, Sep 22, 2020 at 04:52:49PM +0100, Szabolcs Nagy wrote:

> > if we add a kernel level opt-in mechanism for tag checks later (e.g.

> > elf marking) or if the settings are exclusively owned by early libc

> > code then i think the proposed abi is ok (this is our current

> > agreement and works as long as no late runtime change is needed to the

> > settings).

> 

> In the Android case, run-time changes to the tag checking mode I think

> are expected (usually via signal handlers), though per-thread.


ok that works, but does not help allocators or
runtimes that don't own the signal handlers.

> > i'm now wondering about the default tag check mode: it may be better

> > to enable sync tag checks in the kernel. it's not clear to me what

> > would break with that. this is probably late to discuss now and libc

> > would need ways to override the default no matter what, but i'd like

> > to know if somebody sees problems or risks with unconditional sync tag

> > checks turned on (sorry i don't remember if we went through this

> > before). i assume it would have no effect on a process that never uses

> > PROT_MTE.

> 

> I don't think it helps much. We already have a requirement that to be

> able to pass tagged pointers to kernel syscalls, the user needs a

> prctl(PR_TAGGED_ADDR_ENABLE) call (code already in mainline). Using

> PROT_MTE without tagged pointers won't be of much use. So if we are to

> set different tag check defaults, we should also enable the tagged addr

> ABI automatically.

> 

> That said, I still have a preference for MTE and tagged addr ABI to be

> explicitly requested by the (human) user either via environment

> variables or marked in an ELF note as "safe with/using tags". Given the

> recent mremap() issue we caused in glibc, I'm worried that other things

> may break with enabling the tagged addr ABI everywhere.

> 

> Another aspect is that sync mode by default in a distro where glibc is

> MTE-aware will lead to performance regressions. That's another case in

> favour of the user explicitly asking for tag checking.


ok this all makes sense to me.

> 

> Anyway, I'm open to having a debate on changing the defaults.

> 

> -- 

> Catalin
Adhemerval Zanella via Libc-alpha Oct. 14, 2020, 11:43 p.m. UTC | #10
On Fri, Sep 18, 2020 at 1:30 AM Will Deacon <will@kernel.org> wrote:
>

> On Thu, Sep 17, 2020 at 05:15:53PM +0100, Dave Martin wrote:

> > On Thu, Sep 17, 2020 at 10:02:30AM +0100, Catalin Marinas wrote:

> > > On Thu, Sep 17, 2020 at 09:11:08AM +0100, Will Deacon wrote:

> > > > On Fri, Sep 04, 2020 at 11:30:29AM +0100, Catalin Marinas wrote:

> > > > > From: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > > >

> > > > > Memory Tagging Extension (part of the ARMv8.5 Extensions) provides

> > > > > a mechanism to detect the sources of memory related errors which

> > > > > may be vulnerable to exploitation, including bounds violations,

> > > > > use-after-free, use-after-return, use-out-of-scope and use before

> > > > > initialization errors.

> > > > >

> > > > > Add Memory Tagging Extension documentation for the arm64 linux

> > > > > kernel support.

> > > > >

> > > > > Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

> > > > > Co-developed-by: Catalin Marinas <catalin.marinas@arm.com>

> > > > > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> > > > > Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

> > > >

> > > > I'm taking this to mean that Szabolcs is happy with the proposed ABI --

> > > > please shout if that's not the case!

> > >

> > > I think Szabolcs is still on holiday. To summarise the past threads,

> > > AFAICT he's happy with this per-thread control ABI but the discussion

> > > went on whether to expand it in the future (with a new bit) to

> > > synchronise the tag checking mode across all threads of a process. This

> > > adds some complications for the kernel as it needs an IPI to the other

> > > CPUs to set SCTLR_EL1 and it's also racy with multiple threads

> > > requesting different modes.

> > >

> > > Now, in the glibc land, if the tag check mode is controlled via

> > > environment variables, the dynamic loader can set this at process start

> > > while still in single-threaded mode and not touch it at run-time. The

> > > MTE checking can still be enabled at run-time, per mapped memory range

> > > via the PROT_MTE flag. This approach doesn't require any additional

> > > changes to the current patches. But it's for Szabolcs to confirm once

> > > he's back.

> > >

> > > > Wasn't there a man page kicking around too? Would be good to see that

> > > > go upstream (to the manpages project, of course).

> > >

> > > Dave started writing one for the tagged address ABI, not sure where that

> > > is. For the MTE additions, we are waiting for the ABI to be upstreamed.

> >

> > The tagged address ABI control stuff is upstream in the man-pages-5.08

> > release.

> >

> > I don't think anyone drafted anything for MTE yet.  Do we consider the

> > MTE ABI to be sufficiently stable now for it to be worth starting

> > drafting something?

>

> I think so, yes. I'm hoping to queue it for 5.10, once I have an Ack from

> the Android tools side on the per-thread ABI.


Our main requirement on the Android side is to provide an API for
changing the tag checking mode in all threads in a process while
multiple threads are running. I think we've been able to accomplish
this [1] by using a libc private real-time signal which is sent to all
threads. The implementation has been tested on FVP via the included
unit tests. The code has also been tested on real hardware in a
multi-threaded app process (of course we don't have MTE-enabled
hardware, so the implementation was tested on hardware by hacking it
to disable the tagged address ABI instead of changing the tag checking
mode, and then verifying via ptrace(PTRACE_GETREGSET) that the tagged
address ABI was disabled in all threads).

That being said, as with any code at the nexus of concurrency and
POSIX signals, the implementation is quite tricky so I would say it
falls more into the category of "no obvious problems" than "obviously
no problems". It also relies on changes to the implementations of
pthread APIs so it wouldn't catch threads created directly via clone()
rather than via pthread_create(). I think we would be able to ignore
such threads on Android without causing compatibility issues because
we can require the process to not create threads via clone() before
calling the function. I imagine this may not necessarily work for
other libcs like glibc, though, but as I understand it glibc has no
plan to offer such an API.

I feel confident enough in the kernel API though that I think that
it's reasonable as a starting point at least, and that if a problem
with the API is discovered I would expect it to be fixable by adding
new APIs, so:

Acked-by: Peter Collingbourne <pcc@google.com>


Peter

[1] https://android-review.googlesource.com/c/platform/bionic/+/1427377
Adhemerval Zanella via Libc-alpha Oct. 15, 2020, 8:57 a.m. UTC | #11
On Wed, Oct 14, 2020 at 04:43:23PM -0700, Peter Collingbourne wrote:
> On Fri, Sep 18, 2020 at 1:30 AM Will Deacon <will@kernel.org> wrote:

> > I think so, yes. I'm hoping to queue it for 5.10, once I have an Ack from

> > the Android tools side on the per-thread ABI.

> 

> Our main requirement on the Android side is to provide an API for

> changing the tag checking mode in all threads in a process while

> multiple threads are running. I think we've been able to accomplish

> this [1] by using a libc private real-time signal which is sent to all

> threads. The implementation has been tested on FVP via the included

> unit tests. The code has also been tested on real hardware in a

> multi-threaded app process (of course we don't have MTE-enabled

> hardware, so the implementation was tested on hardware by hacking it

> to disable the tagged address ABI instead of changing the tag checking

> mode, and then verifying via ptrace(PTRACE_GETREGSET) that the tagged

> address ABI was disabled in all threads).

> 

> That being said, as with any code at the nexus of concurrency and

> POSIX signals, the implementation is quite tricky so I would say it

> falls more into the category of "no obvious problems" than "obviously

> no problems". It also relies on changes to the implementations of

> pthread APIs so it wouldn't catch threads created directly via clone()

> rather than via pthread_create(). I think we would be able to ignore

> such threads on Android without causing compatibility issues because

> we can require the process to not create threads via clone() before

> calling the function. I imagine this may not necessarily work for

> other libcs like glibc, though, but as I understand it glibc has no

> plan to offer such an API.

> 

> I feel confident enough in the kernel API though that I think that

> it's reasonable as a starting point at least, and that if a problem

> with the API is discovered I would expect it to be fixable by adding

> new APIs, so:

> 

> Acked-by: Peter Collingbourne <pcc@google.com>


Thanks, Peter. This series has already landed upstream, so I'm unable to
add your Ack now, but the text above is very helpful.

Cheers,

Will

> [1] https://android-review.googlesource.com/c/platform/bionic/+/1427377
Adhemerval Zanella via Libc-alpha Oct. 15, 2020, 11:14 a.m. UTC | #12
The 10/14/2020 16:43, Peter Collingbourne wrote:
> On Fri, Sep 18, 2020 at 1:30 AM Will Deacon <will@kernel.org> wrote:

> > I think so, yes. I'm hoping to queue it for 5.10, once I have an Ack from

> > the Android tools side on the per-thread ABI.

> 

> Our main requirement on the Android side is to provide an API for

> changing the tag checking mode in all threads in a process while

> multiple threads are running. I think we've been able to accomplish

> this [1] by using a libc private real-time signal which is sent to all

> threads. The implementation has been tested on FVP via the included

> unit tests. The code has also been tested on real hardware in a

> multi-threaded app process (of course we don't have MTE-enabled

> hardware, so the implementation was tested on hardware by hacking it

> to disable the tagged address ABI instead of changing the tag checking

> mode, and then verifying via ptrace(PTRACE_GETREGSET) that the tagged

> address ABI was disabled in all threads).

> 

> That being said, as with any code at the nexus of concurrency and

> POSIX signals, the implementation is quite tricky so I would say it

> falls more into the category of "no obvious problems" than "obviously

> no problems". It also relies on changes to the implementations of

> pthread APIs so it wouldn't catch threads created directly via clone()

> rather than via pthread_create(). I think we would be able to ignore

> such threads on Android without causing compatibility issues because

> we can require the process to not create threads via clone() before

> calling the function. I imagine this may not necessarily work for

> other libcs like glibc, though, but as I understand it glibc has no

> plan to offer such an API.


no immediate plans.

to make such api useful we would have to expose it to
users (e.g. custom allocators) which is tricky.

note that glibc has the necessary infrastructure to do
the internal signaling, but it had issues in the past.

i think it had problems with qemu-user and golang c ffi
and libc internal issues around multi-threaded fork/vfork
or simply stack overflow because of small thread stacks
and growing signal frames that are more likely to hit
at the wrong time if libc uses more internal signals.

so i think such per process operation is easier to handle
correctly in the kernel.

doing this outside of the libc (e.g. in a custom allocator)
is not possible (without relying on new libc apis) which i
thought was a reasonable use-case, but likely glibc will
enable sync tag checks early and leave it that way (the only
tricky bit is to have an opt-in/-out mechanism for binaries
that are not compatible with the tagged address abi and
i don't know yet how that will work).

> [1] https://android-review.googlesource.com/c/platform/bionic/+/1427377


btw in the bionic implementation there are writes to
globals (g_tcf, g_arg, g_func) that are later read in
signal handlers of other threads without atomics. i'm
not sure if that's enough synchronization (can we
assume that tgkill synchronizes with signal handlers?).
diff mbox series

Patch

diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index f28853f80089..328e0c454fbd 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -175,6 +175,8 @@  infrastructure:
      +------------------------------+---------+---------+
      | Name                         |  bits   | visible |
      +------------------------------+---------+---------+
+     | MTE                          | [11-8]  |    y    |
+     +------------------------------+---------+---------+
      | SSBS                         | [7-4]   |    y    |
      +------------------------------+---------+---------+
      | BT                           | [3-0]   |    y    |
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index 84a9fd2d41b4..bbd9cf54db6c 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -240,6 +240,10 @@  HWCAP2_BTI
 
     Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001.
 
+HWCAP2_MTE
+
+    Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described
+    by Documentation/arm64/memory-tagging-extension.rst.
 
 4. Unused AT_HWCAP bits
 -----------------------
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index d9665d83c53a..43b0939d384e 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -14,6 +14,7 @@  ARM64 Architecture
     hugetlbpage
     legacy_instructions
     memory
+    memory-tagging-extension
     perf
     pointer-authentication
     silicon-errata
diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst
new file mode 100644
index 000000000000..e3709b536b89
--- /dev/null
+++ b/Documentation/arm64/memory-tagging-extension.rst
@@ -0,0 +1,305 @@ 
+===============================================
+Memory Tagging Extension (MTE) in AArch64 Linux
+===============================================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+         Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 2020-02-25
+
+This document describes the provision of the Memory Tagging Extension
+functionality in AArch64 Linux.
+
+Introduction
+============
+
+ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
+feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
+(Top Byte Ignore) feature and allows software to access a 4-bit
+allocation tag for each 16-byte granule in the physical address space.
+Such memory range must be mapped with the Normal-Tagged memory
+attribute. A logical tag is derived from bits 59-56 of the virtual
+address used for the memory access. A CPU with MTE enabled will compare
+the logical tag against the allocation tag and potentially raise an
+exception on mismatch, subject to system registers configuration.
+
+Userspace Support
+=================
+
+When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
+supported by the hardware, the kernel advertises the feature to
+userspace via ``HWCAP2_MTE``.
+
+PROT_MTE
+--------
+
+To access the allocation tags, a user process must enable the Tagged
+memory attribute on an address range using a new ``prot`` flag for
+``mmap()`` and ``mprotect()``:
+
+``PROT_MTE`` - Pages allow access to the MTE allocation tags.
+
+The allocation tag is set to 0 when such pages are first mapped in the
+user address space and preserved on copy-on-write. ``MAP_SHARED`` is
+supported and the allocation tags can be shared between processes.
+
+**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
+RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
+types of mapping will result in ``-EINVAL`` returned by these system
+calls.
+
+**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
+be cleared by ``mprotect()``.
+
+**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
+``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
+point after the system call.
+
+Tag Check Faults
+----------------
+
+When ``PROT_MTE`` is enabled on an address range and a mismatch between
+the logical and allocation tags occurs on access, there are three
+configurable behaviours:
+
+- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
+  tag check fault.
+
+- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
+  ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
+  memory access is not performed. If ``SIGSEGV`` is ignored or blocked
+  by the offending thread, the containing process is terminated with a
+  ``coredump``.
+
+- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
+  thread, asynchronously following one or multiple tag check faults,
+  with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
+  address is unknown).
+
+The user can select the above modes, per thread, using the
+``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where
+``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK``
+bit-field:
+
+- ``PR_MTE_TCF_NONE``  - *Ignore* tag check faults
+- ``PR_MTE_TCF_SYNC``  - *Synchronous* tag check fault mode
+- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
+
+The current tag check fault mode can be read using the
+``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
+
+Tag checking can also be disabled for a user thread by setting the
+``PSTATE.TCO`` bit with ``MSR TCO, #1``.
+
+**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
+irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
+``sigreturn()``.
+
+**Note**: There are no *match-all* logical tags available for user
+applications.
+
+**Note**: Kernel accesses to the user address space (e.g. ``read()``
+system call) are not checked if the user thread tag checking mode is
+``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
+``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
+address accesses, however it cannot always guarantee it.
+
+Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
+-----------------------------------------------------------------
+
+The architecture allows excluding certain tags to be randomly generated
+via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
+excludes all tags other than 0. A user thread can enable specific tags
+in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
+flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
+in the ``PR_MTE_TAG_MASK`` bit-field.
+
+**Note**: The hardware uses an exclude mask but the ``prctl()``
+interface provides an include mask. An include mask of ``0`` (exclusion
+mask ``0xffff``) results in the CPU always generating tag ``0``.
+
+Initial process state
+---------------------
+
+On ``execve()``, the new process has the following configuration:
+
+- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
+- Tag checking mode set to ``PR_MTE_TCF_NONE``
+- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
+- ``PSTATE.TCO`` set to 0
+- ``PROT_MTE`` not set on any of the initial memory maps
+
+On ``fork()``, the new process inherits the parent's configuration and
+memory map attributes with the exception of the ``madvise()`` ranges
+with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
+to 0).
+
+The ``ptrace()`` interface
+--------------------------
+
+``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
+the tags from or set the tags to a tracee's address space. The
+``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
+data)`` where:
+
+- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_PEEKMTETAGS``.
+- ``pid`` - the tracee's PID.
+- ``addr`` - address in the tracee's address space.
+- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
+  a buffer of ``iov_len`` length in the tracer's address space.
+
+The tags in the tracer's ``iov_base`` buffer are represented as one
+4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
+tracee's address space.
+
+**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
+will use the corresponding aligned address.
+
+``ptrace()`` return value:
+
+- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
+  number of tags transferred. This may be smaller than the requested
+  ``iov_len`` if the requested address range in the tracee's or the
+  tracer's space cannot be accessed or does not have valid tags.
+- ``-EPERM`` - the specified process cannot be traced.
+- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
+  address) and no tags copied. ``iov_len`` not updated.
+- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
+  or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
+- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
+  mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
+
+**Note**: There are no transient errors for the requests above, so user
+programs should not retry in case of a non-zero system call return.
+
+``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
+``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
+address ABI control and MTE configuration of a process as per the
+``prctl()`` options described in
+Documentation/arm64/tagged-address-abi.rst and above. The corresponding
+``regset`` is 1 element of 8 bytes (``sizeof(long))``).
+
+Example of correct usage
+========================
+
+*MTE Example code*
+
+.. code-block:: c
+
+    /*
+     * To be compiled with -march=armv8.5-a+memtag
+     */
+    #include <errno.h>
+    #include <stdint.h>
+    #include <stdio.h>
+    #include <stdlib.h>
+    #include <unistd.h>
+    #include <sys/auxv.h>
+    #include <sys/mman.h>
+    #include <sys/prctl.h>
+
+    /*
+     * From arch/arm64/include/uapi/asm/hwcap.h
+     */
+    #define HWCAP2_MTE              (1 << 18)
+
+    /*
+     * From arch/arm64/include/uapi/asm/mman.h
+     */
+    #define PROT_MTE                 0x20
+
+    /*
+     * From include/uapi/linux/prctl.h
+     */
+    #define PR_SET_TAGGED_ADDR_CTRL 55
+    #define PR_GET_TAGGED_ADDR_CTRL 56
+    # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
+    # define PR_MTE_TCF_SHIFT       1
+    # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
+    # define PR_MTE_TAG_SHIFT       3
+    # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
+
+    /*
+     * Insert a random logical tag into the given pointer.
+     */
+    #define insert_random_tag(ptr) ({                       \
+            uint64_t __val;                                 \
+            asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
+            __val;                                          \
+    })
+
+    /*
+     * Set the allocation tag on the destination address.
+     */
+    #define set_tag(tagged_addr) do {                                      \
+            asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
+    } while (0)
+
+    int main()
+    {
+            unsigned char *a;
+            unsigned long page_sz = sysconf(_SC_PAGESIZE);
+            unsigned long hwcap2 = getauxval(AT_HWCAP2);
+
+            /* check if MTE is present */
+            if (!(hwcap2 & HWCAP2_MTE))
+                    return EXIT_FAILURE;
+
+            /*
+             * Enable the tagged address ABI, synchronous MTE tag check faults and
+             * allow all non-zero tags in the randomly generated set.
+             */
+            if (prctl(PR_SET_TAGGED_ADDR_CTRL,
+                      PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | (0xfffe << PR_MTE_TAG_SHIFT),
+                      0, 0, 0)) {
+                    perror("prctl() failed");
+                    return EXIT_FAILURE;
+            }
+
+            a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
+                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+            if (a == MAP_FAILED) {
+                    perror("mmap() failed");
+                    return EXIT_FAILURE;
+            }
+
+            /*
+             * Enable MTE on the above anonymous mmap. The flag could be passed
+             * directly to mmap() and skip this step.
+             */
+            if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
+                    perror("mprotect() failed");
+                    return EXIT_FAILURE;
+            }
+
+            /* access with the default tag (0) */
+            a[0] = 1;
+            a[1] = 2;
+
+            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+            /* set the logical and allocation tags */
+            a = (unsigned char *)insert_random_tag(a);
+            set_tag(a);
+
+            printf("%p\n", a);
+
+            /* non-zero tag access */
+            a[0] = 3;
+            printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
+
+            /*
+             * If MTE is enabled correctly the next instruction will generate an
+             * exception.
+             */
+            printf("Expecting SIGSEGV...\n");
+            a[16] = 0xdd;
+
+            /* this should not be printed in the PR_MTE_TCF_SYNC mode */
+            printf("...haven't got one\n");
+
+            return EXIT_FAILURE;
+    }