diff mbox series

[v5,06/30] arm64: context switch POR_EL0 register

Message ID 20240822151113.1479789-7-joey.gouly@arm.com
State Accepted
Commit 160a8e13de6c36270e8c6537b8a944f4e73d2362
Headers show
Series Permission Overlay Extension | expand

Commit Message

Joey Gouly Aug. 22, 2024, 3:10 p.m. UTC
POR_EL0 is a register that can be modified by userspace directly,
so it must be context switched.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/include/asm/cpufeature.h |  6 ++++++
 arch/arm64/include/asm/processor.h  |  1 +
 arch/arm64/include/asm/sysreg.h     |  3 +++
 arch/arm64/kernel/process.c         | 28 ++++++++++++++++++++++++++++
 4 files changed, 38 insertions(+)

Comments

Will Deacon Aug. 23, 2024, 2:45 p.m. UTC | #1
On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> POR_EL0 is a register that can be modified by userspace directly,
> so it must be context switched.
> 
> Signed-off-by: Joey Gouly <joey.gouly@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> ---
>  arch/arm64/include/asm/cpufeature.h |  6 ++++++
>  arch/arm64/include/asm/processor.h  |  1 +
>  arch/arm64/include/asm/sysreg.h     |  3 +++
>  arch/arm64/kernel/process.c         | 28 ++++++++++++++++++++++++++++
>  4 files changed, 38 insertions(+)

[...]

> +static void permission_overlay_switch(struct task_struct *next)
> +{
> +	if (!system_supports_poe())
> +		return;
> +
> +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> +	if (current->thread.por_el0 != next->thread.por_el0) {
> +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */

nit: typo "chaning".

But more substantially, is this just to prevent spurious faults in the
context of a new thread using a stale value for POR_EL0?

Will
Catalin Marinas Aug. 23, 2024, 4:41 p.m. UTC | #2
On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > +static void permission_overlay_switch(struct task_struct *next)
> > +{
> > +	if (!system_supports_poe())
> > +		return;
> > +
> > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> 
> nit: typo "chaning".
> 
> But more substantially, is this just to prevent spurious faults in the
> context of a new thread using a stale value for POR_EL0?

Not just prevent faults but enforce the permissions from the new
thread's POR_EL0. The kernel may continue with a uaccess routine from
here, we can't tell.
Will Deacon Aug. 23, 2024, 5:08 p.m. UTC | #3
On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > +static void permission_overlay_switch(struct task_struct *next)
> > > +{
> > > +	if (!system_supports_poe())
> > > +		return;
> > > +
> > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > 
> > nit: typo "chaning".
> > 
> > But more substantially, is this just to prevent spurious faults in the
> > context of a new thread using a stale value for POR_EL0?
> 
> Not just prevent faults but enforce the permissions from the new
> thread's POR_EL0. The kernel may continue with a uaccess routine from
> here, we can't tell.

Hmm, I wondered if that was the case. It's a bit weird though, because:

  - There's a window between switch_mm() and switch_to() where you might
    reasonably expect to be able to execute uaccess routines

  - kthread_use_mm() doesn't/can't look at this at all

  - GUP obviously doesn't care

So what do we actually gain by having the uaccess routines honour this?

Will
Catalin Marinas Aug. 23, 2024, 6:40 p.m. UTC | #4
On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > +{
> > > > +	if (!system_supports_poe())
> > > > +		return;
> > > > +
> > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > 
> > > nit: typo "chaning".
> > > 
> > > But more substantially, is this just to prevent spurious faults in the
> > > context of a new thread using a stale value for POR_EL0?
> > 
> > Not just prevent faults but enforce the permissions from the new
> > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > here, we can't tell.
> 
> Hmm, I wondered if that was the case. It's a bit weird though, because:
> 
>   - There's a window between switch_mm() and switch_to() where you might
>     reasonably expect to be able to execute uaccess routines

I don't think we can have any uaccess between these two switches (a
uaccess could fault, that's a pretty weird state between these two).

>   - kthread_use_mm() doesn't/can't look at this at all

No, but a kthread would have it's own, most permissive, POR_EL0.

>   - GUP obviously doesn't care
> 
> So what do we actually gain by having the uaccess routines honour this?

I guess where it matters is more like not accidentally faulting because
the previous thread had more restrictive permissions.
Will Deacon Aug. 27, 2024, 11:38 a.m. UTC | #5
On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > +{
> > > > > +	if (!system_supports_poe())
> > > > > +		return;
> > > > > +
> > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > 
> > > > nit: typo "chaning".
> > > > 
> > > > But more substantially, is this just to prevent spurious faults in the
> > > > context of a new thread using a stale value for POR_EL0?
> > > 
> > > Not just prevent faults but enforce the permissions from the new
> > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > here, we can't tell.
> > 
> > Hmm, I wondered if that was the case. It's a bit weird though, because:
> > 
> >   - There's a window between switch_mm() and switch_to() where you might
> >     reasonably expect to be able to execute uaccess routines
> 
> I don't think we can have any uaccess between these two switches (a
> uaccess could fault, that's a pretty weird state between these two).
> 
> >   - kthread_use_mm() doesn't/can't look at this at all
> 
> No, but a kthread would have it's own, most permissive, POR_EL0.
> 
> >   - GUP obviously doesn't care
> > 
> > So what do we actually gain by having the uaccess routines honour this?
> 
> I guess where it matters is more like not accidentally faulting because
> the previous thread had more restrictive permissions.

That's what I wondered initially, but won't the fault handler retry in
that case?

Will
Catalin Marinas Sept. 2, 2024, 7:08 p.m. UTC | #6
On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > +{
> > > > > > +	if (!system_supports_poe())
> > > > > > +		return;
> > > > > > +
> > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > 
> > > > > nit: typo "chaning".
> > > > > 
> > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > context of a new thread using a stale value for POR_EL0?
> > > > 
> > > > Not just prevent faults but enforce the permissions from the new
> > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > here, we can't tell.
[...]
> > > So what do we actually gain by having the uaccess routines honour this?
> > 
> > I guess where it matters is more like not accidentally faulting because
> > the previous thread had more restrictive permissions.
> 
> That's what I wondered initially, but won't the fault handler retry in
> that case?

Yes, it will retry and this should be fine (I assume you are only
talking about the dropping ISB in the context switch).

For the case of running with a more permissive stale POR_EL0, arguably it's
slightly more predictable for the user but, OTOH, some syscalls like
readv() could be routed through GUP with no checks. As with MTE, we
don't guarantee uaccesses honour the user permissions.

That said, at some point we should sanitise this path anyway and have a
single ISB at the end. In the meantime, I'm fine with dropping the ISB
here.
Will Deacon Sept. 4, 2024, 10:22 a.m. UTC | #7
On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > +{
> > > > > > > > +	if (!system_supports_poe())
> > > > > > > > +		return;
> > > > > > > > +
> > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > >
> > > > > > > nit: typo "chaning".
> > > > > > >
> > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > >
> > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > here, we can't tell.
> > [...]
> > > > > So what do we actually gain by having the uaccess routines honour this?
> > > >
> > > > I guess where it matters is more like not accidentally faulting because
> > > > the previous thread had more restrictive permissions.
> > >
> > > That's what I wondered initially, but won't the fault handler retry in
> > > that case?
> >
> > Yes, it will retry and this should be fine (I assume you are only
> > talking about the dropping ISB in the context switch).
> >
> > For the case of running with a more permissive stale POR_EL0, arguably it's
> > slightly more predictable for the user but, OTOH, some syscalls like
> > readv() could be routed through GUP with no checks. As with MTE, we
> > don't guarantee uaccesses honour the user permissions.
> >
> > That said, at some point we should sanitise this path anyway and have a
> > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > here.
> >
> 
> commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> Author: Joey Gouly <joey.gouly@arm.com>
> Date:   Tue Sep 3 15:47:26 2024 +0100
> 
>     fixup! arm64: context switch POR_EL0 register
> 
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index a3a61ecdb165..c224b0955f1a 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
>                 return;
> 
>         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> -       if (current->thread.por_el0 != next->thread.por_el0) {
> +       if (current->thread.por_el0 != next->thread.por_el0)
>                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> -               isb();
> -       }
>  }

What about the one in flush_poe()? I'm inclined to drop that as well.

> Will, do you want me to re-send the series with this and the permissions
> diff from the other thread [1],
> or you ok with applying them when you pull it in?

I'll have a crack now, but if it fails miserably then I'll let you know.

Will
Will Deacon Sept. 4, 2024, 11:43 a.m. UTC | #8
On Wed, Sep 04, 2024 at 12:32:21PM +0100, Joey Gouly wrote:
> On Wed, Sep 04, 2024 at 11:22:54AM +0100, Will Deacon wrote:
> > On Tue, Sep 03, 2024 at 03:54:13PM +0100, Joey Gouly wrote:
> > > On Mon, Sep 02, 2024 at 08:08:08PM +0100, Catalin Marinas wrote:
> > > > On Tue, Aug 27, 2024 at 12:38:04PM +0100, Will Deacon wrote:
> > > > > On Fri, Aug 23, 2024 at 07:40:52PM +0100, Catalin Marinas wrote:
> > > > > > On Fri, Aug 23, 2024 at 06:08:36PM +0100, Will Deacon wrote:
> > > > > > > On Fri, Aug 23, 2024 at 05:41:06PM +0100, Catalin Marinas wrote:
> > > > > > > > On Fri, Aug 23, 2024 at 03:45:32PM +0100, Will Deacon wrote:
> > > > > > > > > On Thu, Aug 22, 2024 at 04:10:49PM +0100, Joey Gouly wrote:
> > > > > > > > > > +static void permission_overlay_switch(struct task_struct *next)
> > > > > > > > > > +{
> > > > > > > > > > +	if (!system_supports_poe())
> > > > > > > > > > +		return;
> > > > > > > > > > +
> > > > > > > > > > +	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > > > > > > > +	if (current->thread.por_el0 != next->thread.por_el0) {
> > > > > > > > > > +		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > > > > > > > > +		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > > > > > > >
> > > > > > > > > nit: typo "chaning".
> > > > > > > > >
> > > > > > > > > But more substantially, is this just to prevent spurious faults in the
> > > > > > > > > context of a new thread using a stale value for POR_EL0?
> > > > > > > >
> > > > > > > > Not just prevent faults but enforce the permissions from the new
> > > > > > > > thread's POR_EL0. The kernel may continue with a uaccess routine from
> > > > > > > > here, we can't tell.
> > > > [...]
> > > > > > > So what do we actually gain by having the uaccess routines honour this?
> > > > > >
> > > > > > I guess where it matters is more like not accidentally faulting because
> > > > > > the previous thread had more restrictive permissions.
> > > > >
> > > > > That's what I wondered initially, but won't the fault handler retry in
> > > > > that case?
> > > >
> > > > Yes, it will retry and this should be fine (I assume you are only
> > > > talking about the dropping ISB in the context switch).
> > > >
> > > > For the case of running with a more permissive stale POR_EL0, arguably it's
> > > > slightly more predictable for the user but, OTOH, some syscalls like
> > > > readv() could be routed through GUP with no checks. As with MTE, we
> > > > don't guarantee uaccesses honour the user permissions.
> > > >
> > > > That said, at some point we should sanitise this path anyway and have a
> > > > single ISB at the end. In the meantime, I'm fine with dropping the ISB
> > > > here.
> > > >
> > > 
> > > commit 3141fb86bee8d48ae47cab1594dad54f974a8899
> > > Author: Joey Gouly <joey.gouly@arm.com>
> > > Date:   Tue Sep 3 15:47:26 2024 +0100
> > > 
> > >     fixup! arm64: context switch POR_EL0 register
> > > 
> > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > > index a3a61ecdb165..c224b0955f1a 100644
> > > --- a/arch/arm64/kernel/process.c
> > > +++ b/arch/arm64/kernel/process.c
> > > @@ -515,11 +515,8 @@ static void permission_overlay_switch(struct task_struct *next)
> > >                 return;
> > > 
> > >         current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > -       if (current->thread.por_el0 != next->thread.por_el0) {
> > > +       if (current->thread.por_el0 != next->thread.por_el0)
> > >                 write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
> > > -               /* ISB required for kernel uaccess routines when chaning POR_EL0 */
> > > -               isb();
> > > -       }
> > >  }
> > 
> > What about the one in flush_poe()? I'm inclined to drop that as well.
> 
> Yes I guess that one can be removed too. Catalin any comments?
> 
> > 
> > > Will, do you want me to re-send the series with this and the permissions
> > > diff from the other thread [1],
> > > or you ok with applying them when you pull it in?
> > 
> > I'll have a crack now, but if it fails miserably then I'll let you know.
> 
> Thanks! Just to make sure, you should pick the patch up from
> 
> 	https://lore.kernel.org/linux-arm-kernel/20240903152937.GA3768522@e124191.cambridge.arm.com/
> 
> Not the one I linked to in [1] in my previous e-mail.

Right, there's quite a lot I need to do:

- Uncorrupt your patches
- Fix the conflict in the kvm selftests
- Drop the unnecessary ISBs
- Fix the ESR checking
- Fix the el2_setup labels
- Reorder the patches
- Drop the patch that is already in kvmarm

Working on it...

Will
Marc Zyngier Sept. 4, 2024, 5:05 p.m. UTC | #9
On Wed, 04 Sep 2024 17:17:58 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> On Wed, Sep 04, 2024 at 01:55:03PM +0100, Joey Gouly wrote:
> > On Wed, Sep 04, 2024 at 12:43:02PM +0100, Will Deacon wrote:
> > > Right, there's quite a lot I need to do:
> > > 
> > > - Uncorrupt your patches
> > > - Fix the conflict in the kvm selftests
> > > - Drop the unnecessary ISBs
> > > - Fix the ESR checking
> > > - Fix the el2_setup labels
> > > - Reorder the patches
> > > - Drop the patch that is already in kvmarm
> > > 
> > > Working on it...
> > 
> > Sorry! I'm happy to rebase onto some arm64 branch if that will help, just let me know.
> 
> Please have a look at for-next/poe (also merged into for-next/core and
> for-kernelci) and let me know what I got wrong!
> 
> For Marc: I reordered the series so the KVM bits (and deps) are all the
> beginning, should you need them. The branch is based on a merge of the
> shared branch you created previously.

I just had a quick check, and while there is a small conflict with
kvmarm/next, it is extremely minor (small clash in the vcpu_sysreg,
for which the resolving order doesn't matter), and not worth dragging
additional patches in the shared branch.

However, if KVM's own S1PIE series [1] ends up being merged (which I'd
really like), I will definitely have to pull the prefix in, as this is
a bit more involved conflict wise.

Thanks,

	M.

[1] http://lore.kernel.org/all/20240903153834.1909472-1-maz@kernel.org
Kevin Brodsky Sept. 11, 2024, 3:01 p.m. UTC | #10
On 22/08/2024 17:10, Joey Gouly wrote:
> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  		if (system_supports_tpidr2())
>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>  
> +		if (system_supports_poe())
> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);

Here we are only reloading POR_EL0's value if the target is a user
thread. However, as this series stands, POR_EL0 is also relevant to
kthreads, because any uaccess or GUP done from a kthread will also be
checked against POR_EL0. This is especially important in cases like the
io_uring kthread, which accesses the memory of the user process that
spawned it. To prevent such a kthread from inheriting a stale value of
POR_EL0, it seems that we should reload POR_EL0's value in all cases
(user and kernel thread).

Other approaches could also be considered (e.g. resetting POR_EL0 to
unrestricted when creating a kthread), see my reply on v4 [1].

Kevin

[1]
https://lore.kernel.org/linux-arm-kernel/b4f8b351-4c83-43b4-bfbe-8f67f3f56fb9@arm.com/
Dave Hansen Sept. 11, 2024, 3:33 p.m. UTC | #11
On 9/11/24 08:01, Kevin Brodsky wrote:
> On 22/08/2024 17:10, Joey Gouly wrote:
>> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>  		if (system_supports_tpidr2())
>>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>>  
>> +		if (system_supports_poe())
>> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> Here we are only reloading POR_EL0's value if the target is a user
> thread. However, as this series stands, POR_EL0 is also relevant to
> kthreads, because any uaccess or GUP done from a kthread will also be
> checked against POR_EL0. This is especially important in cases like the
> io_uring kthread, which accesses the memory of the user process that
> spawned it. To prevent such a kthread from inheriting a stale value of
> POR_EL0, it seems that we should reload POR_EL0's value in all cases
> (user and kernel thread).

The problem with this is trying to figure out which POR_EL0 to use.  The
kthread could have been spawned ages ago and might not have a POR_EL0
which is very different from the current value of any of the threads in
the process right now.

There's also no great way for a kthread to reach out and grab an updated
value.  It's all completely inherently racy.

> Other approaches could also be considered (e.g. resetting POR_EL0 to
> unrestricted when creating a kthread), see my reply on v4 [1].

I kinda think this is the only way to go.  It's the only sensible,
predictable way.  I _think_ it's what x86 will end up doing with PKRU,
but there's been enough churn there that I'd need to go double check
what happens in practice.

Either way, it would be nice to get an io_uring test in here that
actually spawns kthreads:

	tools/testing/selftests/mm/protection_keys.c
Will Deacon Sept. 12, 2024, 10:50 a.m. UTC | #12
Hi Dave,

On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> On 9/11/24 08:01, Kevin Brodsky wrote:
> > On 22/08/2024 17:10, Joey Gouly wrote:
> >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> >>  		if (system_supports_tpidr2())
> >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> >>  
> >> +		if (system_supports_poe())
> >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > Here we are only reloading POR_EL0's value if the target is a user
> > thread. However, as this series stands, POR_EL0 is also relevant to
> > kthreads, because any uaccess or GUP done from a kthread will also be
> > checked against POR_EL0. This is especially important in cases like the
> > io_uring kthread, which accesses the memory of the user process that
> > spawned it. To prevent such a kthread from inheriting a stale value of
> > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > (user and kernel thread).
> 
> The problem with this is trying to figure out which POR_EL0 to use.  The
> kthread could have been spawned ages ago and might not have a POR_EL0
> which is very different from the current value of any of the threads in
> the process right now.
> 
> There's also no great way for a kthread to reach out and grab an updated
> value.  It's all completely inherently racy.
> 
> > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > unrestricted when creating a kthread), see my reply on v4 [1].
> 
> I kinda think this is the only way to go.  It's the only sensible,
> predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> but there's been enough churn there that I'd need to go double check
> what happens in practice.

I agree.

> Either way, it would be nice to get an io_uring test in here that
> actually spawns kthreads:
> 
> 	tools/testing/selftests/mm/protection_keys.c

It would be good to update Documentation/core-api/protection-keys.rst
as well, since the example with read() raises more questions than it
answers!

Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
you could send some patches on top so we can iron this out in time for
6.12? I'll also be at LPC next week if you're about.

Cheers,

Will
Joey Gouly Sept. 12, 2024, 12:48 p.m. UTC | #13
On Thu, Sep 12, 2024 at 11:50:18AM +0100, Will Deacon wrote:
> Hi Dave,
> 
> On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> > On 9/11/24 08:01, Kevin Brodsky wrote:
> > > On 22/08/2024 17:10, Joey Gouly wrote:
> > >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > >>  		if (system_supports_tpidr2())
> > >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> > >>  
> > >> +		if (system_supports_poe())
> > >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > Here we are only reloading POR_EL0's value if the target is a user
> > > thread. However, as this series stands, POR_EL0 is also relevant to
> > > kthreads, because any uaccess or GUP done from a kthread will also be
> > > checked against POR_EL0. This is especially important in cases like the
> > > io_uring kthread, which accesses the memory of the user process that
> > > spawned it. To prevent such a kthread from inheriting a stale value of
> > > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > > (user and kernel thread).
> > 
> > The problem with this is trying to figure out which POR_EL0 to use.  The
> > kthread could have been spawned ages ago and might not have a POR_EL0
> > which is very different from the current value of any of the threads in
> > the process right now.
> > 
> > There's also no great way for a kthread to reach out and grab an updated
> > value.  It's all completely inherently racy.
> > 
> > > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > > unrestricted when creating a kthread), see my reply on v4 [1].
> > 
> > I kinda think this is the only way to go.  It's the only sensible,
> > predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> > but there's been enough churn there that I'd need to go double check
> > what happens in practice.
> 
> I agree.
> 
> > Either way, it would be nice to get an io_uring test in here that
> > actually spawns kthreads:
> > 
> > 	tools/testing/selftests/mm/protection_keys.c
> 
> It would be good to update Documentation/core-api/protection-keys.rst
> as well, since the example with read() raises more questions than it
> answers!
> 
> Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
> you could send some patches on top so we can iron this out in time for
> 6.12? I'll also be at LPC next week if you're about.

I found the code in arch/x86 that does this, I must have missed this previously.

arch/x86/kernel/process.c: int copy_thread()                                                                                                                   

        /* Kernel thread ? */                                                                                                                                                                  
        if (unlikely(p->flags & PF_KTHREAD)) {                                                                                                                                                 
                p->thread.pkru = pkru_get_init_value();                                                                                                                                        
                memset(childregs, 0, sizeof(struct pt_regs));                                                                                                                                  
                kthread_frame_init(frame, args->fn, args->fn_arg);                                                                                                                             
                return 0;                                                                                                                                                                      
        }

I can send a similar patch for arm64.  I have no idea how to write io_uring
code, so looking for examples I can work with to get a test written. Might just
send the arm64 fix first, if that's fine?

Thanks,
Joey
Will Deacon Sept. 13, 2024, 3:14 p.m. UTC | #14
On Thu, Sep 12, 2024 at 01:48:35PM +0100, Joey Gouly wrote:
> On Thu, Sep 12, 2024 at 11:50:18AM +0100, Will Deacon wrote:
> > On Wed, Sep 11, 2024 at 08:33:54AM -0700, Dave Hansen wrote:
> > > On 9/11/24 08:01, Kevin Brodsky wrote:
> > > > On 22/08/2024 17:10, Joey Gouly wrote:
> > > >> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
> > > >>  		if (system_supports_tpidr2())
> > > >>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
> > > >>  
> > > >> +		if (system_supports_poe())
> > > >> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
> > > > Here we are only reloading POR_EL0's value if the target is a user
> > > > thread. However, as this series stands, POR_EL0 is also relevant to
> > > > kthreads, because any uaccess or GUP done from a kthread will also be
> > > > checked against POR_EL0. This is especially important in cases like the
> > > > io_uring kthread, which accesses the memory of the user process that
> > > > spawned it. To prevent such a kthread from inheriting a stale value of
> > > > POR_EL0, it seems that we should reload POR_EL0's value in all cases
> > > > (user and kernel thread).
> > > 
> > > The problem with this is trying to figure out which POR_EL0 to use.  The
> > > kthread could have been spawned ages ago and might not have a POR_EL0
> > > which is very different from the current value of any of the threads in
> > > the process right now.
> > > 
> > > There's also no great way for a kthread to reach out and grab an updated
> > > value.  It's all completely inherently racy.
> > > 
> > > > Other approaches could also be considered (e.g. resetting POR_EL0 to
> > > > unrestricted when creating a kthread), see my reply on v4 [1].
> > > 
> > > I kinda think this is the only way to go.  It's the only sensible,
> > > predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> > > but there's been enough churn there that I'd need to go double check
> > > what happens in practice.
> > 
> > I agree.
> > 
> > > Either way, it would be nice to get an io_uring test in here that
> > > actually spawns kthreads:
> > > 
> > > 	tools/testing/selftests/mm/protection_keys.c
> > 
> > It would be good to update Documentation/core-api/protection-keys.rst
> > as well, since the example with read() raises more questions than it
> > answers!
> > 
> > Kevin, Joey -- I've got this series queued in arm64 as-is, so perhaps
> > you could send some patches on top so we can iron this out in time for
> > 6.12? I'll also be at LPC next week if you're about.
> 
> I found the code in arch/x86 that does this, I must have missed this previously.
> 
> arch/x86/kernel/process.c: int copy_thread()                                                                                                                   
> 
>         /* Kernel thread ? */                                                                                                                                                                  
>         if (unlikely(p->flags & PF_KTHREAD)) {                                                                                                                                                 
>                 p->thread.pkru = pkru_get_init_value();                                                                                                                                        
>                 memset(childregs, 0, sizeof(struct pt_regs));                                                                                                                                  
>                 kthread_frame_init(frame, args->fn, args->fn_arg);                                                                                                                             
>                 return 0;                                                                                                                                                                      
>         }
> 
> I can send a similar patch for arm64.  I have no idea how to write io_uring
> code, so looking for examples I can work with to get a test written. Might just
> send the arm64 fix first, if that's fine?

I think fix + documentation is what we need before 6.12, but you've still
got plenty of time after the merge window.

Cheers,

Will
Aneesh Kumar K.V Sept. 22, 2024, 5:49 a.m. UTC | #15
Dave Hansen <dave.hansen@intel.com> writes:

> On 9/11/24 08:01, Kevin Brodsky wrote:
>> On 22/08/2024 17:10, Joey Gouly wrote:
>>> @@ -371,6 +382,9 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>>>  		if (system_supports_tpidr2())
>>>  			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
>>>  
>>> +		if (system_supports_poe())
>>> +			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
>> Here we are only reloading POR_EL0's value if the target is a user
>> thread. However, as this series stands, POR_EL0 is also relevant to
>> kthreads, because any uaccess or GUP done from a kthread will also be
>> checked against POR_EL0. This is especially important in cases like the
>> io_uring kthread, which accesses the memory of the user process that
>> spawned it. To prevent such a kthread from inheriting a stale value of
>> POR_EL0, it seems that we should reload POR_EL0's value in all cases
>> (user and kernel thread).
>
> The problem with this is trying to figure out which POR_EL0 to use.  The
> kthread could have been spawned ages ago and might not have a POR_EL0
> which is very different from the current value of any of the threads in
> the process right now.
>
> There's also no great way for a kthread to reach out and grab an updated
> value.  It's all completely inherently racy.
>
>> Other approaches could also be considered (e.g. resetting POR_EL0 to
>> unrestricted when creating a kthread), see my reply on v4 [1].
>
> I kinda think this is the only way to go.  It's the only sensible,
> predictable way.  I _think_ it's what x86 will end up doing with PKRU,
> but there's been enough churn there that I'd need to go double check
> what happens in practice.
>


that is also what powerpc does. 

/* usage of kthread_use_mm() should inherit the
 * AMR value of the operating address space. But, the AMR value is
 * thread-specific and we inherit the address space and not thread
 * access restrictions. Because of this ignore AMR value when accessing
 * userspace via kernel thread.
 */
static __always_inline u64 current_thread_amr(void)
{
	if (current->thread.regs)
		return current->thread.regs->amr;
	return default_amr;
}


-aneesh
diff mbox series

Patch

diff --git arch/arm64/include/asm/cpufeature.h arch/arm64/include/asm/cpufeature.h
index 558434267271..3d261cc123c1 100644
--- arch/arm64/include/asm/cpufeature.h
+++ arch/arm64/include/asm/cpufeature.h
@@ -832,6 +832,12 @@  static inline bool system_supports_lpa2(void)
 	return cpus_have_final_cap(ARM64_HAS_LPA2);
 }
 
+static inline bool system_supports_poe(void)
+{
+	return IS_ENABLED(CONFIG_ARM64_POE) &&
+		alternative_has_cap_unlikely(ARM64_HAS_S1POE);
+}
+
 int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);
 bool try_emulate_mrs(struct pt_regs *regs, u32 isn);
 
diff --git arch/arm64/include/asm/processor.h arch/arm64/include/asm/processor.h
index f77371232d8c..e6376f979273 100644
--- arch/arm64/include/asm/processor.h
+++ arch/arm64/include/asm/processor.h
@@ -184,6 +184,7 @@  struct thread_struct {
 	u64			sctlr_user;
 	u64			svcr;
 	u64			tpidr2_el0;
+	u64			por_el0;
 };
 
 static inline unsigned int thread_get_vl(struct thread_struct *thread,
diff --git arch/arm64/include/asm/sysreg.h arch/arm64/include/asm/sysreg.h
index 4a9ea103817e..494e9efd856f 100644
--- arch/arm64/include/asm/sysreg.h
+++ arch/arm64/include/asm/sysreg.h
@@ -1077,6 +1077,9 @@ 
 #define POE_RXW		UL(0x7)
 #define POE_MASK	UL(0xf)
 
+/* Initial value for Permission Overlay Extension for EL0 */
+#define POR_EL0_INIT	POE_RXW
+
 #define ARM64_FEATURE_FIELD_BITS	4
 
 /* Defined for compatibility only, do not add new users. */
diff --git arch/arm64/kernel/process.c arch/arm64/kernel/process.c
index 4ae31b7af6c3..a3a61ecdb165 100644
--- arch/arm64/kernel/process.c
+++ arch/arm64/kernel/process.c
@@ -271,12 +271,23 @@  static void flush_tagged_addr_state(void)
 		clear_thread_flag(TIF_TAGGED_ADDR);
 }
 
+static void flush_poe(void)
+{
+	if (!system_supports_poe())
+		return;
+
+	write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);
+	/* ISB required for kernel uaccess routines when changing POR_EL0 */
+	isb();
+}
+
 void flush_thread(void)
 {
 	fpsimd_flush_thread();
 	tls_thread_flush();
 	flush_ptrace_hw_breakpoint(current);
 	flush_tagged_addr_state();
+	flush_poe();
 }
 
 void arch_release_task_struct(struct task_struct *tsk)
@@ -371,6 +382,9 @@  int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 		if (system_supports_tpidr2())
 			p->thread.tpidr2_el0 = read_sysreg_s(SYS_TPIDR2_EL0);
 
+		if (system_supports_poe())
+			p->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
+
 		if (stack_start) {
 			if (is_compat_thread(task_thread_info(p)))
 				childregs->compat_sp = stack_start;
@@ -495,6 +509,19 @@  static void erratum_1418040_new_exec(void)
 	preempt_enable();
 }
 
+static void permission_overlay_switch(struct task_struct *next)
+{
+	if (!system_supports_poe())
+		return;
+
+	current->thread.por_el0 = read_sysreg_s(SYS_POR_EL0);
+	if (current->thread.por_el0 != next->thread.por_el0) {
+		write_sysreg_s(next->thread.por_el0, SYS_POR_EL0);
+		/* ISB required for kernel uaccess routines when chaning POR_EL0 */
+		isb();
+	}
+}
+
 /*
  * __switch_to() checks current->thread.sctlr_user as an optimisation. Therefore
  * this function must be called with preemption disabled and the update to
@@ -530,6 +557,7 @@  struct task_struct *__switch_to(struct task_struct *prev,
 	ssbs_thread_switch(next);
 	erratum_1418040_thread_switch(next);
 	ptrauth_thread_switch_user(next);
+	permission_overlay_switch(next);
 
 	/*
 	 * Complete any pending TLB or cache maintenance on this CPU in case