diff mbox

mm: only enable sys_pkey* when ARCH_HAS_PKEYS

Message ID 1477958904-9903-1-git-send-email-mark.rutland@arm.com
State New
Headers show

Commit Message

Mark Rutland Nov. 1, 2016, 12:08 a.m. UTC
When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc
syscall will return -ENOSPC for all (otherwise well-formed) requests, as the
generic implementation of mm_pkey_alloc() returns -1. The other pkey syscalls
perform some work before always failing, in a similar fashion.

This implies the absence of keys, but otherwise functional pkey support. This
is odd, since the architecture provides no such support. Instead, it would be
preferable to indicate that the syscall is not implemented, since this is
effectively the case.

This patch updates the pkey_* syscalls to return -ENOSYS on architectures
without pkey support.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: torvalds@linux-foundation.org
---
 mm/mprotect.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Hi,

In eyeballing some recent commits I spotted 6127d124ee4eb9c3 ("ARM: wire up new
pkey syscalls"), and in looking into that, I realised that the common pkey code
looks somewhat suspicious.

Many architectures don't have user-modifiable pkey support, and for those, we
perform some unnecessary work before returning unclear error codes.

As the pkey went in this merge window, there's stil time to tighten that up.

Thanks,
Mark.

-- 
2.7.4

Comments

Dave Hansen Nov. 2, 2016, 7:15 p.m. UTC | #1
On 10/31/2016 05:08 PM, Mark Rutland wrote:
> When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc

> syscall will return -ENOSPC for all (otherwise well-formed) requests, as the

> generic implementation of mm_pkey_alloc() returns -1. The other pkey syscalls

> perform some work before always failing, in a similar fashion.

> 

> This implies the absence of keys, but otherwise functional pkey support. This

> is odd, since the architecture provides no such support. Instead, it would be

> preferable to indicate that the syscall is not implemented, since this is

> effectively the case.


This makes the behavior of an x86 cpu without pkeys and an arm cpu
without pkeys differ.  Is that what we want?  An application that
_wants_ to use protection keys but can't needs to handle -ENOSPC anyway.

On an architecture that will never support pkeys, it makes sense to do
-ENOSYS, but that's not the case for arm, right?
Mark Rutland Nov. 4, 2016, 11:44 p.m. UTC | #2
On Wed, Nov 02, 2016 at 12:15:50PM -0700, Dave Hansen wrote:
> On 10/31/2016 05:08 PM, Mark Rutland wrote:

> > When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc

> > syscall will return -ENOSPC for all (otherwise well-formed) requests, as the

> > generic implementation of mm_pkey_alloc() returns -1. The other pkey syscalls

> > perform some work before always failing, in a similar fashion.

> > 

> > This implies the absence of keys, but otherwise functional pkey support. This

> > is odd, since the architecture provides no such support. Instead, it would be

> > preferable to indicate that the syscall is not implemented, since this is

> > effectively the case.

> 

> This makes the behavior of an x86 cpu without pkeys and an arm cpu

> without pkeys differ.  Is that what we want?


My rationale was that we have no idea whether architectures will have pkey
support in future, and if/when they do, we may have to apply additional checks
anyhow. i.e. in cases we'd return -ENOSPC today, we might want to return
another error code.

Returning -ENOSYS retains the current behaviour, and allows us to handle that
ABI issue when we know what architecture support looks like.

Other architectures not using the generic syscalls seem to handle this with
-ENOSYS, e.g. parisc with commit 18088db042dd9ae2, so there's differing
behaviour regardless of arm specifically.

> An application that _wants_ to use protection keys but can't needs to handle

> -ENOSPC anyway.


Sure, and that application *also* has to handle -ENOSYS, given current kernels.

> On an architecture that will never support pkeys, it makes sense to do

> -ENOSYS, but that's not the case for arm, right?


I don't know whether arm or other architectures will have (user-accessible)
pkey-like suport.

Thanks,
Mark.
Heiko Carstens Nov. 8, 2016, 9:30 a.m. UTC | #3
On Fri, Nov 04, 2016 at 11:44:59PM +0000, Mark Rutland wrote:
> On Wed, Nov 02, 2016 at 12:15:50PM -0700, Dave Hansen wrote:

> > On 10/31/2016 05:08 PM, Mark Rutland wrote:

> > > When an architecture does not select CONFIG_ARCH_HAS_PKEYS, the pkey_alloc

> > > syscall will return -ENOSPC for all (otherwise well-formed) requests, as the

> > > generic implementation of mm_pkey_alloc() returns -1. The other pkey syscalls

> > > perform some work before always failing, in a similar fashion.

> > > 

> > > This implies the absence of keys, but otherwise functional pkey support. This

> > > is odd, since the architecture provides no such support. Instead, it would be

> > > preferable to indicate that the syscall is not implemented, since this is

> > > effectively the case.

> > 

> > This makes the behavior of an x86 cpu without pkeys and an arm cpu

> > without pkeys differ.  Is that what we want?

> 

> My rationale was that we have no idea whether architectures will have pkey

> support in future, and if/when they do, we may have to apply additional checks

> anyhow. i.e. in cases we'd return -ENOSPC today, we might want to return

> another error code.

> 

> Returning -ENOSYS retains the current behaviour, and allows us to handle that

> ABI issue when we know what architecture support looks like.

> 

> Other architectures not using the generic syscalls seem to handle this with

> -ENOSYS, e.g. parisc with commit 18088db042dd9ae2, so there's differing

> behaviour regardless of arm specifically.


The three system calls won't return -ENOSYS on architectures which decided
to ignore them (like with with above mentioned commit), since they haven't
allocated a system call number at all.

Right now we have one architecture where these three system calls work if
the cpu supports the feature (x86).

Two architectures (arm, mips) have wired them up and thus allocated system
call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a
bit pointless.

Three architectures (parisc, powerpc, s390) decided to ignore the system
calls completely, but still have the pkey code linked into the kernel
image.

imho the generic pkey code should be ifdef'ed with CONFIG_ARCH_HAS_PKEYS.
Otherwise only dead code will be linked and increase the kernel image size
for no good reason.
Russell King (Oracle) Nov. 8, 2016, 10:41 a.m. UTC | #4
On Tue, Nov 08, 2016 at 10:30:42AM +0100, Heiko Carstens wrote:
> Two architectures (arm, mips) have wired them up and thus allocated system

> call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a

> bit pointless.


I don't think it's pointless at all.  First, read the LWN article for
the userspace side of the interface: https://lwn.net/Articles/689395/

From reading this, it seems (at least to me) that these pkey syscalls
are going to be the application level API - which means applications
are probably going to want to make these calls.

Sure, they'll have to go through glibc, and glibc can provide stubs,
but the problem with that is if we do get hardware pkey support (eg,
due to pressure to increase security) then we're going to end up
needing both kernel changes and glibc changes to add the calls.

Since one of the design goals of pkeys is to allow them to work when
there is no underlying hardware support, I see no reason not to wire
them up in architecture syscall tables today, so that we have a cross-
architecture kernel version where the pkey syscalls become available.
glibc (and other libcs) don't then have to mess around with per-
architecture recording of which kernel version the pkey syscalls were
added.

Not wiring up the syscalls doesn't really gain anything: the code
present when !ARCH_HAS_PKEYS will still be part of the kernel image,
it just won't be callable.

So, on balance, I've decided to wire them up on ARM, even though the
hardware doesn't support them, to avoid unnecessary pain in userspace
from the ARM side of things.

Obviously what other architectures do is their own business.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Heiko Carstens Nov. 8, 2016, 11:24 a.m. UTC | #5
On Tue, Nov 08, 2016 at 10:41:12AM +0000, Russell King - ARM Linux wrote:
> On Tue, Nov 08, 2016 at 10:30:42AM +0100, Heiko Carstens wrote:

> > Two architectures (arm, mips) have wired them up and thus allocated system

> > call numbers, even though they don't have ARCH_HAS_PKEYS set. Which seems a

> > bit pointless.

> 

> I don't think it's pointless at all.  First, read the LWN article for

> the userspace side of the interface: https://lwn.net/Articles/689395/

> 

> From reading this, it seems (at least to me) that these pkey syscalls

> are going to be the application level API - which means applications

> are probably going to want to make these calls.

> 

> Sure, they'll have to go through glibc, and glibc can provide stubs,

> but the problem with that is if we do get hardware pkey support (eg,

> due to pressure to increase security) then we're going to end up

> needing both kernel changes and glibc changes to add the calls.

> 

> Since one of the design goals of pkeys is to allow them to work when

> there is no underlying hardware support, I see no reason not to wire

> them up in architecture syscall tables today, so that we have a cross-

> architecture kernel version where the pkey syscalls become available.

> glibc (and other libcs) don't then have to mess around with per-

> architecture recording of which kernel version the pkey syscalls were

> added.

> 

> Not wiring up the syscalls doesn't really gain anything: the code

> present when !ARCH_HAS_PKEYS will still be part of the kernel image,

> it just won't be callable.


That can be easily solved (see below).

> So, on balance, I've decided to wire them up on ARM, even though the

> hardware doesn't support them, to avoid unnecessary pain in userspace

> from the ARM side of things.

> 

> Obviously what other architectures do is their own business.


It would make sense if this would be handled the same across architectures.

We could simply ifdef the small pkey block in mprotect.c and rely on the
pkey cond_syscalls added with e2753293ac4b. The result would actually be
the same what Mark proposed, except with less generated code.

Something like this:diff --git a/mm/mprotect.c b/mm/mprotect.c
index 11936526b08b..9fb86b107e49 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 	return do_mprotect_pkey(start, len, prot, -1);
 }
 
+#ifdef CONFIG_ARCH_HAS_PKEYS
+
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
@@ -534,3 +536,4 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)
 	 */
 	return ret;
 }
+#endif /* CONFIG_ARCH_HAS_PKEYS */

Arnd Bergmann Nov. 8, 2016, 11:39 a.m. UTC | #6
On Tuesday, November 8, 2016 10:30:42 AM CET Heiko Carstens wrote:
> Three architectures (parisc, powerpc, s390) decided to ignore the system

> calls completely, but still have the pkey code linked into the kernel

> image.


Wouldn't it actually make sense to hook this up to the storage keys
in the s390 page tables?

	Arnd
Heiko Carstens Nov. 8, 2016, 12:05 p.m. UTC | #7
On Tue, Nov 08, 2016 at 12:39:28PM +0100, Arnd Bergmann wrote:
> On Tuesday, November 8, 2016 10:30:42 AM CET Heiko Carstens wrote:

> > Three architectures (parisc, powerpc, s390) decided to ignore the system

> > calls completely, but still have the pkey code linked into the kernel

> > image.

> 

> Wouldn't it actually make sense to hook this up to the storage keys

> in the s390 page tables?


We have storage keys per _physical_ page. Not per page within the the table
entries. So this doesn't work unfortunately.
Dave Hansen Nov. 8, 2016, 6:39 p.m. UTC | #8
On 11/08/2016 03:24 AM, Heiko Carstens wrote:
> Something like this:

> 

> diff --git a/mm/mprotect.c b/mm/mprotect.c

> index 11936526b08b..9fb86b107e49 100644

> --- a/mm/mprotect.c

> +++ b/mm/mprotect.c

> @@ -484,6 +484,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,

>  	return do_mprotect_pkey(start, len, prot, -1);

>  }

>  

> +#ifdef CONFIG_ARCH_HAS_PKEYS

> +

>  SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,

>  		unsigned long, prot, int, pkey)

>  {

> @@ -534,3 +536,4 @@ SYSCALL_DEFINE1(pkey_free, int, pkey)

>  	 */

>  	return ret;

>  }

> +#endif /* CONFIG_ARCH_HAS_PKEYS */


That's fine with me, fwiw.  It ends up meaning that the config option
changes whether we get -ENOSPC vs. -ENOSYS, so the x86_32 behavior will
change, for instance.  But, I _think_ that's OK.
diff mbox

Patch

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 1193652..cda3abf 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -487,6 +487,9 @@  SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len,
 		unsigned long, prot, int, pkey)
 {
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_PKEYS))
+		return -ENOSYS;
+
 	return do_mprotect_pkey(start, len, prot, pkey);
 }
 
@@ -495,6 +498,9 @@  SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned long, init_val)
 	int pkey;
 	int ret;
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_PKEYS))
+		return -ENOSYS;
+
 	/* No flags supported yet. */
 	if (flags)
 		return -EINVAL;
@@ -524,6 +530,9 @@  SYSCALL_DEFINE1(pkey_free, int, pkey)
 {
 	int ret;
 
+	if (!IS_ENABLED(CONFIG_ARCH_HAS_PKEYS))
+		return -ENOSYS;
+
 	down_write(&current->mm->mmap_sem);
 	ret = mm_pkey_free(current->mm, pkey);
 	up_write(&current->mm->mmap_sem);