mbox series

[RFC,0/2] arm64: optional paranoid __{get,put}_user checks

Message ID 20171026090942.7041-1-mark.rutland@arm.com
Headers show
Series arm64: optional paranoid __{get,put}_user checks | expand

Message

Mark Rutland Oct. 26, 2017, 9:09 a.m. UTC
Hi,

In Prague, Kees mentioned that it would be nice to have a mechanism to
catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]
issue with unsafe_put_user() in waitid().

These patches allow an optional access_ok() check to be dropped in
arm64's __{get,put}_user() primitives. These will then BUG() if a bad
user pointer is passed (which should only happen in the absence of an
earlier access_ok() check).

The first patch rewrites the arm64 access_ok() check in C. This gives
the compiler the visibility it needs to elide redundant access_ok()
checks, so in the common case:

  get_user()
    access_ok()
    __get_user()
      BUG_ON(!access_ok())
      <uaccess asm>

... the compiler can determine that the second access_ok() must return
true, and can elide it along with the BUG_ON(), leaving:

  get_user()
    access_ok()
      __get_user()
        <uaccess asm>

... and thus this sanity check can have no cost in the common case.

The compiler doesn't always have the visibility to do this (e.g. if the
two access_ok() checks are in different compilation units), but it seems
to manage to do this most of the time -- In testing with v4.14-rc5
defconfig this only increases the total Image size by 4KiB.

I had a go at turning this into a BUILD_BUG_ON(), to see if we could
catch this issue at compile time. However, my GCC wasn't able to remove
the BUILD_BUG() from some {get,put}_user cases. Maybe we can fix that,
or maybe we can have some static analysis catch this at build time.

It's entirely possible that I've made some catastrophic mistake in these
patches; I've only build-tested them so far, and I don't currently have
access to hardware to test on.

I also haven't yet modified __copy_{to,from}_user and friends along
similar lines, so this is incomplete. If there aren't any major
objections to this approach, I can fold those in for the next spin.

Thanks,
Mark.

[1] https://lwn.net/Articles/736348/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96ca579a1ecc943b75beba58bebb0356f6cc4b51


Mark Rutland (2):
  arm64: write __range_ok() in C
  arm64: allow paranoid __{get,put}user

 arch/arm64/Kconfig               |  9 +++++++++
 arch/arm64/include/asm/uaccess.h | 27 +++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

-- 
2.11.0

Comments

Will Deacon Oct. 27, 2017, 3:41 p.m. UTC | #1
On Thu, Oct 26, 2017 at 10:09:40AM +0100, Mark Rutland wrote:
> Hi,

> 

> In Prague, Kees mentioned that it would be nice to have a mechanism to

> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

> issue with unsafe_put_user() in waitid().

> 

> These patches allow an optional access_ok() check to be dropped in

> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

> user pointer is passed (which should only happen in the absence of an

> earlier access_ok() check).

> 

> The first patch rewrites the arm64 access_ok() check in C. This gives

> the compiler the visibility it needs to elide redundant access_ok()

> checks, so in the common case:

> 

>   get_user()

>     access_ok()

>     __get_user()

>       BUG_ON(!access_ok())

>       <uaccess asm>

> 

> ... the compiler can determine that the second access_ok() must return

> true, and can elide it along with the BUG_ON(), leaving:

> 

>   get_user()

>     access_ok()

>       __get_user()

>         <uaccess asm>

> 

> ... and thus this sanity check can have no cost in the common case.


Probably a stupid question, but why not just move the access_ok check
into __{get,put}_user and remove it from {get,put}_user? We can also
then move the uaccess_{enable,disable}_not_uao calls out from the __
variants so that we can implement user_access_{begin,end}.

Will
Mark Rutland Oct. 27, 2017, 8:44 p.m. UTC | #2
On Fri, Oct 27, 2017 at 04:41:13PM +0100, Will Deacon wrote:
> On Thu, Oct 26, 2017 at 10:09:40AM +0100, Mark Rutland wrote:

> > Hi,

> > 

> > In Prague, Kees mentioned that it would be nice to have a mechanism to

> > catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

> > issue with unsafe_put_user() in waitid().

> > 

> > These patches allow an optional access_ok() check to be dropped in

> > arm64's __{get,put}_user() primitives. These will then BUG() if a bad

> > user pointer is passed (which should only happen in the absence of an

> > earlier access_ok() check).

> > 

> > The first patch rewrites the arm64 access_ok() check in C. This gives

> > the compiler the visibility it needs to elide redundant access_ok()

> > checks, so in the common case:

> > 

> >   get_user()

> >     access_ok()

> >     __get_user()

> >       BUG_ON(!access_ok())

> >       <uaccess asm>

> > 

> > ... the compiler can determine that the second access_ok() must return

> > true, and can elide it along with the BUG_ON(), leaving:

> > 

> >   get_user()

> >     access_ok()

> >       __get_user()

> >         <uaccess asm>

> > 

> > ... and thus this sanity check can have no cost in the common case.

> 

> Probably a stupid question, but why not just move the access_ok check

> into __{get,put}_user and remove it from {get,put}_user?


Good question.

I was considering this as a debug option, making it possible to catch unsafe
__{get,put}_user() uses via fuzzing or at build time.

As a hardening option, it would make more sense to always have the check in
__{get,put}_user().

> We can also then move the uaccess_{enable,disable}_not_uao calls out from the

> __ variants so that we can implement user_access_{begin,end}.


Mhmm. I'll take a look at this for v2, afer I've figured out precisely what
I've broken with this RFC.

I'd still like the option to scream on unsafe __{get,put}_user() calls, but it
should be possible to handle both cases with minimal IS_ENABLED() usage.

Thanks,
Mark.
Russell King (Oracle) Oct. 28, 2017, 8:47 a.m. UTC | #3
On Fri, Oct 27, 2017 at 04:41:13PM +0100, Will Deacon wrote:
> Probably a stupid question, but why not just move the access_ok check

> into __{get,put}_user and remove it from {get,put}_user? We can also

> then move the uaccess_{enable,disable}_not_uao calls out from the __

> variants so that we can implement user_access_{begin,end}.


The intent of __{get,put}_user() is to have a fast accessor compared
to {get,put}_user() which does all the full checks.

However, with the uaccess stuff we have now by default, I don't think
it makes much sense - maybe we're better off using copy_{to,from}_user()
in those code paths and fixing up the struct in kernel space rather than
__{get,put}_user()?

I suspect that if we do have the full checks in __{get,put}_user() that
makes the case stronger for doing that - and maybe killing the __
accessors entirely.

Take a look at kernel/signal.c to see a typical usage of the __
accessors.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Laura Abbott Oct. 31, 2017, 11:56 p.m. UTC | #4
On 10/26/2017 02:09 AM, Mark Rutland wrote:
> Hi,

> 

> In Prague, Kees mentioned that it would be nice to have a mechanism to

> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

> issue with unsafe_put_user() in waitid().

> 

> These patches allow an optional access_ok() check to be dropped in

> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

> user pointer is passed (which should only happen in the absence of an

> earlier access_ok() check).

> 

> The first patch rewrites the arm64 access_ok() check in C. This gives

> the compiler the visibility it needs to elide redundant access_ok()

> checks, so in the common case:

> 

>   get_user()

>     access_ok()

>     __get_user()

>       BUG_ON(!access_ok())

>       <uaccess asm>

> 

> ... the compiler can determine that the second access_ok() must return

> true, and can elide it along with the BUG_ON(), leaving:

> 

>   get_user()

>     access_ok()

>       __get_user()

>         <uaccess asm>

> 

> ... and thus this sanity check can have no cost in the common case.

> 

> The compiler doesn't always have the visibility to do this (e.g. if the

> two access_ok() checks are in different compilation units), but it seems

> to manage to do this most of the time -- In testing with v4.14-rc5

> defconfig this only increases the total Image size by 4KiB.

> 

> I had a go at turning this into a BUILD_BUG_ON(), to see if we could

> catch this issue at compile time. However, my GCC wasn't able to remove

> the BUILD_BUG() from some {get,put}_user cases. Maybe we can fix that,

> or maybe we can have some static analysis catch this at build time.

> 

> It's entirely possible that I've made some catastrophic mistake in these

> patches; I've only build-tested them so far, and I don't currently have

> access to hardware to test on.

> 

> I also haven't yet modified __copy_{to,from}_user and friends along

> similar lines, so this is incomplete. If there aren't any major

> objections to this approach, I can fold those in for the next spin.

> 

> Thanks,

> Mark.

> 

> [1] https://lwn.net/Articles/736348/

> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96ca579a1ecc943b75beba58bebb0356f6cc4b51

> 

> 

> Mark Rutland (2):

>   arm64: write __range_ok() in C

>   arm64: allow paranoid __{get,put}user

> 

>  arch/arm64/Kconfig               |  9 +++++++++

>  arch/arm64/include/asm/uaccess.h | 27 +++++++++++++++++++--------

>  2 files changed, 28 insertions(+), 8 deletions(-)

> 


Turning on the option fails as soon as we hit userspace. On my buildroot
based environment I get the help text for ld.so (????) and then a message
about attempting to kill init. I get a crash in init on the Hikey Android
environment as well. It almost seems like the __range_ok re-write
is triggering an error but it only seems to happen when the option is
enabled even when I take out the BUG. I'll see if I can get more useful
information.

Thanks,
Laura
Mark Rutland Nov. 1, 2017, 12:05 p.m. UTC | #5
On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:
> On 10/26/2017 02:09 AM, Mark Rutland wrote:

> > In Prague, Kees mentioned that it would be nice to have a mechanism to

> > catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

> > issue with unsafe_put_user() in waitid().

> > 

> > These patches allow an optional access_ok() check to be dropped in

> > arm64's __{get,put}_user() primitives. These will then BUG() if a bad

> > user pointer is passed (which should only happen in the absence of an

> > earlier access_ok() check).


> Turning on the option fails as soon as we hit userspace. On my buildroot

> based environment I get the help text for ld.so (????) and then a message

> about attempting to kill init. 


Ouch. Thanks for the report, and sorry about this.

The problem is that I evaluate the ptr argument twice in
__{get,put}_user(), and this may have side effects.

e.g. when the ELF loader does things like:

  __put_user((elf_addr_t)p, sp++)

... we increment sp twice, and write to the wrong user address, leaving
sp corrupt.

I have an additional patch [1] to fix this, which is in my
arm64/access-ok branch [2].

Thanks,
Mark.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok
Laura Abbott Nov. 1, 2017, 9:13 p.m. UTC | #6
On 11/01/2017 05:05 AM, Mark Rutland wrote:
> On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:

>> On 10/26/2017 02:09 AM, Mark Rutland wrote:

>>> In Prague, Kees mentioned that it would be nice to have a mechanism to

>>> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

>>> issue with unsafe_put_user() in waitid().

>>>

>>> These patches allow an optional access_ok() check to be dropped in

>>> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

>>> user pointer is passed (which should only happen in the absence of an

>>> earlier access_ok() check).

> 

>> Turning on the option fails as soon as we hit userspace. On my buildroot

>> based environment I get the help text for ld.so (????) and then a message

>> about attempting to kill init. 

> 

> Ouch. Thanks for the report, and sorry about this.

> 

> The problem is that I evaluate the ptr argument twice in

> __{get,put}_user(), and this may have side effects.

> 

> e.g. when the ELF loader does things like:

> 

>   __put_user((elf_addr_t)p, sp++)

> 

> ... we increment sp twice, and write to the wrong user address, leaving

> sp corrupt.

> 

> I have an additional patch [1] to fix this, which is in my

> arm64/access-ok branch [2].

> 

> Thanks,

> Mark.

> 

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543

> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok

> 


Thanks, the updated patch works. I wrote an LKDTM test to verify
the expected behavior (__{get,put}_user panic whereas {get,put}_user
do not). You're welcome to add Tested-by or I can wait for v2.

Thanks,
Laura
Kees Cook Nov. 1, 2017, 10:28 p.m. UTC | #7
On Wed, Nov 1, 2017 at 2:13 PM, Laura Abbott <labbott@redhat.com> wrote:
> On 11/01/2017 05:05 AM, Mark Rutland wrote:

>> On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:

>>> On 10/26/2017 02:09 AM, Mark Rutland wrote:

>>>> In Prague, Kees mentioned that it would be nice to have a mechanism to

>>>> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

>>>> issue with unsafe_put_user() in waitid().

>>>>

>>>> These patches allow an optional access_ok() check to be dropped in

>>>> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

>>>> user pointer is passed (which should only happen in the absence of an

>>>> earlier access_ok() check).

>>

>>> Turning on the option fails as soon as we hit userspace. On my buildroot

>>> based environment I get the help text for ld.so (????) and then a message

>>> about attempting to kill init.

>>

>> Ouch. Thanks for the report, and sorry about this.

>>

>> The problem is that I evaluate the ptr argument twice in

>> __{get,put}_user(), and this may have side effects.

>>

>> e.g. when the ELF loader does things like:

>>

>>   __put_user((elf_addr_t)p, sp++)

>>

>> ... we increment sp twice, and write to the wrong user address, leaving

>> sp corrupt.

>>

>> I have an additional patch [1] to fix this, which is in my

>> arm64/access-ok branch [2].

>>

>> Thanks,

>> Mark.

>>

>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543

>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok

>>

>

> Thanks, the updated patch works. I wrote an LKDTM test to verify

> the expected behavior (__{get,put}_user panic whereas {get,put}_user

> do not). You're welcome to add Tested-by or I can wait for v2.


Nice. :) Out of curiosity, can you check if this correctly BUG()s on a
waitid() call when the fixes are reverted?

96ca579a1ecc ("waitid(): Avoid unbalanced user_access_end() on
access_ok() error")
1c9fec470b81 ("waitid(): Add missing access_ok() checks")

-Kees

-- 
Kees Cook
Pixel Security
Laura Abbott Nov. 1, 2017, 11:05 p.m. UTC | #8
On 11/01/2017 03:28 PM, Kees Cook wrote:
> On Wed, Nov 1, 2017 at 2:13 PM, Laura Abbott <labbott@redhat.com> wrote:

>> On 11/01/2017 05:05 AM, Mark Rutland wrote:

>>> On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:

>>>> On 10/26/2017 02:09 AM, Mark Rutland wrote:

>>>>> In Prague, Kees mentioned that it would be nice to have a mechanism to

>>>>> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

>>>>> issue with unsafe_put_user() in waitid().

>>>>>

>>>>> These patches allow an optional access_ok() check to be dropped in

>>>>> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

>>>>> user pointer is passed (which should only happen in the absence of an

>>>>> earlier access_ok() check).

>>>

>>>> Turning on the option fails as soon as we hit userspace. On my buildroot

>>>> based environment I get the help text for ld.so (????) and then a message

>>>> about attempting to kill init.

>>>

>>> Ouch. Thanks for the report, and sorry about this.

>>>

>>> The problem is that I evaluate the ptr argument twice in

>>> __{get,put}_user(), and this may have side effects.

>>>

>>> e.g. when the ELF loader does things like:

>>>

>>>   __put_user((elf_addr_t)p, sp++)

>>>

>>> ... we increment sp twice, and write to the wrong user address, leaving

>>> sp corrupt.

>>>

>>> I have an additional patch [1] to fix this, which is in my

>>> arm64/access-ok branch [2].

>>>

>>> Thanks,

>>> Mark.

>>>

>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543

>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok

>>>

>>

>> Thanks, the updated patch works. I wrote an LKDTM test to verify

>> the expected behavior (__{get,put}_user panic whereas {get,put}_user

>> do not). You're welcome to add Tested-by or I can wait for v2.

> 

> Nice. :) Out of curiosity, can you check if this correctly BUG()s on a

> waitid() call when the fixes are reverted?

> 

> 96ca579a1ecc ("waitid(): Avoid unbalanced user_access_end() on

> access_ok() error")

> 1c9fec470b81 ("waitid(): Add missing access_ok() checks")

> 

> -Kees

> 


Yep, we get a nice bug:

[   34.783912] ------------[ cut here ]------------
[   34.784484] kernel BUG at kernel/exit.c:1614!
[   34.785016] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[   34.785572] Modules linked in:
[   34.786177] CPU: 0 PID: 1324 Comm: a.out Not tainted 4.14.0-rc5-00005-ga3bb7b0f72d3 #69
[   34.786657] Hardware name: linux,dummy-virt (DT)
[   34.787093] task: ffff80003c4ed400 task.stack: ffff00000ade0000
[   34.788196] PC is at SyS_waitid+0x1d4/0x210
[   34.788534] LR is at SyS_waitid+0x20/0x210
[   34.788839] pc : [<ffff0000080cde1c>] lr : [<ffff0000080cdc68>] pstate: a0000145
[   34.789310] sp : ffff00000ade3e00
[   34.789578] x29: ffff00000ade3e00 x28: ffff80003c4ed400 
[   34.790039] x27: ffff0000089e1000 x26: 000000000000005f 
[   34.790397] x25: 0000000000000124 x24: 0000000000000015 
[   34.790649] x23: 0000000080000000 x22: 0000ffffb1eb6b24 
[   34.790897] x21: 00000000ffffffff x20: 000080003600c000 
[   34.791149] x19: ffff800000000000 x18: 0000000000000007 
[   34.791397] x17: 0000000000000001 x16: 0000000000000019 
[   34.791648] x15: 0000000000000033 x14: 000000000000004c 
[   34.791903] x13: 0000000000000068 x12: ffff000008af69d0 
[   34.792156] x11: ffff00000ade3c20 x10: 0000000000000000 
[   34.792451] x9 : 0000000000000000 x8 : 0000000000000000 
[   34.792706] x7 : 0000000000000000 x6 : ffff000008f42f18 
[   34.792965] x5 : dead000000000100 x4 : 0000000000000011 
[   34.793214] x3 : ffff80003c4ed400 x2 : ffff800000000004 
[   34.793462] x1 : 0001000000000000 x0 : 0000000000000000 
[   34.793743] Process a.out (pid: 1324, stack limit = 0xffff00000ade0000)
[   34.794098] Call trace:
[   34.794351] Exception stack(0xffff00000ade3cc0 to 0xffff00000ade3e00)
[   34.794722] 3cc0: 0000000000000000 0001000000000000 ffff800000000004 ffff80003c4ed400
[   34.795034] 3ce0: 0000000000000011 dead000000000100 ffff000008f42f18 0000000000000000
[   34.795297] 3d00: 0000000000000000 0000000000000000 0000000000000000 ffff00000ade3c20
[   34.795549] 3d20: ffff000008af69d0 0000000000000068 000000000000004c 0000000000000033
[   34.795803] 3d40: 0000000000000019 0000000000000001 0000000000000007 ffff800000000000
[   34.796066] 3d60: 000080003600c000 00000000ffffffff 0000ffffb1eb6b24 0000000080000000
[   34.796277] 3d80: 0000000000000015 0000000000000124 000000000000005f ffff0000089e1000
[   34.796477] 3da0: ffff80003c4ed400 ffff00000ade3e00 ffff0000080cdc68 ffff00000ade3e00
[   34.796677] 3dc0: ffff0000080cde1c 00000000a0000145 0000000000000000 ffff80003c4ed400
[   34.796884] 3de0: 0001000000000000 dead000000000100 ffff00000ade3e00 ffff0000080cde1c
[   34.797153] [<ffff0000080cde1c>] SyS_waitid+0x1d4/0x210
[   34.797298] Exception stack(0xffff00000ade3ec0 to 0xffff00000ade4000)
[   34.797470] 3ec0: 0000000000000000 0000000000000000 ffff800000000000 0000000000000004
[   34.797670] 3ee0: 0000000000000000 0400000055550400 0000ffffb1e0a011 0000000000000cbf
[   34.797873] 3f00: 000000000000005f 0000000000000a3b 0000000000000000 16170f120a1a1311
[   34.798073] 3f20: 00000000000001a8 0000ffffb1f8ecb8 0000ffffb1e1c0e0 0000000000000002
[   34.798272] 3f40: 0000ffffb1eb6af0 0000000000420018 0000000000000000 0000000000400740
[   34.798471] 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   34.798668] 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   34.798870] 3fa0: 0000000000000000 0000fffff7294200 00000000004006fc 0000fffff7294200
[   34.799068] 3fc0: 0000ffffb1eb6b24 0000000080000000 0000000000000000 000000000000005f
[   34.799265] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[   34.799474] [<ffff0000080837b0>] el0_svc_naked+0x24/0x28
[   34.799715] Code: f9400fb4 17ffff9a d503201f f9000fb4 (d4210000) 
[   34.800121] ---[ end trace a14ca5cd5d8f9b30 ]---
Kees Cook Nov. 1, 2017, 11:29 p.m. UTC | #9
On Wed, Nov 1, 2017 at 4:05 PM, Laura Abbott <labbott@redhat.com> wrote:
> On 11/01/2017 03:28 PM, Kees Cook wrote:

>> On Wed, Nov 1, 2017 at 2:13 PM, Laura Abbott <labbott@redhat.com> wrote:

>>> On 11/01/2017 05:05 AM, Mark Rutland wrote:

>>>> On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:

>>>>> On 10/26/2017 02:09 AM, Mark Rutland wrote:

>>>>>> In Prague, Kees mentioned that it would be nice to have a mechanism to

>>>>>> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

>>>>>> issue with unsafe_put_user() in waitid().

>>>>>>

>>>>>> These patches allow an optional access_ok() check to be dropped in

>>>>>> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

>>>>>> user pointer is passed (which should only happen in the absence of an

>>>>>> earlier access_ok() check).

>>>>

>>>>> Turning on the option fails as soon as we hit userspace. On my buildroot

>>>>> based environment I get the help text for ld.so (????) and then a message

>>>>> about attempting to kill init.

>>>>

>>>> Ouch. Thanks for the report, and sorry about this.

>>>>

>>>> The problem is that I evaluate the ptr argument twice in

>>>> __{get,put}_user(), and this may have side effects.

>>>>

>>>> e.g. when the ELF loader does things like:

>>>>

>>>>   __put_user((elf_addr_t)p, sp++)

>>>>

>>>> ... we increment sp twice, and write to the wrong user address, leaving

>>>> sp corrupt.

>>>>

>>>> I have an additional patch [1] to fix this, which is in my

>>>> arm64/access-ok branch [2].

>>>>

>>>> Thanks,

>>>> Mark.

>>>>

>>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543

>>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok

>>>>

>>>

>>> Thanks, the updated patch works. I wrote an LKDTM test to verify

>>> the expected behavior (__{get,put}_user panic whereas {get,put}_user

>>> do not). You're welcome to add Tested-by or I can wait for v2.

>>

>> Nice. :) Out of curiosity, can you check if this correctly BUG()s on a

>> waitid() call when the fixes are reverted?

>>

>> 96ca579a1ecc ("waitid(): Avoid unbalanced user_access_end() on

>> access_ok() error")

>> 1c9fec470b81 ("waitid(): Add missing access_ok() checks")

>>

>> -Kees

>>

>

> Yep, we get a nice bug:

>

> [   34.783912] ------------[ cut here ]------------

> [   34.784484] kernel BUG at kernel/exit.c:1614!


Awesome! :)

I wonder how hard it might be to make this happen on x86 too (or
generically). Hmmm

-Kees

> [   34.785016] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

> [   34.785572] Modules linked in:

> [   34.786177] CPU: 0 PID: 1324 Comm: a.out Not tainted 4.14.0-rc5-00005-ga3bb7b0f72d3 #69

> [   34.786657] Hardware name: linux,dummy-virt (DT)

> [   34.787093] task: ffff80003c4ed400 task.stack: ffff00000ade0000

> [   34.788196] PC is at SyS_waitid+0x1d4/0x210

> [   34.788534] LR is at SyS_waitid+0x20/0x210

> [   34.788839] pc : [<ffff0000080cde1c>] lr : [<ffff0000080cdc68>] pstate: a0000145

> [   34.789310] sp : ffff00000ade3e00

> [   34.789578] x29: ffff00000ade3e00 x28: ffff80003c4ed400

> [   34.790039] x27: ffff0000089e1000 x26: 000000000000005f

> [   34.790397] x25: 0000000000000124 x24: 0000000000000015

> [   34.790649] x23: 0000000080000000 x22: 0000ffffb1eb6b24

> [   34.790897] x21: 00000000ffffffff x20: 000080003600c000

> [   34.791149] x19: ffff800000000000 x18: 0000000000000007

> [   34.791397] x17: 0000000000000001 x16: 0000000000000019

> [   34.791648] x15: 0000000000000033 x14: 000000000000004c

> [   34.791903] x13: 0000000000000068 x12: ffff000008af69d0

> [   34.792156] x11: ffff00000ade3c20 x10: 0000000000000000

> [   34.792451] x9 : 0000000000000000 x8 : 0000000000000000

> [   34.792706] x7 : 0000000000000000 x6 : ffff000008f42f18

> [   34.792965] x5 : dead000000000100 x4 : 0000000000000011

> [   34.793214] x3 : ffff80003c4ed400 x2 : ffff800000000004

> [   34.793462] x1 : 0001000000000000 x0 : 0000000000000000

> [   34.793743] Process a.out (pid: 1324, stack limit = 0xffff00000ade0000)

> [   34.794098] Call trace:

> [   34.794351] Exception stack(0xffff00000ade3cc0 to 0xffff00000ade3e00)

> [   34.794722] 3cc0: 0000000000000000 0001000000000000 ffff800000000004 ffff80003c4ed400

> [   34.795034] 3ce0: 0000000000000011 dead000000000100 ffff000008f42f18 0000000000000000

> [   34.795297] 3d00: 0000000000000000 0000000000000000 0000000000000000 ffff00000ade3c20

> [   34.795549] 3d20: ffff000008af69d0 0000000000000068 000000000000004c 0000000000000033

> [   34.795803] 3d40: 0000000000000019 0000000000000001 0000000000000007 ffff800000000000

> [   34.796066] 3d60: 000080003600c000 00000000ffffffff 0000ffffb1eb6b24 0000000080000000

> [   34.796277] 3d80: 0000000000000015 0000000000000124 000000000000005f ffff0000089e1000

> [   34.796477] 3da0: ffff80003c4ed400 ffff00000ade3e00 ffff0000080cdc68 ffff00000ade3e00

> [   34.796677] 3dc0: ffff0000080cde1c 00000000a0000145 0000000000000000 ffff80003c4ed400

> [   34.796884] 3de0: 0001000000000000 dead000000000100 ffff00000ade3e00 ffff0000080cde1c

> [   34.797153] [<ffff0000080cde1c>] SyS_waitid+0x1d4/0x210

> [   34.797298] Exception stack(0xffff00000ade3ec0 to 0xffff00000ade4000)

> [   34.797470] 3ec0: 0000000000000000 0000000000000000 ffff800000000000 0000000000000004

> [   34.797670] 3ee0: 0000000000000000 0400000055550400 0000ffffb1e0a011 0000000000000cbf

> [   34.797873] 3f00: 000000000000005f 0000000000000a3b 0000000000000000 16170f120a1a1311

> [   34.798073] 3f20: 00000000000001a8 0000ffffb1f8ecb8 0000ffffb1e1c0e0 0000000000000002

> [   34.798272] 3f40: 0000ffffb1eb6af0 0000000000420018 0000000000000000 0000000000400740

> [   34.798471] 3f60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

> [   34.798668] 3f80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

> [   34.798870] 3fa0: 0000000000000000 0000fffff7294200 00000000004006fc 0000fffff7294200

> [   34.799068] 3fc0: 0000ffffb1eb6b24 0000000080000000 0000000000000000 000000000000005f

> [   34.799265] 3fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000

> [   34.799474] [<ffff0000080837b0>] el0_svc_naked+0x24/0x28

> [   34.799715] Code: f9400fb4 17ffff9a d503201f f9000fb4 (d4210000)

> [   34.800121] ---[ end trace a14ca5cd5d8f9b30 ]---




-- 
Kees Cook
Pixel Security
Laura Abbott Nov. 2, 2017, 1:25 a.m. UTC | #10
On 11/01/2017 04:29 PM, Kees Cook wrote:
> On Wed, Nov 1, 2017 at 4:05 PM, Laura Abbott <labbott@redhat.com> wrote:

>> On 11/01/2017 03:28 PM, Kees Cook wrote:

>>> On Wed, Nov 1, 2017 at 2:13 PM, Laura Abbott <labbott@redhat.com> wrote:

>>>> On 11/01/2017 05:05 AM, Mark Rutland wrote:

>>>>> On Tue, Oct 31, 2017 at 04:56:39PM -0700, Laura Abbott wrote:

>>>>>> On 10/26/2017 02:09 AM, Mark Rutland wrote:

>>>>>>> In Prague, Kees mentioned that it would be nice to have a mechanism to

>>>>>>> catch bad __{get,put}_user uses, such as the recent CVE-2017-5123 [1,2]

>>>>>>> issue with unsafe_put_user() in waitid().

>>>>>>>

>>>>>>> These patches allow an optional access_ok() check to be dropped in

>>>>>>> arm64's __{get,put}_user() primitives. These will then BUG() if a bad

>>>>>>> user pointer is passed (which should only happen in the absence of an

>>>>>>> earlier access_ok() check).

>>>>>

>>>>>> Turning on the option fails as soon as we hit userspace. On my buildroot

>>>>>> based environment I get the help text for ld.so (????) and then a message

>>>>>> about attempting to kill init.

>>>>>

>>>>> Ouch. Thanks for the report, and sorry about this.

>>>>>

>>>>> The problem is that I evaluate the ptr argument twice in

>>>>> __{get,put}_user(), and this may have side effects.

>>>>>

>>>>> e.g. when the ELF loader does things like:

>>>>>

>>>>>   __put_user((elf_addr_t)p, sp++)

>>>>>

>>>>> ... we increment sp twice, and write to the wrong user address, leaving

>>>>> sp corrupt.

>>>>>

>>>>> I have an additional patch [1] to fix this, which is in my

>>>>> arm64/access-ok branch [2].

>>>>>

>>>>> Thanks,

>>>>> Mark.

>>>>>

>>>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/access-ok&id=ebb7ff83eb53b8810395d5cf48712a4ae6d678543

>>>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/access-ok

>>>>>

>>>>

>>>> Thanks, the updated patch works. I wrote an LKDTM test to verify

>>>> the expected behavior (__{get,put}_user panic whereas {get,put}_user

>>>> do not). You're welcome to add Tested-by or I can wait for v2.

>>>

>>> Nice. :) Out of curiosity, can you check if this correctly BUG()s on a

>>> waitid() call when the fixes are reverted?

>>>

>>> 96ca579a1ecc ("waitid(): Avoid unbalanced user_access_end() on

>>> access_ok() error")

>>> 1c9fec470b81 ("waitid(): Add missing access_ok() checks")

>>>

>>> -Kees

>>>

>>

>> Yep, we get a nice bug:

>>

>> [   34.783912] ------------[ cut here ]------------

>> [   34.784484] kernel BUG at kernel/exit.c:1614!

> 

> Awesome! :)

> 

> I wonder how hard it might be to make this happen on x86 too (or

> generically). Hmmm

x86 looks like it needs the same ptr_argument fixup as arm64 but
seems to have a separate unsafe path so it's actually easier to
fix up. I have version of this that seems to work so I'll clean
it up and send it out tomorrow.

Thanks,
Laura