diff mbox

--enable-stack-protector for glibc, v4, now with arm

Message ID 56D76644.8050906@linaro.org
State New
Headers show

Commit Message

Adhemerval Zanella March 2, 2016, 10:16 p.m. UTC
On 02-03-2016 15:28, Nix wrote:
> On 28 Feb 2016, nix@esperi.org.uk spake thusly:

> 

>> This is version 4 of the stack-protected glibc patch, incorporating all review

>> comments to date (unless I missed some), and adding a degree of arm support

>> (i.e. "I know nothing about the platform but the tests passed").

> 

> So... other than the changelog, is there anything else I need to do to

> get review of the trickier parts of this? I've exhausted all the

> platforms I have access to and fixed all regressions on such platforms,

> but am happy to test/fix on more if people give me access to them.

> 


I did not review the patch themselves in detail, but I gave them a try on
aarch64 and ppc64le using the default --enable-stack-protector. I saw 
only one regression, which I also saw on i686 and x86_64 and (and makes
me question either why you did not see it or if you just overlook it):

FAIL: elf/check-localplt

$ cat elf/check-localplt.out 
Extra PLT reference: libc.so: __stack_chk_fail

Since there is no information indicating __stack_chk_fail is local,
the linker creates internal PLT calls.  We can solve it by the same
strategy we used on memcpy/memmove compiler call generation by using
asm symbol hack directive:


This seems to fix x86_64 and powerpc64le (also if we decide to add this
patch please add a comments explaining the issue).

And I also agree with Szabolcs Nagy about patch 15/16, we need to understand
better what is happening on the mentioned assembly routines before adding
hacks. Have you tried remove the assembly implementations for i386 and
let is use the default C implementation to check if it is something
related to the asm routines themselves?

Comments

Adhemerval Zanella March 3, 2016, 6:25 p.m. UTC | #1
On 03-03-2016 14:34, Nix wrote:
> On 2 Mar 2016, Adhemerval Zanella stated:

> 

>> On 02-03-2016 15:28, Nix wrote:

>>> On 28 Feb 2016, nix@esperi.org.uk spake thusly:

>>>

>>>> This is version 4 of the stack-protected glibc patch, incorporating all review

>>>> comments to date (unless I missed some), and adding a degree of arm support

>>>> (i.e. "I know nothing about the platform but the tests passed").

>>>

>>> So... other than the changelog, is there anything else I need to do to

>>> get review of the trickier parts of this? I've exhausted all the

>>> platforms I have access to and fixed all regressions on such platforms,

>>> but am happy to test/fix on more if people give me access to them.

>>

>> I did not review the patch themselves in detail, but I gave them a try on

>> aarch64 and ppc64le using the default --enable-stack-protector. I saw 

>> only one regression,

> 

> OK, if it worked that well not only on 64-bit ARM but also on a platform

> I never went near, I think we can say all the major holes are covered!

> (I honestly expected a bit of per-arch protection to be needed on most

> arches, as was needed on SPARC and x86_64 both -- but none was needed on

> 32-bit ARM either, so maybe I was being too paranoid.)


I thought too, although based on the patchset the level of arch-specific
hackery is still limited on some specific implementations.

I see also there are some arch specific implementations that issues
stack allocation (for instance strcspn for powerpc64) which won't have
instrumented code.  However I do not see how to accomplish unless it
it is being explicit added on such implementations.  I do no see this
as an impeding for patch, but I think it might require some notes in
documentation.

> 

>>                      which I also saw on i686 and x86_64 and (and makes

>> me question either why you did not see it or if you just overlook it):

>>

>> FAIL: elf/check-localplt

>>

>> $ cat elf/check-localplt.out 

>> Extra PLT reference: libc.so: __stack_chk_fail

>>

>> Since there is no information indicating __stack_chk_fail is local,

>> the linker creates internal PLT calls.

> 

> I left that in specifically because I wasn't sure which way to go (we

> see it on all platforms, including SPARC and ARM). It is quite possibly

> desirable to de-PLTify this (because these calls are frequently made),

> but it's *also* quite possibly desirable not to do that, because, like

> malloc(), __stack_chk_fail() might reasonably be something that people

> might want to intercept, to change what happens on stack overrun.

> 

> I'm not sure which is preferable. I guess if the fortification hooks

> aren't overrideable, neither should these be.  BUt then, we can't really

> use __fortify_fail() as a model, because glibc *implements*

> fortification, so it contains all the calls to __fortify_fail() itself:

> the same is not true of stack-protection, where random user programs can

> and will also call __stack_chk_fail().


I am not sure either, although imho I would prefer to not allow user to
override *internal* GLIBC stack overrun.  So I am inclined in de-PLTify
internal libc.so usage of __stack_chk_fail().

> 

>>                                        We can solve it by the same

>> strategy we used on memcpy/memmove compiler call generation by using

>> asm symbol hack directive:

> 

> Yeah, that would work. __stack_chk_fail would still be there, and

> callable by everyone else, and overrideable for users outside of glibc

> (because __stack_chk_fail() still *has* a PLT). Of course, if any users

> do override it, they'll see rather inconsistent behaviour, because libc

> will still use the old one, but libresolv, for example, will use the

> overridden one! Is that inconsistency something that we should just

> ignore?


One option is add a private symbol in libc.so, say __stack_chk_fail_glibc,
and add the alias for other libraries to this symbol.  This won't prevent
an user to really override with non-default options (like implementing
the __stack_chk_fail_glibc as well).

Another option will be to link each library with stack_chk_fail.o and
pulling a local copy of it and thus preventing overriding. 

> 

>> This seems to fix x86_64 and powerpc64le (also if we decide to add this

>> patch please add a comments explaining the issue).

> 

> Definitely!

> 

>> And I also agree with Szabolcs Nagy about patch 15/16, we need to understand

>> better what is happening on the mentioned assembly routines before adding

>> hacks. Have you tried remove the assembly implementations for i386 and

>> let is use the default C implementation to check if it is something

>> related to the asm routines themselves?

> 

> It must be: other platforms don't have those implementations, and also

> don't need the hacks (they didn't exist before I started working on this

> patch series, and x86_64 was fine: only x86_32 saw test failures).

> 

> In the case of __sigjmp_save, the cause is clear: it's sibcalled to from

> assembler and clearly must be (One of my worries is that other arches

> may have assembler that does sibcalls in similar fashion without

> mentioning it in comments or anything so my greps failed to find it...

> but I hope that if any such exists, it'll pop up as a bug so we can fix

> it, or someone here will know and mention it.)

> 

> Hm.

> 

> Looking at the prototypes of the functions in question, we see (dropping

> my additions):

> 

> int

> internal_function attribute_hidden

> __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr)

> 

> void

> internal_function

> __pthread_mutex_cond_lock_adjust (pthread_mutex_t *mutex)

> 

> and looking at the assembler in

> sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S, it is obvious

> enough even to someone with my extremely minimal x86 asm experience that

> this thing is not exactly feeling constrained by the normal way one

> calls functions: the fast unlock path is calling

> __pthread_mutex_unlock_usercnt() without having at any point adjusted

> %esp downwards or anything, it does that afterwards on a slow path --

> essentially the call is overlaid on top of the stack frame of

> pthread_cond_timedwait() stack frame, like a weird handrolled sibcall

> which returns conentionally or something. I'm not surprised that adding

> a canary to something called like *that* makes things explode: I suspect

> that adding any extra local variables to

> __pthread_mutex_unlock_usercnt() that weren't optimized out would do the

> same thing. (If so, there should probably be a warning comment in that

> function!)

> 

> __pthread_mutex_cond_lock_adjust() can also be called in a similar

> fashion (jumping via labels 1, 2, 5): __pthread_mutex_cond_lock() is

> also called the same way but the function returns immediately

> afterwards, and it also gets called more conventionally, so I guess no

> harm is done there.)

> 

> Now if anyone who's actually done more than a trivial amount of x86

> assembler in the last fifteen years could confirm that, I'd be very

> happy... since it's quite possible I'm misreading it all. :)

> 


IMHO I would just prefer to get rid of this specific assembly 
implementation.  In fact there are some patches that do intend to do
it (new cond var from Tovarlds and my cancellation refactor) and your
work is another reason to avoid large reimplementation in assembly
with a very strong performance reason to do so.
diff mbox

Patch

diff --git a/sysdeps/generic/symbol-hacks.h b/sysdeps/generic/symbol-hacks.h
index ce576c9..12829cc 100644
--- a/sysdeps/generic/symbol-hacks.h
+++ b/sysdeps/generic/symbol-hacks.h
@@ -5,3 +5,7 @@  asm ("memmove = __GI_memmove");
 asm ("memset = __GI_memset");
 asm ("memcpy = __GI_memcpy");
 #endif
+
+#if !defined __ASSEMBLER__ && IS_IN (libc)
+asm ("__stack_chk_fail = __stack_chk_fail_local");
+#endif