diff mbox

[GLIBC,AARCH64] Rewrite elf_machine_load_address using _DYNAMIC symbol

Message ID 581C57FF.2090901@foss.arm.com
State New
Headers show

Commit Message

Renlin Li Nov. 4, 2016, 9:42 a.m. UTC
Hi all,

This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC
symbol instead of _dl_start.

The static address of _DYNAMIC symbol is stored in the first GOT entry.
Here is the change which makes this solution work.
https://sourceware.org/ml/binutils/2013-06/msg00248.html

i386, x86_64 targets use the same method to do this as well.

The original implementation relies on a trick that R_AARCH64_ABS32 relocation
being resolved at link time and the static address fits in the 32bits.
However, in LP64, normally, the address is defined to be 64 bit.

Additionally, the original inline assembly is not optimized. It uses 4
instructions including a jump.

Optimally, the new implementation here is just two instructions:
ldr %1, _GLOBAL_OFFSET_TABLE_
adr %2, _DYNAMIC

The size of ld.so is around 130K, so it's save to use ldr, adr to get the address.
The address range for those two instruction is +/-1MB.

And by the way, this method is ILP32 safe as well.
aarch64 linux toolchain regression test OK. OK to commit?

Regards,
Renlin Li


ChangeLog:

2016-11-04  Renlin Li  <renlin.li@arm.com>

	* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
	_DYNAMIC symbol to calculate load address.

Comments

Roland McGrath Nov. 4, 2016, 9:24 p.m. UTC | #1
On many or perhaps all machines, elf_machine_load_address could now be
implemented purely in C by using a link-time trick.

In C, just:

	static inline ElfW(Addr) __attribute__ ((unused))
	elf_machine_load_address (void)
	{
	  extern const char _BASE[] __attribute__ ((visibility ("hidden")));
	  return (ElfW(Addr)) _BASE;
	}

Then add a trivial input linker script to the ld.so link:

	PROVIDE_HIDDEN(_BASE = 0);

I know this works for x86_64 and aarch64, and does not require a load.
(On x86_64 it's a single lea; on aarch64 it's a single adr+add pair.)
Szabolcs Nagy Nov. 7, 2016, 3:15 p.m. UTC | #2
On 04/11/16 21:24, Roland McGrath wrote:
> On many or perhaps all machines, elf_machine_load_address could now be

> implemented purely in C by using a link-time trick.

> 

> In C, just:

> 

> 	static inline ElfW(Addr) __attribute__ ((unused))

> 	elf_machine_load_address (void)

> 	{

> 	  extern const char _BASE[] __attribute__ ((visibility ("hidden")));

> 	  return (ElfW(Addr)) _BASE;

> 	}

> 

> Then add a trivial input linker script to the ld.so link:

> 

> 	PROVIDE_HIDDEN(_BASE = 0);

> 

> I know this works for x86_64 and aarch64, and does not require a load.

> (On x86_64 it's a single lea; on aarch64 it's a single adr+add pair.)

> 


this is less maintenance work, because code can be shared,
but it is not a portable solution: it relies on linker
scripts and on the compiler not doing anything silly.

i think asm is preferable, unless we know that all
supported linkers handle this on all targets.
Szabolcs Nagy Nov. 7, 2016, 3:23 p.m. UTC | #3
On 07/11/16 15:15, Szabolcs Nagy wrote:
> On 04/11/16 21:24, Roland McGrath wrote:

>> On many or perhaps all machines, elf_machine_load_address could now be

>> implemented purely in C by using a link-time trick.

>>

>> In C, just:

>>

>> 	static inline ElfW(Addr) __attribute__ ((unused))

>> 	elf_machine_load_address (void)

>> 	{

>> 	  extern const char _BASE[] __attribute__ ((visibility ("hidden")));

>> 	  return (ElfW(Addr)) _BASE;

>> 	}

>>

>> Then add a trivial input linker script to the ld.so link:

>>

>> 	PROVIDE_HIDDEN(_BASE = 0);


on a second thought:
why is it not ok to use _DYNAMIC instead of _BASE?

then no linker script is needed (_DYNAMIC is in the elf spec).
Szabolcs Nagy Nov. 7, 2016, 3:51 p.m. UTC | #4
On 07/11/16 15:23, Szabolcs Nagy wrote:
> On 07/11/16 15:15, Szabolcs Nagy wrote:

>> On 04/11/16 21:24, Roland McGrath wrote:

>>> On many or perhaps all machines, elf_machine_load_address could now be

>>> implemented purely in C by using a link-time trick.

>>>

>>> In C, just:

>>>

>>> 	static inline ElfW(Addr) __attribute__ ((unused))

>>> 	elf_machine_load_address (void)

>>> 	{

>>> 	  extern const char _BASE[] __attribute__ ((visibility ("hidden")));

>>> 	  return (ElfW(Addr)) _BASE;

>>> 	}

>>>

>>> Then add a trivial input linker script to the ld.so link:

>>>

>>> 	PROVIDE_HIDDEN(_BASE = 0);

> 

> on a second thought:

> why is it not ok to use _DYNAMIC instead of _BASE?

> 

> then no linker script is needed (_DYNAMIC is in the elf spec).

> 


hidden symbol is not accessed with direct pc relative addressing on mips

so this approach does not work in general.
Roland McGrath Nov. 8, 2016, 9:28 p.m. UTC | #5
There is plenty more reliance on the compiler not doing the wrong things.
I don't see any new issue there.

The use of a linker script here also does not concern me.  This only
affects building ld.so itself, so there is no issue about general linker
compatibility.  We have plenty more use of fancy linker features and only a
few linkers are capable of building libc already.

I never said I was sure this technique works on all machines.  
It certainly works on aarch64.

Show me the code you have in mind using _DYNAMIC.  The scheme using a
linker-defined symbol with value 0 is the only one I'm aware of that
reduces to the minimal number of assembly instructions, with none of them
being a load.
Szabolcs Nagy Nov. 9, 2016, 2:48 p.m. UTC | #6
On 08/11/16 21:28, Roland McGrath wrote:
> Show me the code you have in mind using _DYNAMIC.  The scheme using a

> linker-defined symbol with value 0 is the only one I'm aware of that

> reduces to the minimal number of assembly instructions, with none of them

> being a load.


well the current x86_64 code is already doing what i had in mind.

i assumed GOT[0] is used elsewhere so it has to be computed anyway
and then doing (_DYNAMIC-GOT[0]) should be the same as _BASE using
an extra sub.
Maciej W. Rozycki Nov. 9, 2016, 7:06 p.m. UTC | #7
On Mon, 7 Nov 2016, Szabolcs Nagy wrote:

> hidden symbol is not accessed with direct pc relative addressing on mips


 Well, the regular MIPS ISA has no PC-relative addressing mode (except 
from branch instructions), so this can't be done with that instruction set 
(the MIPS16 and microMIPS ISAs do have some forms of PC-relative 
addressing, which can be used to access hidden and internal symbols 
bypassing GOT in PIC code if the compiler is smart enough).

  Maciej
Roland McGrath Nov. 9, 2016, 10:02 p.m. UTC | #8
> On 08/11/16 21:28, Roland McGrath wrote:

> > Show me the code you have in mind using _DYNAMIC.  The scheme using a

> > linker-defined symbol with value 0 is the only one I'm aware of that

> > reduces to the minimal number of assembly instructions, with none of them

> > being a load.

> 

> well the current x86_64 code is already doing what i had in mind.


And it is more costly than using _BASE.

> i assumed GOT[0] is used elsewhere so it has to be computed anyway

> and then doing (_DYNAMIC-GOT[0]) should be the same as _BASE using

> an extra sub.


Of course all the methods that work get the same result!
The point is that the _BASE method does it the most efficiently.
Szabolcs Nagy Oct. 17, 2017, 3:41 p.m. UTC | #9
On 04/11/16 09:42, Renlin Li wrote:
> Hi all,

> 

> This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC

> symbol instead of _dl_start.

> 

> The static address of _DYNAMIC symbol is stored in the first GOT entry.

> Here is the change which makes this solution work.

> https://sourceware.org/ml/binutils/2013-06/msg00248.html

> 

> i386, x86_64 targets use the same method to do this as well.

> 

> The original implementation relies on a trick that R_AARCH64_ABS32 relocation

> being resolved at link time and the static address fits in the 32bits.

> However, in LP64, normally, the address is defined to be 64 bit.

> 

> Additionally, the original inline assembly is not optimized. It uses 4

> instructions including a jump.

> 

> Optimally, the new implementation here is just two instructions:

> ldr %1, _GLOBAL_OFFSET_TABLE_

> adr %2, _DYNAMIC

> 

> The size of ld.so is around 130K, so it's save to use ldr, adr to get the address.

> The address range for those two instruction is +/-1MB.

> 

> And by the way, this method is ILP32 safe as well.

> aarch64 linux toolchain regression test OK. OK to commit?

> 

> Regards,

> Renlin Li

> 

> 

> ChangeLog:

> 

> 2016-11-04  Renlin Li  <renlin.li@arm.com>

> 

>     * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use

>     _DYNAMIC symbol to calculate load address.


This is OK.

(Roland notes that introducing a BASE symbol with a
linker script would even avoid loading GOT[0], but
that can be done separately across targets)
Szabolcs Nagy Oct. 17, 2017, 4:28 p.m. UTC | #10
On 17/10/17 16:41, Szabolcs Nagy wrote:
> On 04/11/16 09:42, Renlin Li wrote:

>> Hi all,

>>

>> This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC

>> symbol instead of _dl_start.

>>

>> The static address of _DYNAMIC symbol is stored in the first GOT entry.

>> Here is the change which makes this solution work.

>> https://sourceware.org/ml/binutils/2013-06/msg00248.html

>>

>> i386, x86_64 targets use the same method to do this as well.

>>

>> The original implementation relies on a trick that R_AARCH64_ABS32 relocation

>> being resolved at link time and the static address fits in the 32bits.

>> However, in LP64, normally, the address is defined to be 64 bit.

>>

>> Additionally, the original inline assembly is not optimized. It uses 4

>> instructions including a jump.

>>

>> Optimally, the new implementation here is just two instructions:

>> ldr %1, _GLOBAL_OFFSET_TABLE_

>> adr %2, _DYNAMIC

>>

>> The size of ld.so is around 130K, so it's save to use ldr, adr to get the address.

>> The address range for those two instruction is +/-1MB.

>>

>> And by the way, this method is ILP32 safe as well.

>> aarch64 linux toolchain regression test OK. OK to commit?

>>

>> Regards,

>> Renlin Li

>>

>>

>> ChangeLog:

>>

>> 2016-11-04  Renlin Li  <renlin.li@arm.com>

>>

>>     * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use

>>     _DYNAMIC symbol to calculate load address.

> 

> This is OK.

> 

> (Roland notes that introducing a BASE symbol with a

> linker script would even avoid loading GOT[0], but

> that can be done separately across targets)

> 


please wait with this.

looking at the static pie patches, it seems that also needs
to compute the base address and that cannot assume -mcmodel=tiny,
i don't remember if there was a particular reason -mcmodel=large
would be problematic, if inline asm was only used to save a
few instructions then please resend the patch but using c code
(like what x86_64 is doing), that's less fragile.
Renlin Li Oct. 18, 2017, 10:32 a.m. UTC | #11
Hi Szabolcs,

Here is the C version one which should be portable in all cases.
aarch64 native glibc regression test checked Okay.

Regards,
Renlin

ChangeLog:

2017-10-18  Renlin Li  <renlin.li@arm.com>

	* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
	_DYNAMIC symbol to calculate load address.


On 17/10/17 17:28, Szabolcs Nagy wrote:
> On 17/10/17 16:41, Szabolcs Nagy wrote:

>> On 04/11/16 09:42, Renlin Li wrote:

>>> Hi all,

>>>

>>> This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC

>>> symbol instead of _dl_start.

>>>

>>> The static address of _DYNAMIC symbol is stored in the first GOT entry.

>>> Here is the change which makes this solution work.

>>> https://sourceware.org/ml/binutils/2013-06/msg00248.html

>>>

>>> i386, x86_64 targets use the same method to do this as well.

>>>

>>> The original implementation relies on a trick that R_AARCH64_ABS32 relocation

>>> being resolved at link time and the static address fits in the 32bits.

>>> However, in LP64, normally, the address is defined to be 64 bit.

>>>

>>> Additionally, the original inline assembly is not optimized. It uses 4

>>> instructions including a jump.

>>>

>>> Optimally, the new implementation here is just two instructions:

>>> ldr %1, _GLOBAL_OFFSET_TABLE_

>>> adr %2, _DYNAMIC

>>>

>>> The size of ld.so is around 130K, so it's save to use ldr, adr to get the address.

>>> The address range for those two instruction is +/-1MB.

>>>

>>> And by the way, this method is ILP32 safe as well.

>>> aarch64 linux toolchain regression test OK. OK to commit?

>>>

>>> Regards,

>>> Renlin Li

>>>

>>>

>>> ChangeLog:

>>>

>>> 2016-11-04  Renlin Li  <renlin.li@arm.com>

>>>

>>>      * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use

>>>      _DYNAMIC symbol to calculate load address.

>>

>> This is OK.

>>

>> (Roland notes that introducing a BASE symbol with a

>> linker script would even avoid loading GOT[0], but

>> that can be done separately across targets)

>>

>

> please wait with this.

>

> looking at the static pie patches, it seems that also needs

> to compute the base address and that cannot assume -mcmodel=tiny,

> i don't remember if there was a particular reason -mcmodel=large

> would be problematic, if inline asm was only used to save a

> few instructions then please resend the patch but using c code

> (like what x86_64 is doing), that's less fragile.

>
diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index b124547..e765612 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -51,40 +51,11 @@ elf_machine_load_address (void)
   /* To figure out the load address we use the definition that for any symbol:
      dynamic_addr(symbol) = static_addr(symbol) + load_addr
 
-     The choice of symbol is arbitrary. The static address we obtain
-     by constructing a non GOT reference to the symbol, the dynamic
-     address of the symbol we compute using adrp/add to compute the
-     symbol's address relative to the PC.
-     This depends on 32/16bit relocations being resolved at link time
-     and that the static address fits in the 32/16 bits.  */
-
-  ElfW(Addr) static_addr;
-  ElfW(Addr) dynamic_addr;
-
-  asm ("					\n"
-"	adrp	%1, _dl_start;			\n"
-#ifdef __LP64__
-"	add	%1, %1, #:lo12:_dl_start	\n"
-#else
-"	add	%w1, %w1, #:lo12:_dl_start	\n"
-#endif
-"	ldr	%w0, 1f				\n"
-"	b	2f				\n"
-"1:						\n"
-#ifdef __LP64__
-"	.word	_dl_start			\n"
-#else
-# ifdef __AARCH64EB__
-"	.short  0                               \n"
-# endif
-"	.short  _dl_start                       \n"
-# ifndef __AARCH64EB__
-"	.short  0                               \n"
-# endif
-#endif
-"2:						\n"
-    : "=r" (static_addr),  "=r" (dynamic_addr));
-  return dynamic_addr - static_addr;
+    _DYNAMIC sysmbol is used here as its link-time address stored in
+    the special unrelocated first GOT entry.  */
+
+    extern ElfW(Dyn) _DYNAMIC[] attribute_hidden;
+    return (ElfW(Addr)) &_DYNAMIC - elf_machine_dynamic ();
 }
 
 /* Set up the loaded object described by L so its unrelocated PLT
Szabolcs Nagy Oct. 18, 2017, 4:12 p.m. UTC | #12
On 18/10/17 11:32, Renlin Li wrote:
> Hi Szabolcs,

> 

> Here is the C version one which should be portable in all cases.

> aarch64 native glibc regression test checked Okay.

> 

> Regards,

> Renlin

> 

> ChangeLog:

> 

> 2017-10-18  Renlin Li  <renlin.li@arm.com>

> 

>     * sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use

>     _DYNAMIC symbol to calculate load address.

> 


This is OK to commit.
diff mbox

Patch

diff --git a/sysdeps/aarch64/dl-machine.h b/sysdeps/aarch64/dl-machine.h
index 217e179..b2f6618 100644
--- a/sysdeps/aarch64/dl-machine.h
+++ b/sysdeps/aarch64/dl-machine.h
@@ -49,25 +49,19 @@  elf_machine_load_address (void)
   /* To figure out the load address we use the definition that for any symbol:
      dynamic_addr(symbol) = static_addr(symbol) + load_addr
 
-     The choice of symbol is arbitrary. The static address we obtain
-     by constructing a non GOT reference to the symbol, the dynamic
-     address of the symbol we compute using adrp/add to compute the
-     symbol's address relative to the PC.
-     This depends on 32bit relocations being resolved at link time
-     and that the static address fits in the 32bits.  */
-
-  ElfW(Addr) static_addr;
-  ElfW(Addr) dynamic_addr;
+     _DYNAMIC symbol is used here as its static address is stored in
+     the special unrelocated first GOT entry.  */
 
+  ElfW(Addr) static_addr, dynamic_addr;
   asm ("					\n"
-"	adrp	%1, _dl_start;			\n"
-"	add	%1, %1, #:lo12:_dl_start	\n"
-"	ldr	%w0, 1f				\n"
-"	b	2f				\n"
-"1:						\n"
-"	.word	_dl_start			\n"
-"2:						\n"
-    : "=r" (static_addr),  "=r" (dynamic_addr));
+       "adr	%0, _DYNAMIC			\n"
+#ifdef __LP64__
+       "ldr	%1, _GLOBAL_OFFSET_TABLE_	\n"
+#else
+       "ldr	%w1, _GLOBAL_OFFSET_TABLE_	\n"
+#endif
+       : "=r" (dynamic_addr), "=r" (static_addr));
+
   return dynamic_addr - static_addr;
 }