[AARCH64] align long branch stubs

Message ID CABXYE2UOkCbiFQTJ7kijLUTMy25unww4w2Q_xSMf6icLS7xxog@mail.gmail.com
State New
Headers show

Commit Message

Jim Wilson June 3, 2016, 5:16 a.m.
I got a bug report from Qualcomm that says if you set the A bit in the
SCTLR register, to trap on unaligned accesses, their code fails,
because the toolchain itself is emitting unaligned data accesses.

The problem stems from a pair of patches from Marcus Shawcroft added
over a year ago.
    https://sourceware.org/ml/binutils/2015-03/msg00344.html
    https://sourceware.org/ml/binutils/2015-03/msg00343.html
The stub support makes a bit of effort to try to align everything to
an 8 byte boundary, as the size of every stub is rounded up to a
multiple of 8 bytes.  However, this patch from Marcus adds a 4-byte
branch instruction before the first stub, and sets the section
alignment to 4 bytes instead of 8 bytes..  This causes everything to
end up with 4-byte alignment instead of 8-byte alignment.  This is a
problem for long branch stubs, as they contain a 64-bit address as
data, which should be 8-byte aligned.

You can see the problem in the linker testcase
ld/testsuite/ld-aarch64/farcall-back.d which has
0000000000002034 <__bar3_veneer>:
    2034:       58000090        ldr     x16, 2044 <__bar3_veneer\+0x10>
    2038:       10000011        adr     x17, 2038 <__bar3_veneer\+0x4>
    203c:       8b110210        add     x16, x16, x17
    2040:       d61f0200        br      x16
    2044:       ffffffd8        .word   0xffffffd8
    2048:       00000000        .word   0x00000000
and you can see that the first ldr is loading unaligned data from 2044.

One way to fix this is to add a nop after the branch, and return
section alignment to 8 bytes.  This is fairly simple, though it
requires a lot of annoying testsuite changes.

I'm concerned that this might be reintroducing the problem that Marcus
was trying to fix though, as now we end up with an occasional 4-byte 0
padding around the stub sections.  I tried adding a
bfd_arch_aarch64_nop_filll function, but apparently that only works
inside sections, not between sections.  It isn't clear why we need the
branches around the stub sections though.  It isn't normal to expect
code to fall through the bottom of a section except for
ctod/dtor/init/fini sections, but a stub section should not appear in
the middle of one of those, and if it did, it might be best to use
init_array and fini_array instead, as these are better anyways.

The attached patch implements the solution of adding a nop after the
branch.  It passes a build and make check on an aarch64-linux-gnu
system.

Jim

Comments

Jim Wilson June 3, 2016, 6:56 a.m. | #1
On Thu, Jun 2, 2016 at 10:43 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 10:16 PM, Jim Wilson <jim.wilson@linaro.org> wrote:

>> I got a bug report from Qualcomm that says if you set the A bit in the

>> SCTLR register, to trap on unaligned accesses, their code fails,

>> because the toolchain itself is emitting unaligned data accesses.

>

>

> That should not matter.  Setting bit A on the SCTLR register is not a

> valid thing to do for A class processors.

> Not that it is a problem in general to align the data after all but it

> should not matter in general.


I think what you mean here is that it is wrong to permanently set bit
A in SCTLR on a multiuser multiprogrammed server running Linux.  But
on a single user device, being used for software development and
testing, I don't see why it is wrong to set the A bit in SCTLR,
perhaps temporarily, to verify that code has been correctly written
and compiled to avoid unaligned accesses which are slower than aligned
accesses.  This should work, and can work, if the toolchain stops
emitting unaligned data in long branch stubs.

Jim
Jim Wilson June 10, 2016, 4:30 p.m. | #2
ping

for the attachment, see
https://sourceware.org/ml/binutils/2016-06/msg00021.html

On Thu, Jun 2, 2016 at 10:16 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
> I got a bug report from Qualcomm that says if you set the A bit in the

> SCTLR register, to trap on unaligned accesses, their code fails,

> because the toolchain itself is emitting unaligned data accesses.

>

> The problem stems from a pair of patches from Marcus Shawcroft added

> over a year ago.

>     https://sourceware.org/ml/binutils/2015-03/msg00344.html

>     https://sourceware.org/ml/binutils/2015-03/msg00343.html

> The stub support makes a bit of effort to try to align everything to

> an 8 byte boundary, as the size of every stub is rounded up to a

> multiple of 8 bytes.  However, this patch from Marcus adds a 4-byte

> branch instruction before the first stub, and sets the section

> alignment to 4 bytes instead of 8 bytes..  This causes everything to

> end up with 4-byte alignment instead of 8-byte alignment.  This is a

> problem for long branch stubs, as they contain a 64-bit address as

> data, which should be 8-byte aligned.

>

> You can see the problem in the linker testcase

> ld/testsuite/ld-aarch64/farcall-back.d which has

> 0000000000002034 <__bar3_veneer>:

>     2034:       58000090        ldr     x16, 2044 <__bar3_veneer\+0x10>

>     2038:       10000011        adr     x17, 2038 <__bar3_veneer\+0x4>

>     203c:       8b110210        add     x16, x16, x17

>     2040:       d61f0200        br      x16

>     2044:       ffffffd8        .word   0xffffffd8

>     2048:       00000000        .word   0x00000000

> and you can see that the first ldr is loading unaligned data from 2044.

>

> One way to fix this is to add a nop after the branch, and return

> section alignment to 8 bytes.  This is fairly simple, though it

> requires a lot of annoying testsuite changes.

>

> I'm concerned that this might be reintroducing the problem that Marcus

> was trying to fix though, as now we end up with an occasional 4-byte 0

> padding around the stub sections.  I tried adding a

> bfd_arch_aarch64_nop_filll function, but apparently that only works

> inside sections, not between sections.  It isn't clear why we need the

> branches around the stub sections though.  It isn't normal to expect

> code to fall through the bottom of a section except for

> ctod/dtor/init/fini sections, but a stub section should not appear in

> the middle of one of those, and if it did, it might be best to use

> init_array and fini_array instead, as these are better anyways.

>

> The attached patch implements the solution of adding a nop after the

> branch.  It passes a build and make check on an aarch64-linux-gnu

> system.

>

> Jim

Patch

2016-06-02  Jim Wilson  <jim.wilson@linaro.org>

	bfd/
	* elfnn-aarch64.c (_bfd_aarch64_resize_stubs): Add 8 bytes for branch
	and nop instead of 4.
	(elfNN_arch64_build_stubs): Add nop after branch.  Increase size by
	8 instead of 4.

	ld/
	* emultempl/aarch64elf.em (elf${ELFSIZE}_aarch64_add_stub_section):
	Give stub_sec 8 byte alignment.
	* testsuite/ld-aarch64/erratum835769.d: Adjust for added nop.
	* testsuite/ld-aarch64/erratum843419.d: Likewise.
	* testsuite/ld-aarch64/farcall-b-defsym.d: Likewise.
	* testsuite/ld-aarch64/farcall-b-none-function.d: Likewise.
	* testsuite/ld-aarch64/farcall-b-plt.d: Likewise.
	* testsuite/ld-aarch64/farcall-b-section.d: Likewise.
	* testsuite/ld-aarch64/farcall-b.d: Likewise.
	* testsuite/ld-aarch64/farcall-back.d: Likewise.
	* testsuite/ld-aarch64/farcall-bl-defsym.d: Likewise.
	* testsuite/ld-aarch64/farcall-bl-none-function.d: Likewise.
	* testsuite/ld-aarch64/farcall-bl-plt.d: Likewise.
	* testsuite/ld-aarch64/farcall-bl-section.d: Likewise.
	* testsuite/ld-aarch64/farcall-bl.d: Likewise.

diff --git a/bfd/elfnn-aarch64.c b/bfd/elfnn-aarch64.c
index 774364a..9d5e1a8 100644
--- a/bfd/elfnn-aarch64.c
+++ b/bfd/elfnn-aarch64.c
@@ -3680,8 +3680,10 @@  _bfd_aarch64_resize_stubs (struct elf_aarch64_link_hash_table *htab)
       if (!strstr (section->name, STUB_SUFFIX))
 	continue;
 
+      /* Add space for a branch.  Add 8 bytes to keep section 8 byte aligned,
+	 as long branch stubs contain a 64-bit address.  */
       if (section->size)
-	section->size += 4;
+	section->size += 8;
 
       /* Ensure all stub sections have a size which is a multiple of
 	 4096.  This is important in order to ensure that the insertion
@@ -4237,8 +4239,11 @@  elfNN_aarch64_build_stubs (struct bfd_link_info *info)
 	return FALSE;
       stub_sec->size = 0;
 
+      /* Add a branch around the stub section, and a nop, to keep it 8 byte
+	 aligned, as long branch stubs contain a 64-bit address.  */
       bfd_putl32 (0x14000000 | (size >> 2), stub_sec->contents);
-      stub_sec->size += 4;
+      bfd_putl32 (INSN_NOP, stub_sec->contents+4);
+      stub_sec->size += 8;
     }
 
   /* Build the stubs as directed by the stub hash table.  */
diff --git a/ld/emultempl/aarch64elf.em b/ld/emultempl/aarch64elf.em
index a17553a..900f76a 100644
--- a/ld/emultempl/aarch64elf.em
+++ b/ld/emultempl/aarch64elf.em
@@ -172,7 +172,9 @@  elf${ELFSIZE}_aarch64_add_stub_section (const char *stub_sec_name,
   if (stub_sec == NULL)
     goto err_ret;
 
-  bfd_set_section_alignment (stub_file->the_bfd, stub_sec, 2);
+  /* Long branch stubs contain a 64-bit address, so the section requires
+     8 byte alignment.  */
+  bfd_set_section_alignment (stub_file->the_bfd, stub_sec, 3);
 
   output_section = input_section->output_section;
   os = lang_output_section_get (output_section);
diff --git a/ld/testsuite/ld-aarch64/erratum835769.d b/ld/testsuite/ld-aarch64/erratum835769.d
index f3b0ed4..4d6623e 100644
--- a/ld/testsuite/ld-aarch64/erratum835769.d
+++ b/ld/testsuite/ld-aarch64/erratum835769.d
@@ -33,7 +33,9 @@  Disassembly of section .text:
 [ \t0-9a-f]+:[ \t]+aa0503e0[ \t]+mov[ \t]+x0, x5
 [ \t0-9a-f]+:[ \t]+d65f03c0[ \t]+ret
 
-[ \t0-9a-f]+:[ \t]+14000007[ \t]+b[ \t]+[0-9a-f]+ <__erratum_835769_veneer_0\+0x8>
+[ \t0-9a-f]+:[ \t]+00000000[ \t]+\.inst[ \t]+0x00000000 ; undefined
+[ \t0-9a-f]+:[ \t]+14000008[ \t]+b[ \t]+[0-9a-f]+ <__erratum_835769_veneer_0\+0x8>
+[ \t0-9a-f]+:[ \t]+d503201f[ \t]+nop
 [0-9a-f]+ <__erratum_835769_veneer_2>:
 [ \t0-9a-f]+:[ \t]+9b031885[ \t]+madd[ \t]+x5, x4, x3, x6
 [ \t0-9a-f]+:[ \t0-9a-z]+[ \t]+b[ \t]+[0-9a-f]+ <a7str\+0x[0-9a-f]+>
diff --git a/ld/testsuite/ld-aarch64/erratum843419.d b/ld/testsuite/ld-aarch64/erratum843419.d
index 4be8f9e..41e9ec7 100644
--- a/ld/testsuite/ld-aarch64/erratum843419.d
+++ b/ld/testsuite/ld-aarch64/erratum843419.d
@@ -17,23 +17,24 @@  Disassembly of section .e843419:
     20000ff8:	90100000 	adrp	x0, 40000000 <[_a-zA-z0-9]+>
     20000ffc:	f800c007 	stur	x7, \[x0,#12\]
     20001000:	d2800128 	mov	x8, #0x9                   	// #9
-    20001004:	14000008 	b	20001024 <e843419@0002_00000013_1004>
+    20001004:	14000009 	b	20001028 <e843419@0002_00000013_1004>
     20001008:	8b050020 	add	x0, x1, x5
     2000100c:	b9400fe7 	ldr	w7, \[sp,#12\]
     20001010:	0b0700e0 	add	w0, w7, w7
     20001014:	910043ff 	add	sp, sp, #0x10
-    20001018:	14000005 	b	2000102c <__e835769_veneer>
+    20001018:	14000006 	b	20001030 <__e835769_veneer>
     2000101c:	d65f03c0 	ret
-    20001020:	14000400 	b	20002020 <__e835769_veneer\+0xff4>
+    20001020:	14000400 	b	20002020 <__e835769_veneer\+0xff0>
+    20001024:	d503201f 	nop
 
-0000000020001024 <e843419@0002_00000013_1004>:
-    20001024:	f9000008 	str	x8, \[x0\]
-    20001028:	17fffff8 	b	20001008 <e843419_1\+0x10>
+0000000020001028 <e843419@0002_00000013_1004>:
+    20001028:	f9000008 	str	x8, \[x0\]
+    2000102c:	17fffff7 	b	20001008 <e843419_1\+0x10>
 
-000000002000102c <__e835769_veneer>:
-    2000102c:	f0f17ff0 	adrp	x16, 3000000 <e835769>
-    20001030:	91000210 	add	x16, x16, #0x0
-    20001034:	d61f0200 	br	x16
+0000000020001030 <__e835769_veneer>:
+    20001030:	f0f17ff0 	adrp	x16, 3000000 <e835769>
+    20001034:	91000210 	add	x16, x16, #0x0
+    20001038:	d61f0200 	br	x16
 	...
 
 Disassembly of section .e835769:
@@ -42,14 +43,15 @@  Disassembly of section .e835769:
  3000000:	b8408c87 	ldr	w7, \[x4,#8\]!
  3000004:	1b017c06 	mul	w6, w0, w1
  3000008:	f9400084 	ldr	x4, \[x4\]
- 300000c:	14000004 	b	300001c <__erratum_835769_veneer_0>
+ 300000c:	14000005 	b	3000020 <__erratum_835769_veneer_0>
  3000010:	aa0503e0 	mov	x0, x5
  3000014:	d65f03c0 	ret
- 3000018:	14000400 	b	3001018 <__erratum_835769_veneer_0\+0xffc>
+ 3000018:	14000400 	b	3001018 <__erratum_835769_veneer_0\+0xff8>
+ 300001c:	d503201f 	nop
 
-000000000300001c <__erratum_835769_veneer_0>:
- 300001c:	9b031845 	madd	x5, x2, x3, x6
- 3000020:	17fffffc 	b	3000010 <e835769\+0x10>
+0000000003000020 <__erratum_835769_veneer_0>:
+ 3000020:	9b031845 	madd	x5, x2, x3, x6
+ 3000024:	17fffffb 	b	3000010 <e835769\+0x10>
 	...
 
 Disassembly of section .text:
@@ -58,12 +60,14 @@  Disassembly of section .text:
   400000:	d10043ff 	sub	sp, sp, #0x10
   400004:	d28001a7 	mov	x7, #0xd                   	// #13
   400008:	b9000fe7 	str	w7, \[sp,#12\]
-  40000c:	14000003 	b	400018 <__e843419_veneer>
+  40000c:	14000005 	b	400020 <__e843419_veneer>
   400010:	d65f03c0 	ret
-  400014:	14000400 	b	401014 <__e843419_veneer\+0xffc>
+  400014:	00000000 	.inst	0x00000000 ; undefined
+  400018:	14000400 	b	401018 <__e843419_veneer\+0xff8>
+  40001c:	d503201f 	nop
 
-0000000000400018 <__e843419_veneer>:
-  400018:	900fe010 	adrp	x16, 20000000 <e843419>
-  40001c:	91000210 	add	x16, x16, #0x0
-  400020:	d61f0200 	br	x16
+0000000000400020 <__e843419_veneer>:
+  400020:	900fe010 	adrp	x16, 20000000 <e843419>
+  400024:	91000210 	add	x16, x16, #0x0
+  400028:	d61f0200 	br	x16
 	...
diff --git a/ld/testsuite/ld-aarch64/farcall-b-defsym.d b/ld/testsuite/ld-aarch64/farcall-b-defsym.d
index c3e1602..cc83014 100644
--- a/ld/testsuite/ld-aarch64/farcall-b-defsym.d
+++ b/ld/testsuite/ld-aarch64/farcall-b-defsym.d
@@ -8,11 +8,12 @@ 
 Disassembly of section .text:
 
 0000000000001000 <_start>:
- +1000:	14000003 	b	100c <__bar_veneer>
+ +1000:	14000004 	b	1010 <__bar_veneer>
  +1004:	d65f03c0 	ret
-[ \t]+1008:[ \t]+14000007[ \t]+b[ \t]+1024 <__bar_veneer\+0x18>
-000000000000100c <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+[ \t]+1008:[ \t]+14000008[ \t]+b[ \t]+1028 <__bar_veneer\+0x18>
+[ \t]+100c:[ \t]+d503201f[ \t]+nop
+0000000000001010 <__bar_veneer>:
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
diff --git a/ld/testsuite/ld-aarch64/farcall-b-none-function.d b/ld/testsuite/ld-aarch64/farcall-b-none-function.d
index ba2981f..e06936c 100644
--- a/ld/testsuite/ld-aarch64/farcall-b-none-function.d
+++ b/ld/testsuite/ld-aarch64/farcall-b-none-function.d
@@ -8,14 +8,15 @@ 
 Disassembly of section .text:
 
 .* <_start>:
-    1000:	14000003 	b	100c <__bar_veneer>
+    1000:	14000004 	b	1010 <__bar_veneer>
     1004:	d65f03c0 	ret
-    1008:	14000007 	b	1024 <__bar_veneer\+0x18>
+    1008:	14000008 	b	1028 <__bar_veneer\+0x18>
+    100c:	d503201f 	nop
 
 .* <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
diff --git a/ld/testsuite/ld-aarch64/farcall-b-plt.d b/ld/testsuite/ld-aarch64/farcall-b-plt.d
index 49c82eb..c0c7750 100644
--- a/ld/testsuite/ld-aarch64/farcall-b-plt.d
+++ b/ld/testsuite/ld-aarch64/farcall-b-plt.d
@@ -29,7 +29,9 @@  Disassembly of section .text:
 	...
 .*:	.* 	b	.* <__foo_veneer>
 .*:	d65f03c0 	ret
+.*:	.* 	.inst	0x00000000 ; undefined
 .*:	.* 	b	.* <__foo_veneer\+.*>
+.*:	.* 	nop
 
 .* <__foo_veneer>:
 .*:	.* 	adrp	x16, 0 <foo@plt.*>
diff --git a/ld/testsuite/ld-aarch64/farcall-b-section.d b/ld/testsuite/ld-aarch64/farcall-b-section.d
index 4745c0f..71e04f8 100644
--- a/ld/testsuite/ld-aarch64/farcall-b-section.d
+++ b/ld/testsuite/ld-aarch64/farcall-b-section.d
@@ -8,21 +8,23 @@ 
 Disassembly of section .text:
 
 .* <_start>:
-    1000:	14000008 	b	1020 <___veneer>
-    1004:	14000003 	b	1010 <___veneer>
+    1000:	1400000a 	b	1028 <___veneer>
+    1004:	14000005 	b	1018 <___veneer>
     1008:	d65f03c0 	ret
-    100c:	1400000d 	b	1040 <___veneer\+0x20>
+    100c:	00000000 	.inst	0x00000000 ; undefined
+    1010:	1400000e 	b	1048 <___veneer\+0x20>
+    1014:	d503201f 	nop
 
 .* <___veneer>:
-    1010:	90040010 	adrp	x16, 8001000 <bar>
-    1014:	91001210 	add	x16, x16, #0x4
-    1018:	d61f0200 	br	x16
-    101c:	00000000 	.inst	0x00000000 ; undefined
+    1018:	90040010 	adrp	x16, 8001000 <bar>
+    101c:	91001210 	add	x16, x16, #0x4
+    1020:	d61f0200 	br	x16
+    1024:	00000000 	.inst	0x00000000 ; undefined
 
 .* <___veneer>:
-    1020:	90040010 	adrp	x16, 8001000 <bar>
-    1024:	91000210 	add	x16, x16, #0x0
-    1028:	d61f0200 	br	x16
+    1028:	90040010 	adrp	x16, 8001000 <bar>
+    102c:	91000210 	add	x16, x16, #0x0
+    1030:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
diff --git a/ld/testsuite/ld-aarch64/farcall-b.d b/ld/testsuite/ld-aarch64/farcall-b.d
index c1a0c6f..001c640 100644
--- a/ld/testsuite/ld-aarch64/farcall-b.d
+++ b/ld/testsuite/ld-aarch64/farcall-b.d
@@ -8,13 +8,14 @@ 
 Disassembly of section .text:
 
 0000000000001000 <_start>:
- +1000:	14000003 	b	100c <__bar_veneer>
+ +1000:	14000004 	b	1010 <__bar_veneer>
  +1004:	d65f03c0 	ret
-[ \t]+1008:[ \t]+14000007[ \t]+b[ \t]+1024 <__bar_veneer\+0x18>
-000000000000100c <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+[ \t]+1008:[ \t]+14000008[ \t]+b[ \t]+1028 <__bar_veneer\+0x18>
+[ \t]+100c:[ \t]+d503201f[ \t]+nop
+0000000000001010 <__bar_veneer>:
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
diff --git a/ld/testsuite/ld-aarch64/farcall-back.d b/ld/testsuite/ld-aarch64/farcall-back.d
index 8b22360..fcd0a29 100644
--- a/ld/testsuite/ld-aarch64/farcall-back.d
+++ b/ld/testsuite/ld-aarch64/farcall-back.d
@@ -9,66 +9,68 @@ 
 Disassembly of section .text:
 
 0000000000001000 <_start>:
-    1000:	14000413 	b	204c <__bar1_veneer>
-    1004:	94000412 	bl	204c <__bar1_veneer>
-    1008:	14000407 	b	2024 <__bar2_veneer>
-    100c:	94000406 	bl	2024 <__bar2_veneer>
-    1010:	14000409 	b	2034 <__bar3_veneer>
-    1014:	94000408 	bl	2034 <__bar3_veneer>
+    1000:	14000414 	b	2050 <__bar1_veneer>
+    1004:	94000413 	bl	2050 <__bar1_veneer>
+    1008:	14000408 	b	2028 <__bar2_veneer>
+    100c:	94000407 	bl	2028 <__bar2_veneer>
+    1010:	1400040a 	b	2038 <__bar3_veneer>
+    1014:	94000409 	bl	2038 <__bar3_veneer>
     1018:	d65f03c0 	ret
 	...
 
 000000000000201c <_back>:
     201c:	d65f03c0 	ret
 
-[ \t]+2020:[ \t]+14000013[ \t]+b[ \t]+206c <__bar1_veneer\+0x20>
-0000000000002024 <__bar2_veneer>:
-    2024:	f07ffff0 	adrp	x16, 100001000 <bar1\+0x1000>
-    2028:	91002210 	add	x16, x16, #0x8
-    202c:	d61f0200 	br	x16
-    2030:	00000000 	.inst	0x00000000 ; undefined
+[ \t]+2020:[ \t]+14000014[ \t]+b[ \t]+2070 <__bar1_veneer\+0x20>
+[ \t]+2024:[ \t]+d503201f[ \t]+nop
+0000000000002028 <__bar2_veneer>:
+    2028:	f07ffff0 	adrp	x16, 100001000 <bar1\+0x1000>
+    202c:	91002210 	add	x16, x16, #0x8
+    2030:	d61f0200 	br	x16
+    2034:	00000000 	.inst	0x00000000 ; undefined
 
-0000000000002034 <__bar3_veneer>:
-    2034:	58000090 	ldr	x16, 2044 <__bar3_veneer\+0x10>
-    2038:	10000011 	adr	x17, 2038 <__bar3_veneer\+0x4>
-    203c:	8b110210 	add	x16, x16, x17
-    2040:	d61f0200 	br	x16
-    2044:	ffffffd8 	.word	0xffffffd8
-    2048:	00000000 	.word	0x00000000
+0000000000002038 <__bar3_veneer>:
+    2038:	58000090 	ldr	x16, 2048 <__bar3_veneer\+0x10>
+    203c:	10000011 	adr	x17, 203c <__bar3_veneer\+0x4>
+    2040:	8b110210 	add	x16, x16, x17
+    2044:	d61f0200 	br	x16
+    2048:	ffffffd4 	.word	0xffffffd4
+    204c:	00000000 	.word	0x00000000
 
-000000000000204c <__bar1_veneer>:
-    204c:	d07ffff0 	adrp	x16, 100000000 <bar1>
-    2050:	91000210 	add	x16, x16, #0x0
-    2054:	d61f0200 	br	x16
+0000000000002050 <__bar1_veneer>:
+    2050:	d07ffff0 	adrp	x16, 100000000 <bar1>
+    2054:	91000210 	add	x16, x16, #0x0
+    2058:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
 
 0000000100000000 <bar1>:
    100000000:	d65f03c0 	ret
-   100000004:	14000806 	b	10000201c <___start_veneer>
+   100000004:	14000807 	b	100002020 <___start_veneer>
 	...
 
 0000000100001008 <bar2>:
    100001008:	d65f03c0 	ret
-   10000100c:	14000404 	b	10000201c <___start_veneer>
+   10000100c:	14000405 	b	100002020 <___start_veneer>
 	...
 
 0000000100002010 <bar3>:
    100002010:	d65f03c0 	ret
-   100002014:	14000008 	b	100002034 <___back_veneer>
+   100002014:	14000009 	b	100002038 <___back_veneer>
 
-[ \t]+100002018:[ \t]+1400000d[ \t]+b[ \t]+10000204c <___back_veneer\+0x18>
-000000010000201c <___start_veneer>:
-   10000201c:	58000090 	ldr	x16, 10000202c <___start_veneer\+0x10>
-   100002020:	10000011 	adr	x17, 100002020 <___start_veneer\+0x4>
-   100002024:	8b110210 	add	x16, x16, x17
-   100002028:	d61f0200 	br	x16
-   10000202c:	ffffefe0 	.word	0xffffefe0
-   100002030:	fffffffe 	.word	0xfffffffe
+[ \t]+100002018:[ \t]+1400000e[ \t]+b[ \t]+100002050 <___back_veneer\+0x18>
+[ \t]+10000201c:[ \t]+d503201f[ \t]+nop
+0000000100002020 <___start_veneer>:
+   100002020:	58000090 	ldr	x16, 100002030 <___start_veneer\+0x10>
+   100002024:	10000011 	adr	x17, 100002024 <___start_veneer\+0x4>
+   100002028:	8b110210 	add	x16, x16, x17
+   10000202c:	d61f0200 	br	x16
+   100002030:	ffffefdc 	.word	0xffffefdc
+   100002034:	fffffffe 	.word	0xfffffffe
 
-0000000100002034 <___back_veneer>:
-   100002034:	90800010 	adrp	x16, 2000 <_start\+0x1000>
-   100002038:	91007210 	add	x16, x16, #0x1c
-   10000203c:	d61f0200 	br	x16
+0000000100002038 <___back_veneer>:
+   100002038:	90800010 	adrp	x16, 2000 <_start\+0x1000>
+   10000203c:	91007210 	add	x16, x16, #0x1c
+   100002040:	d61f0200 	br	x16
 	...
diff --git a/ld/testsuite/ld-aarch64/farcall-bl-defsym.d b/ld/testsuite/ld-aarch64/farcall-bl-defsym.d
index 68332bf..ce16fa1 100644
--- a/ld/testsuite/ld-aarch64/farcall-bl-defsym.d
+++ b/ld/testsuite/ld-aarch64/farcall-bl-defsym.d
@@ -8,11 +8,12 @@ 
 Disassembly of section .text:
 
 0000000000001000 <_start>:
- +1000:	94000003 	bl	100c <__bar_veneer>
+ +1000:	94000004 	bl	1010 <__bar_veneer>
  +1004:	d65f03c0 	ret
-[ \t]+1008:[ \t]+14000007[ \t]+b[ \t]+1024 <__bar_veneer\+0x18>
-000000000000100c <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+[ \t]+1008:[ \t]+14000008[ \t]+b[ \t]+1028 <__bar_veneer\+0x18>
+[ \t]+100c:[ \t]+d503201f[ \t]+nop
+0000000000001010 <__bar_veneer>:
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
diff --git a/ld/testsuite/ld-aarch64/farcall-bl-none-function.d b/ld/testsuite/ld-aarch64/farcall-bl-none-function.d
index b6a4dda..4ab9c7e 100644
--- a/ld/testsuite/ld-aarch64/farcall-bl-none-function.d
+++ b/ld/testsuite/ld-aarch64/farcall-bl-none-function.d
@@ -8,14 +8,15 @@ 
 Disassembly of section .text:
 
 .* <_start>:
-    1000:	94000003 	bl	100c <__bar_veneer>
+    1000:	94000004 	bl	1010 <__bar_veneer>
     1004:	d65f03c0 	ret
-    1008:	14000007 	b	1024 <__bar_veneer\+0x18>
+    1008:	14000008 	b	1028 <__bar_veneer\+0x18>
+    100c:	d503201f 	nop
 
 .* <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
diff --git a/ld/testsuite/ld-aarch64/farcall-bl-plt.d b/ld/testsuite/ld-aarch64/farcall-bl-plt.d
index 457a4fa..6d71459 100644
--- a/ld/testsuite/ld-aarch64/farcall-bl-plt.d
+++ b/ld/testsuite/ld-aarch64/farcall-bl-plt.d
@@ -29,7 +29,9 @@  Disassembly of section .text:
 	...
 .*:	.* 	bl	.* <__foo_veneer>
 .*:	d65f03c0 	ret
+.*:	.* 	.inst	0x00000000 ; undefined
 .*:	.* 	b	.* <__foo_veneer\+.*>
+.*:	.* 	nop
 
 .* <__foo_veneer>:
 .*:	.* 	adrp	x16, 0 <foo@plt.*>
diff --git a/ld/testsuite/ld-aarch64/farcall-bl-section.d b/ld/testsuite/ld-aarch64/farcall-bl-section.d
index 2bd4f85..09e84ef 100644
--- a/ld/testsuite/ld-aarch64/farcall-bl-section.d
+++ b/ld/testsuite/ld-aarch64/farcall-bl-section.d
@@ -8,21 +8,23 @@ 
 Disassembly of section .text:
 
 .* <_start>:
-    1000:	94000008 	bl	1020 <___veneer>
-    1004:	94000003 	bl	1010 <___veneer>
+    1000:	9400000a 	bl	1028 <___veneer>
+    1004:	94000005 	bl	1018 <___veneer>
     1008:	d65f03c0 	ret
-    100c:	1400000d 	b	1040 <___veneer\+0x20>
+    100c:	00000000 	.inst	0x00000000 ; undefined
+    1010:	1400000e 	b	1048 <___veneer\+0x20>
+    1014:	d503201f 	nop
 
 .* <___veneer>:
-    1010:	90040010 	adrp	x16, 8001000 <bar>
-    1014:	91001210 	add	x16, x16, #0x4
-    1018:	d61f0200 	br	x16
-    101c:	00000000 	.inst	0x00000000 ; undefined
+    1018:	90040010 	adrp	x16, 8001000 <bar>
+    101c:	91001210 	add	x16, x16, #0x4
+    1020:	d61f0200 	br	x16
+    1024:	00000000 	.inst	0x00000000 ; undefined
 
 .* <___veneer>:
-    1020:	90040010 	adrp	x16, 8001000 <bar>
-    1024:	91000210 	add	x16, x16, #0x0
-    1028:	d61f0200 	br	x16
+    1028:	90040010 	adrp	x16, 8001000 <bar>
+    102c:	91000210 	add	x16, x16, #0x0
+    1030:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo:
diff --git a/ld/testsuite/ld-aarch64/farcall-bl.d b/ld/testsuite/ld-aarch64/farcall-bl.d
index 78e94dc..d29334b 100644
--- a/ld/testsuite/ld-aarch64/farcall-bl.d
+++ b/ld/testsuite/ld-aarch64/farcall-bl.d
@@ -8,13 +8,14 @@ 
 Disassembly of section .text:
 
 0000000000001000 <_start>:
- +1000:	94000003 	bl	100c <__bar_veneer>
+ +1000:	94000004 	bl	1010 <__bar_veneer>
  +1004:	d65f03c0 	ret
-[ \t]+1008:[ \t]+14000007[ \t]+b[ \t]+1024 <__bar_veneer\+0x18>
-000000000000100c <__bar_veneer>:
-    100c:	90040010 	adrp	x16, 8001000 <bar>
-    1010:	91000210 	add	x16, x16, #0x0
-    1014:	d61f0200 	br	x16
+[ \t]+1008:[ \t]+14000008[ \t]+b[ \t]+1028 <__bar_veneer\+0x18>
+[ \t]+100c:[ \t]+d503201f[ \t]+nop
+0000000000001010 <__bar_veneer>:
+    1010:	90040010 	adrp	x16, 8001000 <bar>
+    1014:	91000210 	add	x16, x16, #0x0
+    1018:	d61f0200 	br	x16
 	...
 
 Disassembly of section .foo: