diff mbox

[edk2,1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it

Message ID 1469618762-7648-1-git-send-email-ard.biesheuvel@linaro.org
State Accepted
Commit 026a82abf0bd6268d32f4559dbede00715264f74
Headers show

Commit Message

Ard Biesheuvel July 27, 2016, 11:26 a.m. UTC
The ADRP instruction in the AArch64 ISA requires the link time and load
time offsets of a binary to be equal modulo 4 KB. The reason is that this
instruction always produces a multiple of 4 KB, and relies on a subsequent
ADD or LDR instruction to set the offset into the page. The resulting
symbol reference only produces the correct value if the symbol in question
resides at that exact offset into the page, and so loading the binary at
arbitrary offsets is not possible.

Due to the various levels of padding when packing FVs into FVs into FDs,
this alignment is very costly for XIP code, and so we would like to relax
this alignment requirement if possible.

Given that symbols that are sufficiently close (within 1 MB) of the
reference can also be reached using an ADR instruction which does not
suffer from this alignment issue, let's replace ADRP instructions with ADR
after linking if the offset can be encoded in this instruction's immediate
field. Note that this only makes sense if the section alignment is < 4 KB.
Otherwise, replacing the ADRP has no benefit, considering that the
subsequent ADD or LDR instruction is retained, and that micro-architectures
are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
fusing) than for ADR/ADD pairs, which are non-typical.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 BaseTools/Source/C/GenFw/Elf64Convert.c | 51 ++++++++++++++++----
 1 file changed, 42 insertions(+), 9 deletions(-)

-- 
2.7.4

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Comments

Ard Biesheuvel Aug. 1, 2016, 11:53 a.m. UTC | #1
On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> The ADRP instruction in the AArch64 ISA requires the link time and load

> time offsets of a binary to be equal modulo 4 KB. The reason is that this

> instruction always produces a multiple of 4 KB, and relies on a subsequent

> ADD or LDR instruction to set the offset into the page. The resulting

> symbol reference only produces the correct value if the symbol in question

> resides at that exact offset into the page, and so loading the binary at

> arbitrary offsets is not possible.

>

> Due to the various levels of padding when packing FVs into FVs into FDs,

> this alignment is very costly for XIP code, and so we would like to relax

> this alignment requirement if possible.

>

> Given that symbols that are sufficiently close (within 1 MB) of the

> reference can also be reached using an ADR instruction which does not

> suffer from this alignment issue, let's replace ADRP instructions with ADR

> after linking if the offset can be encoded in this instruction's immediate

> field. Note that this only makes sense if the section alignment is < 4 KB.

> Otherwise, replacing the ADRP has no benefit, considering that the

> subsequent ADD or LDR instruction is retained, and that micro-architectures

> are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op

> fusing) than for ADR/ADD pairs, which are non-typical.

>

> Contributed-under: TianoCore Contribution Agreement 1.0

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


@Liming, @Leif:

are there any objections to these patches? I know it is unfortunate
that we need to modify instructions as part of the ELF to PE/COFF
conversion, but it is very effective

ArmVirtQemu-AARCH64 built with CLANG35:

Before:

FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

After:

FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

For comparision, GCC49

FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free

and GCC5 (with LTO)

FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free

In other words, it turns CLANG35 from a pathetic outlier into
something usable :-)

Regards,
Ard.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Leif Lindholm Aug. 1, 2016, 2:19 p.m. UTC | #2
Apologies, lost track of this one.

On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> > The ADRP instruction in the AArch64 ISA requires the link time and load

> > time offsets of a binary to be equal modulo 4 KB. The reason is that this

> > instruction always produces a multiple of 4 KB, and relies on a subsequent

> > ADD or LDR instruction to set the offset into the page. The resulting

> > symbol reference only produces the correct value if the symbol in question

> > resides at that exact offset into the page, and so loading the binary at

> > arbitrary offsets is not possible.

> >

> > Due to the various levels of padding when packing FVs into FVs into FDs,

> > this alignment is very costly for XIP code, and so we would like to relax

> > this alignment requirement if possible.

> >

> > Given that symbols that are sufficiently close (within 1 MB) of the

> > reference can also be reached using an ADR instruction which does not

> > suffer from this alignment issue, let's replace ADRP instructions with ADR

> > after linking if the offset can be encoded in this instruction's immediate

> > field. Note that this only makes sense if the section alignment is < 4 KB.

> > Otherwise, replacing the ADRP has no benefit, considering that the

> > subsequent ADD or LDR instruction is retained, and that micro-architectures

> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op

> > fusing) than for ADR/ADD pairs, which are non-typical.

> >

> > Contributed-under: TianoCore Contribution Agreement 1.0

> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> 

> @Liming, @Leif:

> 

> are there any objections to these patches? I know it is unfortunate

> that we need to modify instructions as part of the ELF to PE/COFF

> conversion, but it is very effective


It's absolutely horrid, but extremely useful.
For the series:
Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>


> ArmVirtQemu-AARCH64 built with CLANG35:

> 

> Before:

> 

> FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free

> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

> 

> After:

> 

> FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free

> FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

> 

> For comparision, GCC49

> 

> FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free

> FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free

> 

> and GCC5 (with LTO)

> 

> FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free

> FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free

> 

> In other words, it turns CLANG35 from a pathetic outlier into

> something usable :-)

> 

> Regards,

> Ard.

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel Aug. 2, 2016, 9:03 a.m. UTC | #3
On 1 August 2016 at 16:19, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> Apologies, lost track of this one.

>

> On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:

>> On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>> > The ADRP instruction in the AArch64 ISA requires the link time and load

>> > time offsets of a binary to be equal modulo 4 KB. The reason is that this

>> > instruction always produces a multiple of 4 KB, and relies on a subsequent

>> > ADD or LDR instruction to set the offset into the page. The resulting

>> > symbol reference only produces the correct value if the symbol in question

>> > resides at that exact offset into the page, and so loading the binary at

>> > arbitrary offsets is not possible.

>> >

>> > Due to the various levels of padding when packing FVs into FVs into FDs,

>> > this alignment is very costly for XIP code, and so we would like to relax

>> > this alignment requirement if possible.

>> >

>> > Given that symbols that are sufficiently close (within 1 MB) of the

>> > reference can also be reached using an ADR instruction which does not

>> > suffer from this alignment issue, let's replace ADRP instructions with ADR

>> > after linking if the offset can be encoded in this instruction's immediate

>> > field. Note that this only makes sense if the section alignment is < 4 KB.

>> > Otherwise, replacing the ADRP has no benefit, considering that the

>> > subsequent ADD or LDR instruction is retained, and that micro-architectures

>> > are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op

>> > fusing) than for ADR/ADD pairs, which are non-typical.

>> >

>> > Contributed-under: TianoCore Contribution Agreement 1.0

>> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>>

>> @Liming, @Leif:

>>

>> are there any objections to these patches? I know it is unfortunate

>> that we need to modify instructions as part of the ELF to PE/COFF

>> conversion, but it is very effective

>

> It's absolutely horrid, but extremely useful.

> For the series:

> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

>


Thanks

Committed as

026a82abf0bd BaseTools/GenFw AARCH64: convert ADRP to ADR instructions
if binary size allows it
b89919ee8f8c BaseTools AARCH64: override XIP module linker alignment to 32 bytes
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
diff mbox

Patch

diff --git a/BaseTools/Source/C/GenFw/Elf64Convert.c b/BaseTools/Source/C/GenFw/Elf64Convert.c
index 024a2a0d5357..944c94b8f8b4 100644
--- a/BaseTools/Source/C/GenFw/Elf64Convert.c
+++ b/BaseTools/Source/C/GenFw/Elf64Convert.c
@@ -806,26 +806,59 @@  WriteSections64 (
           switch (ELF_R_TYPE(Rel->r_info)) {
 
           case R_AARCH64_ADR_PREL_PG_HI21:
-          case R_AARCH64_ADD_ABS_LO12_NC:
-          case R_AARCH64_LDST8_ABS_LO12_NC:
-          case R_AARCH64_LDST16_ABS_LO12_NC:
-          case R_AARCH64_LDST32_ABS_LO12_NC:
-          case R_AARCH64_LDST64_ABS_LO12_NC:
-          case R_AARCH64_LDST128_ABS_LO12_NC:
             //
             // AArch64 PG_H21 relocations are typically paired with ABS_LO12
             // relocations, where a PC-relative reference with +/- 4 GB range is
             // split into a relative high part and an absolute low part. Since
             // the absolute low part represents the offset into a 4 KB page, we
+            // either have to convert the ADRP into an ADR instruction, or we
+            // need to use a section alignment of at least 4 KB, so that the
+            // binary appears at a correct offset at runtime. In any case, we
             // have to make sure that the 4 KB relative offsets of both the
             // section containing the reference as well as the section to which
             // it refers have not been changed during PE/COFF conversion (i.e.,
             // in ScanSections64() above).
             //
+            if (mCoffAlignment < 0x1000) {
+              //
+              // Attempt to convert the ADRP into an ADR instruction.
+              // This is only possible if the symbol is within +/- 1 MB.
+              //
+              INT64 Offset;
+
+              // Decode the ADRP instruction
+              Offset = (INT32)((*(UINT32 *)Targ & 0xffffe0) << 8);
+              Offset = (Offset << (6 - 5)) | ((*(UINT32 *)Targ & 0x60000000) >> (29 - 12));
+
+              //
+              // ADRP offset is relative to the previous page boundary,
+              // whereas ADR offset is relative to the instruction itself.
+              // So fix up the offset so it points to the page containing
+              // the symbol.
+              //
+              Offset -= (UINTN)(Targ - mCoffFile) & 0xfff;
+
+              if (Offset < -0x100000 || Offset > 0xfffff) {
+                Error (NULL, 0, 3000, "Invalid", "WriteSections64(): %s  due to its size (> 1 MB), this module requires 4 KB section alignment.",
+                  mInImageName);
+                break;
+              }
+
+              // Re-encode the offset as an ADR instruction
+              *(UINT32 *)Targ &= 0x1000001f;
+              *(UINT32 *)Targ |= ((Offset & 0x1ffffc) << (5 - 2)) | ((Offset & 0x3) << 29);
+            }
+            /* fall through */
+
+          case R_AARCH64_ADD_ABS_LO12_NC:
+          case R_AARCH64_LDST8_ABS_LO12_NC:
+          case R_AARCH64_LDST16_ABS_LO12_NC:
+          case R_AARCH64_LDST32_ABS_LO12_NC:
+          case R_AARCH64_LDST64_ABS_LO12_NC:
+          case R_AARCH64_LDST128_ABS_LO12_NC:
             if (((SecShdr->sh_addr ^ SecOffset) & 0xfff) != 0 ||
-                ((SymShdr->sh_addr ^ mCoffSectionsOffset[Sym->st_shndx]) & 0xfff) != 0 ||
-                mCoffAlignment < 0x1000) {
-              Error (NULL, 0, 3000, "Invalid", "WriteSections64(): %s AARCH64 small code model requires 4 KB section alignment.",
+                ((SymShdr->sh_addr ^ mCoffSectionsOffset[Sym->st_shndx]) & 0xfff) != 0) {
+              Error (NULL, 0, 3000, "Invalid", "WriteSections64(): %s AARCH64 small code model requires identical ELF and PE/COFF section offsets modulo 4 KB.",
                 mInImageName);
               break;
             }