diff mbox series

[RFC] Add support for non-contiguous memory regions

Message ID CAKdteOaZmkDJVvwGeb=qjh5YayK6cT7oHBQ9w+GhOtU2gdqwTg@mail.gmail.com
State New
Headers show
Series [RFC] Add support for non-contiguous memory regions | expand

Commit Message

Christophe Lyon Nov. 29, 2019, 10:12 a.m. UTC
Hi,

The attached patches implement support for non-contiguous memory
regions, which was discussed in more detail in
https://sourceware.org/ml/binutils/2019-07/msg00020.html

In other words, it allows to list input sections in multiple output
regions, to help with cases were we have several memory banks, but
don't care in which one a given section goes. This avoids the painful
linker script changes needed when a memory banks become too small
after program changes (eg more data, or compiler code generation
changes).

Patch 1 contains the implementation, and patch 2 adds a new test.

This is an RFC because it lacks documentation, and I thought I should
check whether the approach is OK before polishing.

Patch 1 adds a new --enable-non-contiguous-regions linker flag which
is required to enable this new feature. Of course, we can discuss the
option name, suggestions welcome :-)
It works by allowing processing of input section multiple times until
a sufficiently large output region is found, the irrelevant ones are
removed when we detect that the candidate input section would overflow
the current output section.

At some point I thought I should insert padding when an input section
is too large to fit in the current output section, in order to make
sure it is full, but it turns out not to be necessary.

The added testcase defines 4 input sections: .data, .data.2, .data.3
and .data.4, and the linker script has
*(.data) *(.data.*)
for each of the 3 output sections .raml, .ramu and .ramz. .data.3 and
.data.3 do not fit in .raml, and are thus assigned respectively to
.ramu and .ramz.

I've run the tests with arm-none-eabi as target, and also checked that
there is no regression when forcing --enable-non-contiguous-regions:
actually only the ld/scripts/rgn-over* tests fail in such a case,
because the offending sections can be assigned to other large enough
output sections, and we no longer face the error cases these tests try
to catch.

One bit I'm not clear about in the expected results is the file offset
for .ramu and .ramz sections: why are they aligned on 0x1000, while
.ARM.attributes isn't ?

In fact, I feel my patch is surprisingly small for a feature that has
been requested for so long... :-)

Thoughts?

Thanks,

Christophe

Comments

Simon Richter Nov. 29, 2019, 10:28 a.m. UTC | #1
Hi,

On Fri, Nov 29, 2019 at 11:12:03AM +0100, Christophe Lyon wrote:

> The attached patches implement support for non-contiguous memory

> regions, which was discussed in more detail in

> https://sourceware.org/ml/binutils/2019-07/msg00020.html


How does this interact with relaxations?

   Simon
Christophe Lyon Dec. 2, 2019, 9:56 a.m. UTC | #2
On Fri, 29 Nov 2019 at 11:28, Simon Richter <Simon.Richter@hogyros.de> wrote:
>

> Hi,

>

> On Fri, Nov 29, 2019 at 11:12:03AM +0100, Christophe Lyon wrote:

>

> > The attached patches implement support for non-contiguous memory

> > regions, which was discussed in more detail in

> > https://sourceware.org/ml/binutils/2019-07/msg00020.html

>

> How does this interact with relaxations?

>


Hi,

I haven't tested that specifically yet. Do you have a target /
testcase in mind that I could check?

Christophe

>    Simon
Simon Richter Dec. 2, 2019, 11:05 a.m. UTC | #3
Hi,

On Mon, Dec 02, 2019 at 10:56:22AM +0100, Christophe Lyon wrote:

> > > The attached patches implement support for non-contiguous memory

> > > regions


> > How does this interact with relaxations?


> I haven't tested that specifically yet. Do you have a target /

> testcase in mind that I could check?


A quick testcase would be on PowerPC:

Assembler file "test.S"

		.machine "ppc"

		.section .text.one
		b       2f

		.section .text.two
	2:
		nop

Linker script "test.ld"

	MEMORY {
		one (RXAI) : ORIGIN = 0x00000000, LENGTH = 0x10000000
		two (RXAI) : ORIGIN = 0x20000000, LENGTH = 0x10000000
	}

	SECTIONS {
		one : {
			*(.text.one)
		} > one
		two : {
			*(.text.two)
		} > two
	}

Assemble:

	$ as -o test.o test.S

Disassemble:

	$ objdump -dr test.o

	test.o:     file format elf64-powerpcle


	Disassembly of section .text.one:

	0000000000000000 <.text.one>:
	   0:   00 00 00 48     b       0x0
				0: R_PPC64_REL24        .text.two

	Disassembly of section .text.two:

	0000000000000000 <.text.two>:
	   0:   00 00 00 60     nop

Link:

	$ ld -o test -T test.ld --relax test.o

Disassemble:

	$ objdump -dr test

	test:     file format elf64-powerpcle

	Disassembly of section one:

	0000000000000000 <0000001b.plt_branch.1c:5>:
	   0:   08 80 82 e9     ld      r12,-32760(r2)
	   4:   a6 03 89 7d     mtctr   r12
	   8:   20 04 80 4e     bctr
		...
	  20:   e0 ff ff 4b     b       0 <0000001b.plt_branch.1c:5>

	Disassembly of section two:

	0000000020000000 <two>:
	    20000000:   00 00 00 60     nop

The linker generates the trampoline as the branch displacement doesn't fit
the instruction, extending the section size by 12 bytes. This happens after
addresses are assigned, because that is when the linker learns that the
displacement is too large.

There is also the opposite case, where branch instructions are replaced by
shorter variants on some architectures during relaxation, but I don't have
an example immediately ready.

The desired behaviour for the combination of non-contiguous regions and
relaxations should probably be defined by someone from the core team --
probably locking out the combination (like "-r --relax" on PowerPC) or just
ensuring that it doesn't crash is the sanest thing to do here.

   Simon
Christophe Lyon Dec. 3, 2019, 1:26 p.m. UTC | #4
On Mon, 2 Dec 2019 at 12:05, Simon Richter <Simon.Richter@hogyros.de> wrote:
>

> Hi,

>

> On Mon, Dec 02, 2019 at 10:56:22AM +0100, Christophe Lyon wrote:

>

> > > > The attached patches implement support for non-contiguous memory

> > > > regions

>

> > > How does this interact with relaxations?

>

> > I haven't tested that specifically yet. Do you have a target /

> > testcase in mind that I could check?

>

> A quick testcase would be on PowerPC:

>

> Assembler file "test.S"

>

>                 .machine "ppc"

>

>                 .section .text.one

>                 b       2f

>

>                 .section .text.two

>         2:

>                 nop

>

> Linker script "test.ld"

>

>         MEMORY {

>                 one (RXAI) : ORIGIN = 0x00000000, LENGTH = 0x10000000

>                 two (RXAI) : ORIGIN = 0x20000000, LENGTH = 0x10000000

>         }

>

>         SECTIONS {

>                 one : {

>                         *(.text.one)

>                 } > one

>                 two : {

>                         *(.text.two)

>                 } > two

>         }

>

> Assemble:

>

>         $ as -o test.o test.S

>

> Disassemble:

>

>         $ objdump -dr test.o

>

>         test.o:     file format elf64-powerpcle

>

>

>         Disassembly of section .text.one:

>

>         0000000000000000 <.text.one>:

>            0:   00 00 00 48     b       0x0

>                                 0: R_PPC64_REL24        .text.two

>

>         Disassembly of section .text.two:

>

>         0000000000000000 <.text.two>:

>            0:   00 00 00 60     nop

>

> Link:

>

>         $ ld -o test -T test.ld --relax test.o

>

> Disassemble:

>

>         $ objdump -dr test

>

>         test:     file format elf64-powerpcle

>

>         Disassembly of section one:

>

>         0000000000000000 <0000001b.plt_branch.1c:5>:

>            0:   08 80 82 e9     ld      r12,-32760(r2)

>            4:   a6 03 89 7d     mtctr   r12

>            8:   20 04 80 4e     bctr

>                 ...

>           20:   e0 ff ff 4b     b       0 <0000001b.plt_branch.1c:5>

>

>         Disassembly of section two:

>

>         0000000020000000 <two>:

>             20000000:   00 00 00 60     nop

>

> The linker generates the trampoline as the branch displacement doesn't fit

> the instruction, extending the section size by 12 bytes. This happens after

> addresses are assigned, because that is when the linker learns that the

> displacement is too large.

>

> There is also the opposite case, where branch instructions are replaced by

> shorter variants on some architectures during relaxation, but I don't have

> an example immediately ready.

>

> The desired behaviour for the combination of non-contiguous regions and

> relaxations should probably be defined by someone from the core team --

> probably locking out the combination (like "-r --relax" on PowerPC) or just

> ensuring that it doesn't crash is the sanest thing to do here.

>


Whether the new option is enabled or not, my patch has no impact on
your testcase, because the memory regions are large enough to contain
all the code.

If I modify your linker script to reduce the length of memory region
"one" to 0x10, the testcase fails to link (as expected) with:
a.out section `one' will not fit in region `one'

If I use --enable-non-contiguous-regions, link succeeds, the "one"
section is silently discarded from the output (and no trampoline
either).

This is not a user-friendly behaviour :-) I'm checking how to keep the
proper error message instead.


>    Simon
Simon Richter Dec. 3, 2019, 2:25 p.m. UTC | #5
Hi,

On Tue, Dec 03, 2019 at 02:26:59PM +0100, Christophe Lyon wrote:

> If I use --enable-non-contiguous-regions, link succeeds, the "one"

> section is silently discarded from the output (and no trampoline

> either).


> This is not a user-friendly behaviour :-) I'm checking how to keep the

> proper error message instead.


That's why I asked. :)

The annoying thing is that there are lots of interesting cases in that
combination, e.g. if you add

test2.S

	.machine "ppc"
	.section .text.one

	nop
	nop
	nop
	nop

and try to fit that together with the other object into an output region
"one" that has two ranges of 0x20 bytes each. This would fit, by placing
the trampoline and the branch it belongs to in one range, and the four nops
in the other, but arriving at that conclusion will be difficult for the
linker.

With reversed order, it could be even worse: four nops fills half of the
first range, and either the branch or the trampoline alone would still fit,
but these need to go to the same output range as the entire point of
generating the trampoline in the first place was to have it near the
original branch.

Hypothetical adversarial memory map (not sure about syntax):

	MEMORY {
		one (RXAI) : ORIGIN = 0x00000000, LENGTH = 0x10
                one (RXAI) : ORIGIN = 0x10000000, LENGTH = 0x10
                two (RXAI) : ORIGIN = 0x20000000, LENGTH = 0x10
	}

Getting all of these interesting cases right sounds like a lot of work to
me, and although I would expect that there might be some overlap between
"people with weird memory maps" and "people who need trampolines to get
over the holes in weird memory maps", I'm not sure that this is necessary
as long as no incorrect output is generated.

The same goes for relaxations that shrink the code, I believe Nios II has a
bunch of these as program size directly translates to resource usage. It
will be annoying if the linker refuses to link a program that would fit
after shrinking, but still acceptable to me as a user.

   Simon
diff mbox series

Patch

From 5a6274a0db009b7e19835a0726c88360d9b9c6e8 Mon Sep 17 00:00:00 2001
From: Christophe Lyon <christophe.lyon@linaro.org>
Date: Mon, 25 Nov 2019 08:55:37 +0000
Subject: [PATCH 1/2] Add support for non-contiguous memory

2019-11-27  Christophe Lyon  <christophe.lyon@linaro.org>

	bfd/
	* bfd-in2.h: Regenerate.
	* section.c (asection): Add already_assigned field.
	(BFD_FAKE_SECTION): Add default initializer for it.

	include/
	* bfdlink.h (bfd_link_info): Add non_contiguous_regions field.

	ld/
	* ldlang.c (lang_add_section): Add support for
	non_contiguous_regions.
	(size_input_section): Likewise.
	(lang_size_sections_1): Likewise.
	* ldlex.h (option_values): Add OPTION_NON_CONTIGUOUS_REGIONS.
	* lexsup.c (ld_options): Add entry for
	--enable-non-contiguous-regions.
	(parse_args): Handle it.
---
 bfd/bfd-in2.h     |  9 +++++++-
 bfd/section.c     |  9 +++++++-
 include/bfdlink.h |  3 +++
 ld/ldlang.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 ld/ldlex.h        |  1 +
 ld/lexsup.c       |  5 +++++
 6 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/bfd/bfd-in2.h b/bfd/bfd-in2.h
index a00dfa35..0e1126c 100644
--- a/bfd/bfd-in2.h
+++ b/bfd/bfd-in2.h
@@ -1177,6 +1177,10 @@  typedef struct bfd_section
     struct bfd_link_order *link_order;
     struct bfd_section *s;
   } map_head, map_tail;
+ /* Points to the output section this section is already assigned to, if any.
+    This is used when support for non-contiguous memory regions is enabled.  */
+ struct bfd_section *already_assigned;
+
 } asection;
 
 /* Relax table contains information about instructions which can
@@ -1358,7 +1362,10 @@  discarded_section (const asection *sec)
      (struct bfd_symbol *) SYM, &SEC.symbol,                           \
                                                                        \
   /* map_head, map_tail                                            */  \
-     { NULL }, { NULL }                                                \
+     { NULL }, { NULL },                                               \
+                                                                       \
+  /* already_assigned                                              */  \
+     NULL                                                              \
     }
 
 /* We use a macro to initialize the static asymbol structures because
diff --git a/bfd/section.c b/bfd/section.c
index 34e08ae..b1f0b71 100644
--- a/bfd/section.c
+++ b/bfd/section.c
@@ -536,6 +536,10 @@  CODE_FRAGMENT
 .    struct bfd_link_order *link_order;
 .    struct bfd_section *s;
 .  } map_head, map_tail;
+. {* Points to the output section this section is already assigned to, if any.
+.    This is used when support for non-contiguous memory regions is enabled.  *}
+. struct bfd_section *already_assigned;
+.
 .} asection;
 .
 .{* Relax table contains information about instructions which can
@@ -717,7 +721,10 @@  CODE_FRAGMENT
 .     (struct bfd_symbol *) SYM, &SEC.symbol,				\
 .									\
 .  {* map_head, map_tail                                            *}	\
-.     { NULL }, { NULL }						\
+.     { NULL }, { NULL },						\
+.									\
+.  {* already_assigned                                             *}	\
+.     NULL								\
 .    }
 .
 .{* We use a macro to initialize the static asymbol structures because
diff --git a/include/bfdlink.h b/include/bfdlink.h
index 32d1512..e1e7c1e 100644
--- a/include/bfdlink.h
+++ b/include/bfdlink.h
@@ -501,6 +501,9 @@  struct bfd_link_info
   /* TRUE if "-Map map" is passed to linker.  */
   unsigned int has_map_file : 1;
 
+  /* TRUE if "--enable-non-contiguous-regions" is passed to the linker.  */
+  unsigned int non_contiguous_regions : 1;
+
   /* Char that may appear as the first char of a symbol, but should be
      skipped (like symbol_leading_char) when looking up symbols in
      wrap_hash.  Used by PowerPC Linux for 'dot' symbols.  */
diff --git a/ld/ldlang.c b/ld/ldlang.c
index eedcb7f..f2acdbb 100644
--- a/ld/ldlang.c
+++ b/ld/ldlang.c
@@ -2550,7 +2550,14 @@  lang_add_section (lang_statement_list_type *ptr,
     }
 
   if (section->output_section != NULL)
-    return;
+    {
+      if (!link_info.non_contiguous_regions)
+	return;
+      /* SECTION has already been affected to an output section, but
+	 the user allows it to be mapped to another one in case it
+	 overflows. We'll later update the actual output section in
+	 size_input_section as appropriate.  */
+    }
 
   /* We don't copy the SEC_NEVER_LOAD flag from an input section
      to an output section, because we want to be able to include a
@@ -5109,11 +5116,27 @@  size_input_section
   (lang_statement_union_type **this_ptr,
    lang_output_section_statement_type *output_section_statement,
    fill_type *fill,
+   bfd_boolean *removed,
    bfd_vma dot)
 {
   lang_input_section_type *is = &((*this_ptr)->input_section);
   asection *i = is->section;
   asection *o = output_section_statement->bfd_section;
+  *removed = 0;
+
+  if (link_info.non_contiguous_regions)
+    {
+      /* If the input section I has already been successfully assigned
+	 to an output section other than O, don't bother with it and
+	 let the caller remove it from the list.  Keep processing in
+	 case we have already handled O, because the repeated passes
+	 have reinitialized its size.  */
+      if (i->already_assigned && i->already_assigned != o)
+	{
+	  *removed = 1;
+	  return dot;
+	}
+    }
 
   if (i->sec_info_type == SEC_INFO_TYPE_JUST_SYMS)
     i->output_offset = i->vma - o->vma;
@@ -5145,6 +5168,24 @@  size_input_section
 	  dot += alignment_needed;
 	}
 
+      if (link_info.non_contiguous_regions)
+	{
+	  /* If I would overflow O, let the caller remove I from the
+	     list.  */
+	  if (output_section_statement->region)
+	    {
+	      bfd_vma end = output_section_statement->region->origin
+		+ output_section_statement->region->length;
+
+	      if (dot + TO_ADDR (i->size) > end)
+		{
+		  *removed = 1;
+		  dot = end;
+		  return dot;
+		}
+	    }
+	}
+
       /* Remember where in the output section this input section goes.  */
       i->output_offset = dot - o->vma;
 
@@ -5152,6 +5193,14 @@  size_input_section
       dot += TO_ADDR (i->size);
       if (!(o->flags & SEC_FIXED_SIZE))
 	o->size = TO_SIZE (dot - o->vma);
+
+      if (link_info.non_contiguous_regions)
+	{
+	  /* Record that I was successfully assigned to O, and update
+	     its actual output section too.  */
+	  i->already_assigned = o;
+	  i->output_section = o;
+	}
     }
 
   return dot;
@@ -5436,9 +5485,10 @@  lang_size_sections_1
    bfd_boolean check_regions)
 {
   lang_statement_union_type *s;
+  lang_statement_union_type *prev_s = NULL;
 
   /* Size up the sections from their constituent parts.  */
-  for (s = *prev; s != NULL; s = s->header.next)
+  for (s = *prev; s != NULL; prev_s = s, s = s->header.next)
     {
       switch (s->header.type)
 	{
@@ -5852,6 +5902,7 @@  lang_size_sections_1
 	case lang_input_section_enum:
 	  {
 	    asection *i;
+	    bfd_boolean removed;
 
 	    i = s->input_section.section;
 	    if (relax)
@@ -5864,7 +5915,17 @@  lang_size_sections_1
 		  *relax = TRUE;
 	      }
 	    dot = size_input_section (prev, output_section_statement,
-				      fill, dot);
+				      fill, &removed, dot);
+
+	    if (link_info.non_contiguous_regions && removed)
+	      {
+		/* If I doesn't fit in the current output section,
+		   remove it from the list.  */
+		if (prev_s)
+		  prev_s->header.next=s->header.next;
+		else
+		  *prev = s->header.next;
+	      }
 	  }
 	  break;
 
diff --git a/ld/ldlex.h b/ld/ldlex.h
index 32a7a64..4e7a419 100644
--- a/ld/ldlex.h
+++ b/ld/ldlex.h
@@ -150,6 +150,7 @@  enum option_values
   OPTION_FORCE_GROUP_ALLOCATION,
   OPTION_PRINT_MAP_DISCARDED,
   OPTION_NO_PRINT_MAP_DISCARDED,
+  OPTION_NON_CONTIGUOUS_REGIONS,
 };
 
 /* The initial parser states.  */
diff --git a/ld/lexsup.c b/ld/lexsup.c
index d7766c3..13cab6d 100644
--- a/ld/lexsup.c
+++ b/ld/lexsup.c
@@ -122,6 +122,8 @@  static const struct ld_option ld_options[] =
     'E', NULL, N_("Export all dynamic symbols"), TWO_DASHES },
   { {"no-export-dynamic", no_argument, NULL, OPTION_NO_EXPORT_DYNAMIC},
     '\0', NULL, N_("Undo the effect of --export-dynamic"), TWO_DASHES },
+  { {"enable-non-contiguous-regions", no_argument, NULL, OPTION_NON_CONTIGUOUS_REGIONS},
+    '\0', NULL, N_("Enable support of non-contiguous memory regions"), TWO_DASHES },
   { {"EB", no_argument, NULL, OPTION_EB},
     '\0', NULL, N_("Link big-endian objects"), ONE_DASH },
   { {"EL", no_argument, NULL, OPTION_EL},
@@ -845,6 +847,9 @@  parse_args (unsigned argc, char **argv)
 	case OPTION_NO_EXPORT_DYNAMIC:
 	  link_info.export_dynamic = FALSE;
 	  break;
+	case OPTION_NON_CONTIGUOUS_REGIONS:
+	  link_info.non_contiguous_regions = TRUE;
+	  break;
 	case 'e':
 	  lang_add_entry (optarg, TRUE);
 	  break;
-- 
2.7.4