diff mbox

aarch64: Add split-stack initial support

Message ID 1469738164-7869-1-git-send-email-adhemerval.zanella@linaro.com
State Superseded
Headers show

Commit Message

Adhemerval Zanella Netto July 28, 2016, 8:36 p.m. UTC
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>


This patch adds the split-stack support on aarch64 (PR #67877).  As for
other ports this patch should be used along with glibc and gold support.

The support is done similar to other architectures: a __private_ss field is
added on TCB in glibc, a target-specific __morestack implementation and
helper functions are added in libgcc and compiler supported adjustments
(split-stack prologue, va_start for argument handling).  I also plan to
send the gold support to adjust stack allocation acrosss split-stack
and default code calls.

Current approach is similar to powerpc one: at most 2 GB of stack allocation
is support so stack adjustments can be done with 2 instructions (either just
a movn plus nop or a movn followed by movk).  The morestack call is non
standard with x10 hollding the requested stack pointer and x11 the required
stack size to be copied.  Also function arguments on the old stack are
accessed based on a value relative to the stack pointer, so x10 is used to
hold theold stack value.  Unwinding is handled by a personality routine that
knows how to find stack segments.

Split-stack prologue on function entry is as follow (this goes before the
usual function prologue):

	mrs    x9, tpidr_el0
	mov    x10, -<required stack allocation>
	nop/movk
	add    x10, sp, x10
	ldr    x9, [x9, 16]
	cmp    x10, x9
	b.cs    enough
	stp    x30, [sp, -16]mov    x11, <required arguments copy size>
	mov    x11, <required arguments copy size>
	bl     __morestack
	ldp    x30, [sp], 16
	ret
enough:
	# usual function prologue, modified a little at the end to set up the
	# arg_pointer in x10, starts here.  The arg_pointer is initialized,
	# if it is used, with
	mov     x11, <required stack allocation>
	add     x10, x29, x11
	b.cs    function
	mov     x10, x28
function:

Notes:
 1. Even if a function does not allocate a stack frame, a split-stack prologue
    is created.  It is to avoid issues with tail call for external symbols
    which might require linker adjustment (libgo/runtime/go-varargs.c).

 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
    to after the required stack calculation.

 3. Similar to powerpc, When the linker detects a call from split-stack to
    non-split-stack code, it adds 16k (or more) to the value found in "allocate"
    instructions (so non-split-stack code gets a larger stack).  The amount is
    tunable by a linker option.  The edit means aarch64 does not need to
    implement __morestack_non_split, necessary on x86 because insufficient
    space is available there to edit the stack comparison code.  This feature
    is only implemented in the GNU gold linker.

 4. AArch64 does not handle >2G stack initially and although it is possible
    to implement it, limiting to 2G allows to materize the allocation with
    only 2 instructions (mov + movk) and thus simplifying the linker
    adjustments required.  Supporting multiple threads each requiring more
    than 2G of stack is probably not that important, and likely to OOM at
    run time.

 5. The TCB support on GLIBC is meant to be included in version 2.25 [1].

I tested bootstrapping on aarch64-linux-gnu and although still digesting
the results I saw no regression.  All cgo tests are passing, although based
on previous reports in other archs gold support should be present to avoid
issues on split calling non-split code.

libgcc/ChangeLog:

	* libgcc/config.host: Use t-stack and t-statck-aarch64 for
	aarch64*-*-linux.
	* libgcc/config/aarch64/morestack-c.c: New file.
	* libgcc/config/aarch64/morestack.S: Likewise.
	* libgcc/config/aarch64/t-stack-aarch64: Likewise.
	* libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
	code.

gcc/ChangeLog:

	* common/config/aarch64/aarch64-common.c
	(aarch64_supports_split_stack): New function.
	(TARGET_SUPPORTS_SPLIT_STACK): New macro.
	* gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
	macro.
	* gcc/config/aarch64/aarch64-protos.h: Add
	aarch64_expand_split_stack_prologue and
	aarch64_split_stack_space_check.
	* gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the
	argument pointer (x10) for split-stack.
	(aarch64_expand_builtin_va_start): Use internal argument pointer
	instead of virtual_incoming_args_rtx.
	(aarch64_expand_split_stack_prologue): New function.
	(aarch64_file_end): Emit the split-stack note sections.
	(aarch64_internal_arg_pointer): Likewise.
	(aarch64_live_on_entry): Set the argument pointer for split-stack.
	(aarch64_split_stack_space_check): Likewise.
	(TARGET_ASM_FILE_END): New macro.
	(TARGET_INTERNAL_ARG_POINTER): Likewise.
	* gcc/config/aarch64/aarch64.h (aarch64_frame): Add
	split_stack_arg_pointer to setup the argument pointer when using
	split-stack.
	* gcc/config/aarch64/aarch64.md (UNSPEC_STACK_CHECK): New unspec.
	(UNSPECV_SPLIT_STACK_RETURN): Likewise.
	(split_stack_prologue): New expand.
	(split_stack_return): New insn.
	(split_stack_space_check): New expand.
	* gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start.
	* gcc/testsuite/gcc.dg/split-6.c (down): Likewise.

[1] https://sourceware.org/ml/libc-alpha/2016-07/msg00647.html

---
 gcc/ChangeLog                              |  33 ++++
 gcc/common/config/aarch64/aarch64-common.c |  16 +-
 gcc/config/aarch64/aarch64-linux.h         |   2 -
 gcc/config/aarch64/aarch64-protos.h        |   2 +
 gcc/config/aarch64/aarch64.c               | 230 +++++++++++++++++++++++-
 gcc/config/aarch64/aarch64.h               |   3 +
 gcc/config/aarch64/aarch64.md              |  32 ++++
 gcc/testsuite/gcc.dg/split-3.c             |   1 +
 gcc/testsuite/gcc.dg/split-6.c             |   1 +
 libgcc/ChangeLog                           |  11 ++
 libgcc/config.host                         |   1 +
 libgcc/config/aarch64/morestack-c.c        |  95 ++++++++++
 libgcc/config/aarch64/morestack.S          | 269 +++++++++++++++++++++++++++++
 libgcc/config/aarch64/t-stack-aarch64      |   3 +
 libgcc/generic-morestack.c                 |   1 +
 15 files changed, 696 insertions(+), 4 deletions(-)
 create mode 100644 libgcc/config/aarch64/morestack-c.c
 create mode 100644 libgcc/config/aarch64/morestack.S
 create mode 100644 libgcc/config/aarch64/t-stack-aarch64

-- 
2.7.4

Comments

Adhemerval Zanella Netto Aug. 5, 2016, 5:12 p.m. UTC | #1
Ping.

On 28/07/2016 17:36, Adhemerval Zanella wrote:
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>

> 

> This patch adds the split-stack support on aarch64 (PR #67877).  As for

> other ports this patch should be used along with glibc and gold support.

> 

> The support is done similar to other architectures: a __private_ss field is

> added on TCB in glibc, a target-specific __morestack implementation and

> helper functions are added in libgcc and compiler supported adjustments

> (split-stack prologue, va_start for argument handling).  I also plan to

> send the gold support to adjust stack allocation acrosss split-stack

> and default code calls.

> 

> Current approach is similar to powerpc one: at most 2 GB of stack allocation

> is support so stack adjustments can be done with 2 instructions (either just

> a movn plus nop or a movn followed by movk).  The morestack call is non

> standard with x10 hollding the requested stack pointer and x11 the required

> stack size to be copied.  Also function arguments on the old stack are

> accessed based on a value relative to the stack pointer, so x10 is used to

> hold theold stack value.  Unwinding is handled by a personality routine that

> knows how to find stack segments.

> 

> Split-stack prologue on function entry is as follow (this goes before the

> usual function prologue):

> 

> 	mrs    x9, tpidr_el0

> 	mov    x10, -<required stack allocation>

> 	nop/movk

> 	add    x10, sp, x10

> 	ldr    x9, [x9, 16]

> 	cmp    x10, x9

> 	b.cs    enough

> 	stp    x30, [sp, -16]mov    x11, <required arguments copy size>

> 	mov    x11, <required arguments copy size>

> 	bl     __morestack

> 	ldp    x30, [sp], 16

> 	ret

> enough:

> 	# usual function prologue, modified a little at the end to set up the

> 	# arg_pointer in x10, starts here.  The arg_pointer is initialized,

> 	# if it is used, with

> 	mov     x11, <required stack allocation>

> 	add     x10, x29, x11

> 	b.cs    function

> 	mov     x10, x28

> function:

> 

> Notes:

>  1. Even if a function does not allocate a stack frame, a split-stack prologue

>     is created.  It is to avoid issues with tail call for external symbols

>     which might require linker adjustment (libgo/runtime/go-varargs.c).

> 

>  2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr

>     to after the required stack calculation.

> 

>  3. Similar to powerpc, When the linker detects a call from split-stack to

>     non-split-stack code, it adds 16k (or more) to the value found in "allocate"

>     instructions (so non-split-stack code gets a larger stack).  The amount is

>     tunable by a linker option.  The edit means aarch64 does not need to

>     implement __morestack_non_split, necessary on x86 because insufficient

>     space is available there to edit the stack comparison code.  This feature

>     is only implemented in the GNU gold linker.

> 

>  4. AArch64 does not handle >2G stack initially and although it is possible

>     to implement it, limiting to 2G allows to materize the allocation with

>     only 2 instructions (mov + movk) and thus simplifying the linker

>     adjustments required.  Supporting multiple threads each requiring more

>     than 2G of stack is probably not that important, and likely to OOM at

>     run time.

> 

>  5. The TCB support on GLIBC is meant to be included in version 2.25 [1].

> 

> I tested bootstrapping on aarch64-linux-gnu and although still digesting

> the results I saw no regression.  All cgo tests are passing, although based

> on previous reports in other archs gold support should be present to avoid

> issues on split calling non-split code.

> 

> libgcc/ChangeLog:

> 

> 	* libgcc/config.host: Use t-stack and t-statck-aarch64 for

> 	aarch64*-*-linux.

> 	* libgcc/config/aarch64/morestack-c.c: New file.

> 	* libgcc/config/aarch64/morestack.S: Likewise.

> 	* libgcc/config/aarch64/t-stack-aarch64: Likewise.

> 	* libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific

> 	code.

> 

> gcc/ChangeLog:

> 

> 	* common/config/aarch64/aarch64-common.c

> 	(aarch64_supports_split_stack): New function.

> 	(TARGET_SUPPORTS_SPLIT_STACK): New macro.

> 	* gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove

> 	macro.

> 	* gcc/config/aarch64/aarch64-protos.h: Add

> 	aarch64_expand_split_stack_prologue and

> 	aarch64_split_stack_space_check.

> 	* gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the

> 	argument pointer (x10) for split-stack.

> 	(aarch64_expand_builtin_va_start): Use internal argument pointer

> 	instead of virtual_incoming_args_rtx.

> 	(aarch64_expand_split_stack_prologue): New function.

> 	(aarch64_file_end): Emit the split-stack note sections.

> 	(aarch64_internal_arg_pointer): Likewise.

> 	(aarch64_live_on_entry): Set the argument pointer for split-stack.

> 	(aarch64_split_stack_space_check): Likewise.

> 	(TARGET_ASM_FILE_END): New macro.

> 	(TARGET_INTERNAL_ARG_POINTER): Likewise.

> 	* gcc/config/aarch64/aarch64.h (aarch64_frame): Add

> 	split_stack_arg_pointer to setup the argument pointer when using

> 	split-stack.

> 	* gcc/config/aarch64/aarch64.md (UNSPEC_STACK_CHECK): New unspec.

> 	(UNSPECV_SPLIT_STACK_RETURN): Likewise.

> 	(split_stack_prologue): New expand.

> 	(split_stack_return): New insn.

> 	(split_stack_space_check): New expand.

> 	* gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start.

> 	* gcc/testsuite/gcc.dg/split-6.c (down): Likewise.

> 

> [1] https://sourceware.org/ml/libc-alpha/2016-07/msg00647.html

> 

> ---

>  gcc/ChangeLog                              |  33 ++++

>  gcc/common/config/aarch64/aarch64-common.c |  16 +-

>  gcc/config/aarch64/aarch64-linux.h         |   2 -

>  gcc/config/aarch64/aarch64-protos.h        |   2 +

>  gcc/config/aarch64/aarch64.c               | 230 +++++++++++++++++++++++-

>  gcc/config/aarch64/aarch64.h               |   3 +

>  gcc/config/aarch64/aarch64.md              |  32 ++++

>  gcc/testsuite/gcc.dg/split-3.c             |   1 +

>  gcc/testsuite/gcc.dg/split-6.c             |   1 +

>  libgcc/ChangeLog                           |  11 ++

>  libgcc/config.host                         |   1 +

>  libgcc/config/aarch64/morestack-c.c        |  95 ++++++++++

>  libgcc/config/aarch64/morestack.S          | 269 +++++++++++++++++++++++++++++

>  libgcc/config/aarch64/t-stack-aarch64      |   3 +

>  libgcc/generic-morestack.c                 |   1 +

>  15 files changed, 696 insertions(+), 4 deletions(-)

>  create mode 100644 libgcc/config/aarch64/morestack-c.c

>  create mode 100644 libgcc/config/aarch64/morestack.S

>  create mode 100644 libgcc/config/aarch64/t-stack-aarch64

> 

> diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c

> index 08e7959..01c3239 100644

> --- a/gcc/common/config/aarch64/aarch64-common.c

> +++ b/gcc/common/config/aarch64/aarch64-common.c

> @@ -106,6 +106,21 @@ aarch64_handle_option (struct gcc_options *opts,

>      }

>  }

>  

> +/* -fsplit-stack uses a TCB field available on glibc-2.25.  GLIBC also

> +   exports symbol, __tcb_private_ss, to signal it has the field available

> +   on TCB allocation.  This aims to prevent binaries linked against newer

> +   GLIBC to run on non-supported ones.  */

> +

> +static bool

> +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED,

> +			      struct gcc_options *opts ATTRIBUTE_UNUSED)

> +{

> +  return true;

> +}

> +

> +#undef TARGET_SUPPORTS_SPLIT_STACK

> +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack

> +

>  struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;

>  

>  /* An ISA extension in the co-processor and main instruction set space.  */

> @@ -342,4 +357,3 @@ aarch64_rewrite_mcpu (int argc, const char **argv)

>  }

>  

>  #undef AARCH64_CPU_NAME_LENGTH

> -

> diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h

> index 5fcaa59..ab3208b 100644

> --- a/gcc/config/aarch64/aarch64-linux.h

> +++ b/gcc/config/aarch64/aarch64-linux.h

> @@ -80,8 +80,6 @@

>      }						\

>    while (0)

>  

> -#define TARGET_ASM_FILE_END file_end_indicate_exec_stack

> -

>  /* Uninitialized common symbols in non-PIE executables, even with

>     strong definitions in dependent shared libraries, will resolve

>     to COPY relocated symbol in the executable.  See PR65780.  */

> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

> index 3cdd69b..82a4e11 100644

> --- a/gcc/config/aarch64/aarch64-protos.h

> +++ b/gcc/config/aarch64/aarch64-protos.h

> @@ -377,6 +377,8 @@ void aarch64_err_no_fpadvsimd (machine_mode, const char *);

>  void aarch64_expand_epilogue (bool);

>  void aarch64_expand_mov_immediate (rtx, rtx);

>  void aarch64_expand_prologue (void);

> +void aarch64_expand_split_stack_prologue (void);

> +void aarch64_split_stack_space_check (rtx, rtx);

>  void aarch64_expand_vector_init (rtx, rtx);

>  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,

>  				   const_tree, unsigned);

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index e56398a..2cf239f 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -3227,6 +3227,34 @@ aarch64_expand_prologue (void)

>  	  RTX_FRAME_RELATED_P (insn) = 1;

>  	}

>      }

> +

> +  if (flag_split_stack && offset)

> +    {

> +      /* Setup the argument pointer (x10) for -fsplit-stack code.  If

> +	 __morestack was called, it will left the arg pointer to the

> +	 old stack in x28.  Otherwise, the argument pointer is the top

> +	 of current frame.  */

> +      rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM);

> +      rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);

> +      rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);

> +      rtx x29 = gen_rtx_REG (Pmode, R29_REGNUM);

> +      rtx not_more = gen_label_rtx ();

> +      rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);

> +      rtx jump;

> +

> +      emit_move_insn (x11, GEN_INT (hard_fp_offset));

> +      emit_insn (gen_add3_insn (x10, x29, x11));

> +      jump = gen_rtx_IF_THEN_ELSE (VOIDmode,

> +				   gen_rtx_GEU (VOIDmode, cc_reg,

> +						const0_rtx),

> +				   gen_rtx_LABEL_REF (VOIDmode, not_more),

> +				   pc_rtx);

> +      jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));

> +      JUMP_LABEL (jump) = not_more;

> +      LABEL_NUSES (not_more) += 1;

> +      emit_move_insn (x10, x28);

> +      emit_label (not_more);

> +    }

>  }

>  

>  /* Return TRUE if we can use a simple_return insn.

> @@ -3303,6 +3331,7 @@ aarch64_expand_epilogue (bool for_sibcall)

>        offset = offset - fp_offset;

>      }

>  

> +

>    if (offset > 0)

>      {

>        unsigned reg1 = cfun->machine->frame.wb_candidate1;

> @@ -9648,7 +9677,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)

>    /* Emit code to initialize STACK, which points to the next varargs stack

>       argument.  CUM->AAPCS_STACK_SIZE gives the number of stack words used

>       by named arguments.  STACK is 8-byte aligned.  */

> -  t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);

> +  t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);

>    if (cum->aapcs_stack_size > 0)

>      t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * UNITS_PER_WORD);

>    t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);

> @@ -14010,6 +14039,196 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,

>      }

>  }

>  

> +/* -fsplit-stack support.  */

> +

> +/* A SYMBOL_REF for __morestack.  */

> +static GTY(()) rtx morestack_ref;

> +

> +/* Emit -fsplit-stack prologue, which goes before the regular function

> +   prologue.  */

> +void

> +aarch64_expand_split_stack_prologue (void)

> +{

> +  HOST_WIDE_INT frame_size, args_size;

> +  rtx_code_label *ok_label = NULL;

> +  rtx mem, ssvalue, compare, jump, insn, call_fusage;

> +  rtx reg11, reg30, temp;

> +  rtx new_cfa, cfi_ops = NULL;

> +  /* Offset from thread pointer to __private_ss.  */

> +  int psso = 0x10;

> +  int ninsn;

> +

> +  gcc_assert (flag_split_stack && reload_completed);

> +

> +  /* It limits total maximum stack allocation on 2G so its value can be

> +     materialized with two instruction at most (movn/movk).  It might be

> +     used by the linker to add some extra space for split calling non split

> +     stack functions.  */

> +  frame_size = cfun->machine->frame.frame_size;

> +  if (frame_size > ((HOST_WIDE_INT) 1 << 31))

> +    {

> +      sorry ("Stack frame larger than 2G is not supported for -fsplit-stack");

> +      return;

> +    }

> +

> +  if (morestack_ref == NULL_RTX)

> +    {

> +      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");

> +      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL

> +					   | SYMBOL_FLAG_FUNCTION);

> +    }

> +

> +  /* Load __private_ss from TCB.  */

> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);

> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));

> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));

> +  emit_move_insn (ssvalue, mem);

> +

> +  temp = gen_rtx_REG (Pmode, R10_REGNUM);

> +

> +  /* Always emit two insns to calculate the requested stack, so the linker

> +     can edit them when adjusting size for calling non-split-stack code.  */

> +  ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true,

> +					  Pmode);

> +  gcc_assert (ninsn == 1 || ninsn == 2);

> +  if (ninsn == 1)

> +    emit_insn (gen_nop ());

> +  emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp));

> +

> +  compare = aarch64_gen_compare_reg (LT, temp, ssvalue);

> +

> +  /* Jump to __morestack call if current __private_ss is not suffice.  */

> +  ok_label = gen_label_rtx ();

> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,

> +			       gen_rtx_GEU (VOIDmode, compare, const0_rtx),

> +			       gen_rtx_LABEL_REF (VOIDmode, ok_label),

> +			       pc_rtx);

> +  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));

> +  JUMP_LABEL (jump) = ok_label;

> +

> +  /* Mark the jump as very likely to be taken.  */

> +  add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE / 100 - 1);

> +

> +  call_fusage = NULL_RTX;

> +

> +  /* Call __morestack with a non-standard call procedure: x10 will hold

> +     the requested stack pointer and x11 the required stack size to be

> +     copied.  */

> +  args_size = crtl->args.size >= 0 ? crtl->args.size : 0;

> +  reg11 = gen_rtx_REG (DImode, R11_REGNUM);

> +  emit_move_insn (reg11, GEN_INT (args_size));

> +  use_reg (&call_fusage, reg11);

> +

> +  /* Set up a minimum frame pointer to call __morestack.  The SP is not

> +     save on x29 prior so in __morestack x29 points to the called SP.  */

> +  reg30 = gen_rtx_REG (Pmode, R30_REGNUM);

> +  aarch64_pushwb_single_reg (Pmode, R30_REGNUM, 16);

> +

> +  insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref),

> +				   const0_rtx, const0_rtx));

> +  add_function_usage_to (insn, call_fusage);

> +

> +  cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops);

> +  mem = plus_constant (Pmode, stack_pointer_rtx, 16);

> +  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, cfi_ops);

> +

> +  mem = gen_rtx_POST_MODIFY (Pmode, stack_pointer_rtx, mem);

> +  mem = gen_rtx_MEM (DImode, mem);

> +  insn = emit_move_insn (reg30, mem);

> +

> +  new_cfa = stack_pointer_rtx;

> +  new_cfa = plus_constant (Pmode, new_cfa, 16);

> +  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);

> +  REG_NOTES (insn) = cfi_ops;

> +  RTX_FRAME_RELATED_P (insn) = 1;

> +

> +  emit_insn (gen_split_stack_return ());

> +

> +  emit_label (ok_label);

> +  LABEL_NUSES (ok_label) = 1;

> +}

> +

> +/* Implement TARGET_ASM_FILE_END.  */

> +static void

> +aarch64_file_end (void)

> +{

> +  file_end_indicate_exec_stack ();

> +

> +  if (flag_split_stack)

> +    file_end_indicate_split_stack ();

> +}

> +

> +/* Return the internal arg pointer used for function incoming arguments.  */

> +static rtx

> +aarch64_internal_arg_pointer (void)

> +{

> +  if (flag_split_stack)

> +    {

> +      if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX)

> +	{

> +	  rtx pat;

> +

> +	  cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode);

> +	  REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1;

> +

> +	  /* Put the pseudo initialization right after the note at the

> +	     beginning of the function.  */

> +	  pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer,

> +			     gen_rtx_REG (Pmode, R10_REGNUM));

> +	  push_topmost_sequence ();

> +	  emit_insn_after (pat, get_insns ());

> +	  pop_topmost_sequence ();

> +	}

> +      return plus_constant (Pmode, cfun->machine->frame.split_stack_arg_pointer,

> +			    FIRST_PARM_OFFSET (current_function_decl));

> +    }

> +  return virtual_incoming_args_rtx;

> +}

> +

> +static void

> +aarch64_live_on_entry (bitmap regs)

> +{

> +  if (flag_split_stack)

> +    bitmap_set_bit (regs, R10_REGNUM);

> +}

> +

> +/* Emit -fsplit-stack dynamic stack allocation space check.  */

> +

> +void

> +aarch64_split_stack_space_check (rtx size, rtx label)

> +{

> +  rtx mem, ssvalue, compare, jump;

> +  rtx requested = gen_reg_rtx (Pmode);

> +  /* Offset from thread pointer to __private_ss.  */

> +  int psso = 0x10;

> +

> +  /* Load __private_ss from TCB.  */

> +  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);

> +  emit_insn (gen_aarch64_load_tp_hard (ssvalue));

> +  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));

> +  emit_move_insn (ssvalue, mem);

> +

> +  /* And compare it with frame pointer plus required stack.  */

> +  if (CONST_INT_P (size))

> +     emit_insn (gen_add3_insn (requested, stack_pointer_rtx,

> +			       GEN_INT (-INTVAL (size))));

> +  else

> +    {

> +      size = force_reg (Pmode, size);

> +      emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx,

> +						size));

> +    }

> +

> +  /* Jump to __morestack call if current __private_ss is not suffice.  */

> +  compare = aarch64_gen_compare_reg (LT, requested, ssvalue);

> +  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,

> +			       gen_rtx_GEU (VOIDmode, compare, const0_rtx),

> +			       gen_rtx_LABEL_REF (VOIDmode, label),

> +			       pc_rtx);

> +  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));

> +  JUMP_LABEL (jump) = label;

> +}

> +

>  #undef TARGET_ADDRESS_COST

>  #define TARGET_ADDRESS_COST aarch64_address_cost

>  

> @@ -14036,6 +14255,9 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,

>  #undef TARGET_ASM_FILE_START

>  #define TARGET_ASM_FILE_START aarch64_start_file

>  

> +#undef TARGET_ASM_FILE_END

> +#define TARGET_ASM_FILE_END aarch64_file_end

> +

>  #undef TARGET_ASM_OUTPUT_MI_THUNK

>  #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk

>  

> @@ -14118,6 +14340,12 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,

>  #undef TARGET_FRAME_POINTER_REQUIRED

>  #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required

>  

> +#undef TARGET_EXTRA_LIVE_ON_ENTRY

> +#define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_live_on_entry

> +

> +#undef TARGET_INTERNAL_ARG_POINTER

> +#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer

> +

>  #undef TARGET_GIMPLE_FOLD_BUILTIN

>  #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin

>  

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h

> index 1915980..0ba3172 100644

> --- a/gcc/config/aarch64/aarch64.h

> +++ b/gcc/config/aarch64/aarch64.h

> @@ -570,6 +570,9 @@ struct GTY (()) aarch64_frame

>  

>    HOST_WIDE_INT frame_size;

>  

> +  /* Alternative internal arg pointer for -fsplit-stack.  */

> +  rtx split_stack_arg_pointer;

> +

>    bool laid_out;

>  };

>  

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

> index 9e87a0d..8992608 100644

> --- a/gcc/config/aarch64/aarch64.md

> +++ b/gcc/config/aarch64/aarch64.md

> @@ -130,6 +130,7 @@

>      UNSPEC_VSTRUCTDUMMY

>      UNSPEC_SP_SET

>      UNSPEC_SP_TEST

> +    UNSPEC_STACK_CHECK

>      UNSPEC_RSQRT

>      UNSPEC_RSQRTE

>      UNSPEC_RSQRTS

> @@ -144,6 +145,7 @@

>      UNSPECV_SET_FPSR		; Represent assign of FPSR content.

>      UNSPECV_BLOCKAGE		; Represent a blockage

>      UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.

> +    UNSPECV_SPLIT_STACK_RETURN  ; Represent a camouflaged return

>    ]

>  )

>  

> @@ -5394,3 +5396,33 @@

>  

>  ;; ldp/stp peephole patterns

>  (include "aarch64-ldpstp.md")

> +

> +;; Handle -fsplit-stack

> +(define_expand "split_stack_prologue"

> +  [(const_int 0)]

> +  ""

> +{

> +  aarch64_expand_split_stack_prologue ();

> +  DONE;

> +})

> +

> +;; A return instruction which the middle-end does not see.

> +(define_insn "split_stack_return"

> +  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_RETURN)]

> +  ""

> +  "ret"

> +  [(set_attr "type" "branch")])

> +

> +;; If there are operand 0 bytes available on the stack, jump to

> +;; operand 1.

> +(define_expand "split_stack_space_check"

> +  [(set (match_dup 2) (compare:CC (match_dup 3) (match_dup 2)))

> +   (set (pc) (if_then_else

> +	      (geu (match_dup 4) (const_int 0))

> +	      (label_ref (match_operand 1))

> +	      (pc)))]

> +  ""

> +{

> +  aarch64_split_stack_space_check (operands[0], operands[1]);

> +  DONE;

> +})

> diff --git a/gcc/testsuite/gcc.dg/split-3.c b/gcc/testsuite/gcc.dg/split-3.c

> index 64bbb8c..5ba7616 100644

> --- a/gcc/testsuite/gcc.dg/split-3.c

> +++ b/gcc/testsuite/gcc.dg/split-3.c

> @@ -40,6 +40,7 @@ down (int i, ...)

>        || va_arg (ap, int) != 9

>        || va_arg (ap, int) != 10)

>      abort ();

> +  va_end (ap);

>  

>    if (i > 0)

>      {

> diff --git a/gcc/testsuite/gcc.dg/split-6.c b/gcc/testsuite/gcc.dg/split-6.c

> index b32cf8d..b3016ba 100644

> --- a/gcc/testsuite/gcc.dg/split-6.c

> +++ b/gcc/testsuite/gcc.dg/split-6.c

> @@ -37,6 +37,7 @@ down (int i, ...)

>        || va_arg (ap, int) != 9

>        || va_arg (ap, int) != 10)

>      abort ();

> +  va_end (ap);

>  

>    if (i > 0)

>      {

> diff --git a/libgcc/config.host b/libgcc/config.host

> index 4ccf25d..18f49f1 100644

> --- a/libgcc/config.host

> +++ b/libgcc/config.host

> @@ -336,6 +336,7 @@ aarch64*-*-linux*)

>  	md_unwind_header=aarch64/linux-unwind.h

>  	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"

>  	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"

> +	tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64"

>  	;;

>  alpha*-*-linux*)

>  	tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm alpha/t-linux"

> diff --git a/libgcc/config/aarch64/morestack-c.c b/libgcc/config/aarch64/morestack-c.c

> new file mode 100644

> index 0000000..8df7895

> --- /dev/null

> +++ b/libgcc/config/aarch64/morestack-c.c

> @@ -0,0 +1,95 @@

> +/* AArch64 support for -fsplit-stack.

> + * Copyright (C) 2016 Free Software Foundation, Inc.

> + *

> + * This file is free software; you can redistribute it and/or modify it

> + * under the terms of the GNU General Public License as published by the

> + * Free Software Foundation; either version 3, or (at your option) any

> + * later version.

> + *

> + * This file is distributed in the hope that it will be useful, but

> + * WITHOUT ANY WARRANTY; without even the implied warranty of

> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU

> + * General Public License for more details.

> + *

> + * Under Section 7 of GPL version 3, you are granted additional

> + * permissions described in the GCC Runtime Library Exception, version

> + * 3.1, as published by the Free Software Foundation.

> + *

> + * You should have received a copy of the GNU General Public License and

> + * a copy of the GCC Runtime Library Exception along with this program;

> + * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see

> + * <http://www.gnu.org/licenses/>.

> + */

> +

> +#ifndef inhibit_libc

> +

> +#include <stdint.h>

> +#include <stdlib.h>

> +#include <stddef.h>

> +#include "generic-morestack.h"

> +

> +/* This is based on GLIBC definition (version 2.24).  There is no need to

> +   keep it sync since new fields are added on the end of structure and do

> +   not change the '__private_ss' layout.  */

> +typedef struct

> +{

> +  void *dtv;

> +  void *private;

> +  void *__private_ss;

> +} tcbhead_t;

> +

> +#define INITIAL_STACK_SIZE  0x4000

> +#define BACKOFF             0x1000

> +

> +void __generic_morestack_set_initial_sp (void *sp, size_t len);

> +void *__morestack_get_guard (void);

> +void __morestack_set_guard (void *);

> +void *__morestack_make_guard (void *stack, size_t size);

> +void __morestack_load_mmap (void);

> +

> +/* We declare is as weak so it fails either at stack linking or

> +   at runtime if the GLIBC does not have the required TCB field.  */

> +extern void __tcb_private_ss (void) __attribute__ ((weak));

> +

> +/* Initialize the stack guard when the program starts or when a new

> +   thread.  This is called from a constructor using ctors section.  */

> +void

> +__stack_split_initialize (void)

> +{

> +  __tcb_private_ss ();

> +

> +  register void* sp __asm__ ("sp");

> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());

> +  tcb->__private_ss = (void*)((uintptr_t)sp - INITIAL_STACK_SIZE);

> +  return __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE);

> +}

> +

> +/* Return current __private_ss.  */

> +void *

> +__morestack_get_guard (void)

> +{

> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());

> +  return tcb->__private_ss;

> +}

> +

> +/* Set __private_ss to ptr.  */

> +void

> +__morestack_set_guard (void *ptr)

> +{

> +  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());

> +  tcb->__private_ss = ptr;

> +}

> +

> +/* Return the stack guard value for given stack.  */

> +void *

> +__morestack_make_guard (void *stack, size_t size)

> +{

> +  return (void*)((uintptr_t)stack - size + BACKOFF);

> +}

> +

> +/* Make __stack_split_initialize a high priority constructor.  */

> +static void (*const ctors []) 

> +  __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *))))

> +  = { __stack_split_initialize, __morestack_load_mmap };

> +

> +#endif /* !defined (inhibit_libc) */

> diff --git a/libgcc/config/aarch64/morestack.S b/libgcc/config/aarch64/morestack.S

> new file mode 100644

> index 0000000..5bbac4c

> --- /dev/null

> +++ b/libgcc/config/aarch64/morestack.S

> @@ -0,0 +1,269 @@

> +# AArch64 support for -fsplit-stack.

> +# Copyright (C) 2016 Free Software Foundation, Inc.

> +

> +# This file is part of GCC.

> +

> +# GCC is free software; you can redistribute it and/or modify it under

> +# the terms of the GNU General Public License as published by the Free

> +# Software Foundation; either version 3, or (at your option) any later

> +# version.

> +

> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY

> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or

> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

> +# for more details.

> +

> +# Under Section 7 of GPL version 3, you are granted additional

> +# permissions described in the GCC Runtime Library Exception, version

> +# 3.1, as published by the Free Software Foundation.

> +

> +# You should have received a copy of the GNU General Public License and

> +# a copy of the GCC Runtime Library Exception along with this program;

> +# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see

> +# <http://www.gnu.org/licenses/>.

> +

> +/* Define an entry point visible from C.  */

> +#define ENTRY(name)						\

> +  .globl name;							\

> +  .type name,%function;						\

> +  .align 4;							\

> +  name##:

> +

> +#define END(name)						\

> +  .size name,.-name

> +

> +

> +#define MORESTACK_FRAMESIZE	112

> +/* Offset based on function stack to get its argument from __morestack

> +   frame.  */

> +#define STACKFRAME_BASE		(-MORESTACK_FRAMESIZE - 16)

> +/* Offset from __morestack frame where the new stack size is saved and

> +   passed to __generic_morestack.  */

> +#define NEWSTACK_SAVE		88

> +/* Offset from __morestack frame where the arguments size saved and

> +   passed to __generic_morestack.  */

> +#define ARGS_SIZE_SAVE		80

> +

> +#define BACKOFF			0x2000

> +# Large excess allocated when calling non-split-stack code.

> +#define NON_SPLIT_STACK		0x100000

> +

> +# TCB offset of __private_ss

> +#define TCB_PRIVATE_SS		#16

> +

> +	.text

> +ENTRY(__morestack_non_split)

> +	.cfi_startproc

> +# We use a cleanup to restore the tcbhead_t.__private_ss if

> +# an exception is thrown through this code.

> +	add	x11, x11, NON_SPLIT_STACK

> +	.cfi_endproc

> +END(__morestack_non_split)

> +# Fall through into __morestack

> +

> +# This function is called with non-standard calling conventions.  On entry

> +# x10 is the requested stack pointer.  The split-stack prologue is in the

> +# form:

> +#

> +#	mrs    x9, tpidr_el0

> +#	mov    x10, -<required stack allocation>

> +#	add    x10, sp, x10

> +#	ldr    x9, [x9, 16]

> +#	cmp    x10, x9

> +#	bcs    enough

> +#	stp    x30, [sp, -16]!

> +#	mov    x11, <required arguments copy size>

> +#	bl     __morestack

> +#	ldp    x30, [sp], 16

> +#	ret

> +# enough:

> +#

> +# The normal function prologue follows here, with a small addition at the

> +# end to set up the argument pointer.  The argument pointer is setup with:

> +#

> +#	mov     x11, <required stack allocation>

> +#	sub	sp, sp, <required stack allocation>

> +#	add	x10, x29, x11

> +#	b.cs    function:

> +#	mov     x10, x28

> +# function:

> +#

> +# Note that all argument parameter registers and the x10 (the argument

> +# pointer) are saved.  The N bit is also saved and restores to indicate

> +# that the function is called (so the prologue addition can set up the

> +# argument pointer correctly).

> +

> +ENTRY(__morestack)

> +.LFB1:

> +	.cfi_startproc

> +

> +#ifdef __PIC__

> +	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0

> +	.cfi_lsda 0x1b,.LLSDA1

> +#else

> +	.cfi_personality 0x3,__gcc_personality_v0

> +	.cfi_lsda 0x3,.LLSDA1

> +#endif

> +

> +	# Calculate requested stack size.

> +	sub	x12, sp, x10

> +	# Save parameters

> +	stp	x29, x30, [sp, -MORESTACK_FRAMESIZE]!

> +	.cfi_def_cfa_offset MORESTACK_FRAMESIZE

> +	.cfi_offset 29, -MORESTACK_FRAMESIZE

> +	.cfi_offset 30, -MORESTACK_FRAMESIZE+8

> +	add	x29, sp, 0

> +	.cfi_def_cfa_register 29

> +	# Adjust the requested stack size for the frame pointer save.

> +	add	x12, x12, 16

> +	stp	x0, x1, [sp, 16]

> +	stp	x2, x3, [sp, 32]

> +	add	x12, x12, BACKOFF

> +	stp	x4, x5, [sp, 48]

> +	stp	x6, x7, [sp, 64]

> +	stp	x11, x12, [sp, 80]

> +	str	x28, [sp, 96]

> +

> +	# Setup on x28 the function initial frame pointer.  Its value will

> +	# copied to function argument pointer.

> +	add	x28, sp, MORESTACK_FRAMESIZE + 16

> +

> +	# void __morestack_block_signals (void)

> +	bl	__morestack_block_signals

> +

> +	# void *__generic_morestack (size_t *pframe_size,

> +	#			     void *old_stack,

> +	#			     size_t param_size)

> +	# pframe_size: is the size of the required stack frame (the function

> +	#	       amount of space remaining on the allocated stack).

> +	# old_stack: points at the parameters the old stack

> +	# param_size: size in bytes of parameters to copy to the new stack.

> +	add	x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE

> +	mov	x1, x28

> +	ldr	x2, [sp, ARGS_SIZE_SAVE]

> +	bl	__generic_morestack

> +

> +	# Start using new stack

> +	stp	x29, x30, [x0, -16]!

> +	mov	sp, x0

> +

> +	# Set __private_ss stack guard for the new stack.

> +	ldr	x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]

> +	add	x0, x0, BACKOFF

> +	sub	x0, x0, 16

> +	sub	x0, x0, x9

> +.LEHB0:

> +	mrs	x1, tpidr_el0

> +	str	x0, [x1, TCB_PRIVATE_SS]

> +

> +	# void __morestack_unblock_signals (void)

> +	bl	__morestack_unblock_signals

> +

> +	# Set up for a call to the target function.

> +	#ldp	x29, x30, [x28, STACKFRAME_BASE]

> +	ldr	x30, [x28, STACKFRAME_BASE + 8]

> +	ldp	x0, x1, [x28, STACKFRAME_BASE + 16]

> +	ldp	x2, x3, [x28, STACKFRAME_BASE + 32]

> +	ldp	x4, x5, [x28, STACKFRAME_BASE + 48]

> +	ldp	x6, x7, [x28, STACKFRAME_BASE + 64]

> +	add	x9, x30, 8

> +	cmp	x30, x9

> +	blr	x9

> +

> +	stp	x0, x1, [x28, STACKFRAME_BASE + 16]

> +	stp	x2, x3, [x28, STACKFRAME_BASE + 32]

> +	stp	x4, x5, [x28, STACKFRAME_BASE + 48]

> +	stp	x6, x7, [x28, STACKFRAME_BASE + 64]

> +

> +	bl	__morestack_block_signals

> +

> +	# void *__generic_releasestack (size_t *pavailable)

> +	add	x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE

> +	bl	__generic_releasestack

> +

> +	# Reset __private_ss stack guard to value for old stack

> +	ldr	x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]

> +	add	x0, x0, BACKOFF

> +	sub	x0, x0, x9

> +

> +	# Update TCB split stack field

> +.LEHE0:

> +	mrs	x1, tpidr_el0

> +	str	x0, [x1, TCB_PRIVATE_SS]

> +

> +	bl __morestack_unblock_signals

> +

> +	# Use old stack again.

> +	sub	sp, x28, 16

> +

> +	ldp	x0, x1, [x28, STACKFRAME_BASE + 16]

> +	ldp	x2, x3, [x28, STACKFRAME_BASE + 32]

> +	ldp	x4, x5, [x28, STACKFRAME_BASE + 48]

> +	ldp	x6, x7, [x28, STACKFRAME_BASE + 64]

> +	ldp	x29, x30, [x28, STACKFRAME_BASE]

> +	ldr	x28, [x28, STACKFRAME_BASE + 96]

> +

> +	.cfi_remember_state

> +	.cfi_restore 30

> +	.cfi_restore 29

> +	.cfi_def_cfa 31, 0

> +

> +	ret

> +

> +# This is the cleanup code called by the stack unwinder when

> +# unwinding through code between .LEHB0 and .LEHE0 above.

> +cleanup:

> +	.cfi_restore_state

> +	str	x0, [x28, STACKFRAME_BASE]

> +	# size_t __generic_findstack (void *stack)

> +	mov	x0, x28

> +	bl	__generic_findstack

> +	sub	x0, x28, x0

> +	add	x0, x0, BACKOFF

> +	# Restore tcbhead_t.__private_ss

> +	mrs	x1, tpidr_el0

> +	str	x0, [x1, TCB_PRIVATE_SS]

> +	ldr	x0, [x28, STACKFRAME_BASE]

> +	b	_Unwind_Resume

> +        .cfi_endproc

> +END(__morestack)

> +

> +	.section .gcc_except_table,"a",@progbits

> +	.align 4

> +.LLSDA1:

> +	# @LPStart format (omit)

> +        .byte   0xff

> +	# @TType format (omit)

> +        .byte   0xff

> +	# Call-site format (uleb128)

> +        .byte   0x1

> +	# Call-site table length

> +        .uleb128 .LLSDACSE1-.LLSDACSB1

> +.LLSDACSB1:

> +	# region 0 start

> +        .uleb128 .LEHB0-.LFB1

> +	# length

> +        .uleb128 .LEHE0-.LEHB0

> +	# landing pad

> +        .uleb128 cleanup-.LFB1

> +	# no action (ie a cleanup)

> +        .uleb128 0

> +.LLSDACSE1:

> +

> +

> +	.global __gcc_personality_v0

> +#ifdef __PIC__

> +	# Build a position independent reference to the personality function.

> +	.hidden DW.ref.__gcc_personality_v0

> +	.weak   DW.ref.__gcc_personality_v0

> +	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat

> +	.type   DW.ref.__gcc_personality_v0, @object

> +	.align 3

> +DW.ref.__gcc_personality_v0:

> +	.size   DW.ref.__gcc_personality_v0, 8

> +	.quad   __gcc_personality_v0

> +#endif

> +

> +	.section .note.GNU-stack,"",@progbits

> +	.section .note.GNU-split-stack,"",@progbits

> +	.section .note.GNU-no-split-stack,"",@progbits

> diff --git a/libgcc/config/aarch64/t-stack-aarch64 b/libgcc/config/aarch64/t-stack-aarch64

> new file mode 100644

> index 0000000..4babb4e

> --- /dev/null

> +++ b/libgcc/config/aarch64/t-stack-aarch64

> @@ -0,0 +1,3 @@

> +# Makefile fragment to support -fsplit-stack for aarch64.

> +LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \

> +	      $(srcdir)/config/aarch64/morestack-c.c

> diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c

> index b8eec4e..fe7092b 100644

> --- a/libgcc/generic-morestack.c

> +++ b/libgcc/generic-morestack.c

> @@ -943,6 +943,7 @@ __splitstack_find (void *segment_arg, void *sp, size_t *len,

>        nsp -= 2 * 160;

>  #elif defined __s390__

>        nsp -= 2 * 96;

> +#elif defined __aarch64__

>  #else

>  #error "unrecognized target"

>  #endif

>
diff mbox

Patch

diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 08e7959..01c3239 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -106,6 +106,21 @@  aarch64_handle_option (struct gcc_options *opts,
     }
 }
 
+/* -fsplit-stack uses a TCB field available on glibc-2.25.  GLIBC also
+   exports symbol, __tcb_private_ss, to signal it has the field available
+   on TCB allocation.  This aims to prevent binaries linked against newer
+   GLIBC to run on non-supported ones.  */
+
+static bool
+aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED,
+			      struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  return true;
+}
+
+#undef TARGET_SUPPORTS_SPLIT_STACK
+#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack
+
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -342,4 +357,3 @@  aarch64_rewrite_mcpu (int argc, const char **argv)
 }
 
 #undef AARCH64_CPU_NAME_LENGTH
-
diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h
index 5fcaa59..ab3208b 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -80,8 +80,6 @@ 
     }						\
   while (0)
 
-#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
-
 /* Uninitialized common symbols in non-PIE executables, even with
    strong definitions in dependent shared libraries, will resolve
    to COPY relocated symbol in the executable.  See PR65780.  */
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 3cdd69b..82a4e11 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -377,6 +377,8 @@  void aarch64_err_no_fpadvsimd (machine_mode, const char *);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx);
 void aarch64_expand_prologue (void);
+void aarch64_expand_split_stack_prologue (void);
+void aarch64_split_stack_space_check (rtx, rtx);
 void aarch64_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 				   const_tree, unsigned);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e56398a..2cf239f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3227,6 +3227,34 @@  aarch64_expand_prologue (void)
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
     }
+
+  if (flag_split_stack && offset)
+    {
+      /* Setup the argument pointer (x10) for -fsplit-stack code.  If
+	 __morestack was called, it will left the arg pointer to the
+	 old stack in x28.  Otherwise, the argument pointer is the top
+	 of current frame.  */
+      rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM);
+      rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM);
+      rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM);
+      rtx x29 = gen_rtx_REG (Pmode, R29_REGNUM);
+      rtx not_more = gen_label_rtx ();
+      rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+      rtx jump;
+
+      emit_move_insn (x11, GEN_INT (hard_fp_offset));
+      emit_insn (gen_add3_insn (x10, x29, x11));
+      jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+				   gen_rtx_GEU (VOIDmode, cc_reg,
+						const0_rtx),
+				   gen_rtx_LABEL_REF (VOIDmode, not_more),
+				   pc_rtx);
+      jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+      JUMP_LABEL (jump) = not_more;
+      LABEL_NUSES (not_more) += 1;
+      emit_move_insn (x10, x28);
+      emit_label (not_more);
+    }
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -3303,6 +3331,7 @@  aarch64_expand_epilogue (bool for_sibcall)
       offset = offset - fp_offset;
     }
 
+
   if (offset > 0)
     {
       unsigned reg1 = cfun->machine->frame.wb_candidate1;
@@ -9648,7 +9677,7 @@  aarch64_expand_builtin_va_start (tree valist, rtx nextarg ATTRIBUTE_UNUSED)
   /* Emit code to initialize STACK, which points to the next varargs stack
      argument.  CUM->AAPCS_STACK_SIZE gives the number of stack words used
      by named arguments.  STACK is 8-byte aligned.  */
-  t = make_tree (TREE_TYPE (stack), virtual_incoming_args_rtx);
+  t = make_tree (TREE_TYPE (stack), crtl->args.internal_arg_pointer);
   if (cum->aapcs_stack_size > 0)
     t = fold_build_pointer_plus_hwi (t, cum->aapcs_stack_size * UNITS_PER_WORD);
   t = build2 (MODIFY_EXPR, TREE_TYPE (stack), stack, t);
@@ -14010,6 +14039,196 @@  aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
     }
 }
 
+/* -fsplit-stack support.  */
+
+/* A SYMBOL_REF for __morestack.  */
+static GTY(()) rtx morestack_ref;
+
+/* Emit -fsplit-stack prologue, which goes before the regular function
+   prologue.  */
+void
+aarch64_expand_split_stack_prologue (void)
+{
+  HOST_WIDE_INT frame_size, args_size;
+  rtx_code_label *ok_label = NULL;
+  rtx mem, ssvalue, compare, jump, insn, call_fusage;
+  rtx reg11, reg30, temp;
+  rtx new_cfa, cfi_ops = NULL;
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = 0x10;
+  int ninsn;
+
+  gcc_assert (flag_split_stack && reload_completed);
+
+  /* It limits total maximum stack allocation on 2G so its value can be
+     materialized with two instruction at most (movn/movk).  It might be
+     used by the linker to add some extra space for split calling non split
+     stack functions.  */
+  frame_size = cfun->machine->frame.frame_size;
+  if (frame_size > ((HOST_WIDE_INT) 1 << 31))
+    {
+      sorry ("Stack frame larger than 2G is not supported for -fsplit-stack");
+      return;
+    }
+
+  if (morestack_ref == NULL_RTX)
+    {
+      morestack_ref = gen_rtx_SYMBOL_REF (Pmode, "__morestack");
+      SYMBOL_REF_FLAGS (morestack_ref) |= (SYMBOL_FLAG_LOCAL
+					   | SYMBOL_FLAG_FUNCTION);
+    }
+
+  /* Load __private_ss from TCB.  */
+  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
+  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
+  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
+  emit_move_insn (ssvalue, mem);
+
+  temp = gen_rtx_REG (Pmode, R10_REGNUM);
+
+  /* Always emit two insns to calculate the requested stack, so the linker
+     can edit them when adjusting size for calling non-split-stack code.  */
+  ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), true,
+					  Pmode);
+  gcc_assert (ninsn == 1 || ninsn == 2);
+  if (ninsn == 1)
+    emit_insn (gen_nop ());
+  emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp));
+
+  compare = aarch64_gen_compare_reg (LT, temp, ssvalue);
+
+  /* Jump to __morestack call if current __private_ss is not suffice.  */
+  ok_label = gen_label_rtx ();
+  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+			       gen_rtx_GEU (VOIDmode, compare, const0_rtx),
+			       gen_rtx_LABEL_REF (VOIDmode, ok_label),
+			       pc_rtx);
+  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+  JUMP_LABEL (jump) = ok_label;
+
+  /* Mark the jump as very likely to be taken.  */
+  add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE / 100 - 1);
+
+  call_fusage = NULL_RTX;
+
+  /* Call __morestack with a non-standard call procedure: x10 will hold
+     the requested stack pointer and x11 the required stack size to be
+     copied.  */
+  args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
+  reg11 = gen_rtx_REG (DImode, R11_REGNUM);
+  emit_move_insn (reg11, GEN_INT (args_size));
+  use_reg (&call_fusage, reg11);
+
+  /* Set up a minimum frame pointer to call __morestack.  The SP is not
+     save on x29 prior so in __morestack x29 points to the called SP.  */
+  reg30 = gen_rtx_REG (Pmode, R30_REGNUM);
+  aarch64_pushwb_single_reg (Pmode, R30_REGNUM, 16);
+
+  insn = emit_call_insn (gen_call (gen_rtx_MEM (DImode, morestack_ref),
+				   const0_rtx, const0_rtx));
+  add_function_usage_to (insn, call_fusage);
+
+  cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg30, cfi_ops);
+  mem = plus_constant (Pmode, stack_pointer_rtx, 16);
+  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, stack_pointer_rtx, cfi_ops);
+
+  mem = gen_rtx_POST_MODIFY (Pmode, stack_pointer_rtx, mem);
+  mem = gen_rtx_MEM (DImode, mem);
+  insn = emit_move_insn (reg30, mem);
+
+  new_cfa = stack_pointer_rtx;
+  new_cfa = plus_constant (Pmode, new_cfa, 16);
+  cfi_ops = alloc_reg_note (REG_CFA_DEF_CFA, new_cfa, cfi_ops);
+  REG_NOTES (insn) = cfi_ops;
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  emit_insn (gen_split_stack_return ());
+
+  emit_label (ok_label);
+  LABEL_NUSES (ok_label) = 1;
+}
+
+/* Implement TARGET_ASM_FILE_END.  */
+static void
+aarch64_file_end (void)
+{
+  file_end_indicate_exec_stack ();
+
+  if (flag_split_stack)
+    file_end_indicate_split_stack ();
+}
+
+/* Return the internal arg pointer used for function incoming arguments.  */
+static rtx
+aarch64_internal_arg_pointer (void)
+{
+  if (flag_split_stack)
+    {
+      if (cfun->machine->frame.split_stack_arg_pointer == NULL_RTX)
+	{
+	  rtx pat;
+
+	  cfun->machine->frame.split_stack_arg_pointer = gen_reg_rtx (Pmode);
+	  REG_POINTER (cfun->machine->frame.split_stack_arg_pointer) = 1;
+
+	  /* Put the pseudo initialization right after the note at the
+	     beginning of the function.  */
+	  pat = gen_rtx_SET (cfun->machine->frame.split_stack_arg_pointer,
+			     gen_rtx_REG (Pmode, R10_REGNUM));
+	  push_topmost_sequence ();
+	  emit_insn_after (pat, get_insns ());
+	  pop_topmost_sequence ();
+	}
+      return plus_constant (Pmode, cfun->machine->frame.split_stack_arg_pointer,
+			    FIRST_PARM_OFFSET (current_function_decl));
+    }
+  return virtual_incoming_args_rtx;
+}
+
+static void
+aarch64_live_on_entry (bitmap regs)
+{
+  if (flag_split_stack)
+    bitmap_set_bit (regs, R10_REGNUM);
+}
+
+/* Emit -fsplit-stack dynamic stack allocation space check.  */
+
+void
+aarch64_split_stack_space_check (rtx size, rtx label)
+{
+  rtx mem, ssvalue, compare, jump;
+  rtx requested = gen_reg_rtx (Pmode);
+  /* Offset from thread pointer to __private_ss.  */
+  int psso = 0x10;
+
+  /* Load __private_ss from TCB.  */
+  ssvalue = gen_rtx_REG (Pmode, R9_REGNUM);
+  emit_insn (gen_aarch64_load_tp_hard (ssvalue));
+  mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso));
+  emit_move_insn (ssvalue, mem);
+
+  /* And compare it with frame pointer plus required stack.  */
+  if (CONST_INT_P (size))
+     emit_insn (gen_add3_insn (requested, stack_pointer_rtx,
+			       GEN_INT (-INTVAL (size))));
+  else
+    {
+      size = force_reg (Pmode, size);
+      emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx,
+						size));
+    }
+
+  /* Jump to __morestack call if current __private_ss is not suffice.  */
+  compare = aarch64_gen_compare_reg (LT, requested, ssvalue);
+  jump = gen_rtx_IF_THEN_ELSE (VOIDmode,
+			       gen_rtx_GEU (VOIDmode, compare, const0_rtx),
+			       gen_rtx_LABEL_REF (VOIDmode, label),
+			       pc_rtx);
+  jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump));
+  JUMP_LABEL (jump) = label;
+}
+
 #undef TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST aarch64_address_cost
 
@@ -14036,6 +14255,9 @@  aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_ASM_FILE_START
 #define TARGET_ASM_FILE_START aarch64_start_file
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END aarch64_file_end
+
 #undef TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK aarch64_output_mi_thunk
 
@@ -14118,6 +14340,12 @@  aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
+#undef TARGET_EXTRA_LIVE_ON_ENTRY
+#define TARGET_EXTRA_LIVE_ON_ENTRY aarch64_live_on_entry
+
+#undef TARGET_INTERNAL_ARG_POINTER
+#define TARGET_INTERNAL_ARG_POINTER aarch64_internal_arg_pointer
+
 #undef TARGET_GIMPLE_FOLD_BUILTIN
 #define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1915980..0ba3172 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -570,6 +570,9 @@  struct GTY (()) aarch64_frame
 
   HOST_WIDE_INT frame_size;
 
+  /* Alternative internal arg pointer for -fsplit-stack.  */
+  rtx split_stack_arg_pointer;
+
   bool laid_out;
 };
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 9e87a0d..8992608 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -130,6 +130,7 @@ 
     UNSPEC_VSTRUCTDUMMY
     UNSPEC_SP_SET
     UNSPEC_SP_TEST
+    UNSPEC_STACK_CHECK
     UNSPEC_RSQRT
     UNSPEC_RSQRTE
     UNSPEC_RSQRTS
@@ -144,6 +145,7 @@ 
     UNSPECV_SET_FPSR		; Represent assign of FPSR content.
     UNSPECV_BLOCKAGE		; Represent a blockage
     UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+    UNSPECV_SPLIT_STACK_RETURN  ; Represent a camouflaged return
   ]
 )
 
@@ -5394,3 +5396,33 @@ 
 
 ;; ldp/stp peephole patterns
 (include "aarch64-ldpstp.md")
+
+;; Handle -fsplit-stack
+(define_expand "split_stack_prologue"
+  [(const_int 0)]
+  ""
+{
+  aarch64_expand_split_stack_prologue ();
+  DONE;
+})
+
+;; A return instruction which the middle-end does not see.
+(define_insn "split_stack_return"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPLIT_STACK_RETURN)]
+  ""
+  "ret"
+  [(set_attr "type" "branch")])
+
+;; If there are operand 0 bytes available on the stack, jump to
+;; operand 1.
+(define_expand "split_stack_space_check"
+  [(set (match_dup 2) (compare:CC (match_dup 3) (match_dup 2)))
+   (set (pc) (if_then_else
+	      (geu (match_dup 4) (const_int 0))
+	      (label_ref (match_operand 1))
+	      (pc)))]
+  ""
+{
+  aarch64_split_stack_space_check (operands[0], operands[1]);
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.dg/split-3.c b/gcc/testsuite/gcc.dg/split-3.c
index 64bbb8c..5ba7616 100644
--- a/gcc/testsuite/gcc.dg/split-3.c
+++ b/gcc/testsuite/gcc.dg/split-3.c
@@ -40,6 +40,7 @@  down (int i, ...)
       || va_arg (ap, int) != 9
       || va_arg (ap, int) != 10)
     abort ();
+  va_end (ap);
 
   if (i > 0)
     {
diff --git a/gcc/testsuite/gcc.dg/split-6.c b/gcc/testsuite/gcc.dg/split-6.c
index b32cf8d..b3016ba 100644
--- a/gcc/testsuite/gcc.dg/split-6.c
+++ b/gcc/testsuite/gcc.dg/split-6.c
@@ -37,6 +37,7 @@  down (int i, ...)
       || va_arg (ap, int) != 9
       || va_arg (ap, int) != 10)
     abort ();
+  va_end (ap);
 
   if (i > 0)
     {
diff --git a/libgcc/config.host b/libgcc/config.host
index 4ccf25d..18f49f1 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -336,6 +336,7 @@  aarch64*-*-linux*)
 	md_unwind_header=aarch64/linux-unwind.h
 	tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
 	tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
+	tmake_file="${tmake_file} t-stack aarch64/t-stack-aarch64"
 	;;
 alpha*-*-linux*)
 	tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee t-crtfm alpha/t-linux"
diff --git a/libgcc/config/aarch64/morestack-c.c b/libgcc/config/aarch64/morestack-c.c
new file mode 100644
index 0000000..8df7895
--- /dev/null
+++ b/libgcc/config/aarch64/morestack-c.c
@@ -0,0 +1,95 @@ 
+/* AArch64 support for -fsplit-stack.
+ * Copyright (C) 2016 Free Software Foundation, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 3, or (at your option) any
+ * later version.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * Under Section 7 of GPL version 3, you are granted additional
+ * permissions described in the GCC Runtime Library Exception, version
+ * 3.1, as published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License and
+ * a copy of the GCC Runtime Library Exception along with this program;
+ * see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef inhibit_libc
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include "generic-morestack.h"
+
+/* This is based on GLIBC definition (version 2.24).  There is no need to
+   keep it sync since new fields are added on the end of structure and do
+   not change the '__private_ss' layout.  */
+typedef struct
+{
+  void *dtv;
+  void *private;
+  void *__private_ss;
+} tcbhead_t;
+
+#define INITIAL_STACK_SIZE  0x4000
+#define BACKOFF             0x1000
+
+void __generic_morestack_set_initial_sp (void *sp, size_t len);
+void *__morestack_get_guard (void);
+void __morestack_set_guard (void *);
+void *__morestack_make_guard (void *stack, size_t size);
+void __morestack_load_mmap (void);
+
+/* We declare is as weak so it fails either at stack linking or
+   at runtime if the GLIBC does not have the required TCB field.  */
+extern void __tcb_private_ss (void) __attribute__ ((weak));
+
+/* Initialize the stack guard when the program starts or when a new
+   thread.  This is called from a constructor using ctors section.  */
+void
+__stack_split_initialize (void)
+{
+  __tcb_private_ss ();
+
+  register void* sp __asm__ ("sp");
+  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
+  tcb->__private_ss = (void*)((uintptr_t)sp - INITIAL_STACK_SIZE);
+  return __generic_morestack_set_initial_sp (sp, INITIAL_STACK_SIZE);
+}
+
+/* Return current __private_ss.  */
+void *
+__morestack_get_guard (void)
+{
+  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
+  return tcb->__private_ss;
+}
+
+/* Set __private_ss to ptr.  */
+void
+__morestack_set_guard (void *ptr)
+{
+  tcbhead_t *tcb = ((tcbhead_t *) __builtin_thread_pointer ());
+  tcb->__private_ss = ptr;
+}
+
+/* Return the stack guard value for given stack.  */
+void *
+__morestack_make_guard (void *stack, size_t size)
+{
+  return (void*)((uintptr_t)stack - size + BACKOFF);
+}
+
+/* Make __stack_split_initialize a high priority constructor.  */
+static void (*const ctors []) 
+  __attribute__ ((used, section (".ctors.65535"), aligned (sizeof (void *))))
+  = { __stack_split_initialize, __morestack_load_mmap };
+
+#endif /* !defined (inhibit_libc) */
diff --git a/libgcc/config/aarch64/morestack.S b/libgcc/config/aarch64/morestack.S
new file mode 100644
index 0000000..5bbac4c
--- /dev/null
+++ b/libgcc/config/aarch64/morestack.S
@@ -0,0 +1,269 @@ 
+# AArch64 support for -fsplit-stack.
+# Copyright (C) 2016 Free Software Foundation, Inc.
+
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+
+# Under Section 7 of GPL version 3, you are granted additional
+# permissions described in the GCC Runtime Library Exception, version
+# 3.1, as published by the Free Software Foundation.
+
+# You should have received a copy of the GNU General Public License and
+# a copy of the GCC Runtime Library Exception along with this program;
+# see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+/* Define an entry point visible from C.  */
+#define ENTRY(name)						\
+  .globl name;							\
+  .type name,%function;						\
+  .align 4;							\
+  name##:
+
+#define END(name)						\
+  .size name,.-name
+
+
+#define MORESTACK_FRAMESIZE	112
+/* Offset based on function stack to get its argument from __morestack
+   frame.  */
+#define STACKFRAME_BASE		(-MORESTACK_FRAMESIZE - 16)
+/* Offset from __morestack frame where the new stack size is saved and
+   passed to __generic_morestack.  */
+#define NEWSTACK_SAVE		88
+/* Offset from __morestack frame where the arguments size saved and
+   passed to __generic_morestack.  */
+#define ARGS_SIZE_SAVE		80
+
+#define BACKOFF			0x2000
+# Large excess allocated when calling non-split-stack code.
+#define NON_SPLIT_STACK		0x100000
+
+# TCB offset of __private_ss
+#define TCB_PRIVATE_SS		#16
+
+	.text
+ENTRY(__morestack_non_split)
+	.cfi_startproc
+# We use a cleanup to restore the tcbhead_t.__private_ss if
+# an exception is thrown through this code.
+	add	x11, x11, NON_SPLIT_STACK
+	.cfi_endproc
+END(__morestack_non_split)
+# Fall through into __morestack
+
+# This function is called with non-standard calling conventions.  On entry
+# x10 is the requested stack pointer.  The split-stack prologue is in the
+# form:
+#
+#	mrs    x9, tpidr_el0
+#	mov    x10, -<required stack allocation>
+#	add    x10, sp, x10
+#	ldr    x9, [x9, 16]
+#	cmp    x10, x9
+#	bcs    enough
+#	stp    x30, [sp, -16]!
+#	mov    x11, <required arguments copy size>
+#	bl     __morestack
+#	ldp    x30, [sp], 16
+#	ret
+# enough:
+#
+# The normal function prologue follows here, with a small addition at the
+# end to set up the argument pointer.  The argument pointer is setup with:
+#
+#	mov     x11, <required stack allocation>
+#	sub	sp, sp, <required stack allocation>
+#	add	x10, x29, x11
+#	b.cs    function:
+#	mov     x10, x28
+# function:
+#
+# Note that all argument parameter registers and the x10 (the argument
+# pointer) are saved.  The N bit is also saved and restores to indicate
+# that the function is called (so the prologue addition can set up the
+# argument pointer correctly).
+
+ENTRY(__morestack)
+.LFB1:
+	.cfi_startproc
+
+#ifdef __PIC__
+	.cfi_personality 0x9b,DW.ref.__gcc_personality_v0
+	.cfi_lsda 0x1b,.LLSDA1
+#else
+	.cfi_personality 0x3,__gcc_personality_v0
+	.cfi_lsda 0x3,.LLSDA1
+#endif
+
+	# Calculate requested stack size.
+	sub	x12, sp, x10
+	# Save parameters
+	stp	x29, x30, [sp, -MORESTACK_FRAMESIZE]!
+	.cfi_def_cfa_offset MORESTACK_FRAMESIZE
+	.cfi_offset 29, -MORESTACK_FRAMESIZE
+	.cfi_offset 30, -MORESTACK_FRAMESIZE+8
+	add	x29, sp, 0
+	.cfi_def_cfa_register 29
+	# Adjust the requested stack size for the frame pointer save.
+	add	x12, x12, 16
+	stp	x0, x1, [sp, 16]
+	stp	x2, x3, [sp, 32]
+	add	x12, x12, BACKOFF
+	stp	x4, x5, [sp, 48]
+	stp	x6, x7, [sp, 64]
+	stp	x11, x12, [sp, 80]
+	str	x28, [sp, 96]
+
+	# Setup on x28 the function initial frame pointer.  Its value will
+	# copied to function argument pointer.
+	add	x28, sp, MORESTACK_FRAMESIZE + 16
+
+	# void __morestack_block_signals (void)
+	bl	__morestack_block_signals
+
+	# void *__generic_morestack (size_t *pframe_size,
+	#			     void *old_stack,
+	#			     size_t param_size)
+	# pframe_size: is the size of the required stack frame (the function
+	#	       amount of space remaining on the allocated stack).
+	# old_stack: points at the parameters the old stack
+	# param_size: size in bytes of parameters to copy to the new stack.
+	add	x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE
+	mov	x1, x28
+	ldr	x2, [sp, ARGS_SIZE_SAVE]
+	bl	__generic_morestack
+
+	# Start using new stack
+	stp	x29, x30, [x0, -16]!
+	mov	sp, x0
+
+	# Set __private_ss stack guard for the new stack.
+	ldr	x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]
+	add	x0, x0, BACKOFF
+	sub	x0, x0, 16
+	sub	x0, x0, x9
+.LEHB0:
+	mrs	x1, tpidr_el0
+	str	x0, [x1, TCB_PRIVATE_SS]
+
+	# void __morestack_unblock_signals (void)
+	bl	__morestack_unblock_signals
+
+	# Set up for a call to the target function.
+	#ldp	x29, x30, [x28, STACKFRAME_BASE]
+	ldr	x30, [x28, STACKFRAME_BASE + 8]
+	ldp	x0, x1, [x28, STACKFRAME_BASE + 16]
+	ldp	x2, x3, [x28, STACKFRAME_BASE + 32]
+	ldp	x4, x5, [x28, STACKFRAME_BASE + 48]
+	ldp	x6, x7, [x28, STACKFRAME_BASE + 64]
+	add	x9, x30, 8
+	cmp	x30, x9
+	blr	x9
+
+	stp	x0, x1, [x28, STACKFRAME_BASE + 16]
+	stp	x2, x3, [x28, STACKFRAME_BASE + 32]
+	stp	x4, x5, [x28, STACKFRAME_BASE + 48]
+	stp	x6, x7, [x28, STACKFRAME_BASE + 64]
+
+	bl	__morestack_block_signals
+
+	# void *__generic_releasestack (size_t *pavailable)
+	add	x0, x28, STACKFRAME_BASE + NEWSTACK_SAVE
+	bl	__generic_releasestack
+
+	# Reset __private_ss stack guard to value for old stack
+	ldr	x9, [x28, STACKFRAME_BASE + NEWSTACK_SAVE]
+	add	x0, x0, BACKOFF
+	sub	x0, x0, x9
+
+	# Update TCB split stack field
+.LEHE0:
+	mrs	x1, tpidr_el0
+	str	x0, [x1, TCB_PRIVATE_SS]
+
+	bl __morestack_unblock_signals
+
+	# Use old stack again.
+	sub	sp, x28, 16
+
+	ldp	x0, x1, [x28, STACKFRAME_BASE + 16]
+	ldp	x2, x3, [x28, STACKFRAME_BASE + 32]
+	ldp	x4, x5, [x28, STACKFRAME_BASE + 48]
+	ldp	x6, x7, [x28, STACKFRAME_BASE + 64]
+	ldp	x29, x30, [x28, STACKFRAME_BASE]
+	ldr	x28, [x28, STACKFRAME_BASE + 96]
+
+	.cfi_remember_state
+	.cfi_restore 30
+	.cfi_restore 29
+	.cfi_def_cfa 31, 0
+
+	ret
+
+# This is the cleanup code called by the stack unwinder when
+# unwinding through code between .LEHB0 and .LEHE0 above.
+cleanup:
+	.cfi_restore_state
+	str	x0, [x28, STACKFRAME_BASE]
+	# size_t __generic_findstack (void *stack)
+	mov	x0, x28
+	bl	__generic_findstack
+	sub	x0, x28, x0
+	add	x0, x0, BACKOFF
+	# Restore tcbhead_t.__private_ss
+	mrs	x1, tpidr_el0
+	str	x0, [x1, TCB_PRIVATE_SS]
+	ldr	x0, [x28, STACKFRAME_BASE]
+	b	_Unwind_Resume
+        .cfi_endproc
+END(__morestack)
+
+	.section .gcc_except_table,"a",@progbits
+	.align 4
+.LLSDA1:
+	# @LPStart format (omit)
+        .byte   0xff
+	# @TType format (omit)
+        .byte   0xff
+	# Call-site format (uleb128)
+        .byte   0x1
+	# Call-site table length
+        .uleb128 .LLSDACSE1-.LLSDACSB1
+.LLSDACSB1:
+	# region 0 start
+        .uleb128 .LEHB0-.LFB1
+	# length
+        .uleb128 .LEHE0-.LEHB0
+	# landing pad
+        .uleb128 cleanup-.LFB1
+	# no action (ie a cleanup)
+        .uleb128 0
+.LLSDACSE1:
+
+
+	.global __gcc_personality_v0
+#ifdef __PIC__
+	# Build a position independent reference to the personality function.
+	.hidden DW.ref.__gcc_personality_v0
+	.weak   DW.ref.__gcc_personality_v0
+	.section .data.DW.ref.__gcc_personality_v0,"awG",@progbits,DW.ref.__gcc_personality_v0,comdat
+	.type   DW.ref.__gcc_personality_v0, @object
+	.align 3
+DW.ref.__gcc_personality_v0:
+	.size   DW.ref.__gcc_personality_v0, 8
+	.quad   __gcc_personality_v0
+#endif
+
+	.section .note.GNU-stack,"",@progbits
+	.section .note.GNU-split-stack,"",@progbits
+	.section .note.GNU-no-split-stack,"",@progbits
diff --git a/libgcc/config/aarch64/t-stack-aarch64 b/libgcc/config/aarch64/t-stack-aarch64
new file mode 100644
index 0000000..4babb4e
--- /dev/null
+++ b/libgcc/config/aarch64/t-stack-aarch64
@@ -0,0 +1,3 @@ 
+# Makefile fragment to support -fsplit-stack for aarch64.
+LIB2ADD_ST += $(srcdir)/config/aarch64/morestack.S \
+	      $(srcdir)/config/aarch64/morestack-c.c
diff --git a/libgcc/generic-morestack.c b/libgcc/generic-morestack.c
index b8eec4e..fe7092b 100644
--- a/libgcc/generic-morestack.c
+++ b/libgcc/generic-morestack.c
@@ -943,6 +943,7 @@  __splitstack_find (void *segment_arg, void *sp, size_t *len,
       nsp -= 2 * 160;
 #elif defined __s390__
       nsp -= 2 * 96;
+#elif defined __aarch64__
 #else
 #error "unrecognized target"
 #endif