diff mbox

[RFC,AARCH64] TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook

Message ID 535F1061.90908@linaro.org
State Accepted
Headers show

Commit Message

Kugan Vivekanandarajah April 29, 2014, 2:37 a.m. UTC
On 28/04/14 21:01, Ramana Radhakrishnan wrote:
> On 04/26/14 11:57, Kugan wrote:
>> Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64.
>> With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS.
>>
>> This implementation is based on SPARC and i386 implementations.
>>
>> Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new
>> regression. Is this OK for trunk?
> 
> Again like A32 please test on hardware to make sure this behaves
> correctly with c11-atomic-exec-5.c .
> 
> If you don't have access to hardware, let us know : we'll take it for a
> spin once you update the patch according to Marcus's comments.
> 

Thanks for the review. I have updated the patch. I also have updated
hold, clear and update to be exactly as in feholdexcpt.c, fclrexcpt.c
and feupdateenv.c of glibc/ports/sysdeps/aarch64/fpu.

I have limited real hardware access and just did a bootstrap and tested
c11-atomic-exec-5.c alone to make sure that it PASS. I have also
regression tested again on qemu-aarch64 for aarch64-none-linux-gnu with
no new regressions. I will appreciate if you could do the regression
testing on real hw.

As for the ARM version of the patch, I did test the previous version for
c11-atomic-exec-5.c and did verified it on chromebook before I posted
the match . I have now updated the patch based on your review and the
full bootstrap and regression testing is now under way. I will post the
patch once the results are available.

Thanks,
Kugan

+2014-04-29  Kugan Vivekanandarajah  <kuganv@linaro.org>
+
+	* config/aarch64/aarch64.c (TARGET_ATOMIC_ASSIGN_EXPAND_FENV): New
+	define.
+	* config/aarch64/aarch64-protos.h (aarch64_atomic_assign_expand_fenv):
+	New function declaration.
+	* config/aarch64/aarch64-builtins.c (aarch64_builtins) : Add
+	AARCH64_BUILTIN_GET_FPCR, AARCH64_BUILTIN_SET_FPCR.
+	AARCH64_BUILTIN_GET_FPSR and AARCH64_BUILTIN_SET_FPSR.
+	(aarch64_init_builtins) : Initialize builtins
+	__builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr.
+	__builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr.
+	(aarch64_expand_builtin) : Expand builtins __builtins_aarch64_set_fpcr
+	__builtins_aarch64_get_fpcr, __builtins_aarch64_get_fpsr,
+	and __builtins_aarch64_set_fpsr.
+	(aarch64_atomic_assign_expand_fenv): New function.
+	* config/aarch64/aarch64.md (set_fpcr): New pattern.
+	(get_fpcr) : Likewise.
+	(set_fpsr) : Likewise.
+	(get_fpsr) : Likewise.
+	(unspecv): Add UNSPECV_GET_FPCR and UNSPECV_SET_FPCR, UNSPECV_GET_FPSR
+	 and UNSPECV_SET_FPSR.
+	* doc/extend.texi (AARCH64 Built-in Functions) : Document
+	__builtins_aarch64_set_fpcr, __builtins_aarch64_get_fpcr.
+	__builtins_aarch64_set_fpsr and __builtins_aarch64_get_fpsr.

Comments

Marcus Shawcroft May 2, 2014, 10:06 a.m. UTC | #1
On 29 April 2014 03:37, Kugan <kugan.vivekanandarajah@linaro.org> wrote:
>
> On 28/04/14 21:01, Ramana Radhakrishnan wrote:
>> On 04/26/14 11:57, Kugan wrote:
>>> Attached patch implements TARGET_ATOMIC_ASSIGN_EXPAND_FENV for AARCH64.
>>> With this, atomic test-case gcc.dg/atomic/c11-atomic-exec-5.c now PASS.
>>>
>>> This implementation is based on SPARC and i386 implementations.
>>>
>>> Regression tested on qemu-aarch64 for aarch64-none-linux-gnu with no new
>>> regression. Is this OK for trunk?
>>
>> Again like A32 please test on hardware to make sure this behaves
>> correctly with c11-atomic-exec-5.c .
>>
>> If you don't have access to hardware, let us know : we'll take it for a
>> spin once you update the patch according to Marcus's comments.
>>
>
> Thanks for the review. I have updated the patch. I also have updated
> hold, clear and update to be exactly as in feholdexcpt.c, fclrexcpt.c
> and feupdateenv.c of glibc/ports/sysdeps/aarch64/fpu.
>

Kugan, I've not looked at the respin in detail yet, but it has just
occurred to me that the sequence used here to set FPCR is
insufficient.  The architecture reference manual requires that any
write to FPCR must be syncrhronized by a context synchronization
operation so we need to plant an ISB after the write.   Both the write
and ISB are likely to be expensive on some implementations so it would
be good to ensure that both the write and the isb are scheduled
independently.  IIRC there si

> I have limited real hardware access and just did a bootstrap and tested
> c11-atomic-exec-5.c alone to make sure that it PASS. I have also
> regression tested again on qemu-aarch64 for aarch64-none-linux-gnu with
> no new regressions. I will appreciate if you could do the regression
> testing on real hw.

Once the ISB issue is resolved I'll give the patch a spin on HW here.
/Marcus
Marcus Shawcroft May 2, 2014, 10:15 a.m. UTC | #2
On 2 May 2014 11:06, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:

> Kugan, I've not looked at the respin in detail yet, but it has just
> occurred to me that the sequence used here to set FPCR is
> insufficient.  The architecture reference manual requires that any
> write to FPCR must be syncrhronized by a context synchronization
> operation so we need to plant an ISB after the write.   Both the write
> and ISB are likely to be expensive on some implementations so it would
> be good to ensure that both the write and the isb are scheduled
> independently.  IIRC there si

Sorry, incomplete sentence.  I had started to write that IIRC the same
issue did not apply to FPSCR in the ARM patch.  I have doubled checked
and the FPSCR does not have the issue therefore the ARM patch is fine
in this respect.

/Marcus
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 55cfe0a..5cdc978 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -371,6 +371,12 @@  static aarch64_simd_builtin_datum aarch64_simd_builtin_data[] = {
 enum aarch64_builtins
 {
   AARCH64_BUILTIN_MIN,
+
+  AARCH64_BUILTIN_GET_FPCR,
+  AARCH64_BUILTIN_SET_FPCR,
+  AARCH64_BUILTIN_GET_FPSR,
+  AARCH64_BUILTIN_SET_FPSR,
+
   AARCH64_SIMD_BUILTIN_BASE,
 #include "aarch64-simd-builtins.def"
   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_BUILTIN_BASE
@@ -752,6 +758,24 @@  aarch64_init_simd_builtins (void)
 void
 aarch64_init_builtins (void)
 {
+  tree ftype_set_fpr
+    = build_function_type_list (void_type_node, unsigned_type_node, NULL);
+  tree ftype_get_fpr
+    = build_function_type_list (unsigned_type_node, NULL);
+
+  aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR]
+    = add_builtin_function ("__builtin_aarch64_get_fpcr", ftype_get_fpr,
+			    AARCH64_BUILTIN_GET_FPCR, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR]
+    = add_builtin_function ("__builtin_aarch64_set_fpcr", ftype_set_fpr,
+			    AARCH64_BUILTIN_SET_FPCR, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPSR]
+    = add_builtin_function ("__builtin_aarch64_get_fpsr", ftype_get_fpr,
+			    AARCH64_BUILTIN_GET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPSR]
+    = add_builtin_function ("__builtin_aarch64_set_fpsr", ftype_set_fpr,
+			    AARCH64_BUILTIN_SET_FPSR, BUILT_IN_MD, NULL, NULL_TREE);
+
   if (TARGET_SIMD)
     aarch64_init_simd_builtins ();
 }
@@ -964,6 +988,36 @@  aarch64_expand_builtin (tree exp,
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   int fcode = DECL_FUNCTION_CODE (fndecl);
+  int icode;
+  rtx pat, op0;
+  tree arg0;
+
+  switch (fcode)
+    {
+    case AARCH64_BUILTIN_GET_FPCR:
+    case AARCH64_BUILTIN_SET_FPCR:
+    case AARCH64_BUILTIN_GET_FPSR:
+    case AARCH64_BUILTIN_SET_FPSR:
+      if ((fcode == AARCH64_BUILTIN_GET_FPCR)
+	  || (fcode == AARCH64_BUILTIN_GET_FPSR))
+	{
+	  icode = (fcode == AARCH64_BUILTIN_GET_FPSR) ?
+	    CODE_FOR_get_fpsr : CODE_FOR_get_fpcr;
+	  target = gen_reg_rtx (SImode);
+	  pat = GEN_FCN (icode) (target);
+	}
+      else
+	{
+	  target = NULL_RTX;
+	  icode = (fcode == AARCH64_BUILTIN_SET_FPSR) ?
+	    CODE_FOR_set_fpsr : CODE_FOR_set_fpcr;
+	  arg0 = CALL_EXPR_ARG (exp, 0);
+	  op0 = expand_normal (arg0);
+	  pat = GEN_FCN (icode) (op0);
+	}
+      emit_insn (pat);
+      return target;
+    }
 
   if (fcode >= AARCH64_SIMD_BUILTIN_BASE)
     return aarch64_simd_expand_builtin (fcode, exp, target);
@@ -1196,6 +1250,103 @@  aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return changed;
 }
 
+void
+aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
+{
+  const unsigned FE_INVALID = 1;
+  const unsigned FE_DIVBYZERO = 2;
+  const unsigned FE_OVERFLOW = 4;
+  const unsigned FE_UNDERFLOW = 8;
+  const unsigned FE_INEXACT = 16;
+  const unsigned HOST_WIDE_INT FE_ALL_EXCEPT = (FE_INVALID | FE_DIVBYZERO
+						| FE_OVERFLOW | FE_UNDERFLOW
+						| FE_INEXACT);
+  const unsigned HOST_WIDE_INT FE_EXCEPT_SHIFT = 8;
+  tree fenv_cr, fenv_sr, get_fpcr, set_fpcr, mask_cr, mask_sr;
+  tree ld_fenv_cr, ld_fenv_sr, masked_fenv_cr, masked_fenv_sr, hold_fnclex_cr;
+  tree hold_fnclex_sr, tmp_var, reload_fenv, restore_fnenv, get_fpsr, set_fpsr;
+  tree update_call, atomic_feraiseexcept, hold_fnclex, masked_fenv, ld_fenv;
+
+  /* Generate the equivalence of :
+       unsigned int fenv_cr;
+       fenv_cr = __builtin_aarch64_get_fpcr ();
+
+       unsigned int fenv_sr;
+       fenv_sr = __builtin_aarch64_get_fpsr ();
+
+       Now set all exceptions to non-stop
+       unsigned int mask_cr = ~(FE_ALL_EXCEPT << FE_EXCEPT_SHIFT);
+       unsigned int masked_cr;
+       masked_cr = fenv_cr & mask_cr;
+
+       And clear all exception flags
+       unsigned int maske_sr = ~FE_ALL_EXCEPT;
+       unsigned int masked_cr;
+       masked_sr = fenv_sr & mask_sr;
+
+       __builtin_aarch64_set_cr (masked_cr);
+       __builtin_aarch64_set_sr (masked_sr);  */
+
+  fenv_cr = create_tmp_var (unsigned_type_node, NULL);
+  fenv_sr = create_tmp_var (unsigned_type_node, NULL);
+
+  get_fpcr = aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPCR];
+  set_fpcr = aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPCR];
+  get_fpsr = aarch64_builtin_decls[AARCH64_BUILTIN_GET_FPSR];
+  set_fpsr = aarch64_builtin_decls[AARCH64_BUILTIN_SET_FPSR];
+
+  mask_cr = build_int_cst (unsigned_type_node,
+			   ~(FE_ALL_EXCEPT << FE_EXCEPT_SHIFT));
+  mask_sr = build_int_cst (unsigned_type_node,
+			   ~(FE_ALL_EXCEPT));
+
+  ld_fenv_cr = build2 (MODIFY_EXPR, unsigned_type_node,
+		    fenv_cr, build_call_expr (get_fpcr, 0));
+  ld_fenv_sr = build2 (MODIFY_EXPR, unsigned_type_node,
+		    fenv_sr, build_call_expr (get_fpsr, 0));
+
+  masked_fenv_cr = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_cr, mask_cr);
+  masked_fenv_sr = build2 (BIT_AND_EXPR, unsigned_type_node, fenv_sr, mask_sr);
+
+  hold_fnclex_cr = build_call_expr (set_fpcr, 1, masked_fenv_cr);
+  hold_fnclex_sr = build_call_expr (set_fpsr, 1, masked_fenv_sr);
+
+  hold_fnclex = build2 (COMPOUND_EXPR, void_type_node, hold_fnclex_cr,
+			hold_fnclex_sr);
+  masked_fenv = build2 (COMPOUND_EXPR, void_type_node, masked_fenv_cr,
+			masked_fenv_sr);
+  ld_fenv = build2 (COMPOUND_EXPR, void_type_node, ld_fenv_cr, ld_fenv_sr);
+
+  *hold = build2 (COMPOUND_EXPR, void_type_node,
+		  build2 (COMPOUND_EXPR, void_type_node, masked_fenv, ld_fenv),
+		  hold_fnclex);
+
+  /* Store the value of masked_fenv to clear the exceptions:
+     __builtin_aarch64_set_fpcr (masked_sr);  */
+
+  *clear = build_call_expr (set_fpsr, 1, masked_fenv_sr);
+
+  /* Generate the equivalent of :
+       unsigned int tmp2_var;
+       tmp_var = __builtin_aarch64_get_fpsr ();
+
+       __builtin_aarch64_set_fpsr (fenv_sr);
+
+       __atomic_feraiseexcept (tmp_var);  */
+
+  tmp_var = create_tmp_var (unsigned_type_node, NULL);
+  reload_fenv = build2 (MODIFY_EXPR, unsigned_type_node,
+			tmp_var, build_call_expr (get_fpsr, 0));
+  restore_fnenv = build_call_expr (set_fpsr, 1, fenv_sr);
+  atomic_feraiseexcept = builtin_decl_implicit (BUILT_IN_ATOMIC_FERAISEEXCEPT);
+  update_call = build_call_expr (atomic_feraiseexcept, 1,
+				 fold_convert (integer_type_node, tmp_var));
+  *update = build2 (COMPOUND_EXPR, void_type_node,
+		    build2 (COMPOUND_EXPR, void_type_node,
+			    reload_fenv, restore_fnenv), update_call);
+}
+
+
 #undef AARCH64_CHECK_BUILTIN_MODE
 #undef AARCH64_FIND_FRINT_VARIANT
 #undef BUILTIN_DX
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 5542f02..f4f3f61 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -289,4 +289,5 @@  extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
 extern bool
 aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
+void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a3147ee..fbbdc23 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8488,6 +8488,10 @@  aarch64_cannot_change_mode_class (enum machine_mode from,
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   aarch64_autovectorize_vector_sizes
 
+#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
+#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV \
+  aarch64_atomic_assign_expand_fenv
+
 /* Section anchor support.  */
 
 #undef TARGET_MIN_ANCHOR_OFFSET
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c86a29d..24f235f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -107,6 +107,10 @@ 
 
 (define_c_enum "unspecv" [
     UNSPECV_EH_RETURN		; Represent EH_RETURN
+    UNSPECV_GET_FPCR		; Represent fetch of FPCR content.
+    UNSPECV_SET_FPCR		; Represent assign of FPCR content.
+    UNSPECV_GET_FPSR		; Represent fetch of FPSR content.
+    UNSPECV_SET_FPSR		; Represent assign of FPSR content.
   ]
 )
 
@@ -3635,6 +3639,37 @@ 
   DONE;
 })
 
+;; Write Floating-point Control Register.
+(define_insn "set_fpcr"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] UNSPECV_SET_FPCR)]
+  ""
+  "msr\\tfpcr, %0"
+  [(set_attr "type" "mrs")])
+
+;; Read Floating-point Control Register.
+(define_insn "get_fpcr"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_FPCR))]
+  ""
+  "mrs\\t%0, fpcr"
+  [(set_attr "type" "mrs")])
+
+;; Write Floating-point Status Register.
+(define_insn "set_fpsr"
+  [(unspec_volatile [(match_operand:SI 0 "register_operand" "r")] UNSPECV_SET_FPSR)]
+  ""
+  "msr\\tfpsr, %0"
+  [(set_attr "type" "mrs")])
+
+;; Read Floating-point Status Register.
+(define_insn "get_fpsr"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_FPSR))]
+  ""
+  "mrs\\t%0, fpsr"
+  [(set_attr "type" "mrs")])
+
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 347a94a..8bd13f3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9107,6 +9107,7 @@  to those machines.  Generally these generate calls to specific machine
 instructions, but allow the compiler to schedule those calls.
 
 @menu
+* AARCH64 Built-in Functions::
 * Alpha Built-in Functions::
 * Altera Nios II Built-in Functions::
 * ARC Built-in Functions::
@@ -9139,6 +9140,18 @@  instructions, but allow the compiler to schedule those calls.
 * TILEPro Built-in Functions::
 @end menu
 
+@node AARCH64 Built-in Functions
+@subsection AARCH64 Built-in Functions
+
+These built-in functions are available for the AARCH64 family of
+processors.
+@smallexample
+unsigned int __builtin_aarch64_get_fpcr ()
+void __builtin_aarch64_set_fpcr (unsigned int)
+unsigned int __builtin_aarch64_get_fpsr ()
+void __builtin_aarch64_set_fpsr (unsigned int)
+@end smallexample
+
 @node Alpha Built-in Functions
 @subsection Alpha Built-in Functions