diff mbox

[4/4,ARM] Add attribute/pragma target fpu=

Message ID 55FBD3B4.9050709@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Sept. 18, 2015, 9:04 a.m. UTC
On 15/09/15 11:47, Christian Bruel wrote:
>
> On 09/14/2015 04:30 PM, Christian Bruel wrote:
>> Finally, the final part of the patch set does the attribute target
>> parsing and checking, redefines the preprocessor macros and implements
>> the inlining rules.
>>
>> testcases and documentation included.
>>
> new version to remove a shadowed remnant piece of code.
>
>
>   > thanks
>   >
>   > Christian
>   >

+  /* OK to inline between different modes.
+     Function with mode specific instructions, e.g using asm,
+     must be explicitely protected with noinline.  */

s/explicitely/explicitly/


+  const struct arm_fpu_desc *fpu_desc1
+    = &all_fpus[caller_opts->x_arm_fpu_index];
+  const struct arm_fpu_desc *fpu_desc2
+    = &all_fpus[callee_opts->x_arm_fpu_index];

Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way

+
+  /* Can't inline NEON extension if the caller doesn't support it.  */
+  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
+      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
+    return false;
+
+  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
+  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
+      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
+    return false;
+

We also need to take into account FPU_FL_FP16...
In general what we want is for the callee FPU features to be
a subset of the callers features, similar to the way we handle
the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
aarch64 port. I think that's the way to go here rather than explicitly
writing down a check for each feature.

@@ -242,6 +239,8 @@
  
        /* Update macros.  */
        gcc_assert (cur_opt->x_target_flags == target_flags);
+      /* This one can be redefined by the pragma without warning.  */
+      cpp_undef (parse_in, "__ARM_FP");
        arm_cpu_builtins (parse_in);
  
Could you elaborate why the cpp_undef here?
If you want to undefine __ARM_FP so you can redefine it to a new value
in arm_cpu_builtins then I think you should just undefine it in that function.

Comments

Kyrylo Tkachov Oct. 8, 2015, 8:52 a.m. UTC | #1
Hi Christian,

On 21/09/15 14:43, Christian Bruel wrote:
> Hi Kyrill,
>
> Thanks for your comments. Answers interleaved and the new patch attached.
>
> On 09/18/2015 11:04 AM, Kyrill Tkachov wrote:
>> On 15/09/15 11:47, Christian Bruel wrote:
>>> On 09/14/2015 04:30 PM, Christian Bruel wrote:
>>>> Finally, the final part of the patch set does the attribute target
>>>> parsing and checking, redefines the preprocessor macros and implements
>>>> the inlining rules.
>>>>
>>>> testcases and documentation included.
>>>>
>>> new version to remove a shadowed remnant piece of code.
>>>
>>>
>>>     > thanks
>>>     >
>>>     > Christian
>>>     >
>> +  /* OK to inline between different modes.
>> +     Function with mode specific instructions, e.g using asm,
>> +     must be explicitely protected with noinline.  */
>>
>> s/explicitely/explicitly/
>>
> thanks
>
>> +  const struct arm_fpu_desc *fpu_desc1
>> +    = &all_fpus[caller_opts->x_arm_fpu_index];
>> +  const struct arm_fpu_desc *fpu_desc2
>> +    = &all_fpus[callee_opts->x_arm_fpu_index];
>>
>> Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way
> ok
>
>> +
>> +  /* Can't inline NEON extension if the caller doesn't support it.  */
>> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
>> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
>> +    return false;
>> +
>> +  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
>> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
>> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
>> +    return false;
>> +
>>
>> We also need to take into account FPU_FL_FP16...
>> In general what we want is for the callee FPU features to be
>> a subset of the callers features, similar to the way we handle
>> the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
>> aarch64 port. I think that's the way to go here rather than explicitly
>> writing down a check for each feature.
> ok, with FL_FP16 now,
>
>> @@ -242,6 +239,8 @@
>>
>>           /* Update macros.  */
>>           gcc_assert (cur_opt->x_target_flags == target_flags);
>> +      /* This one can be redefined by the pragma without warning.  */
>> +      cpp_undef (parse_in, "__ARM_FP");
>>           arm_cpu_builtins (parse_in);
>>
>> Could you elaborate why the cpp_undef here?
>> If you want to undefine __ARM_FP so you can redefine it to a new value
>> in arm_cpu_builtins then I think you should just undefine it in that function.
> This is to avoid a warning: "__ARM_FP" redefined when creating a new
> pragma scope. (See the test attr-crypto.c).
>
> We cannot call the cpp_undef inside arm_cpu_builtins, because it is also
> used for the TARGET_CPU_CPP_BUILTINS hook and then would prevent real
> illegitimate redefinitions.
>
> Alternatively, I thought to reset the warn_builtin_macro_redefined flag,
> but that doesn't work as the macro is not NODE_BUILTIN (see the
> definition of warn_of_redefinition in libcpp).
> We might need to change this later : should target macros be marked as
> NOTE_BUILTIN ? We can discuss this separately (I can open a defect) as
> we have the cpp_undep solution for now, if you agree.
>
>>
>> diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
>> --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-10 12:21:00.698911244 +0200
>> +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-14 10:27:20.281932581 +0200
>> @@ -13360,6 +13363,8 @@
>>     floating-point arithmetic (in particular denormal values are treated as
>>     zero), so the use of NEON instructions may lead to a loss of precision.
>>
>> +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
>> +
>>
>> s/"mfpu="/"fpu="
>>
> thanks
>
>> --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
>> +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-14 16:12:08.449698268 +0200
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_neon_ok } */
>> +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
>> +
>> +void
>> +f3(int n, int x[], int y[]) {
>> +  int i;
>> +  for (i = 0; i < n; ++i)
>> +    y[i] = x[i] << 3;
>> +}
>> +
>>
>> What if GCC has been configured with --with-fpu=neon?
>> Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.
> Ah yes. I've added ((target("fpu=vfp")) instead, since we are testing
> the attribute.
>

2015-05-26  Christian Bruel<christian.bruel@st.com>

	PR target/65837
	* config/arm/arm-c.c (arm_cpu_builtins): Set or reset
	__ARM_FEATURE_CRYPTO, __VFP_FP__, __ARM_NEON__
	(arm_pragma_target_parse): Change check for arm_cpu_builtins.
	undefine __ARM_FP.
	* config/arm/arm.c (arm_can_inline_p): Check FPUs.
	(arm_valid_target_attribute_rec): Handle -mfpu attribute target.
	* doc/invoke.texi (-mfpu=): Mention attribute and pragma.
	* doc/extend.texi (-mfpu=): Describe attribute.

2015-09-14  Christian Bruel<christian.bruel@st.com>

	PR target/65837
	gcc.target/arm/lto/pr65837_0.c
	gcc.target/arm/attr-neon2.c
	gcc.target/arm/attr-neon.c
	gcc.target/arm/attr-neon-builtin-fail.c
	gcc.target/arm/attr-crypto.c

The parts in this patch look ok to me.
However, I think we need some more functionality
In aarch64 we support compiling a file with no simd, including arm_neon.h and using arm_neon.h intrinsics
within functions tagged with simd support.
We want to support such functionality on arm i.e. compile a file with -mfpu=vfp and use arm_neon.h intrinsics
in a function tagged with an fpu=neon attribute.
For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, like in the aarch64 version of arm_neon.h

Thanks,
Kyrill
Kyrylo Tkachov Nov. 13, 2015, 11:49 a.m. UTC | #2
Hi Christian,

On 12/11/15 14:54, Christian Bruel wrote:
> Hi Kyril,

>

>> ...

>> The parts in this patch look ok to me.

>> However, I think we need some more functionality

>> In aarch64 we support compiling a file with no simd, including arm_neon.h and using arm_neon.h intrinsics

>> within functions tagged with simd support.

>> We want to support such functionality on arm i.e. compile a file with -mfpu=vfp and use arm_neon.h intrinsics

>> in a function tagged with an fpu=neon attribute.

>> For that we'd need to wrap the intrinsics in arm_neon.h in appropriate pragmas, like in the aarch64 version of arm_neon.h

>

> As discussed, here is arm_neon.h for aarch32/neon with the same programming model than aarch64/simd. As you said lets use one of the fpu=neon attributes even if the file is compiled with -mfpu=vfp.

>

> The drawback for this is that now we unconditionally makes available every neon intrinsics, introducing a small legacy change with regards to error checking (that you didn't have with aarch64). Then it's worth to stress that:

>

>  - One cannot check #include "arm_neon.h" to check if the compiler can use neon instruction. Instead use #ifndef __ARM_NEON__. (Found in target-supports.exp)


Checking the macro is the 'canonical' way to check for NEON support,
so I reckon we can live with that.

>

>

>  - Types cannot be checked. For instance:

>

> #include <arm_neon.h>

>

> poly128_t

> foo (poly128_t* ptr)

> {

>   return vldrq_p128 (ptr);

> }

>

> compiled with -mfpu=neon used to be rejected with

>

>    error: unknown type name 'poly128_t' ...

>

>  Now the error, as a side effect from the inlining rules between incompatible modes, becomes

>

>   error: inlining failed in call to always_inline 'vldrq_p128': target specific option mismatch ...


Well, the previous message is misleading anyway since the user error there is not a type issue
but failure to specify the correct -mfpu option.

>

> I found this more confusing, so I was a little bit reluctant to implement this, but the code is correctly rejected and the message makes sense, after all. Just a different check.

>

> This patch applies on top of the preceding attribute/pragma target fpu= series. Tested with arm-none-eabi configured with default and --with-cpu=cortex-a9 --with-fp --with-float=hard


Do you mean --with-fpu=<something>?

>

> Also fixes a few macro that depends on fpu=, that I forgot to redefine.


Can you please split those changes into a separate patch and ChangeLog and commit the separately?
That part is preapproved.


This patch is ok then with above comment about splitting the arm-c.c changes separately.
Thanks for doing this!
I believe all patches in this series are approved then
so you can go ahead and start committing.

Kyrill

>

> Christian

>
diff mbox

Patch

diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
--- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-10 12:21:00.698911244 +0200
+++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-14 10:27:20.281932581 +0200
@@ -13360,6 +13363,8 @@ 
  floating-point arithmetic (in particular denormal values are treated as
  zero), so the use of NEON instructions may lead to a loss of precision.
  
+You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
+

s/"mfpu="/"fpu="


--- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
+++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-14 16:12:08.449698268 +0200
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
+
+void
+f3(int n, int x[], int y[]) {
+  int i;
+  for (i = 0; i < n; ++i)
+    y[i] = x[i] << 3;
+}
+

What if GCC has been configured with --with-fpu=neon?
Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.