From patchwork Tue Oct 20 15:47:31 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 55312 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f200.google.com (mail-lb0-f200.google.com [209.85.217.200]) by patches.linaro.org (Postfix) with ESMTPS id ED18B23024 for ; Tue, 20 Oct 2015 15:48:03 +0000 (UTC) Received: by lbbms9 with SMTP id ms9sf8645510lbb.3 for ; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to:cc :subject:references:in-reply-to:content-type:x-original-sender :x-original-authentication-results; bh=6ZiCoZXQ4MdDaTPpGxKrknWRscrO4Mg3qUcRWf7w8Bo=; b=DXqMB/86erMtq1PeyHQY9eWEt1ERbQ30vXDR8Uh2dD4K/VylioXMXHNQGD4LS25fmW BNRS9IJGBbgDikZJ4Qri3FJV5C50ALmHspLc4UGVsab5OUA4asPlOgQjQ/qCmzJkNA4k EtJPV97gP4ej9XJ3ugjEdE69YN5haHx6TyrD8v758lnTTpkBxNS2ZR7wZwFaWpMbTR+q SE7bwca3p5kS5peg2AEtEJNZ3i0xlqpUMZBFSWdCMJkv8y3A/KObnjgfyWsqdBrEJiVo nHWnLZk/Lu4nfPWfQDCJrRVWgMzGAWxycuDUQdJhaaoE0IXPmSFkcsWEj/2qh74wMOzg mbYw== X-Gm-Message-State: ALoCoQl6Y4X8jd9R5ax6FdLrJWiLC37xF3DGjQVwmgdDgKtghzk3ZROpXgjZehiKKA2E0+VP0JHY X-Received: by 10.194.109.233 with SMTP id hv9mr806984wjb.1.1445356082849; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.25.24.170 with SMTP id 42ls95765lfy.3.gmail; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) X-Received: by 10.25.24.27 with SMTP id o27mr1498705lfi.5.1445356082558; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com. [2a00:1450:4010:c04::235]) by mx.google.com with ESMTPS id a135si2743115lfe.135.2015.10.20.08.48.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Oct 2015 08:48:02 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::235 as permitted sender) client-ip=2a00:1450:4010:c04::235; Received: by lbbwb3 with SMTP id wb3so18539355lbb.1 for ; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) X-Received: by 10.112.163.131 with SMTP id yi3mr2321017lbb.36.1445356082255; Tue, 20 Oct 2015 08:48:02 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp2161181lbq; Tue, 20 Oct 2015 08:48:00 -0700 (PDT) X-Received: by 10.66.216.39 with SMTP id on7mr4688986pac.73.1445356080573; Tue, 20 Oct 2015 08:48:00 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id zo6si6082555pbc.29.2015.10.20.08.48.00 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Oct 2015 08:48:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-410677-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 92540 invoked by alias); 20 Oct 2015 15:47:40 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 92530 invoked by uid 89); 20 Oct 2015 15:47:39 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.6 required=5.0 tests=AWL, BAYES_50, LIKELY_SPAM_BODY, SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Oct 2015 15:47:36 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-24-Y_UzQhluR5ySg3Leb052UA-1; Tue, 20 Oct 2015 16:47:31 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 20 Oct 2015 16:47:31 +0100 Message-ID: <56266213.80705@arm.com> Date: Tue, 20 Oct 2015 16:47:31 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Marcus Shawcroft CC: GCC Patches , Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation References: <5624F6B3.1030407@arm.com> In-Reply-To: X-MC-Unique: Y_UzQhluR5ySg3Leb052UA-1 X-IsSubscribed: yes X-Original-Sender: kyrylo.tkachov@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::235 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 On 20/10/15 16:26, Marcus Shawcroft wrote: > On 19 October 2015 at 14:57, Kyrill Tkachov wrote: > >> 2015-10-19 Kyrylo Tkachov >> >> * config/aarch64/aarch64.md >> (*aarch64_fcvt2_mult): New pattern. >> * config/aarch64/aarch64-simd.md >> (*aarch64_fcvt2_mult): Likewise. >> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. >> (aarch64_fpconst_pow_of_2): New function. >> (aarch64_vec_fpconst_pow_of_2): Likewise. >> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare >> prototype. >> (aarch64_vec_fpconst_pow_of_2): Likewise. >> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. >> (aarch64_fp_vec_pow2): Likewise. >> >> 2015-10-19 Kyrylo Tkachov >> >> * gcc.target/aarch64/fmul_fcvt_1.c: New test. >> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. > + char buf[64]; > + sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); > > Prefer snprintf please. > > + } > + [(set_attr "type" "neon_fp_to_int_")] > +) > + > + > > Superflous blank line here ? > > + *cost += rtx_cost (XEXP (x, 0), VOIDmode, > + (enum rtx_code) code, 0, speed); > > My understanding is the unnecessary use of enum is now discouraged, > (rtx_code) is sufficient in this case. > > + int count = CONST_VECTOR_NUNITS (x); > + int i; > + for (i = 1; i < count; i++) > > Push the int into the for initializer. > Push the rhs of the count assignment into the for condition and drop > the definition of count. Ok, done. > > +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ > > I'd prefer scan-assembler-times or do you have a particular reason to > avoid it in these tests? No reason. Here's the patch updated as per your feedback. How's this? Thanks, Kyrill 2015-10-20 Kyrylo Tkachov * config/aarch64/aarch64.md (*aarch64_fcvt2_mult): New pattern. * config/aarch64/aarch64-simd.md (*aarch64_fcvt2_mult): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. (aarch64_fpconst_pow_of_2): New function. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare prototype. (aarch64_vec_fpconst_pow_of_2): Likewise. * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. (aarch64_fp_vec_pow2): Likewise. 2015-10-20 Kyrylo Tkachov * gcc.target/aarch64/fmul_fcvt_1.c: New test. * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. commit dc55192dd5f54f69d659ea6cc703c5fc2dc9b88b Author: Kyrylo Tkachov Date: Thu Oct 8 15:17:47 2015 +0100 [AArch64] Add fmul+fcvt optimisation diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a8ac8d3..309dcfb 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -294,12 +294,14 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx); enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); int aarch64_asm_preferred_eh_data_format (int, int); +int aarch64_fpconst_pow_of_2 (rtx); machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode); int aarch64_hard_regno_mode_ok (unsigned, machine_mode); int aarch64_hard_regno_nregs (unsigned, machine_mode); int aarch64_simd_attr_length_move (rtx_insn *); int aarch64_uxt_size (int, HOST_WIDE_INT); +int aarch64_vec_fpconst_pow_of_2 (rtx); rtx aarch64_final_eh_return_addr (void); rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int); const char *aarch64_output_move_struct (rtx *operands); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 167277e..cf1ff6d 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1654,6 +1654,26 @@ (define_insn "l2" [(set_attr "type" "neon_fp_to_int_")] ) +(define_insn "*aarch64_fcvt2_mult" + [(set (match_operand: 0 "register_operand" "=w") + (FIXUORS: (unspec: + [(mult:VDQF + (match_operand:VDQF 1 "register_operand" "w") + (match_operand:VDQF 2 "aarch64_fp_vec_pow2" ""))] + UNSPEC_FRINTZ)))] + "TARGET_SIMD + && IN_RANGE (aarch64_vec_fpconst_pow_of_2 (operands[2]), 1, + GET_MODE_BITSIZE (GET_MODE_INNER (mode)))" + { + int fbits = aarch64_vec_fpconst_pow_of_2 (operands[2]); + char buf[64]; + snprintf (buf, 64, "fcvtz\\t%%0., %%1., #%d", fbits); + output_asm_insn (buf, operands); + return ""; + } + [(set_attr "type" "neon_fp_to_int_")] +) + (define_expand "2" [(set (match_operand: 0 "register_operand") (FIXUORS: (unspec: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e304d40..17e59e1 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6791,6 +6791,19 @@ cost_plus: else *cost += extra_cost->fp[GET_MODE (x) == DFmode].toint; } + + /* We can combine fmul by a power of 2 followed by a fcvt into a single + fixed-point fcvt. */ + if (GET_CODE (x) == MULT + && ((VECTOR_MODE_P (mode) + && aarch64_vec_fpconst_pow_of_2 (XEXP (x, 1)) > 0) + || aarch64_fpconst_pow_of_2 (XEXP (x, 1)) > 0)) + { + *cost += rtx_cost (XEXP (x, 0), VOIDmode, (rtx_code) code, + 0, speed); + return true; + } + *cost += rtx_cost (x, VOIDmode, (enum rtx_code) code, 0, speed); return true; @@ -13357,6 +13370,52 @@ aarch64_reorg (void) #undef TARGET_MACHINE_DEPENDENT_REORG #define TARGET_MACHINE_DEPENDENT_REORG aarch64_reorg + +/* If X is a positive CONST_DOUBLE with a value that is a power of 2 + return the log2 of that value. Otherwise return -1. */ + +int +aarch64_fpconst_pow_of_2 (rtx x) +{ + const REAL_VALUE_TYPE *r; + + if (!CONST_DOUBLE_P (x)) + return -1; + + r = CONST_DOUBLE_REAL_VALUE (x); + + if (REAL_VALUE_NEGATIVE (*r) + || REAL_VALUE_ISNAN (*r) + || REAL_VALUE_ISINF (*r) + || !real_isinteger (r, DFmode)) + return -1; + + return exact_log2 (real_to_integer (r)); +} + +/* If X is a vector of equal CONST_DOUBLE values and that value is + Y, return the aarch64_fpconst_pow_of_2 of Y. Otherwise return -1. */ + +int +aarch64_vec_fpconst_pow_of_2 (rtx x) +{ + if (GET_CODE (x) != CONST_VECTOR) + return -1; + + if (GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_FLOAT) + return -1; + + int firstval = aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, 0)); + if (firstval <= 0) + return -1; + + for (int i = 1; i < CONST_VECTOR_NUNITS (x); i++) + if (aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, i)) != firstval) + return -1; + + return firstval; +} + /* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float. */ static tree aarch64_promoted_type (const_tree t) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a3ec371..35f9877 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4287,6 +4287,25 @@ (define_insn "l2" [(set_attr "type" "f_cvtf2i")] ) +(define_insn "*aarch64_fcvt2_mult" + [(set (match_operand:GPI 0 "register_operand" "=r") + (FIXUORS:GPI + (mult:GPF + (match_operand:GPF 1 "register_operand" "w") + (match_operand:GPF 2 "aarch64_fp_pow2" "F"))))] + "TARGET_FLOAT + && IN_RANGE (aarch64_fpconst_pow_of_2 (operands[2]), 1, + GET_MODE_BITSIZE (mode))" + { + int fbits = aarch64_fpconst_pow_of_2 (operands[2]); + char buf[64]; + snprintf (buf, 64, "fcvtz\\t%%0, %%1, #%d", fbits); + output_asm_insn (buf, operands); + return ""; + } + [(set_attr "type" "f_cvtf2i")] +) + ;; fma - no throw (define_insn "fma4" diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 8af4b81..1bcbf62 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -87,6 +87,13 @@ (define_predicate "aarch64_fp_compare_operand" (and (match_code "const_double") (match_test "aarch64_float_const_zero_rtx_p (op)")))) +(define_predicate "aarch64_fp_pow2" + (and (match_code "const_double") + (match_test "aarch64_fpconst_pow_of_2 (op) > 0"))) + +(define_predicate "aarch64_fp_vec_pow2" + (match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0")) + (define_predicate "aarch64_plus_immediate" (and (match_code "const_int") (ior (match_test "aarch64_uimm12_shift (INTVAL (op))") diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c new file mode 100644 index 0000000..4e3ace7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_1.c @@ -0,0 +1,129 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -O2 -fno-inline" } */ + +#define FUNC_DEFS(__a) \ +int \ +sffoo##__a (float x) \ +{ \ + return x * __a##.0f; \ +} \ + \ +unsigned int \ +usffoo##__a (float x) \ +{ \ + return x * __a##.0f; \ +} \ + \ +long \ +lsffoo##__a (float x) \ +{ \ + return x * __a##.0f; \ +} \ + \ +unsigned long \ +ulsffoo##__a (float x) \ +{ \ + return x * __a##.0f; \ +} + +#define FUNC_DEFD(__a) \ +long \ +dffoo##__a (double x) \ +{ \ + return x * __a##.0; \ +} \ + \ +unsigned long \ +udffoo##__a (double x) \ +{ \ + return x * __a##.0; \ +} \ +int \ +sdffoo##__a (double x) \ +{ \ + return x * __a##.0; \ +} \ + \ +unsigned int \ +usdffoo##__a (double x) \ +{ \ + return x * __a##.0; \ +} + +FUNC_DEFS (4) +FUNC_DEFD (4) +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#2" 1 } } */ + +FUNC_DEFS (8) +FUNC_DEFD (8) +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#3" 1 } } */ + +FUNC_DEFS (16) +FUNC_DEFD (16) +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], s\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], s\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], s\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], s\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tx\[0-9\], d\[0-9\]*.*#4" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzu\tw\[0-9\], d\[0-9\]*.*#4" 1 } } */ + + +#define FUNC_TESTS(__a, __b) \ +do \ + { \ + if (sffoo##__a (__b) != (int)(__b * __a)) \ + __builtin_abort (); \ + if (usffoo##__a (__b) != (unsigned int)(__b * __a)) \ + __builtin_abort (); \ + if (lsffoo##__a (__b) != (long)(__b * __a)) \ + __builtin_abort (); \ + if (ulsffoo##__a (__b) != (unsigned long)(__b * __a)) \ + __builtin_abort (); \ + } while (0) + +#define FUNC_TESTD(__a, __b) \ +do \ + { \ + if (dffoo##__a (__b) != (long)(__b * __a)) \ + __builtin_abort (); \ + if (udffoo##__a (__b) != (unsigned long)(__b * __a)) \ + __builtin_abort (); \ + if (sdffoo##__a (__b) != (int)(__b * __a)) \ + __builtin_abort (); \ + if (usdffoo##__a (__b) != (unsigned int)(__b * __a)) \ + __builtin_abort (); \ + } while (0) + +int +main (void) +{ + float i; + + for (i = -0.001; i < 32.0; i += 1.0f) + { + FUNC_TESTS (4, i); + FUNC_TESTS (8, i); + FUNC_TESTS (16, i); + + FUNC_TESTD (4, i); + FUNC_TESTD (8, i); + FUNC_TESTD (16, i); + } + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c new file mode 100644 index 0000000..d8a9335 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fmul_fcvt_2.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -O2 -ftree-vectorize -fno-inline" } */ + +#define N 1024 + +#define FUNC_DEF(__a) \ +void \ +foo##__a (float *a, int *b) \ +{ \ + int i; \ + for (i = 0; i < N; i++) \ + b[i] = a[i] * __a##.0f; \ +} + +FUNC_DEF (4) +FUNC_DEF (8) +FUNC_DEF (16) + +int ints[N]; +float floats[N]; + +void +reset_ints (int *arr) +{ + int i; + + for (i = 0; i < N; i++) + arr[i] = 0; +} + +void +check_result (int *is, int n) +{ + int i; + + for (i = 0; i < N; i++) + if (is[i] != i * n) + __builtin_abort (); +} + +#define FUNC_CHECK(__a) \ +do \ + { \ + reset_ints (ints); \ + foo##__a (floats, ints); \ + check_result (ints, __a); \ + } while (0) + + +int +main (void) +{ + int i; + for (i = 0; i < N; i++) + floats[i] = (float) i; + + FUNC_CHECK (4); + FUNC_CHECK (8); + FUNC_CHECK (16); + + return 0; +} + +/* { dg-final { scan-assembler-not "fmul\tv\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#2" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#3" 1 } } */ +/* { dg-final { scan-assembler-times "fcvtzs\tv\[0-9\].4s, v\[0-9\].4s*.*#4" 1 } } */