[PATCH/AARCH64] Add scheduler for Thunderx2t99

Message ID CO2PR07MB2694BF7E5AAC0C4263D7EFC383660@CO2PR07MB2694.namprd07.prod.outlook.com
State New
Headers show

Commit Message

Hurugalawadi, Naveen Jan. 12, 2017, 3:43 a.m.
Hi James,

The scheduling patch for vulcan was posted at the following link:-
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01205.html

We are working on the patch and addressed the comments for thunderx2t99.

>> I tried lowering the repeat expressions as so:

Done.

>>split off the AdvSIMD/FP model from the main pipeline

Done.

>> A change like wiring the vulcan_f0 and vulcan_f1 reservations

>> to be cpu_units of a new define_automaton "vulcan_advsimd"

Done.

>> simplifying some of the remaining large expressions

>> (vulcan_asimd_load*_mult, vulcan_asimd_load*_elts) can bring the size down

Did not understand much about this comment.
Can you please let me know about the simplification?

Please find attached the modified patch as per your suggestions and comments.
Please review the patch and let us know if its okay?

Thanks,
Naveen

Comments

James Greenhalgh Jan. 13, 2017, 5:26 p.m. | #1
On Thu, Jan 12, 2017 at 03:43:52AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,

> 

> The scheduling patch for vulcan was posted at the following link:-

> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01205.html

> 

> We are working on the patch and addressed the comments for thunderx2t99.


Great, thanks.

> 

> >> I tried lowering the repeat expressions as so:

> Done.

> 

> >>split off the AdvSIMD/FP model from the main pipeline

> Done.

> 

> >> A change like wiring the vulcan_f0 and vulcan_f1 reservations

> >> to be cpu_units of a new define_automaton "vulcan_advsimd"

> Done.


Perfect, the automaton size is much more palatable now.

  Automaton `thunderx2t99'
      184 NDFA states,            838 NDFA arcs
      184 DFA states,             838 DFA arcs
      184 minimal DFA states,     838 minimal DFA arcs
      370 all insns          8 insn equivalence classes
    0 locked states
   1016 transition comb vector els,  1472 trans table els: use simple vect
   1472 min delay table els, compression factor 4

  Automaton `thunderx2t99_advsimd'
    12231 NDFA states,          85833 NDFA arcs
    12231 DFA states,           85833 DFA arcs
     9246 minimal DFA states,   66554 minimal DFA arcs
      370 all insns         11 insn equivalence classes
    0 locked states
  84074 transition comb vector els, 101706 trans table els: use simple vect
  101706 min delay table els, compression factor 2

  Automaton `thunderx2t99_ldst'
       49 NDFA states,            193 NDFA arcs
       49 DFA states,             193 DFA arcs
       16 minimal DFA states,      94 minimal DFA arcs
      370 all insns         13 insn equivalence classes
    0 locked states
     91 transition comb vector els,   208 trans table els: use simple vect
    208 min delay table els, compression factor 2

  Automaton `thunderx2t99_mult'
        2 NDFA states,              5 NDFA arcs
        2 DFA states,               5 DFA arcs
        2 minimal DFA states,       5 minimal DFA arcs
      370 all insns          3 insn equivalence classes
    0 locked states
    6 transition comb vector els,     6 trans table els: use simple vect
    6 min delay table els, compression factor 8

> 

> >> simplifying some of the remaining large expressions

> >> (vulcan_asimd_load*_mult, vulcan_asimd_load*_elts) can bring the size down

> Did not understand much about this comment.

> Can you please let me know about the simplification?


I wonder whether the current modeling of:

  (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6

as:

  "thunderx2t99_ls_both*2,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\
   (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\
   (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01")

Actually benefits the schedule in a meaningful way, or if it just increases
the size of the automaton. My guess is that given how many approximations
scheduling makes as to the actual working of the machine (e.g. always perfect
alignment, no cache misses, no other reason to restart loads) there is only
a very small benefit to accurately describing the flow of an instruction
through the pipelines in this way.

The size of the automaton is more reasonable now, so I won't insist on
changing it, but even with your changes the thunderx2t99 model is still
the largest automaton we're building.

> Please find attached the modified patch as per your suggestions and comments.

> Please review the patch and let us know if its okay?


I'd like for the size to come down again, or at least for you to comment on
whether the modelling you do for thunderx2t99_asimd_load*_mult and
thunderx2t99_asimd_load*_elts can be justified by an improved schedule.

> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def

> index a7a4b33..4d39673 100644

> --- a/gcc/config/aarch64/aarch64-cores.def

> +++ b/gcc/config/aarch64/aarch64-cores.def

> @@ -75,7 +75,7 @@ AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  AARCH64_FL_FOR_ARCH8, xge

>  

>  /* Broadcom ('B') cores. */

>  AARCH64_CORE("thunderx2t99",  thunderx2t99, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)


You'll want to update this to use your new scheduling model :-).

> -AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

> +AARCH64_CORE("vulcan",  vulcan, vulcan, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)


This line is incorrect, you should be changing vulcan to use the new
thunderx2t99 model. The tune attribute "vulcan" won't match your new model,
so with this change you'd get "generic" (i.e. no) scheduling for -mcpu=vulcan .

Otherwise, I don't have any concerns with this patch but it must be
retested as the new scheduling model can never be used with this patch
version.

Thanks,
James
Hurugalawadi, Naveen Feb. 2, 2017, 5:21 a.m. | #2
Hi James,

Thanks for reviewing the patch and comments.

>> I wonder whether the current modeling of:

>> (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6

>> Actually benefits the schedule in a meaningful way, or if it just increases


Done. Removed the scheduler modeling for thunderx2t99_asimd_load*_mult and
thunderx2t99_asimd_load*_elts for ld3/ld4 and st3/st4 which are rarely used.

The automaton size has come down drastically without that and hopefully
should be okay.
============================================================
Automaton `thunderx2t99'
      184 NDFA states,            838 NDFA arcs
      184 DFA states,             838 DFA arcs
      184 minimal DFA states,     838 minimal DFA arcs
      360 all insns          8 insn equivalence classes
    0 locked states
 1016 transition comb vector els,  1472 trans table els: use simple vect
 1472 min delay table els, compression factor 4

Automaton `thunderx2t99_advsimd'
      453 NDFA states,           1966 NDFA arcs
      453 DFA states,            1966 DFA arcs
      351 minimal DFA states,    1562 minimal DFA arcs
      360 all insns          7 insn equivalence classes
    0 locked states
 1901 transition comb vector els,  2457 trans table els: use simple vect
 2457 min delay table els, compression factor 2

Automaton `thunderx2t99_ldst'
       41 NDFA states,            163 NDFA arcs
       41 DFA states,             163 DFA arcs
       14 minimal DFA states,      78 minimal DFA arcs
      360 all insns          8 insn equivalence classes
    0 locked states
   83 transition comb vector els,   112 trans table els: use simple vect
  112 min delay table els, compression factor 4

Automaton `thunderx2t99_mult'
        2 NDFA states,              5 NDFA arcs
        2 DFA states,               5 DFA arcs
        2 minimal DFA states,       5 minimal DFA arcs
      360 all insns          3 insn equivalence classes
    0 locked states
    6 transition comb vector els,     6 trans table els: use simple vect
    6 min delay table els, compression factor 8
============================================================

>> You'll want to update this to use your new scheduling model :-).


Done. I had overlooked it :-).

>> you should be changing vulcan to use the new thunderx2t99 model. 


Done. Using the new thunderx2t99 model.

Please review the modified patch and let us know your comments on the same.

Thanks,
Naveen
James Greenhalgh Feb. 2, 2017, 3:58 p.m. | #3
On Thu, Feb 02, 2017 at 05:21:05AM +0000, Hurugalawadi, Naveen wrote:
> Hi James,

> 

> Thanks for reviewing the patch and comments.

> 

> >> I wonder whether the current modeling of:

> >> (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6

> >> Actually benefits the schedule in a meaningful way, or if it just increases

> 

> Done. Removed the scheduler modeling for thunderx2t99_asimd_load*_mult and

> thunderx2t99_asimd_load*_elts for ld3/ld4 and st3/st4 which are rarely used.

> 

> The automaton size has come down drastically without that and hopefully

> should be okay.

> ============================================================

> Automaton `thunderx2t99'

>       184 NDFA states,            838 NDFA arcs

>       184 DFA states,             838 DFA arcs

>       184 minimal DFA states,     838 minimal DFA arcs

>       360 all insns          8 insn equivalence classes

>     0 locked states

>  1016 transition comb vector els,  1472 trans table els: use simple vect

>  1472 min delay table els, compression factor 4

> 

> Automaton `thunderx2t99_advsimd'

>       453 NDFA states,           1966 NDFA arcs

>       453 DFA states,            1966 DFA arcs

>       351 minimal DFA states,    1562 minimal DFA arcs

>       360 all insns          7 insn equivalence classes

>     0 locked states

>  1901 transition comb vector els,  2457 trans table els: use simple vect

>  2457 min delay table els, compression factor 2

> 

> Automaton `thunderx2t99_ldst'

>        41 NDFA states,            163 NDFA arcs

>        41 DFA states,             163 DFA arcs

>        14 minimal DFA states,      78 minimal DFA arcs

>       360 all insns          8 insn equivalence classes

>     0 locked states

>    83 transition comb vector els,   112 trans table els: use simple vect

>   112 min delay table els, compression factor 4

> 

> Automaton `thunderx2t99_mult'

>         2 NDFA states,              5 NDFA arcs

>         2 DFA states,               5 DFA arcs

>         2 minimal DFA states,       5 minimal DFA arcs

>       360 all insns          3 insn equivalence classes

>     0 locked states

>     6 transition comb vector els,     6 trans table els: use simple vect

>     6 min delay table els, compression factor 8

> ============================================================

> 

> >> You'll want to update this to use your new scheduling model :-).

> 

> Done. I had overlooked it :-).

> 

> >> you should be changing vulcan to use the new thunderx2t99 model. 

> 

> Done. Using the new thunderx2t99 model.


That looks much better.

I'm assuming you've tested this as appropriate for the subtargets you're
modifying and are comfortable with the level of risk taking the patch at
this stage. As it only changes behaviour for the thunderx2t99 and vulcan
targets, I'd be happy to take the patch now. Though please give
Richard/Marcus 24 hours to object.

OK if no objections from others in the next 24 hours.

Thanks,
James

> Please review the modified patch and let us know your comments on the same.

> 

> Thanks,

> Naveen


> diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def

> index a7a4b33..1b958e3 100644

> --- a/gcc/config/aarch64/aarch64-cores.def

> +++ b/gcc/config/aarch64/aarch64-cores.def

> @@ -74,8 +74,8 @@ AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  AARCH64_FL_FOR_ARCH8, xge

>  /* V8.1 Architecture Processors.  */

>  

>  /* Broadcom ('B') cores. */

> -AARCH64_CORE("thunderx2t99",  thunderx2t99, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

> -AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

> +AARCH64_CORE("thunderx2t99",  thunderx2t99, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

> +AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)

>  

>  /* V8 big.LITTLE implementations.  */

>  

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

> index a693a3b..7550c3e 100644

> --- a/gcc/config/aarch64/aarch64.md

> +++ b/gcc/config/aarch64/aarch64.md

> @@ -225,6 +225,7 @@

>  (include "../arm/exynos-m1.md")

>  (include "thunderx.md")

>  (include "../arm/xgene1.md")

> +(include "thunderx2t99.md")

>  

>  ;; -------------------------------------------------------------------

>  ;; Jumps and other miscellaneous insns

> diff --git a/gcc/config/aarch64/thunderx2t99.md b/gcc/config/aarch64/thunderx2t99.md

> new file mode 100644

> index 0000000..0dd7199

> --- /dev/null

> +++ b/gcc/config/aarch64/thunderx2t99.md

> @@ -0,0 +1,443 @@

> +;; Cavium ThunderX 2 CN99xx pipeline description

> +;; Copyright (C) 2016-2017 Free Software Foundation, Inc.

> +;;

> +;; Contributed by Cavium, Broadcom and Mentor Embedded.

> +

> +;; This file is part of GCC.

> +

> +;; GCC is free software; you can redistribute it and/or modify

> +;; it under the terms of the GNU General Public License as published by

> +;; the Free Software Foundation; either version 3, or (at your option)

> +;; any later version.

> +

> +;; GCC is distributed in the hope that it will be useful,

> +;; but WITHOUT ANY WARRANTY; without even the implied warranty of

> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +;; GNU General Public License for more details.

> +

> +;; You should have received a copy of the GNU General Public License

> +;; along with GCC; see the file COPYING3.  If not see

> +;; <http://www.gnu.org/licenses/>.

> +

> +(define_automaton "thunderx2t99, thunderx2t99_advsimd, thunderx2t99_ldst")

> +(define_automaton "thunderx2t99_mult")

> +

> +(define_cpu_unit "thunderx2t99_i0" "thunderx2t99")

> +(define_cpu_unit "thunderx2t99_i1" "thunderx2t99")

> +(define_cpu_unit "thunderx2t99_i2" "thunderx2t99")

> +

> +(define_cpu_unit "thunderx2t99_ls0" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_ls1" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_sd" "thunderx2t99_ldst")

> +

> +; Pseudo-units for multiply pipeline.

> +

> +(define_cpu_unit "thunderx2t99_i1m1" "thunderx2t99_mult")

> +(define_cpu_unit "thunderx2t99_i1m2" "thunderx2t99_mult")

> +(define_cpu_unit "thunderx2t99_i1m3" "thunderx2t99_mult")

> +

> +; Pseudo-units for load delay (assuming dcache hit).

> +

> +(define_cpu_unit "thunderx2t99_ls0d1" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_ls0d2" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_ls0d3" "thunderx2t99_ldst")

> +

> +(define_cpu_unit "thunderx2t99_ls1d1" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_ls1d2" "thunderx2t99_ldst")

> +(define_cpu_unit "thunderx2t99_ls1d3" "thunderx2t99_ldst")

> +

> +; Make some aliases for f0/f1.

> +(define_cpu_unit "thunderx2t99_f0" "thunderx2t99_advsimd")

> +(define_cpu_unit "thunderx2t99_f1" "thunderx2t99_advsimd")

> +

> +(define_reservation "thunderx2t99_i012" "thunderx2t99_i0|thunderx2t99_i1|thunderx2t99_i2")

> +(define_reservation "thunderx2t99_ls01" "thunderx2t99_ls0|thunderx2t99_ls1")

> +(define_reservation "thunderx2t99_f01" "thunderx2t99_f0|thunderx2t99_f1")

> +

> +(define_reservation "thunderx2t99_ls_both" "thunderx2t99_ls0+thunderx2t99_ls1")

> +

> +; A load with delay in the ls0/ls1 pipes.

> +(define_reservation "thunderx2t99_l0delay" "thunderx2t99_ls0,\

> +				      thunderx2t99_ls0d1,thunderx2t99_ls0d2,\

> +				      thunderx2t99_ls0d3")

> +(define_reservation "thunderx2t99_l1delay" "thunderx2t99_ls1,\

> +				      thunderx2t99_ls1d1,thunderx2t99_ls1d2,\

> +				      thunderx2t99_ls1d3")

> +(define_reservation "thunderx2t99_l01delay" "thunderx2t99_l0delay|thunderx2t99_l1delay")

> +

> +;; Branch and call instructions.

> +

> +(define_insn_reservation "thunderx2t99_branch" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "call,branch"))

> +  "thunderx2t99_i2")

> +

> +;; Integer arithmetic/logic instructions.

> +

> +; Plain register moves are handled by renaming, and don't create any uops.

> +

> +(define_insn_reservation "thunderx2t99_regmove" 0

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "mov_reg"))

> +  "nothing")

> +

> +(define_insn_reservation "thunderx2t99_alu_basic" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "alu_imm,alu_sreg,alus_imm,alus_sreg,\

> +			adc_reg,adc_imm,adcs_reg,adcs_imm,\

> +			logic_reg,logic_imm,logics_reg,logics_imm,\

> +			csel,adr,mov_imm,shift_reg,shift_imm,bfm,\

> +			rbit,rev,extend,rotate_imm"))

> +  "thunderx2t99_i012")

> +

> +(define_insn_reservation "thunderx2t99_alu_shift" 2

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "alu_shift_imm,alu_ext,alu_shift_reg,\

> +			alus_shift_imm,alus_ext,alus_shift_reg,\

> +			logic_shift_imm,logics_shift_reg"))

> +  "thunderx2t99_i012,thunderx2t99_i012")

> +

> +(define_insn_reservation "thunderx2t99_div" 13

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "sdiv,udiv"))

> +  "thunderx2t99_i1*3")

> +

> +(define_insn_reservation "thunderx2t99_madd" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "mla,smlal,umlal"))

> +  "thunderx2t99_i1,thunderx2t99_i1m1,thunderx2t99_i1m2,thunderx2t99_i1m3,\

> +   thunderx2t99_i012")

> +

> +; NOTE: smull, umull are used for "high part" multiplies too.

> +(define_insn_reservation "thunderx2t99_mul" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "mul,smull,umull"))

> +  "thunderx2t99_i1,thunderx2t99_i1m1,thunderx2t99_i1m2,thunderx2t99_i1m3")

> +

> +(define_insn_reservation "thunderx2t99_countbits" 3

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "clz"))

> +  "thunderx2t99_i1")

> +

> +;; Integer loads and stores.

> +

> +(define_insn_reservation "thunderx2t99_load_basic" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "load1"))

> +  "thunderx2t99_ls01")

> +

> +(define_insn_reservation "thunderx2t99_loadpair" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "load2"))

> +  "thunderx2t99_i012,thunderx2t99_ls01")

> +

> +(define_insn_reservation "thunderx2t99_store_basic" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "store1"))

> +  "thunderx2t99_ls01,thunderx2t99_sd")

> +

> +(define_insn_reservation "thunderx2t99_storepair_basic" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "store2"))

> +  "thunderx2t99_ls01,thunderx2t99_sd")

> +

> +;; FP data processing instructions.

> +

> +(define_insn_reservation "thunderx2t99_fp_simple" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "ffariths,ffarithd,f_minmaxs,f_minmaxd"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fp_addsub" 6

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fadds,faddd"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fp_cmp" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fcmps,fcmpd"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fp_divsqrt_s" 16

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fdivs,fsqrts"))

> +  "thunderx2t99_f0*3|thunderx2t99_f1*3")

> +

> +(define_insn_reservation "thunderx2t99_fp_divsqrt_d" 23

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fdivd,fsqrtd"))

> +  "thunderx2t99_f0*5|thunderx2t99_f1*5")

> +

> +(define_insn_reservation "thunderx2t99_fp_mul_mac" 6

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fmuls,fmuld,fmacs,fmacd"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_frint" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "f_rints,f_rintd"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fcsel" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fcsel"))

> +  "thunderx2t99_f01")

> +

> +;; FP miscellaneous instructions.

> +

> +(define_insn_reservation "thunderx2t99_fp_cvt" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "f_cvtf2i,f_cvt,f_cvti2f"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fp_mov" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "fconsts,fconstd,fmov,f_mrc"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_fp_mov_to_gen" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "f_mcr"))

> +  "thunderx2t99_f01")

> +

> +;; FP loads and stores.

> +

> +(define_insn_reservation "thunderx2t99_fp_load_basic" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "f_loads,f_loadd"))

> +  "thunderx2t99_ls01")

> +

> +(define_insn_reservation "thunderx2t99_fp_loadpair_basic" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load1_2reg"))

> +  "thunderx2t99_ls01*2")

> +

> +(define_insn_reservation "thunderx2t99_fp_store_basic" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "f_stores,f_stored"))

> +  "thunderx2t99_ls01,thunderx2t99_sd")

> +

> +(define_insn_reservation "thunderx2t99_fp_storepair_basic" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store1_2reg"))

> +  "thunderx2t99_ls01,(thunderx2t99_ls01+thunderx2t99_sd),thunderx2t99_sd")

> +

> +;; ASIMD integer instructions.

> +

> +(define_insn_reservation "thunderx2t99_asimd_int" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_abd,neon_abd_q,\

> +			neon_arith_acc,neon_arith_acc_q,\

> +			neon_abs,neon_abs_q,\

> +			neon_add,neon_add_q,\

> +			neon_neg,neon_neg_q,\

> +			neon_add_long,neon_add_widen,\

> +			neon_add_halve,neon_add_halve_q,\

> +			neon_sub_long,neon_sub_widen,\

> +			neon_sub_halve,neon_sub_halve_q,\

> +			neon_add_halve_narrow_q,neon_sub_halve_narrow_q,\

> +			neon_qabs,neon_qabs_q,\

> +			neon_qadd,neon_qadd_q,\

> +			neon_qneg,neon_qneg_q,\

> +			neon_qsub,neon_qsub_q,\

> +			neon_minmax,neon_minmax_q,\

> +			neon_reduc_minmax,neon_reduc_minmax_q,\

> +			neon_mul_b,neon_mul_h,neon_mul_s,\

> +			neon_mul_b_q,neon_mul_h_q,neon_mul_s_q,\

> +			neon_sat_mul_b,neon_sat_mul_h,neon_sat_mul_s,\

> +			neon_sat_mul_b_q,neon_sat_mul_h_q,neon_sat_mul_s_q,\

> +			neon_mla_b,neon_mla_h,neon_mla_s,\

> +			neon_mla_b_q,neon_mla_h_q,neon_mla_s_q,\

> +			neon_mul_b_long,neon_mul_h_long,\

> +			neon_mul_s_long,neon_mul_d_long,\

> +			neon_sat_mul_b_long,neon_sat_mul_h_long,\

> +			neon_sat_mul_s_long,\

> +			neon_mla_b_long,neon_mla_h_long,neon_mla_s_long,\

> +			neon_sat_mla_b_long,neon_sat_mla_h_long,\

> +			neon_sat_mla_s_long,\

> +			neon_shift_acc,neon_shift_acc_q,\

> +			neon_shift_imm,neon_shift_imm_q,\

> +			neon_shift_reg,neon_shift_reg_q,\

> +			neon_shift_imm_long,neon_shift_imm_narrow_q,\

> +			neon_sat_shift_imm,neon_sat_shift_imm_q,\

> +			neon_sat_shift_reg,neon_sat_shift_reg_q,\

> +			neon_sat_shift_imm_narrow_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_reduc_add" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_reduc_add,neon_reduc_add_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_cmp" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_compare,neon_compare_q,neon_compare_zero,\

> +			neon_tst,neon_tst_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_logic" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_logic,neon_logic_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_polynomial" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_mul_d_long"))

> +  "thunderx2t99_f01")

> +

> +;; ASIMD floating-point instructions.

> +

> +(define_insn_reservation "thunderx2t99_asimd_fp_simple" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_abs_s,neon_fp_abs_d,\

> +			neon_fp_abs_s_q,neon_fp_abs_d_q,\

> +			neon_fp_compare_s,neon_fp_compare_d,\

> +			neon_fp_compare_s_q,neon_fp_compare_d_q,\

> +			neon_fp_minmax_s,neon_fp_minmax_d,\

> +			neon_fp_minmax_s_q,neon_fp_minmax_d_q,\

> +			neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d,\

> +			neon_fp_reduc_minmax_s_q,neon_fp_reduc_minmax_d_q,\

> +			neon_fp_neg_s,neon_fp_neg_d,\

> +			neon_fp_neg_s_q,neon_fp_neg_d_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_fp_arith" 6

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_abd_s,neon_fp_abd_d,\

> +			neon_fp_abd_s_q,neon_fp_abd_d_q,\

> +			neon_fp_addsub_s,neon_fp_addsub_d,\

> +			neon_fp_addsub_s_q,neon_fp_addsub_d_q,\

> +			neon_fp_reduc_add_s,neon_fp_reduc_add_d,\

> +			neon_fp_reduc_add_s_q,neon_fp_reduc_add_d_q,\

> +			neon_fp_mul_s,neon_fp_mul_d,\

> +			neon_fp_mul_s_q,neon_fp_mul_d_q,\

> +			neon_fp_mla_s,neon_fp_mla_d,\

> +			neon_fp_mla_s_q,neon_fp_mla_d_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_fp_conv" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_cvt_widen_s,neon_fp_cvt_narrow_d_q,\

> +			neon_fp_to_int_s,neon_fp_to_int_d,\

> +			neon_fp_to_int_s_q,neon_fp_to_int_d_q,\

> +			neon_fp_round_s,neon_fp_round_d,\

> +			neon_fp_round_s_q,neon_fp_round_d_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_fp_div_s" 16

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_div_s,neon_fp_div_s_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_fp_div_d" 23

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_div_d,neon_fp_div_d_q"))

> +  "thunderx2t99_f01")

> +

> +;; ASIMD miscellaneous instructions.

> +

> +(define_insn_reservation "thunderx2t99_asimd_misc" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_rbit,\

> +			neon_bsl,neon_bsl_q,\

> +			neon_cls,neon_cls_q,\

> +			neon_cnt,neon_cnt_q,\

> +			neon_from_gp,neon_from_gp_q,\

> +			neon_dup,neon_dup_q,\

> +			neon_ext,neon_ext_q,\

> +			neon_ins,neon_ins_q,\

> +			neon_move,neon_move_q,\

> +			neon_fp_recpe_s,neon_fp_recpe_d,\

> +			neon_fp_recpe_s_q,neon_fp_recpe_d_q,\

> +			neon_fp_recpx_s,neon_fp_recpx_d,\

> +			neon_fp_recpx_s_q,neon_fp_recpx_d_q,\

> +			neon_rev,neon_rev_q,\

> +			neon_dup,neon_dup_q,\

> +			neon_permute,neon_permute_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_recip_step" 6

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_fp_recps_s,neon_fp_recps_s_q,\

> +			neon_fp_recps_d,neon_fp_recps_d_q,\

> +			neon_fp_rsqrts_s, neon_fp_rsqrts_s_q,\

> +			neon_fp_rsqrts_d, neon_fp_rsqrts_d_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_lut" 8

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_tbl1,neon_tbl1_q,neon_tbl2_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_elt_to_gr" 6

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_to_gp,neon_to_gp_q"))

> +  "thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_ext" 7

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_shift_imm_narrow_q,neon_sat_shift_imm_narrow_q"))

> +  "thunderx2t99_f01")

> +

> +;; ASIMD load instructions.

> +

> +; NOTE: These reservations attempt to model latency and throughput correctly,

> +; but the cycle timing of unit allocation is not necessarily accurate (because

> +; insns are split into uops, and those may be issued out-of-order).

> +

> +(define_insn_reservation "thunderx2t99_asimd_load1_1_mult" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load1_1reg,neon_load1_1reg_q"))

> +  "thunderx2t99_ls01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_load1_2_mult" 4

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q"))

> +  "thunderx2t99_ls_both")

> +

> +(define_insn_reservation "thunderx2t99_asimd_load1_onelane" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load1_one_lane,neon_load1_one_lane_q"))

> +  "thunderx2t99_l01delay,thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_load1_all" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load1_all_lanes,neon_load1_all_lanes_q"))

> +  "thunderx2t99_l01delay,thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_load2" 5

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_load2_2reg,neon_load2_2reg_q,\

> +			neon_load2_one_lane,neon_load2_one_lane_q,\

> +			neon_load2_all_lanes,neon_load2_all_lanes_q"))

> +  "(thunderx2t99_l0delay,thunderx2t99_f01)|(thunderx2t99_l1delay,\

> +    thunderx2t99_f01)")

> +

> +;; ASIMD store instructions.

> +

> +; Same note applies as for ASIMD load instructions.

> +

> +(define_insn_reservation "thunderx2t99_asimd_store1_1_mult" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store1_1reg,neon_store1_1reg_q"))

> +  "thunderx2t99_ls01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_store1_2_mult" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store1_2reg,neon_store1_2reg_q"))

> +  "thunderx2t99_ls_both")

> +

> +(define_insn_reservation "thunderx2t99_asimd_store1_onelane" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store1_one_lane,neon_store1_one_lane_q"))

> +  "thunderx2t99_ls01,thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_store2_mult" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store2_2reg,neon_store2_2reg_q"))

> +  "thunderx2t99_ls_both,thunderx2t99_f01")

> +

> +(define_insn_reservation "thunderx2t99_asimd_store2_onelane" 1

> +  (and (eq_attr "tune" "thunderx2t99")

> +       (eq_attr "type" "neon_store2_one_lane,neon_store2_one_lane_q"))

> +  "thunderx2t99_ls01,thunderx2t99_f01")

Patch hide | download patch | download mbox

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index a7a4b33..4d39673 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -75,7 +75,7 @@  AARCH64_CORE("xgene1",      xgene1,    xgene1,    8A,  AARCH64_FL_FOR_ARCH8, xge
 
 /* Broadcom ('B') cores. */
 AARCH64_CORE("thunderx2t99",  thunderx2t99, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
-AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
+AARCH64_CORE("vulcan",  vulcan, vulcan, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index bde4231..063559c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -220,6 +220,7 @@ 
 (include "../arm/exynos-m1.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
+(include "thunderx2t99.md")
 
 ;; -------------------------------------------------------------------
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/thunderx2t99.md b/gcc/config/aarch64/thunderx2t99.md
new file mode 100644
index 0000000..00d40f8
--- /dev/null
+++ b/gcc/config/aarch64/thunderx2t99.md
@@ -0,0 +1,513 @@ 
+;; Cavium ThunderX 2 CN99xx pipeline description
+;; Copyright (C) 2016-2017 Free Software Foundation, Inc.
+;;
+;; Contributed by Cavium, Broadcom and Mentor Embedded.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_automaton "thunderx2t99, thunderx2t99_advsimd, thunderx2t99_ldst")
+(define_automaton "thunderx2t99_mult")
+
+(define_cpu_unit "thunderx2t99_i0" "thunderx2t99")
+(define_cpu_unit "thunderx2t99_i1" "thunderx2t99")
+(define_cpu_unit "thunderx2t99_i2" "thunderx2t99")
+
+(define_cpu_unit "thunderx2t99_ls0" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_ls1" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_sd" "thunderx2t99_ldst")
+
+; Pseudo-units for multiply pipeline.
+
+(define_cpu_unit "thunderx2t99_i1m1" "thunderx2t99_mult")
+(define_cpu_unit "thunderx2t99_i1m2" "thunderx2t99_mult")
+(define_cpu_unit "thunderx2t99_i1m3" "thunderx2t99_mult")
+
+; Pseudo-units for load delay (assuming dcache hit).
+
+(define_cpu_unit "thunderx2t99_ls0d1" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_ls0d2" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_ls0d3" "thunderx2t99_ldst")
+
+(define_cpu_unit "thunderx2t99_ls1d1" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_ls1d2" "thunderx2t99_ldst")
+(define_cpu_unit "thunderx2t99_ls1d3" "thunderx2t99_ldst")
+
+; Make some aliases for f0/f1.
+(define_cpu_unit "thunderx2t99_f0" "thunderx2t99_advsimd")
+(define_cpu_unit "thunderx2t99_f1" "thunderx2t99_advsimd")
+
+(define_reservation "thunderx2t99_i012" "thunderx2t99_i0|thunderx2t99_i1|thunderx2t99_i2")
+(define_reservation "thunderx2t99_ls01" "thunderx2t99_ls0|thunderx2t99_ls1")
+(define_reservation "thunderx2t99_f01" "thunderx2t99_f0|thunderx2t99_f1")
+
+(define_reservation "thunderx2t99_ls_both" "thunderx2t99_ls0+thunderx2t99_ls1")
+
+; A load with delay in the ls0/ls1 pipes.
+(define_reservation "thunderx2t99_l0delay" "thunderx2t99_ls0,\
+				      thunderx2t99_ls0d1,thunderx2t99_ls0d2,\
+				      thunderx2t99_ls0d3")
+(define_reservation "thunderx2t99_l1delay" "thunderx2t99_ls1,\
+				      thunderx2t99_ls1d1,thunderx2t99_ls1d2,\
+				      thunderx2t99_ls1d3")
+(define_reservation "thunderx2t99_l01delay" "thunderx2t99_l0delay|thunderx2t99_l1delay")
+
+;; Branch and call instructions.
+
+(define_insn_reservation "thunderx2t99_branch" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "call,branch"))
+  "thunderx2t99_i2")
+
+;; Integer arithmetic/logic instructions.
+
+; Plain register moves are handled by renaming, and don't create any uops.
+
+(define_insn_reservation "thunderx2t99_regmove" 0
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "mov_reg"))
+  "nothing")
+
+(define_insn_reservation "thunderx2t99_alu_basic" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "alu_imm,alu_sreg,alus_imm,alus_sreg,\
+			adc_reg,adc_imm,adcs_reg,adcs_imm,\
+			logic_reg,logic_imm,logics_reg,logics_imm,\
+			csel,adr,mov_imm,shift_reg,shift_imm,bfm,\
+			rbit,rev,extend,rotate_imm"))
+  "thunderx2t99_i012")
+
+(define_insn_reservation "thunderx2t99_alu_shift" 2
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "alu_shift_imm,alu_ext,alu_shift_reg,\
+			alus_shift_imm,alus_ext,alus_shift_reg,\
+			logic_shift_imm,logics_shift_reg"))
+  "thunderx2t99_i012,thunderx2t99_i012")
+
+(define_insn_reservation "thunderx2t99_div" 13
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "sdiv,udiv"))
+  "thunderx2t99_i1*3")
+
+(define_insn_reservation "thunderx2t99_madd" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "mla,smlal,umlal"))
+  "thunderx2t99_i1,thunderx2t99_i1m1,thunderx2t99_i1m2,thunderx2t99_i1m3,\
+   thunderx2t99_i012")
+
+; NOTE: smull, umull are used for "high part" multiplies too.
+(define_insn_reservation "thunderx2t99_mul" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "mul,smull,umull"))
+  "thunderx2t99_i1,thunderx2t99_i1m1,thunderx2t99_i1m2,thunderx2t99_i1m3")
+
+(define_insn_reservation "thunderx2t99_countbits" 3
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "clz"))
+  "thunderx2t99_i1")
+
+;; Integer loads and stores.
+
+(define_insn_reservation "thunderx2t99_load_basic" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "load1"))
+  "thunderx2t99_ls01")
+
+(define_insn_reservation "thunderx2t99_loadpair" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "load2"))
+  "thunderx2t99_i012,thunderx2t99_ls01")
+
+(define_insn_reservation "thunderx2t99_store_basic" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "store1"))
+  "thunderx2t99_ls01,thunderx2t99_sd")
+
+(define_insn_reservation "thunderx2t99_storepair_basic" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "store2"))
+  "thunderx2t99_ls01,thunderx2t99_sd")
+
+;; FP data processing instructions.
+
+(define_insn_reservation "thunderx2t99_fp_simple" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "ffariths,ffarithd,f_minmaxs,f_minmaxd"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fp_addsub" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fadds,faddd"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fp_cmp" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fcmps,fcmpd"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fp_divsqrt_s" 16
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fdivs,fsqrts"))
+  "thunderx2t99_f0*3|thunderx2t99_f1*3")
+
+(define_insn_reservation "thunderx2t99_fp_divsqrt_d" 23
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fdivd,fsqrtd"))
+  "thunderx2t99_f0*5|thunderx2t99_f1*5")
+
+(define_insn_reservation "thunderx2t99_fp_mul_mac" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fmuls,fmuld,fmacs,fmacd"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_frint" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "f_rints,f_rintd"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fcsel" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fcsel"))
+  "thunderx2t99_f01")
+
+;; FP miscellaneous instructions.
+
+(define_insn_reservation "thunderx2t99_fp_cvt" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "f_cvtf2i,f_cvt,f_cvti2f"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fp_mov" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "fconsts,fconstd,fmov,f_mrc"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_fp_mov_to_gen" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "f_mcr"))
+  "thunderx2t99_f01")
+
+;; FP loads and stores.
+
+(define_insn_reservation "thunderx2t99_fp_load_basic" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "f_loads,f_loadd"))
+  "thunderx2t99_ls01")
+
+(define_insn_reservation "thunderx2t99_fp_loadpair_basic" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_2reg"))
+  "thunderx2t99_ls01*2")
+
+(define_insn_reservation "thunderx2t99_fp_store_basic" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "f_stores,f_stored"))
+  "thunderx2t99_ls01,thunderx2t99_sd")
+
+(define_insn_reservation "thunderx2t99_fp_storepair_basic" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_2reg"))
+  "thunderx2t99_ls01,(thunderx2t99_ls01+thunderx2t99_sd),thunderx2t99_sd")
+
+;; ASIMD integer instructions.
+
+(define_insn_reservation "thunderx2t99_asimd_int" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_abd,neon_abd_q,\
+			neon_arith_acc,neon_arith_acc_q,\
+			neon_abs,neon_abs_q,\
+			neon_add,neon_add_q,\
+			neon_neg,neon_neg_q,\
+			neon_add_long,neon_add_widen,\
+			neon_add_halve,neon_add_halve_q,\
+			neon_sub_long,neon_sub_widen,\
+			neon_sub_halve,neon_sub_halve_q,\
+			neon_add_halve_narrow_q,neon_sub_halve_narrow_q,\
+			neon_qabs,neon_qabs_q,\
+			neon_qadd,neon_qadd_q,\
+			neon_qneg,neon_qneg_q,\
+			neon_qsub,neon_qsub_q,\
+			neon_minmax,neon_minmax_q,\
+			neon_reduc_minmax,neon_reduc_minmax_q,\
+			neon_mul_b,neon_mul_h,neon_mul_s,\
+			neon_mul_b_q,neon_mul_h_q,neon_mul_s_q,\
+			neon_sat_mul_b,neon_sat_mul_h,neon_sat_mul_s,\
+			neon_sat_mul_b_q,neon_sat_mul_h_q,neon_sat_mul_s_q,\
+			neon_mla_b,neon_mla_h,neon_mla_s,\
+			neon_mla_b_q,neon_mla_h_q,neon_mla_s_q,\
+			neon_mul_b_long,neon_mul_h_long,\
+			neon_mul_s_long,neon_mul_d_long,\
+			neon_sat_mul_b_long,neon_sat_mul_h_long,\
+			neon_sat_mul_s_long,\
+			neon_mla_b_long,neon_mla_h_long,neon_mla_s_long,\
+			neon_sat_mla_b_long,neon_sat_mla_h_long,\
+			neon_sat_mla_s_long,\
+			neon_shift_acc,neon_shift_acc_q,\
+			neon_shift_imm,neon_shift_imm_q,\
+			neon_shift_reg,neon_shift_reg_q,\
+			neon_shift_imm_long,neon_shift_imm_narrow_q,\
+			neon_sat_shift_imm,neon_sat_shift_imm_q,\
+			neon_sat_shift_reg,neon_sat_shift_reg_q,\
+			neon_sat_shift_imm_narrow_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_reduc_add" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_reduc_add,neon_reduc_add_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_cmp" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_compare,neon_compare_q,neon_compare_zero,\
+			neon_tst,neon_tst_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_logic" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_logic,neon_logic_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_polynomial" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_mul_d_long"))
+  "thunderx2t99_f01")
+
+;; ASIMD floating-point instructions.
+
+(define_insn_reservation "thunderx2t99_asimd_fp_simple" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_abs_s,neon_fp_abs_d,\
+			neon_fp_abs_s_q,neon_fp_abs_d_q,\
+			neon_fp_compare_s,neon_fp_compare_d,\
+			neon_fp_compare_s_q,neon_fp_compare_d_q,\
+			neon_fp_minmax_s,neon_fp_minmax_d,\
+			neon_fp_minmax_s_q,neon_fp_minmax_d_q,\
+			neon_fp_reduc_minmax_s,neon_fp_reduc_minmax_d,\
+			neon_fp_reduc_minmax_s_q,neon_fp_reduc_minmax_d_q,\
+			neon_fp_neg_s,neon_fp_neg_d,\
+			neon_fp_neg_s_q,neon_fp_neg_d_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_fp_arith" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_abd_s,neon_fp_abd_d,\
+			neon_fp_abd_s_q,neon_fp_abd_d_q,\
+			neon_fp_addsub_s,neon_fp_addsub_d,\
+			neon_fp_addsub_s_q,neon_fp_addsub_d_q,\
+			neon_fp_reduc_add_s,neon_fp_reduc_add_d,\
+			neon_fp_reduc_add_s_q,neon_fp_reduc_add_d_q,\
+			neon_fp_mul_s,neon_fp_mul_d,\
+			neon_fp_mul_s_q,neon_fp_mul_d_q,\
+			neon_fp_mla_s,neon_fp_mla_d,\
+			neon_fp_mla_s_q,neon_fp_mla_d_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_fp_conv" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_cvt_widen_s,neon_fp_cvt_narrow_d_q,\
+			neon_fp_to_int_s,neon_fp_to_int_d,\
+			neon_fp_to_int_s_q,neon_fp_to_int_d_q,\
+			neon_fp_round_s,neon_fp_round_d,\
+			neon_fp_round_s_q,neon_fp_round_d_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_fp_div_s" 16
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_div_s,neon_fp_div_s_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_fp_div_d" 23
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_div_d,neon_fp_div_d_q"))
+  "thunderx2t99_f01")
+
+;; ASIMD miscellaneous instructions.
+
+(define_insn_reservation "thunderx2t99_asimd_misc" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_rbit,\
+			neon_bsl,neon_bsl_q,\
+			neon_cls,neon_cls_q,\
+			neon_cnt,neon_cnt_q,\
+			neon_from_gp,neon_from_gp_q,\
+			neon_dup,neon_dup_q,\
+			neon_ext,neon_ext_q,\
+			neon_ins,neon_ins_q,\
+			neon_move,neon_move_q,\
+			neon_fp_recpe_s,neon_fp_recpe_d,\
+			neon_fp_recpe_s_q,neon_fp_recpe_d_q,\
+			neon_fp_recpx_s,neon_fp_recpx_d,\
+			neon_fp_recpx_s_q,neon_fp_recpx_d_q,\
+			neon_rev,neon_rev_q,\
+			neon_dup,neon_dup_q,\
+			neon_permute,neon_permute_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_recip_step" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_fp_recps_s,neon_fp_recps_s_q,\
+			neon_fp_recps_d,neon_fp_recps_d_q,\
+			neon_fp_rsqrts_s, neon_fp_rsqrts_s_q,\
+			neon_fp_rsqrts_d, neon_fp_rsqrts_d_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_lut" 8
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_tbl1,neon_tbl1_q,neon_tbl2_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_elt_to_gr" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_to_gp,neon_to_gp_q"))
+  "thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_ext" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_shift_imm_narrow_q,neon_sat_shift_imm_narrow_q"))
+  "thunderx2t99_f01")
+
+;; ASIMD load instructions.
+
+; NOTE: These reservations attempt to model latency and throughput correctly,
+; but the cycle timing of unit allocation is not necessarily accurate (because
+; insns are split into uops, and those may be issued out-of-order).
+
+(define_insn_reservation "thunderx2t99_asimd_load1_1_mult" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_1reg,neon_load1_1reg_q"))
+  "thunderx2t99_ls01")
+
+(define_insn_reservation "thunderx2t99_asimd_load1_2_mult" 4
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_2reg,neon_load1_2reg_q"))
+  "thunderx2t99_ls_both")
+
+(define_insn_reservation "thunderx2t99_asimd_load1_3_mult" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_3reg,neon_load1_3reg_q"))
+  "(thunderx2t99_ls_both,thunderx2t99_ls01)|(thunderx2t99_ls01,\
+    thunderx2t99_ls_both)")
+
+(define_insn_reservation "thunderx2t99_asimd_load1_4_mult" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_4reg,neon_load1_4reg_q"))
+  "thunderx2t99_ls_both*2")
+
+(define_insn_reservation "thunderx2t99_asimd_load1_onelane" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_one_lane,neon_load1_one_lane_q"))
+  "thunderx2t99_l01delay,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_load1_all" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load1_all_lanes,neon_load1_all_lanes_q"))
+  "thunderx2t99_l01delay,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_load2" 5
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load2_2reg,neon_load2_2reg_q,\
+			neon_load2_one_lane,neon_load2_one_lane_q,\
+			neon_load2_all_lanes,neon_load2_all_lanes_q"))
+  "(thunderx2t99_l0delay,thunderx2t99_f01)|(thunderx2t99_l1delay,\
+    thunderx2t99_f01)")
+
+(define_insn_reservation "thunderx2t99_asimd_load3_mult" 8
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load3_3reg,neon_load3_3reg_q"))
+  "thunderx2t99_ls_both*3,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\
+   (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\
+   (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_load3_elts" 7
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load3_one_lane,neon_load3_one_lane_q,\
+			neon_load3_all_lanes,neon_load3_all_lanes_q"))
+  "thunderx2t99_ls_both,thunderx2t99_l01delay,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_load4_mult" 8
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load4_4reg,neon_load4_4reg_q"))
+  "thunderx2t99_ls_both*4,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\
+   (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\
+   (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_load4_elts" 6
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_load4_one_lane,neon_load4_one_lane_q,\
+			neon_load4_all_lanes,neon_load4_all_lanes_q"))
+  "thunderx2t99_ls_both*2,(thunderx2t99_ls0d1+thunderx2t99_ls1d1),\
+   (thunderx2t99_ls0d2+thunderx2t99_ls1d2),\
+   (thunderx2t99_ls0d3+thunderx2t99_ls1d3),thunderx2t99_f01")
+
+;; ASIMD store instructions.
+
+; Same note applies as for ASIMD load instructions.
+
+(define_insn_reservation "thunderx2t99_asimd_store1_1_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_1reg,neon_store1_1reg_q"))
+  "thunderx2t99_ls01")
+
+(define_insn_reservation "thunderx2t99_asimd_store1_2_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_2reg,neon_store1_2reg_q"))
+  "thunderx2t99_ls_both")
+
+(define_insn_reservation "thunderx2t99_asimd_store1_3_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_3reg,neon_store1_3reg_q"))
+  "(thunderx2t99_ls_both,thunderx2t99_ls01)|(thunderx2t99_ls01,\
+    thunderx2t99_ls_both)")
+
+(define_insn_reservation "thunderx2t99_asimd_store1_4_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_4reg,neon_store1_4reg_q"))
+  "thunderx2t99_ls_both*2")
+
+(define_insn_reservation "thunderx2t99_asimd_store1_onelane" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store1_one_lane,neon_store1_one_lane_q"))
+  "thunderx2t99_ls01,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store2_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store2_2reg,neon_store2_2reg_q"))
+  "thunderx2t99_ls_both,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store2_onelane" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store2_one_lane,neon_store2_one_lane_q"))
+  "thunderx2t99_ls01,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store3_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store3_3reg,neon_store3_3reg_q"))
+  "thunderx2t99_ls_both*3,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store3_onelane" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store3_one_lane,neon_store3_one_lane_q"))
+  "thunderx2t99_ls_both,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store4_mult" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store4_4reg,neon_store4_4reg_q"))
+  "thunderx2t99_ls_both*4,thunderx2t99_f01")
+
+(define_insn_reservation "thunderx2t99_asimd_store4_onelane" 1
+  (and (eq_attr "tune" "thunderx2t99")
+       (eq_attr "type" "neon_store4_one_lane,neon_store4_one_lane_q"))
+  "thunderx2t99_ls_both,thunderx2t99_f01")