mbox series

[RFC,0/5] Loop unrolling and memory load streams

Message ID CAELXzTMgQX4pMAz9NCDJN99VoCPUF8xZc4kitNnouLqBcoKjbQ@mail.gmail.com
Headers show
Series Loop unrolling and memory load streams | expand

Message

Kugan Vivekanandarajah Sept. 15, 2017, 1:24 a.m. UTC
While loop unrolling helps to keep the pipeline busy in modern
processors, it also can increase the memory streams resulting in
collisions for the hardware prefetcher that can impact performance.
This patch series tries to detect this and limit the loop unrolling.

Patch 1 : Add separate parms for rtl unroller:

Patch2: Add number of hw prefetchers available to cpu_prefetch_tune so it can
be used in loop unrolling decisions:

Patch3: Prevent tree unroller from completely unrolling inner loops if that
results in excessive strided-loads in outer loop:

Patch4: Change iv_analyze_result to take const_rtx. This is just to make the
next patch compile. No functional changes:

Patch5: add aarch64_loop_unroll_adjust to limit partial unrolling in rtl
based on strided-loads in loop:

Bootstrapped and tested on aarch64-linux-gnu (with
–funroll-all-loops). Testing on x86_64-linux-gnu ongoing.

Thanks,
Kugan