[ARM] Improve 64-bit shifts (non-NEON)

Hi all,

This patch introduces a new, more efficient set of DImode shift 
sequences for values stored in core-registers (as opposed to VFP/NEON 
registers).

The new sequences take advantage of knowledge of what the ARM 
instructions do with out-of-range shift amounts.

The following are examples or a simple test case, like this one:

long long
f (long long *a, int b)
{
   return *a << b;
}

In ARM mode, old left-shift vs. the new one:

     stmfd   sp!, {r4, r5}        | ldrd    r2, [r0]
     rsb     r4, r1, #32          | mov     ip, r1
     ldr     r5, [r0, #4]         | stmfd   sp!, {r4, r5}
     subs    ip, r1, #32          | sub     r5, ip, #32
     ldr     r0, [r0, #0]         | rsb     r4, ip, #32
     mov     r3, r5, asl r1       | mov     r1, r3, asl ip
     orr     r3, r3, r0, lsr r4   | mov     r0, r2, asl ip
     mov     r2, r0, asl r1       | orr     r1, r1, r2, asl r5
     movpl   r3, r0, asl ip       | orr     r1, r1, r2, lsr r4
     mov     r0, r2               | ldmfd   sp!, {r4, r5}
     mov     r1, r3               | bx      lr
     ldmfd   sp!, {r4, r5}        |
     bx      lr                   |

In Thumb mode, old left-shift vs. new:

     ldr     r2, [r0, #0]         | ldrd    r2, [r0]
     ldr     r3, [r0, #4]         | push    {r4, r5, r6}
     push    {r4, r5, r6}         | sub     r6, r1, #32
     rsb     r6, r1, #32          | mov     r4, r1
     sub     r4, r1, #32          | rsb     r5, r1, #32
     lsls    r3, r3, r1           | lsls    r6, r2, r6
     lsrs    r6, r2, r6           | lsls    r1, r3, r1
     lsls    r5, r2, r4           | lsrs    r5, r2, r5
     orrs    r3, r3, r6           | lsls    r0, r2, r4
     lsls    r0, r2, r1           | orrs    r1, r1, r6
     bics    r1, r5, r4, asr #32  | orrs    r1, r1, r5
     it      cs                   | pop     {r4, r5, r6}
     movcs   r1, r3               | bx      lr
     pop     {r4, r5, r6}         |
     bx      lr                   |

Logical right shift is essentially the same sequence as the left shift 
above. However, arithmetic right shift requires something slightly 
different. Here it is in ARM mode, old vs. new:

     stmfd   sp!, {r4, r5}        | ldrd    r2, [r0]
     rsb     r4, r1, #32          | mov     ip, r1
     ldr     r5, [r0, #0]         | stmfd   sp!, {r4, r5}
     subs    ip, r1, #32          | rsb     r5, ip, #32
     ldr     r0, [r0, #4]         | subs    r4, ip, #32
     mov     r2, r5, lsr r1       | mov     r0, r2, lsr ip
     orr     r2, r2, r0, asl r4   | mov     r1, r3, asr ip
     mov     r3, r0, asr r1       | orr     r0, r0, r3, asl r5
     movpl   r2, r0, asr ip       | orrge   r0, r0, r3, asr r4
     mov     r1, r3               | ldmfd   sp!, {r4, r5}
     mov     r0, r2               | bx      lr
     ldmfd   sp!, {r4, r5}        |
     bx      lr                   |

I won't bore you with the Thumb mode comparison.

The shift-by-constant cases have also been reimplemented, although the 
resultant sequences are much the same as before. (Doing this isn't 
strictly necessary just yet, but when I post my next patch to do 64-bit 
shifts in NEON, this feature will be required by the fall-back 
alternatives.)

I've run a regression test on a cross-compiler, and I should have native 
test results next week sometime. Also some benchmark results.

Is this OK for stage 1?

Andrew

[ARM] Improve 64-bit shifts (non-NEON)

Commit Message

Comments

Patch