diff mbox series

x86: checksum: Fix unaligned checksums on < i686

Message ID 20230704083206.693155-2-davidgow@google.com
State Superseded
Headers show
Series x86: checksum: Fix unaligned checksums on < i686 | expand

Commit Message

David Gow July 4, 2023, 8:32 a.m. UTC
The checksum_32 code was originally written to only handle 2-byte
aligned buffers, but was later extended to support arbitrary alignment.
However, the non-PPro variant doesn't apply the carry before jumping to
the 2- or 4-byte aligned versions, which clear CF.

This causes the new checksum_kunit test to fail, as it runs with a large
number of different possible alignments and both with and without
carries.

For example:
./tools/testing/kunit/kunit.py run --arch i386 --kconfig_add CONFIG_M486=y checksum
Gives:
    KTAP version 1
    # Subtest: checksum
    1..3
    ok 1 test_csum_fixed_random_inputs
    # test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267
    Expected result == expec, but
        result == 65281 (0xff01)
        expec == 65280 (0xff00)
    not ok 2 test_csum_all_carry_inputs
    # test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:314
    Expected result == expec, but
        result == 65535 (0xffff)
        expec == 65534 (0xfffe)
    not ok 3 test_csum_no_carry_inputs

With this patch, it passes.
    KTAP version 1
    # Subtest: checksum
    1..3
    ok 1 test_csum_fixed_random_inputs
    ok 2 test_csum_all_carry_inputs
    ok 3 test_csum_no_carry_inputs

I also tested it on a real 486DX2, with the same results.

Signed-off-by: David Gow <davidgow@google.com>
---

This is a follow-up to the UML patch to use the common 32-bit x86
checksum implementations:
https://lore.kernel.org/linux-um/20230704083022.692368-2-davidgow@google.com/

---
 arch/x86/lib/checksum_32.S | 1 +
 1 file changed, 1 insertion(+)

Comments

David Laight July 10, 2023, 3:01 p.m. UTC | #1
From: David Gow
> Sent: 04 July 2023 09:32
> 
> The checksum_32 code was originally written to only handle 2-byte
> aligned buffers, but was later extended to support arbitrary alignment.
> However, the non-PPro variant doesn't apply the carry before jumping to
> the 2- or 4-byte aligned versions, which clear CF.
....
> I also tested it on a real 486DX2, with the same results.

Which cpu does anyone really care about?

The unrolled 'adcl' loop is horrid on intel cpu between
(about) 'core' and 'haswell' because each u-op can only
have two inputs and adc needs 3 - so is 2 u-ops.
First fixed by summing to alternate registers.

On anything modern (well I've not checked some Atom based
servers) misaligned accesses are pretty near zero cost.
So it really isn't worth the tests that align data.

(I suspect it all got better a long time ago except
for transfers that cross cache-line boundaries, with
adc taking two cycles even that might be free.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 23318c338db0..128287cea42d 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -62,6 +62,7 @@  SYM_FUNC_START(csum_partial)
 	jl 8f
 	movzbl (%esi), %ebx
 	adcl %ebx, %eax
+	adcl $0, %eax
 	roll $8, %eax
 	inc %esi
 	testl $2, %esi