diff mbox series

[RESEND,1/9] crypto: caam/jr - add fallback for XTS with more than 8B IV

Message ID 20200806163551.14395-2-andrei.botila@oss.nxp.com
State New
Headers show
Series crypto: caam - xts(aes) updates | expand

Commit Message

Andrei Botila Aug. 6, 2020, 4:35 p.m. UTC
From: Andrei Botila <andrei.botila@nxp.com>

A hardware limitation exists for CAAM until Era 9 which restricts
the accelerator to IVs with only 8 bytes. When CAAM has a lower era
a fallback is necessary to process 16 bytes IV.

Fixes: c6415a6016bf ("crypto: caam - add support for acipher xts(aes)")
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Andrei Botila <andrei.botila@nxp.com>
---
 drivers/crypto/caam/caamalg.c | 68 ++++++++++++++++++++++++++++++++---
 1 file changed, 64 insertions(+), 4 deletions(-)

Comments

Horia Geanta Sept. 14, 2020, 4:23 p.m. UTC | #1
On 9/9/2020 1:10 AM, Herbert Xu wrote:
> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

>>

>>> Just go with the get_unaligned unconditionally.

>>

>> Won't this lead to sub-optimal code for ARMv7

>> in case the IV is aligned?

> 

> If this should be optimised in ARMv7 then that should be done

> in get_unaligned itself and not open-coded.

> 

I am not sure what's wrong with avoiding using the unaligned accessors
in case data is aligned.

Documentation/core-api/unaligned-memory-access.rst clearly states:
These macros work for memory accesses of any length (not just 32 bits as
in the examples above). Be aware that when compared to standard access of
aligned memory, using these macros to access unaligned memory can be costly in
terms of performance.

So IMO it makes sense to use get_unaligned() only when needed.
There are several cases of users doing this, e.g. siphash.

Thanks,
Horia
Ard Biesheuvel Sept. 14, 2020, 4:28 p.m. UTC | #2
On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:
>

> On 9/9/2020 1:10 AM, Herbert Xu wrote:

> > On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

> >>

> >>> Just go with the get_unaligned unconditionally.

> >>

> >> Won't this lead to sub-optimal code for ARMv7

> >> in case the IV is aligned?

> >

> > If this should be optimised in ARMv7 then that should be done

> > in get_unaligned itself and not open-coded.

> >

> I am not sure what's wrong with avoiding using the unaligned accessors

> in case data is aligned.

>

> Documentation/core-api/unaligned-memory-access.rst clearly states:

> These macros work for memory accesses of any length (not just 32 bits as

> in the examples above). Be aware that when compared to standard access of

> aligned memory, using these macros to access unaligned memory can be costly in

> terms of performance.

>

> So IMO it makes sense to use get_unaligned() only when needed.

> There are several cases of users doing this, e.g. siphash.

>


For ARMv7 code, using the unaligned accessors unconditionally is fine,
and it will not affect performance.

In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,
you can use the unaligned accessors. If it is not, it helps to have
different code paths.

This is a bit murky, and through the years, the interpretation of
unaligned-memory-access.rst has shifted a bit, but in this case, it
makes no sense to make the distinction.
Horia Geanta Sept. 14, 2020, 5:12 p.m. UTC | #3
On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:
> On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

>>

>> On 9/9/2020 1:10 AM, Herbert Xu wrote:

>>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

>>>>

>>>>> Just go with the get_unaligned unconditionally.

>>>>

>>>> Won't this lead to sub-optimal code for ARMv7

>>>> in case the IV is aligned?

>>>

>>> If this should be optimised in ARMv7 then that should be done

>>> in get_unaligned itself and not open-coded.

>>>

>> I am not sure what's wrong with avoiding using the unaligned accessors

>> in case data is aligned.

>>

>> Documentation/core-api/unaligned-memory-access.rst clearly states:

>> These macros work for memory accesses of any length (not just 32 bits as

>> in the examples above). Be aware that when compared to standard access of

>> aligned memory, using these macros to access unaligned memory can be costly in

>> terms of performance.

>>

>> So IMO it makes sense to use get_unaligned() only when needed.

>> There are several cases of users doing this, e.g. siphash.

>>

> 

> For ARMv7 code, using the unaligned accessors unconditionally is fine,

> and it will not affect performance.

> 

> In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

> you can use the unaligned accessors. If it is not, it helps to have

> different code paths.

> 

arch/arm/include/asm/unaligned.h doesn't make use of
linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
is set.

I understand the comment in the file, however using get_unaligned()
unconditionally takes away the opportunity to generate optimized code
(using ldrd/ldm) when data is aligned.

> This is a bit murky, and through the years, the interpretation of

> unaligned-memory-access.rst has shifted a bit, but in this case, it

> makes no sense to make the distinction.

> 


Thanks,
Horia
Ard Biesheuvel Sept. 14, 2020, 6:20 p.m. UTC | #4
On Mon, 14 Sep 2020 at 20:12, Horia Geantă <horia.geanta@nxp.com> wrote:
>

> On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:

> > On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>

> >> On 9/9/2020 1:10 AM, Herbert Xu wrote:

> >>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

> >>>>

> >>>>> Just go with the get_unaligned unconditionally.

> >>>>

> >>>> Won't this lead to sub-optimal code for ARMv7

> >>>> in case the IV is aligned?

> >>>

> >>> If this should be optimised in ARMv7 then that should be done

> >>> in get_unaligned itself and not open-coded.

> >>>

> >> I am not sure what's wrong with avoiding using the unaligned accessors

> >> in case data is aligned.

> >>

> >> Documentation/core-api/unaligned-memory-access.rst clearly states:

> >> These macros work for memory accesses of any length (not just 32 bits as

> >> in the examples above). Be aware that when compared to standard access of

> >> aligned memory, using these macros to access unaligned memory can be costly in

> >> terms of performance.

> >>

> >> So IMO it makes sense to use get_unaligned() only when needed.

> >> There are several cases of users doing this, e.g. siphash.

> >>

> >

> > For ARMv7 code, using the unaligned accessors unconditionally is fine,

> > and it will not affect performance.

> >

> > In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

> > you can use the unaligned accessors. If it is not, it helps to have

> > different code paths.

> >

> arch/arm/include/asm/unaligned.h doesn't make use of

> linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

> is set.

>

> I understand the comment in the file, however using get_unaligned()

> unconditionally takes away the opportunity to generate optimized code

> (using ldrd/ldm) when data is aligned.

>


But the minimal optimization that is possible here (one ldrd/ldm
instruction vs two ldr instructions) is defeated by the fact that you
are using a conditional branch to select between the two. And this is
not even a hot path to begin with,

> > This is a bit murky, and through the years, the interpretation of

> > unaligned-memory-access.rst has shifted a bit, but in this case, it

> > makes no sense to make the distinction.

> >

>

> Thanks,

> Horia
Horia Geanta Sept. 15, 2020, 10:02 a.m. UTC | #5
On 9/14/2020 9:20 PM, Ard Biesheuvel wrote:
> On Mon, 14 Sep 2020 at 20:12, Horia Geantă <horia.geanta@nxp.com> wrote:

>>

>> On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:

>>> On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

>>>>

>>>> On 9/9/2020 1:10 AM, Herbert Xu wrote:

>>>>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

>>>>>>

>>>>>>> Just go with the get_unaligned unconditionally.

>>>>>>

>>>>>> Won't this lead to sub-optimal code for ARMv7

>>>>>> in case the IV is aligned?

>>>>>

>>>>> If this should be optimised in ARMv7 then that should be done

>>>>> in get_unaligned itself and not open-coded.

>>>>>

>>>> I am not sure what's wrong with avoiding using the unaligned accessors

>>>> in case data is aligned.

>>>>

>>>> Documentation/core-api/unaligned-memory-access.rst clearly states:

>>>> These macros work for memory accesses of any length (not just 32 bits as

>>>> in the examples above). Be aware that when compared to standard access of

>>>> aligned memory, using these macros to access unaligned memory can be costly in

>>>> terms of performance.

>>>>

>>>> So IMO it makes sense to use get_unaligned() only when needed.

>>>> There are several cases of users doing this, e.g. siphash.

>>>>

>>>

>>> For ARMv7 code, using the unaligned accessors unconditionally is fine,

>>> and it will not affect performance.

>>>

>>> In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

>>> you can use the unaligned accessors. If it is not, it helps to have

>>> different code paths.

>>>

>> arch/arm/include/asm/unaligned.h doesn't make use of

>> linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

>> is set.

>>

>> I understand the comment in the file, however using get_unaligned()

>> unconditionally takes away the opportunity to generate optimized code

>> (using ldrd/ldm) when data is aligned.

>>

> 

> But the minimal optimization that is possible here (one ldrd/ldm

> instruction vs two ldr instructions) is defeated by the fact that you

> are using a conditional branch to select between the two. And this is

> not even a hot path to begin with,

> 

This is actually on the hot path (encrypt/decrypt callbacks),
but you're probably right that the conditional branching is going to offset
the optimized code.

To avoid branching, code could be rewritten as:

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
	size = *(u64 *)(req->iv + (ivsize / 2));
#else
	size = get_unaligned((u64 *)(req->iv + (ivsize / 2)));
#endif

however in this case ARMv7 would suffer since
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and
ldrd/ldm for accesses not word-aligned are inefficient - lead to traps.

Would it be ok to use:
#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && !defined(CONFIG_ARM)
to workaround the ARMv7 inconsistency?

Thanks,
Horia
Ard Biesheuvel Sept. 15, 2020, 10:26 a.m. UTC | #6
On Tue, 15 Sep 2020 at 13:02, Horia Geantă <horia.geanta@nxp.com> wrote:
>

> On 9/14/2020 9:20 PM, Ard Biesheuvel wrote:

> > On Mon, 14 Sep 2020 at 20:12, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>

> >> On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:

> >>> On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>>>

> >>>> On 9/9/2020 1:10 AM, Herbert Xu wrote:

> >>>>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

> >>>>>>

> >>>>>>> Just go with the get_unaligned unconditionally.

> >>>>>>

> >>>>>> Won't this lead to sub-optimal code for ARMv7

> >>>>>> in case the IV is aligned?

> >>>>>

> >>>>> If this should be optimised in ARMv7 then that should be done

> >>>>> in get_unaligned itself and not open-coded.

> >>>>>

> >>>> I am not sure what's wrong with avoiding using the unaligned accessors

> >>>> in case data is aligned.

> >>>>

> >>>> Documentation/core-api/unaligned-memory-access.rst clearly states:

> >>>> These macros work for memory accesses of any length (not just 32 bits as

> >>>> in the examples above). Be aware that when compared to standard access of

> >>>> aligned memory, using these macros to access unaligned memory can be costly in

> >>>> terms of performance.

> >>>>

> >>>> So IMO it makes sense to use get_unaligned() only when needed.

> >>>> There are several cases of users doing this, e.g. siphash.

> >>>>

> >>>

> >>> For ARMv7 code, using the unaligned accessors unconditionally is fine,

> >>> and it will not affect performance.

> >>>

> >>> In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

> >>> you can use the unaligned accessors. If it is not, it helps to have

> >>> different code paths.

> >>>

> >> arch/arm/include/asm/unaligned.h doesn't make use of

> >> linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

> >> is set.

> >>

> >> I understand the comment in the file, however using get_unaligned()

> >> unconditionally takes away the opportunity to generate optimized code

> >> (using ldrd/ldm) when data is aligned.

> >>

> >

> > But the minimal optimization that is possible here (one ldrd/ldm

> > instruction vs two ldr instructions) is defeated by the fact that you

> > are using a conditional branch to select between the two. And this is

> > not even a hot path to begin with,

> >

> This is actually on the hot path (encrypt/decrypt callbacks),

> but you're probably right that the conditional branching is going to offset

> the optimized code.

>


This is called once per XTS request, right? And you are saying the
extra cycle makes a difference?

> To avoid branching, code could be rewritten as:

>

> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

>         size = *(u64 *)(req->iv + (ivsize / 2));

> #else

>         size = get_unaligned((u64 *)(req->iv + (ivsize / 2)));

> #endif

>

> however in this case ARMv7 would suffer since

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and

> ldrd/ldm for accesses not word-aligned are inefficient - lead to traps.

>


CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS means 'just use the unaligned
accessors as they are basically free'. Casting a potentially
misaligned u8* to a u64* is not permitted by the C standard.

> Would it be ok to use:

> #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && !defined(CONFIG_ARM)

> to workaround the ARMv7 inconsistency?

>


No, please just use the get_unaligned() accessor.
Horia Geanta Sept. 15, 2020, 12:44 p.m. UTC | #7
On 9/15/2020 1:26 PM, Ard Biesheuvel wrote:
> On Tue, 15 Sep 2020 at 13:02, Horia Geantă <horia.geanta@nxp.com> wrote:

>>

>> On 9/14/2020 9:20 PM, Ard Biesheuvel wrote:

>>> On Mon, 14 Sep 2020 at 20:12, Horia Geantă <horia.geanta@nxp.com> wrote:

>>>>

>>>> On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:

>>>>> On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

>>>>>>

>>>>>> On 9/9/2020 1:10 AM, Herbert Xu wrote:

>>>>>>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

>>>>>>>>

>>>>>>>>> Just go with the get_unaligned unconditionally.

>>>>>>>>

>>>>>>>> Won't this lead to sub-optimal code for ARMv7

>>>>>>>> in case the IV is aligned?

>>>>>>>

>>>>>>> If this should be optimised in ARMv7 then that should be done

>>>>>>> in get_unaligned itself and not open-coded.

>>>>>>>

>>>>>> I am not sure what's wrong with avoiding using the unaligned accessors

>>>>>> in case data is aligned.

>>>>>>

>>>>>> Documentation/core-api/unaligned-memory-access.rst clearly states:

>>>>>> These macros work for memory accesses of any length (not just 32 bits as

>>>>>> in the examples above). Be aware that when compared to standard access of

>>>>>> aligned memory, using these macros to access unaligned memory can be costly in

>>>>>> terms of performance.

>>>>>>

>>>>>> So IMO it makes sense to use get_unaligned() only when needed.

>>>>>> There are several cases of users doing this, e.g. siphash.

>>>>>>

>>>>>

>>>>> For ARMv7 code, using the unaligned accessors unconditionally is fine,

>>>>> and it will not affect performance.

>>>>>

>>>>> In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

>>>>> you can use the unaligned accessors. If it is not, it helps to have

>>>>> different code paths.

>>>>>

>>>> arch/arm/include/asm/unaligned.h doesn't make use of

>>>> linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

>>>> is set.

>>>>

>>>> I understand the comment in the file, however using get_unaligned()

>>>> unconditionally takes away the opportunity to generate optimized code

>>>> (using ldrd/ldm) when data is aligned.

>>>>

>>>

>>> But the minimal optimization that is possible here (one ldrd/ldm

>>> instruction vs two ldr instructions) is defeated by the fact that you

>>> are using a conditional branch to select between the two. And this is

>>> not even a hot path to begin with,

>>>

>> This is actually on the hot path (encrypt/decrypt callbacks),

>> but you're probably right that the conditional branching is going to offset

>> the optimized code.

>>

> 

> This is called once per XTS request, right? And you are saying the

> extra cycle makes a difference?

> 

Yes, once per request and no, not super-important.

>> To avoid branching, code could be rewritten as:

>>

>> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

>>         size = *(u64 *)(req->iv + (ivsize / 2));

>> #else

>>         size = get_unaligned((u64 *)(req->iv + (ivsize / 2)));

>> #endif

>>

>> however in this case ARMv7 would suffer since

>> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and

>> ldrd/ldm for accesses not word-aligned are inefficient - lead to traps.

>>

> 

> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS means 'just use the unaligned

> accessors as they are basically free'. Casting a potentially

> misaligned u8* to a u64* is not permitted by the C standard.

> 

Seems that I misunderstood CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

Looking at its usage, e.g. ether_addr_equal() or __crypto_memneq_*(),
I see similar casts of pointers possibly misaligned.

>> Would it be ok to use:

>> #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && !defined(CONFIG_ARM)

>> to workaround the ARMv7 inconsistency?

>>

> 

> No, please just use the get_unaligned() accessor.

> 

Ok.

Thanks,
Horia
Ard Biesheuvel Sept. 15, 2020, 12:51 p.m. UTC | #8
On Tue, 15 Sep 2020 at 15:45, Horia Geantă <horia.geanta@nxp.com> wrote:
>

> On 9/15/2020 1:26 PM, Ard Biesheuvel wrote:

> > On Tue, 15 Sep 2020 at 13:02, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>

> >> On 9/14/2020 9:20 PM, Ard Biesheuvel wrote:

> >>> On Mon, 14 Sep 2020 at 20:12, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>>>

> >>>> On 9/14/2020 7:28 PM, Ard Biesheuvel wrote:

> >>>>> On Mon, 14 Sep 2020 at 19:24, Horia Geantă <horia.geanta@nxp.com> wrote:

> >>>>>>

> >>>>>> On 9/9/2020 1:10 AM, Herbert Xu wrote:

> >>>>>>> On Tue, Sep 08, 2020 at 01:35:04PM +0300, Horia Geantă wrote:

> >>>>>>>>

> >>>>>>>>> Just go with the get_unaligned unconditionally.

> >>>>>>>>

> >>>>>>>> Won't this lead to sub-optimal code for ARMv7

> >>>>>>>> in case the IV is aligned?

> >>>>>>>

> >>>>>>> If this should be optimised in ARMv7 then that should be done

> >>>>>>> in get_unaligned itself and not open-coded.

> >>>>>>>

> >>>>>> I am not sure what's wrong with avoiding using the unaligned accessors

> >>>>>> in case data is aligned.

> >>>>>>

> >>>>>> Documentation/core-api/unaligned-memory-access.rst clearly states:

> >>>>>> These macros work for memory accesses of any length (not just 32 bits as

> >>>>>> in the examples above). Be aware that when compared to standard access of

> >>>>>> aligned memory, using these macros to access unaligned memory can be costly in

> >>>>>> terms of performance.

> >>>>>>

> >>>>>> So IMO it makes sense to use get_unaligned() only when needed.

> >>>>>> There are several cases of users doing this, e.g. siphash.

> >>>>>>

> >>>>>

> >>>>> For ARMv7 code, using the unaligned accessors unconditionally is fine,

> >>>>> and it will not affect performance.

> >>>>>

> >>>>> In general, when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined,

> >>>>> you can use the unaligned accessors. If it is not, it helps to have

> >>>>> different code paths.

> >>>>>

> >>>> arch/arm/include/asm/unaligned.h doesn't make use of

> >>>> linux/unaligned/access_ok.h, even if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

> >>>> is set.

> >>>>

> >>>> I understand the comment in the file, however using get_unaligned()

> >>>> unconditionally takes away the opportunity to generate optimized code

> >>>> (using ldrd/ldm) when data is aligned.

> >>>>

> >>>

> >>> But the minimal optimization that is possible here (one ldrd/ldm

> >>> instruction vs two ldr instructions) is defeated by the fact that you

> >>> are using a conditional branch to select between the two. And this is

> >>> not even a hot path to begin with,

> >>>

> >> This is actually on the hot path (encrypt/decrypt callbacks),

> >> but you're probably right that the conditional branching is going to offset

> >> the optimized code.

> >>

> >

> > This is called once per XTS request, right? And you are saying the

> > extra cycle makes a difference?

> >

> Yes, once per request and no, not super-important.

>

> >> To avoid branching, code could be rewritten as:

> >>

> >> #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS

> >>         size = *(u64 *)(req->iv + (ivsize / 2));

> >> #else

> >>         size = get_unaligned((u64 *)(req->iv + (ivsize / 2)));

> >> #endif

> >>

> >> however in this case ARMv7 would suffer since

> >> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y and

> >> ldrd/ldm for accesses not word-aligned are inefficient - lead to traps.

> >>

> >

> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS means 'just use the unaligned

> > accessors as they are basically free'. Casting a potentially

> > misaligned u8* to a u64* is not permitted by the C standard.

> >

> Seems that I misunderstood CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

>


You're not the only one :-) I have been intending to get the
discussion going with the networking folks, who rely heavily on this
as well.

> Looking at its usage, e.g. ether_addr_equal() or __crypto_memneq_*(),

> I see similar casts of pointers possibly misaligned.

>


Yes, that is the confusion. CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
should indicate whether using the unaligned accessors is fine in all
cases, or whether you should find other ways to load the data more
efficiently (compare NET_IP_ALIGN, which shifts the entire IP header
so the 32-bit address field appear aligned in memory)

CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS does *not* mean that you can
simply cast any pointer to any type and dereference it, but the
meaning appears to have shifted this way over the years (and the
Documentation/ was even updated to this effect)

Pre-v6 ARM (and MIPS as well, IIRC) require byte sized accesses and
shift/or sequences to do unaligned accesses, whereas v6 and up simply
allows ldr from a misaligned address. So in the former case, you could
use cra_alignmask to align the data in memory, while the latter case
can ignore it. (arch/arm/crypto/aes-cipher-glue.c uses this as well)

> >> Would it be ok to use:

> >> #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && !defined(CONFIG_ARM)

> >> to workaround the ARMv7 inconsistency?

> >>

> >

> > No, please just use the get_unaligned() accessor.

> >

> Ok.

>

> Thanks,

> Horia
diff mbox series

Patch

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 91feda5b63f6..ebf4dc87ca2e 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -57,6 +57,7 @@ 
 #include "key_gen.h"
 #include "caamalg_desc.h"
 #include <crypto/engine.h>
+#include <asm/unaligned.h>
 
 /*
  * crypto alg
@@ -114,10 +115,12 @@  struct caam_ctx {
 	struct alginfo adata;
 	struct alginfo cdata;
 	unsigned int authsize;
+	struct crypto_skcipher *fallback;
 };
 
 struct caam_skcipher_req_ctx {
 	struct skcipher_edesc *edesc;
+	struct skcipher_request fallback_req;
 };
 
 struct caam_aead_req_ctx {
@@ -830,12 +833,17 @@  static int xts_skcipher_setkey(struct crypto_skcipher *skcipher, const u8 *key,
 	struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
 	struct device *jrdev = ctx->jrdev;
 	u32 *desc;
+	int err;
 
 	if (keylen != 2 * AES_MIN_KEY_SIZE  && keylen != 2 * AES_MAX_KEY_SIZE) {
 		dev_dbg(jrdev, "key size mismatch\n");
 		return -EINVAL;
 	}
 
+	err = crypto_skcipher_setkey(ctx->fallback, key, keylen);
+	if (err)
+		return err;
+
 	ctx->cdata.keylen = keylen;
 	ctx->cdata.key_virt = key;
 	ctx->cdata.key_inline = true;
@@ -1755,6 +1763,20 @@  static int skcipher_do_one_req(struct crypto_engine *engine, void *areq)
 	return ret;
 }
 
+static bool xts_skcipher_ivsize(struct skcipher_request *req)
+{
+	struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+	unsigned int ivsize = crypto_skcipher_ivsize(skcipher);
+	u64 size = 0;
+
+	if (IS_ALIGNED((unsigned long)req->iv, __alignof__(u64)))
+		size = *(u64 *)(req->iv + (ivsize / 2));
+	else
+		size = get_unaligned((u64 *)(req->iv + (ivsize / 2)));
+
+	return !!size;
+}
+
 static inline int skcipher_crypt(struct skcipher_request *req, bool encrypt)
 {
 	struct skcipher_edesc *edesc;
@@ -1768,6 +1790,21 @@  static inline int skcipher_crypt(struct skcipher_request *req, bool encrypt)
 	if (!req->cryptlen)
 		return 0;
 
+	if (ctx->fallback && xts_skcipher_ivsize(req)) {
+		struct caam_skcipher_req_ctx *rctx = skcipher_request_ctx(req);
+
+		skcipher_request_set_tfm(&rctx->fallback_req, ctx->fallback);
+		skcipher_request_set_callback(&rctx->fallback_req,
+					      req->base.flags,
+					      req->base.complete,
+					      req->base.data);
+		skcipher_request_set_crypt(&rctx->fallback_req, req->src,
+					   req->dst, req->cryptlen, req->iv);
+
+		return encrypt ? crypto_skcipher_encrypt(&rctx->fallback_req) :
+				  crypto_skcipher_decrypt(&rctx->fallback_req);
+	}
+
 	/* allocate extended descriptor */
 	edesc = skcipher_edesc_alloc(req, DESC_JOB_IO_LEN * CAAM_CMD_SZ);
 	if (IS_ERR(edesc))
@@ -1905,6 +1942,7 @@  static struct caam_skcipher_alg driver_algs[] = {
 			.base = {
 				.cra_name = "xts(aes)",
 				.cra_driver_name = "xts-aes-caam",
+				.cra_flags = CRYPTO_ALG_NEED_FALLBACK,
 				.cra_blocksize = AES_BLOCK_SIZE,
 			},
 			.setkey = xts_skcipher_setkey,
@@ -3344,12 +3382,30 @@  static int caam_cra_init(struct crypto_skcipher *tfm)
 	struct caam_skcipher_alg *caam_alg =
 		container_of(alg, typeof(*caam_alg), skcipher);
 	struct caam_ctx *ctx = crypto_skcipher_ctx(tfm);
+	u32 alg_aai = caam_alg->caam.class1_alg_type & OP_ALG_AAI_MASK;
 
 	crypto_skcipher_set_reqsize(tfm, sizeof(struct caam_skcipher_req_ctx));
 
 	ctx->enginectx.op.do_one_request = skcipher_do_one_req;
 
-	return caam_init_common(crypto_skcipher_ctx(tfm), &caam_alg->caam,
+	if (alg_aai == OP_ALG_AAI_XTS) {
+		const char *tfm_name = crypto_tfm_alg_name(&tfm->base);
+		struct crypto_skcipher *fallback;
+
+		fallback = crypto_alloc_skcipher(tfm_name, 0,
+						 CRYPTO_ALG_NEED_FALLBACK);
+		if (IS_ERR(fallback)) {
+			pr_err("Failed to allocate %s fallback: %ld\n",
+			       tfm_name, PTR_ERR(fallback));
+			return PTR_ERR(fallback);
+		}
+
+		ctx->fallback = fallback;
+		crypto_skcipher_set_reqsize(tfm, sizeof(struct caam_skcipher_req_ctx) +
+					    crypto_skcipher_reqsize(fallback));
+	}
+
+	return caam_init_common(ctx, &caam_alg->caam,
 				false);
 }
 
@@ -3378,7 +3434,11 @@  static void caam_exit_common(struct caam_ctx *ctx)
 
 static void caam_cra_exit(struct crypto_skcipher *tfm)
 {
-	caam_exit_common(crypto_skcipher_ctx(tfm));
+	struct caam_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (ctx->fallback)
+		crypto_free_skcipher(ctx->fallback);
+	caam_exit_common(ctx);
 }
 
 static void caam_aead_exit(struct crypto_aead *tfm)
@@ -3412,8 +3472,8 @@  static void caam_skcipher_alg_init(struct caam_skcipher_alg *t_alg)
 	alg->base.cra_module = THIS_MODULE;
 	alg->base.cra_priority = CAAM_CRA_PRIORITY;
 	alg->base.cra_ctxsize = sizeof(struct caam_ctx);
-	alg->base.cra_flags = CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY |
-			      CRYPTO_ALG_KERN_DRIVER_ONLY;
+	alg->base.cra_flags |= (CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY |
+			      CRYPTO_ALG_KERN_DRIVER_ONLY);
 
 	alg->init = caam_cra_init;
 	alg->exit = caam_cra_exit;