[PATCHv3,00/10] 64-bit data integrity field support

Message ID	20220222163144.1782447-1-kbusch@kernel.org
Headers	show Return-Path: <linux-crypto-owner@kernel.org> From: Keith Busch <kbusch@kernel.org> To: linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Cc: axboe@kernel.dk, hch@lst.de, martin.petersen@oracle.com, colyli@suse.de, Keith Busch <kbusch@kernel.org> Subject: [PATCHv3 00/10] 64-bit data integrity field support Date: Tue, 22 Feb 2022 08:31:34 -0800 Message-Id: <20220222163144.1782447-1-kbusch@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	64-bit data integrity field support \| expand [PATCHv3,00/10] 64-bit data integrity field support [PATCHv3,01/10] block: support pi with extended metadata [PATCHv3,02/10] nvme: allow integrity on extended metadata formats [PATCHv3,03/10] asm-generic: introduce be48 unaligned accessors [PATCHv3,04/10] linux/kernel: introduce lower_48_bits macro [PATCHv3,05/10] lib: add rocksoft model crc64 [PATCHv3,06/10] crypto: add rocksoft 64b crc framework [PATCHv3,07/10] lib: add crc64 tests [PATCHv3,08/10] block: add pi for nvme enhanced integrity [PATCHv3,09/10] nvme: add support for enhanced metadata [PATCHv3,10/10] x86/crypto: add pclmul acceleration for crc64

Keith Busch Feb. 22, 2022, 4:31 p.m. UTC

The NVM Express protocol added enhancements to the data integrity field
formats beyond the T10 defined protection information. A detailed
description of the new formats can be found in the NVMe's NVM Command
Set Specification, section 5.2, available at:

  https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf

This series implements one possible new format: the CRC64 guard with
48-bit reference tags. This does not add support for the variable
"storage tag" field, or any potential hardware acceleration.

Changes since v2:

  Introduced lower_48_bits() (Bart)

  Added "64" suffix to crc names to distinguish from other crc
  calculations (Martin)

  Added module support to crypto library, and have the integrity
  generate/verify use that instead of directly calling the generic table
  lookup implementation. This is modeled after the t10 implementation.

  Added an x86 pclmul accelerated crc64 calculation

  Added speed tests to the test module so that the generic and library
  function could be compared.

  Bug fix in nvme to select the correct integrity profile.

  Added reviews

Note, I am not particularly well versed with x86 assembly, so I'm reasonably
sure that the pcl implementation could be improved. It currently is
about 20x faster than the generic implementation, but I believe it could
be over 30x faster if done more efficiently.

Keith Busch (10):
  block: support pi with extended metadata
  nvme: allow integrity on extended metadata formats
  asm-generic: introduce be48 unaligned accessors
  linux/kernel: introduce lower_48_bits macro
  lib: add rocksoft model crc64
  crypto: add rocksoft 64b crc guard tag framework
  lib: add crc64 tests
  block: add pi for nvme enhanced integrity
  nvme: add support for enhanced metadata
  x86/crypto: add pclmul acceleration for crc64

 arch/x86/crypto/Makefile                  |   3 +
 arch/x86/crypto/crc64-rocksoft-pcl-asm.S  | 215 ++++++++++++++++++++++
 arch/x86/crypto/crc64-rocksoft-pcl_glue.c | 117 ++++++++++++
 block/Kconfig                             |   1 +
 block/bio-integrity.c                     |   1 +
 block/t10-pi.c                            | 198 +++++++++++++++++++-
 crypto/Kconfig                            |  20 ++
 crypto/Makefile                           |   1 +
 crypto/crc64_rocksoft_generic.c           | 104 +++++++++++
 drivers/nvme/host/core.c                  | 175 ++++++++++++++----
 drivers/nvme/host/nvme.h                  |   4 +-
 include/asm-generic/unaligned.h           |  26 +++
 include/linux/blk-integrity.h             |   1 +
 include/linux/crc64.h                     |   7 +
 include/linux/kernel.h                    |   6 +
 include/linux/nvme.h                      |  53 +++++-
 include/linux/t10-pi.h                    |  20 ++
 lib/Kconfig                               |   9 +
 lib/Kconfig.debug                         |   4 +
 lib/Makefile                              |   2 +
 lib/crc64-rocksoft.c                      | 129 +++++++++++++
 lib/crc64.c                               |  26 +++
 lib/gen_crc64table.c                      |  51 +++--
 lib/test_crc64.c                          | 133 +++++++++++++
 24 files changed, 1252 insertions(+), 54 deletions(-)
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl-asm.S
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl_glue.c
 create mode 100644 crypto/crc64_rocksoft_generic.c
 create mode 100644 lib/crc64-rocksoft.c
 create mode 100644 lib/test_crc64.c

Christoph Hellwig Feb. 22, 2022, 4:50 p.m. UTC | #1

On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > Recent data integrity field enhancements allow 48-bit reference tags.
> > Introduce a helper macro since this will be a repeated operation.
> []
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> []
> > @@ -63,6 +63,12 @@
> >  }					\
> >  )
> >  
> > +/**
> > + * lower_48_bits - return bits 0-47 of a number
> > + * @n: the number we're accessing
> > + */
> > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> 
> why not make this a static inline function?

Agreed.

> And visually, it's difficult to quickly count a repeated character to 12.
> 
> Perhaps:
> 
> static inline u64 lower_48_bits(u64 val)
> {
> 	return val & GENMASK_ULL(47, 0);
> }

For anyone who has a minimum knowledge of C and hardware your version
is an obsfucated clusterfuck, while the version Keith wrote is trivial
to read.

Chaitanya Kulkarni Feb. 22, 2022, 4:52 p.m. UTC | #2

On 2/22/22 08:31, Keith Busch wrote:
> The NVMe protocol extended the data integrity fields with unaligned
> 48-bit reference tags. Provide some helper accessors in
> preparation for these.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck

Keith Busch Feb. 22, 2022, 4:56 p.m. UTC | #3

On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > +/**
> > > + * lower_48_bits - return bits 0-47 of a number
> > > + * @n: the number we're accessing
> > > + */
> > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > 
> > why not make this a static inline function?
> 
> Agreed.

Sure, that sounds good to me. I only did it this way to match the
existing local convention, but I personally prefer the inline function
too.

Joe Perches Feb. 22, 2022, 6:43 p.m. UTC | #4

On Tue, 2022-02-22 at 08:56 -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> > On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > > +/ *
> > > > + * lower_48_bits - return bits 0-47 of a number
> > > > + * @n: the number we're accessing
> > > > + */
> > > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > > 
> > > why not make this a static inline function?
> > 
> > Agreed.
> 
> Sure, that sounds good to me. I only did it this way to match the
> existing local convention, but I personally prefer the inline function
> too. 

The existing convention is used there to allow the compiler to
avoid warnings and unnecessary conversions of a u32 to a u64 when
shifting by 32 or more bits.

If it's possible to be used with an architecture dependent typedef
like dma_addr_t, then perhaps it's reasonable to do something like:

#define lower_48_bits(val)					\
({								\
	typeof(val) high = lower_16_bits(upper_32_bits(val));	\
	typeof(val) low = lower_32_bits(val);			\
								\
	(high << 16 << 16) | low;				\
})

and have the compiler have the return value be an appropriate type.

Eric Biggers Feb. 22, 2022, 7:50 p.m. UTC | #5

On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> +config CRYPTO_CRC64_ROCKSOFT
> +	tristate "Rocksoft Model CRC64 algorithm"
> +	depends on CRC64
> +	select CRYPTO_HASH
> +	help
> +	  Rocksoft Model CRC64 computation is being cast as a crypto
> +	  transform. This allows for faster crc64 transforms to be used
> +	  if they are available.

The first sentence of this help text doesn't make sense.

> diff --git a/crypto/crc64_rocksoft_generic.c b/crypto/crc64_rocksoft_generic.c
> new file mode 100644
> index 000000000000..55bad1939614
> --- /dev/null
> +++ b/crypto/crc64_rocksoft_generic.c
> @@ -0,0 +1,104 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Cryptographic API.

The "Cryptographic API" line doesn't provide any helpful information.

> +static int chksum_final(struct shash_desc *desc, u8 *out)
> +{
> +	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> +	*(u64 *)out = ctx->crc;
> +	return 0;
> +}
> +
> +static int __chksum_finup(u64 crc, const u8 *data, unsigned int len, u8 *out)
> +{
> +	*(u64 *)out = crc64_rocksoft_generic(crc, data, len);
> +	return 0;
> +}

These 64-bit writes violate alignment rules and will give the wrong result on
big endian CPUs.  They need to use put_unaligned_le64().

> +static int __init crc64_rocksoft_x86_mod_init(void)
> +{
> +	return crypto_register_shash(&alg);
> +}
> +
> +static void __exit crc64_rocksoft_x86_mod_fini(void)
> +{
> +	crypto_unregister_shash(&alg);
> +}

This has nothing to do with x86.

> +config CRC64_ROCKSOFT
> +	tristate "CRC calculation for the Rocksoft^TM model CRC64"

I'm sure what the rules for trademarks are, but kernel source code usually
doesn't have the trademark symbol/abbreviation scattered everywhere.

> +	select CRYPTO
> +	select CRYPTO_CRC64_ROCKSOFT
> +	help
> +	  This option is only needed if a module that's not in the
> +	  kernel tree needs to calculate CRC checks for use with the
> +	  rocksoft model parameters.

Out-of-tree modules can't be the reason to have a kconfig option.  What is the
real reason?

> +u64 crc64_rocksoft(const unsigned char *buffer, size_t len)
> +{
> +	return crc64_rocksoft_update(~0ULL, buffer, len);
> +}
> +EXPORT_SYMBOL(crc64_rocksoft);

Isn't this missing the bitwise inversion at the end?

> +MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
> +MODULE_DESCRIPTION("Rocksoft model CRC64 calculation (library API)");
> +MODULE_LICENSE("GPL");
> +MODULE_SOFTDEP("pre: crc64");

Shouldn't the MODULE_SOFTDEP be on crc64-rocksoft?

- Eric

Eric Biggers Feb. 22, 2022, 7:56 p.m. UTC | #6

On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> Hardware specific features may be able to calculate a crc64, so provide
> a framework for drivers to register their implementation. If nothing is
> registered, fallback to the generic table lookup implementation. The
> implementation is modeled after the crct10dif equivalent.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  crypto/Kconfig                  |   9 +++
>  crypto/Makefile                 |   1 +
>  crypto/crc64_rocksoft_generic.c | 104 +++++++++++++++++++++++++
>  include/linux/crc64.h           |   5 ++
>  lib/Kconfig                     |   9 +++
>  lib/Makefile                    |   1 +
>  lib/crc64-rocksoft.c            | 129 ++++++++++++++++++++++++++++++++
>  7 files changed, 258 insertions(+)
>  create mode 100644 crypto/crc64_rocksoft_generic.c
>  create mode 100644 lib/crc64-rocksoft.c

I tried testing this, but I can't because it is missing a self-test:

[    0.736340] alg: No test for crc64-rocksoft (crc64-rocksoft-generic)
[    5.440398] alg: No test for crc64-rocksoft (crc64-rocksoft-pclmul)

All algorithms registered with the crypto API need to have a self-test
(in crypto/testmgr.c).

- Eric

Keith Busch Feb. 22, 2022, 8:09 p.m. UTC | #7

On Tue, Feb 22, 2022 at 11:50:42AM -0800, Eric Biggers wrote:
> On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> > +config CRYPTO_CRC64_ROCKSOFT
> > +	tristate "Rocksoft Model CRC64 algorithm"
> > +	depends on CRC64
> > +	select CRYPTO_HASH
> > +	help
> > +	  Rocksoft Model CRC64 computation is being cast as a crypto
> > +	  transform. This allows for faster crc64 transforms to be used
> > +	  if they are available.
> 
> The first sentence of this help text doesn't make sense.

Much of the this patch is a copy from the equivalent T10DIF option,
which says the same as above. I meant to revist these help texts because
they don't make sense to me either. I'll be sure to do that for the next
version.

Joe Perches Feb. 22, 2022, 8:31 p.m. UTC | #8

On Tue, 2022-02-22 at 20:09 +0000, David Laight wrote:
> From: Joe Perches
> > Sent: 22 February 2022 18:43
> > 
> > On Tue, 2022-02-22 at 08:56 -0800, Keith Busch wrote:
> > > On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> > > > On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > > > > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > > > > +/ *
> > > > > > + * lower_48_bits - return bits 0-47 of a number
> > > > > > + * @n: the number we're accessing
> > > > > > + */
> > > > > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > > > > 
> > > > > why not make this a static inline function?
> > > > 
> > > > Agreed.
> > > 
> > > Sure, that sounds good to me. I only did it this way to match the
> > > existing local convention, but I personally prefer the inline function
> > > too.
> > 
> > The existing convention is used there to allow the compiler to
> > avoid warnings and unnecessary conversions of a u32 to a u64 when
> > shifting by 32 or more bits.
> > 
> > If it's possible to be used with an architecture dependent typedef
> > like dma_addr_t, then perhaps it's reasonable to do something like:
> > 
> > #define lower_48_bits(val)					\
> > ({								\
> > 	typeof(val) high = lower_16_bits(upper_32_bits(val));	\
> > 	typeof(val) low = lower_32_bits(val);			\
> > 								\
> > 	(high << 16 << 16) | low;				\
> > })
> > 
> > and have the compiler have the return value be an appropriate type.
> 
> The compiler could make a real pigs breakfast of that.

Both gcc and clang optimize it just fine.

btw: to return the same type the last line should be:

	(typeof(val))((high << 16 << 16) | low);

otherwise the return is sizeof(int) if typeof(val) is not u64

> Oh, did you look for GENMASK([^,]*,[ 0]*) ?

No, why?  I did look for 47, 0 though.

But it's pretty common really.

$ git grep -P 'GENMASK(?:_ULL)?\s*\(\s*\d+\s*,\s*0\s*\)' | wc -l
6233

> I'd only use something GENMASK() for bit ranges.
> Even then it is often easier to just write the value in hex.

Mostly it's the count of the repeated f that's difficult to
quickly verify visually.

> I think the only time I've written anything like that recently
> (last 30 years) was for some hardware registers when the documentation
> user 'bit 1' for the most significant bit.

Luckily the hardware I've had to deal with never did that.
Not even the least significant bit too.

Keith Busch Feb. 22, 2022, 9:12 p.m. UTC | #9

On Tue, Feb 22, 2022 at 12:31:30PM -0800, Joe Perches wrote:
> > I'd only use something GENMASK() for bit ranges.
> > Even then it is often easier to just write the value in hex.
> 
> Mostly it's the count of the repeated f that's difficult to
> quickly verify visually.

I admit I made this counting mistake in earlier versions of this series.

I think the earlier suggested "(1ull << 48) - 1" style in an inline
function seems good, and it doesn't appear to cause compiler concerns.

Christoph Hellwig Feb. 25, 2022, 4:11 p.m. UTC | #10

On Tue, Feb 22, 2022 at 12:09:07PM -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 11:50:42AM -0800, Eric Biggers wrote:
> > On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> > > +config CRYPTO_CRC64_ROCKSOFT
> > > +	tristate "Rocksoft Model CRC64 algorithm"
> > > +	depends on CRC64
> > > +	select CRYPTO_HASH
> > > +	help
> > > +	  Rocksoft Model CRC64 computation is being cast as a crypto
> > > +	  transform. This allows for faster crc64 transforms to be used
> > > +	  if they are available.
> > 
> > The first sentence of this help text doesn't make sense.
> 
> Much of the this patch is a copy from the equivalent T10DIF option,
> which says the same as above. I meant to revist these help texts because
> they don't make sense to me either. I'll be sure to do that for the next
> version.

It might make sense to make CRC_T10DIF an invisible option as well
and drop the help text entirely.

[PATCHv3,00/10] 64-bit data integrity field support

Message

Comments