mbox series

[v3,00/10] Add Copy offload support

Message ID 20220214080002.18381-1-nj.shetty@samsung.com
Headers show
Series Add Copy offload support | expand

Message

Nitesh Shetty Feb. 14, 2022, 7:59 a.m. UTC
The patch series covers the points discussed in November 2021 virtual call
[LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
We have covered the Initial agreed requirements in this patchset.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.

Overall series supports –

1. Driver
- NVMe Copy command (single NS), including support in nvme-target (for
	block and file backend)

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface accommodating
	two block-devs, and multi-source/destination interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl

4. In-kernel user
- dm-kcopyd

[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/

Changes in v3:
- fixed possible race condition reported by Damien Le Moal
- new sysfs controls as suggested by Damien Le Moal
- fixed possible memory leak reported by Dan Carpenter, lkp
- minor fixes


Arnav Dawn (1):
  nvmet: add copy command support for bdev and file ns

Nitesh Shetty (6):
  block: Introduce queue limits for copy-offload support
  block: Add copy offload support infrastructure
  block: Introduce a new ioctl for copy
  block: add emulation for copy
  dm: Add support for copy offload.
  dm: Enable copy offload for dm-linear target

SelvaKumar S (3):
  block: make bio_map_kern() non static
  nvme: add copy offload support
  dm kcopyd: use copy offload support

 block/blk-lib.c                   | 346 ++++++++++++++++++++++++++++++
 block/blk-map.c                   |   2 +-
 block/blk-settings.c              |  59 +++++
 block/blk-sysfs.c                 | 138 ++++++++++++
 block/blk.h                       |   2 +
 block/ioctl.c                     |  32 +++
 drivers/md/dm-kcopyd.c            |  55 ++++-
 drivers/md/dm-linear.c            |   1 +
 drivers/md/dm-table.c             |  45 ++++
 drivers/md/dm.c                   |   6 +
 drivers/nvme/host/core.c          | 119 +++++++++-
 drivers/nvme/host/fc.c            |   4 +
 drivers/nvme/host/nvme.h          |   7 +
 drivers/nvme/host/pci.c           |   9 +
 drivers/nvme/host/rdma.c          |   6 +
 drivers/nvme/host/tcp.c           |   8 +
 drivers/nvme/host/trace.c         |  19 ++
 drivers/nvme/target/admin-cmd.c   |   8 +-
 drivers/nvme/target/io-cmd-bdev.c |  65 ++++++
 drivers/nvme/target/io-cmd-file.c |  48 +++++
 include/linux/blk_types.h         |  21 ++
 include/linux/blkdev.h            |  17 ++
 include/linux/device-mapper.h     |   5 +
 include/linux/nvme.h              |  43 +++-
 include/uapi/linux/fs.h           |  23 ++
 25 files changed, 1074 insertions(+), 14 deletions(-)


base-commit: 23a3fe5e6bb58304e662c604b86bc1264453e888

Comments

Dave Chinner Feb. 14, 2022, 10:08 p.m. UTC | #1
On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual call
> [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> We have covered the Initial agreed requirements in this patchset.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
> 
> Overall series supports –
> 
> 1. Driver
> - NVMe Copy command (single NS), including support in nvme-target (for
> 	block and file backend)
> 
> 2. Block layer
> - Block-generic copy (REQ_COPY flag), with interface accommodating
> 	two block-devs, and multi-source/destination interface
> - Emulation, when offload is natively absent
> - dm-linear support (for cases not requiring split)
> 
> 3. User-interface
> - new ioctl
> 
> 4. In-kernel user
> - dm-kcopyd

The biggest missing piece - and arguably the single most useful
piece of this functionality for users - is hooking this up to the
copy_file_range() syscall so that user file copies can be offloaded
to the hardware efficiently.

This seems like it would relatively easy to do with an fs/iomap iter
loop that maps src + dst file ranges and issues block copy offload
commands on the extents. We already do similar "read from source,
write to destination" operations in iomap, so it's not a huge
stretch to extent the iomap interfaces to provide an copy offload
mechanism using this infrastructure.

Also, hooking this up to copy-file-range() will also get you
immediate data integrity testing right down to the hardware via fsx
in fstests - it uses copy_file_range() as one of it's operations and
it will find all the off-by-one failures in both the linux IO stack
implementation and the hardware itself.

And, in reality, I wouldn't trust a block copy offload mechanism
until it is integrated with filesystems, the page cache and has
solid end-to-end data integrity testing available to shake out all
the bugs that will inevitably exist in this stack....

Cheers,

Dave.
Nitesh Shetty Feb. 17, 2022, 1:02 p.m. UTC | #2
Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote:
> On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote:
> > The patch series covers the points discussed in November 2021 virtual call
> > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > We have covered the Initial agreed requirements in this patchset.
> > Patchset borrows Mikulas's token based approach for 2 bdev
> > implementation.
> > 
> > Overall series supports –
> > 
> > 1. Driver
> > - NVMe Copy command (single NS), including support in nvme-target (for
> > 	block and file backend)
> > 
> > 2. Block layer
> > - Block-generic copy (REQ_COPY flag), with interface accommodating
> > 	two block-devs, and multi-source/destination interface
> > - Emulation, when offload is natively absent
> > - dm-linear support (for cases not requiring split)
> > 
> > 3. User-interface
> > - new ioctl
> > 
> > 4. In-kernel user
> > - dm-kcopyd
> 
> The biggest missing piece - and arguably the single most useful
> piece of this functionality for users - is hooking this up to the
> copy_file_range() syscall so that user file copies can be offloaded
> to the hardware efficiently.
> 
> This seems like it would relatively easy to do with an fs/iomap iter
> loop that maps src + dst file ranges and issues block copy offload
> commands on the extents. We already do similar "read from source,
> write to destination" operations in iomap, so it's not a huge
> stretch to extent the iomap interfaces to provide an copy offload
> mechanism using this infrastructure.
> 
> Also, hooking this up to copy-file-range() will also get you
> immediate data integrity testing right down to the hardware via fsx
> in fstests - it uses copy_file_range() as one of it's operations and
> it will find all the off-by-one failures in both the linux IO stack
> implementation and the hardware itself.
> 
> And, in reality, I wouldn't trust a block copy offload mechanism
> until it is integrated with filesystems, the page cache and has
> solid end-to-end data integrity testing available to shake out all
> the bugs that will inevitably exist in this stack....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>

We had planned copy_file_range (CFR) in next phase of copy offload patch series.
Thinking that we will get to CFR when everything else is robust.
But if that is needed to make things robust, will start looking into that.

--
Nitesh Shetty
Dave Chinner Feb. 23, 2022, 1:43 a.m. UTC | #3
On Thu, Feb 17, 2022 at 06:32:15PM +0530, Nitesh Shetty wrote:
>  Tue, Feb 15, 2022 at 09:08:12AM +1100, Dave Chinner wrote:
> > On Mon, Feb 14, 2022 at 01:29:50PM +0530, Nitesh Shetty wrote:
> > > [LSF/MM/BFP TOPIC] Storage: Copy Offload[0].
> > The biggest missing piece - and arguably the single most useful
> > piece of this functionality for users - is hooking this up to the
> > copy_file_range() syscall so that user file copies can be offloaded
> > to the hardware efficiently.
> > 
> > This seems like it would relatively easy to do with an fs/iomap iter
> > loop that maps src + dst file ranges and issues block copy offload
> > commands on the extents. We already do similar "read from source,
> > write to destination" operations in iomap, so it's not a huge
> > stretch to extent the iomap interfaces to provide an copy offload
> > mechanism using this infrastructure.
> > 
> > Also, hooking this up to copy-file-range() will also get you
> > immediate data integrity testing right down to the hardware via fsx
> > in fstests - it uses copy_file_range() as one of it's operations and
> > it will find all the off-by-one failures in both the linux IO stack
> > implementation and the hardware itself.
> > 
> > And, in reality, I wouldn't trust a block copy offload mechanism
> > until it is integrated with filesystems, the page cache and has
> > solid end-to-end data integrity testing available to shake out all
> > the bugs that will inevitably exist in this stack....
> 
> We had planned copy_file_range (CFR) in next phase of copy offload patch series.
> Thinking that we will get to CFR when everything else is robust.
> But if that is needed to make things robust, will start looking into that.

How do you make it robust when there is no locking/serialisation to
prevent overlapping concurrent IO while the copy-offload is in
progress? Or that you don't have overlapping concurrent
copy-offloads running at the same time?

You've basically created a block dev ioctl interface that looks
impossible to use safely. It doesn't appear to be coherent with the
blockdev page cache nor does it appear to have any documented data
integrity semantics, either. e.g. how does this interact with the
guarantees that fsync_bdev() and/or sync_blockdev() are supposed to
provide?

IOWs, if you don't have either CFR or some other strictly bound
kernel user with well defined access, synchronisation and integrity
semantics, how can anyone actually robustly test these ioctls to be
working correctly in all situations they might be called?

Cheers,

Dave.