[08/26] block: Implement zone append emulation

Given that zone write plugging manages all writes to zones of a zoned
block device, we can track the write pointer position of all zones in
order to implement zone append emulation using regular write operations.
This is needed for devices that do not natively support the zone append
command, e.g. SMR hard-disks.

This commit adds zone write pointer tracking similarly to how the SCSI
disk driver (sd) does, that is, in the form of a 32-bits number of
sectors equal to the offset within the zone of the zone write pointer.
The wp_offset field is added to struct blk_zone_wplug for this. Write
pointer tracking is only enabled for zoned devices that requested
zone append emulation by setting the max_zone_append_sectors queue
limit of the disk to 0.

For zoned devices that requested zone append emulation, wp_offset is
managed as follows:
 - It is incremented when a write BIO is prepared for submission or
   merged into a new request. This is done in
   blk_zone_wplug_prepare_bio() when a BIO is unplugged, in
   blk_zone_write_plug_bio_merged() when a new unplugged BIO is merged
   before zone write plugging and in blk_zone_write_plug_attempt_merge()
   when plugged BIOs are merged into a new request.
 - The helper functions blk_zone_handle_reset() and
   blk_zone_handle_reset_all() are added to set the write pointer
   offset to 0 for the targeted zones of REQ_OP_ZONE_RESET and
   REQ_OP_ZONE_RESETALL operations.
 - The helper function blk_zone_handle_finish() is added to set the
   write pointer offset to the zone size for the target zone of a
   REQ_OP_ZONE_FINISH operation.

The function blk_zone_wplug_prepare_bio() also checks and prepares a BIO
for submission. Preparation involves changing zone append BIOs into
non-mergeable regular write BIOs for devices that require zone append
emulation. Modified zone append BIOs are flagged with the new BIO flag
BIO_EMULATES_ZONE_APPEND. This flag is checked on completion of the
BIO in blk_zone_complete_requests_bio() to restore the original
REQ_OP_ZONE_APPEND operation code of the BIO.

If a write error happens, the wp_offset value may become incorrect and
out of sync with the device managed write pointer. This is handled using
the new zone write plug flag BLK_ZONE_WPLUG_ERROR. The function
blk_zone_wplug_handle_error() is called from the new disk zone write
plug work when this flag is set. This function executes a report zone to
update the zone write pointer offset to the current value as indicated
by the device. The disk zone write plug work is scheduled whenever a BIO
flagged with BIO_ZONE_WRITE_PLUGGING completes with an error or when
bio_zone_wplug_prepare_bio() detects an unaligned write. Once scheduled,
the disk zone write plugs work keeps running until all zone errors are
handled.

The block layer internal inline helper function bio_is_zone_append() is
added to test if a BIO is either a native zone append operation
(REQ_OP_ZONE_APPEND operation code) or if it is flagged with
BIO_EMULATES_ZONE_APPEND. Given that both native and emulated zone
append BIO completion handling should be similar, The functions
blk_update_request() and blk_zone_complete_request_bio() are modified to
use bio_is_zone_append() to execute blk_zone_complete_request_bio() for
both native and emulated zone append operations.

This commit contains contributions from Christoph Hellwig <hch@lst.de>.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 block/blk-mq.c            |   2 +-
 block/blk-zoned.c         | 457 ++++++++++++++++++++++++++++++++++++--
 block/blk.h               |  14 +-
 include/linux/blk_types.h |   1 +
 include/linux/blkdev.h    |   3 +
 5 files changed, 452 insertions(+), 25 deletions(-)

Message ID	20240202073104.2418230-9-dlemoal@kernel.org
State	New
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E210618AED; Fri, 2 Feb 2024 07:31:18 +0000 (UTC) From: Damien Le Moal <dlemoal@kernel.org> To: linux-block@vger.kernel.org, Jens Axboe <axboe@kernel.dk>, linux-scsi@vger.kernel.org, "Martin K . Petersen" <martin.petersen@oracle.com>, dm-devel@lists.linux.dev, Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Subject: [PATCH 08/26] block: Implement zone append emulation Date: Fri, 2 Feb 2024 16:30:46 +0900 Message-ID: <20240202073104.2418230-9-dlemoal@kernel.org> In-Reply-To: <20240202073104.2418230-1-dlemoal@kernel.org> References: <20240202073104.2418230-1-dlemoal@kernel.org> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Zone write plugging \| expand [00/26] Zone write plugging [01/26] block: Restore sector of flush requests [02/26] block: Remove req_bio_endio() [03/26] block: Introduce bio_straddle_zones() and bio_offset_from_zone_start() [04/26] block: Introduce blk_zone_complete_request_bio() [05/26] block: Allow using bio_attempt_back_merge() internally [06/26] block: Introduce zone write plugging [07/26] block: Allow zero value of max_zone_append_sectors queue limit [08/26] block: Implement zone append emulation [09/26] block: Allow BIO-based drivers to use blk_revalidate_disk_zones() [10/26] dm: Use the block layer zone append emulation [11/26] scsi: sd: Use the block layer zone append emulation [12/26] ublk_drv: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature [13/26] null_blk: Do not request ELEVATOR_F_ZBD_SEQ_WRITE elevator feature [14/26] null_blk: Introduce zone_append_max_sectors attribute [15/26] null_blk: Introduce fua attribute [16/26] nvmet: zns: Do not reference the gendisk conv_zones_bitmap [17/26] block: Remove BLK_STS_ZONE_RESOURCE [18/26] block: Simplify blk_revalidate_disk_zones() interface [19/26] block: mq-deadline: Remove support for zone write locking [20/26] block: Remove elevator required features [21/26] block: Do not check zone type in blk_check_zone_append() [22/26] block: Move zone related debugfs attribute to blk-zoned.c [23/26] block: Remove zone write locking [24/26] block: Do not special-case plugging of zone write operations [25/26] block: Reduce zone write plugging memory usage [26/26] block: Add zone_active_wplugs debugfs entry

[08/26] block: Implement zone append emulation

Commit Message

Comments

Patch