Message ID | 20250421021509.2366003-2-yi.zhang@huaweicloud.com |
---|---|
State | New |
Headers | show |
Series | fallocate: introduce FALLOC_FL_WRITE_ZEROES flag | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
Hi Zhang! > + [RO] Devices that explicitly support the unmap write zeroes > + operation in which a single write zeroes request with the unmap > + bit set to zero out the range of contiguous blocks on storage > + by freeing blocks, rather than writing physical zeroes to the > + media. If the write_zeroes_unmap is set to 1, this indicates > + that the device explicitly supports the write zero command. > + However, this may be a best-effort optimization rather than a > + mandatory requirement, some devices may partially fall back to > + writing physical zeroes due to factors such as receiving > + unaligned commands. If the parameter is set to 0, the device > + either does not support this operation, or its support status is > + unknown. I am not so keen on mixing Write Zeroes (which is NVMe-speak) and Unmap (which is SCSI). Also, Deallocate and Unmap reflect block provisioning state on the device but don't really convey what is semantically important for your proposed change (zeroing speed and/or media wear reduction). That said, I'm having a hard time coming up with a better term. WRITE_ZEROES_OPTIMIZED, maybe? Naming is hard... For the description, perhaps something like the following which tries to focus on the block layer semantics without using protocol-specific terminology? [RO] This parameter indicates whether a device supports zeroing data in a specified block range without incurring the cost of physically writing zeroes to media for each individual block. This operation is a best-effort optimization, a device may fall back to physically writing zeroes to media due to other factors such as misalignment or being asked to clear a block range smaller than the device's internal allocation unit. If write_zeroes_unmap is set to 1, the device implements a zeroing operation which opportunistically avoids writing zeroes to media while still guaranteeing that subsequent reads from the specified block range will return zeroed data. If write_zeroes_unmap is set to 0, the device may have to write each logical block media during a zeroing operation.
Hi, Martin! On 2025/5/6 12:21, Martin K. Petersen wrote: > > Hi Zhang! > >> + [RO] Devices that explicitly support the unmap write zeroes >> + operation in which a single write zeroes request with the unmap >> + bit set to zero out the range of contiguous blocks on storage >> + by freeing blocks, rather than writing physical zeroes to the >> + media. If the write_zeroes_unmap is set to 1, this indicates >> + that the device explicitly supports the write zero command. >> + However, this may be a best-effort optimization rather than a >> + mandatory requirement, some devices may partially fall back to >> + writing physical zeroes due to factors such as receiving >> + unaligned commands. If the parameter is set to 0, the device >> + either does not support this operation, or its support status is >> + unknown. > > I am not so keen on mixing Write Zeroes (which is NVMe-speak) and Unmap > (which is SCSI). Also, Deallocate and Unmap reflect block provisioning > state on the device but don't really convey what is semantically > important for your proposed change (zeroing speed and/or media wear > reduction). > Since this flag doesn't strictly guarantee zeroing speed or media wear reduction optimizations, but rather reflects typical optimization behavior across most supported devices and cases. Therefore, I propose using a name that accurately indicates the function of the block device. However, also can't think of a better name either. Using the name WRITE_ZEROES_UNMAP seems appropriate to convey that the block device supports this type of Deallocate and Unmap state. > That said, I'm having a hard time coming up with a better term. > WRITE_ZEROES_OPTIMIZED, maybe? Naming is hard... Using WRITE_ZEROES_OPTIMIZED feels somewhat too generic to me, and users may not fully grasp the specific optimizations it entails based on the name. > > For the description, perhaps something like the following which tries to > focus on the block layer semantics without using protocol-specific > terminology? > > [RO] This parameter indicates whether a device supports zeroing data in > a specified block range without incurring the cost of physically writing > zeroes to media for each individual block. This operation is a > best-effort optimization, a device may fall back to physically writing > zeroes to media due to other factors such as misalignment or being asked > to clear a block range smaller than the device's internal allocation > unit. If write_zeroes_unmap is set to 1, the device implements a zeroing > operation which opportunistically avoids writing zeroes to media while > still guaranteeing that subsequent reads from the specified block range > will return zeroed data. If write_zeroes_unmap is set to 0, the device > may have to write each logical block media during a zeroing operation. > Thank you for optimizing the description, it looks good to me. I'd like to this one in my next iteration. :) Thanks, Yi.
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 3879963f0f01..6531cdfcaacf 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -763,6 +763,24 @@ Description: 0, write zeroes is not supported by the device. +What: /sys/block/<disk>/queue/write_zeroes_unmap +Date: January 2025 +Contact: Zhang Yi <yi.zhang@huawei.com> +Description: + [RO] Devices that explicitly support the unmap write zeroes + operation in which a single write zeroes request with the unmap + bit set to zero out the range of contiguous blocks on storage + by freeing blocks, rather than writing physical zeroes to the + media. If the write_zeroes_unmap is set to 1, this indicates + that the device explicitly supports the write zero command. + However, this may be a best-effort optimization rather than a + mandatory requirement, some devices may partially fall back to + writing physical zeroes due to factors such as receiving + unaligned commands. If the parameter is set to 0, the device + either does not support this operation, or its support status is + unknown. + + What: /sys/block/<disk>/queue/zone_append_max_bytes Date: May 2020 Contact: linux-block@vger.kernel.org diff --git a/block/blk-settings.c b/block/blk-settings.c index 6b2dbe645d23..3331d07bd5d9 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -697,6 +697,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->features &= ~BLK_FEAT_NOWAIT; if (!(b->features & BLK_FEAT_POLL)) t->features &= ~BLK_FEAT_POLL; + if (!(b->features & BLK_FEAT_WRITE_ZEROES_UNMAP)) + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP; t->flags |= (b->flags & BLK_FLAG_MISALIGNED); @@ -819,6 +821,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->zone_write_granularity = 0; t->max_zone_append_sectors = 0; } + + if (!t->max_write_zeroes_sectors) + t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP; + blk_stack_atomic_writes_limits(t, b, start); return ret; diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index a2882751f0d2..7a9c20bd3779 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -261,6 +261,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \ QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA); QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX); +QUEUE_SYSFS_FEATURE_SHOW(write_zeroes_unmap, BLK_FEAT_WRITE_ZEROES_UNMAP); static ssize_t queue_poll_show(struct gendisk *disk, char *page) { @@ -510,6 +511,7 @@ QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes"); QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes"); QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes"); +QUEUE_LIM_RO_ENTRY(queue_write_zeroes_unmap, "write_zeroes_unmap"); QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes"); QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity"); @@ -656,6 +658,7 @@ static struct attribute *queue_attrs[] = { &queue_atomic_write_unit_min_entry.attr, &queue_atomic_write_unit_max_entry.attr, &queue_max_write_zeroes_sectors_entry.attr, + &queue_write_zeroes_unmap_entry.attr, &queue_max_zone_append_sectors_entry.attr, &queue_zone_write_granularity_entry.attr, &queue_rotational_entry.attr, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e39c45bc0a97..7c8752578e36 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -342,6 +342,9 @@ typedef unsigned int __bitwise blk_features_t; #define BLK_FEAT_ATOMIC_WRITES \ ((__force blk_features_t)(1u << 16)) +/* supports unmap write zeroes command */ +#define BLK_FEAT_WRITE_ZEROES_UNMAP ((__force blk_features_t)(1u << 17)) + /* * Flags automatically inherited when stacking limits. */ @@ -1341,6 +1344,11 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) return bdev_limits(bdev)->max_write_zeroes_sectors; } +static inline bool bdev_write_zeroes_unmap(struct block_device *bdev) +{ + return bdev_limits(bdev)->features & BLK_FEAT_WRITE_ZEROES_UNMAP; +} + static inline bool bdev_nonrot(struct block_device *bdev) { return blk_queue_nonrot(bdev_get_queue(bdev));