diff mbox series

[v6,1/7] block: Introduce the flag QUEUE_FLAG_NO_ZONE_WRITE_LOCK

Message ID 20230804154821.3232094-2-bvanassche@acm.org
State New
Headers show
Series Improve performance for zoned UFS devices | expand

Commit Message

Bart Van Assche Aug. 4, 2023, 3:47 p.m. UTC
Writes in sequential write required zones must happen at the write
pointer. Even if the submitter of the write commands (e.g. a filesystem)
submits writes for sequential write required zones in order, the block
layer or the storage controller may reorder these write commands.

The zone locking mechanism in the mq-deadline I/O scheduler serializes
write commands for sequential zones. Some but not all storage controllers
require this serialization. Introduce a new request queue flag to allow
block drivers to indicate that they preserve the order of write commands
and thus do not require serialization of writes per zone.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 include/linux/blkdev.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Jens Axboe Aug. 8, 2023, 9:19 p.m. UTC | #1
On 8/4/23 9:47?AM, Bart Van Assche wrote:
> Writes in sequential write required zones must happen at the write
> pointer. Even if the submitter of the write commands (e.g. a filesystem)
> submits writes for sequential write required zones in order, the block
> layer or the storage controller may reorder these write commands.
> 
> The zone locking mechanism in the mq-deadline I/O scheduler serializes
> write commands for sequential zones. Some but not all storage controllers
> require this serialization. Introduce a new request queue flag to allow
> block drivers to indicate that they preserve the order of write commands
> and thus do not require serialization of writes per zone.

Looking at how this is used, why not call it QUEUE_FLAG_ZONE_WRITE_LOCK
instead? That'd make the code easier to immediately grok, rather than
deal with double negations.
Bart Van Assche Aug. 8, 2023, 9:46 p.m. UTC | #2
On 8/8/23 14:19, Jens Axboe wrote:
> On 8/4/23 9:47?AM, Bart Van Assche wrote:
>> Writes in sequential write required zones must happen at the write
>> pointer. Even if the submitter of the write commands (e.g. a filesystem)
>> submits writes for sequential write required zones in order, the block
>> layer or the storage controller may reorder these write commands.
>>
>> The zone locking mechanism in the mq-deadline I/O scheduler serializes
>> write commands for sequential zones. Some but not all storage controllers
>> require this serialization. Introduce a new request queue flag to allow
>> block drivers to indicate that they preserve the order of write commands
>> and thus do not require serialization of writes per zone.
> 
> Looking at how this is used, why not call it QUEUE_FLAG_ZONE_WRITE_LOCK
> instead? That'd make the code easier to immediately grok, rather than
> deal with double negations.

Hi Jens,

Do I understand correctly that you want me to set the
QUEUE_FLAG_ZONE_WRITE_LOCK flag for all request queues by adding it to
QUEUE_FLAG_MQ_DEFAULT and also that the UFS driver should clear the
QUEUE_FLAG_ZONE_WRITE_LOCK flag?

Thanks,

Bart.
Jens Axboe Aug. 8, 2023, 10:27 p.m. UTC | #3
On 8/8/23 3:46?PM, Bart Van Assche wrote:
> On 8/8/23 14:19, Jens Axboe wrote:
>> On 8/4/23 9:47?AM, Bart Van Assche wrote:
>>> Writes in sequential write required zones must happen at the write
>>> pointer. Even if the submitter of the write commands (e.g. a filesystem)
>>> submits writes for sequential write required zones in order, the block
>>> layer or the storage controller may reorder these write commands.
>>>
>>> The zone locking mechanism in the mq-deadline I/O scheduler serializes
>>> write commands for sequential zones. Some but not all storage controllers
>>> require this serialization. Introduce a new request queue flag to allow
>>> block drivers to indicate that they preserve the order of write commands
>>> and thus do not require serialization of writes per zone.
>>
>> Looking at how this is used, why not call it QUEUE_FLAG_ZONE_WRITE_LOCK
>> instead? That'd make the code easier to immediately grok, rather than
>> deal with double negations.
> 
> Hi Jens,
> 
> Do I understand correctly that you want me to set the
> QUEUE_FLAG_ZONE_WRITE_LOCK flag for all request queues by adding it to
> QUEUE_FLAG_MQ_DEFAULT and also that the UFS driver should clear the
> QUEUE_FLAG_ZONE_WRITE_LOCK flag?

I don't think setting that flag by default makes a lot of sense, if the
device in question isn't zoned. Maybe have variants of BLK_ZONED_* which
has a locked and unlocked variant for each where it applies? Perhaps
have the lock flag be common between them so you can check them in the
same way? That'd keep the fact that it's zoned and if it needs locking
in the same spot, rather than scatter them in two spots.
Bart Van Assche Aug. 9, 2023, 1:45 p.m. UTC | #4
On 8/8/23 15:27, Jens Axboe wrote:
> I don't think setting that flag by default makes a lot of sense, if the
> device in question isn't zoned. Maybe have variants of BLK_ZONED_* which
> has a locked and unlocked variant for each where it applies? Perhaps
> have the lock flag be common between them so you can check them in the
> same way? That'd keep the fact that it's zoned and if it needs locking
> in the same spot, rather than scatter them in two spots.

Hi Jens,

That's an interesting suggestion but there is a complication: the zone type
is set by different code than the code that decides whether or not locking
is required. For SCSI devices the zone type is set from inside
drivers/scsi/sd.c or drivers/scsi/sd_zbc.c while the locking requirements
come from the SCSI LLD (drivers/ufs/core/ufshcd.c). Additionally, the code
for converting between model type and model string in scsi_debug.c would
look weird if BLK_ZONED_* is split into locked and unlocked variants. See
also the zbc_model_strs_*[] arrays and the sdeb_zbc_model_str() function.

It should be possible to move the flag that indicates whether or not zone
locking is required into struct queue_limits next to "enum blk_zoned_model
zoned;". If nobody objects I will select this alternative.

Thanks,

Bart.
diff mbox series

Patch

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2f5371b8482c..b6eb988f81f3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -534,6 +534,12 @@  struct request_queue {
 #define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
+/*
+ * The device supports not using the zone write locking mechanism to serialize
+ * write operations (REQ_OP_WRITE, REQ_OP_WRITE_ZEROES) issued to a sequential
+ * write required zone (BLK_ZONE_TYPE_SEQWRITE_REQ).
+ */
+#define QUEUE_FLAG_NO_ZONE_WRITE_LOCK 8
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
@@ -597,6 +603,11 @@  bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_skip_tagset_quiesce(q) \
 	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
 
+static inline bool blk_queue_no_zone_write_lock(struct request_queue *q)
+{
+	return test_bit(QUEUE_FLAG_NO_ZONE_WRITE_LOCK, &q->queue_flags);
+}
+
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);