diff mbox series

[04/13] block: Restore write hint support

Message ID 20230920191442.3701673-5-bvanassche@acm.org
State New
Headers show
Series Pass data temperature information to zoned UFS devices | expand

Commit Message

Bart Van Assche Sept. 20, 2023, 7:14 p.m. UTC
This patch partially reverts commit c75e707fe1aa ("block: remove the
per-bio/request write hint"). The following aspects of that commit have
been reverted:
- Pass the struct kiocb write hint information to struct bio.
- Pass the struct bio write hint information to struct request.
- Do not merge requests with different write hints.
- Passing write hint information from the VFS layer to the block layer.
- In F2FS, initialization of bio.bi_write_hint.

The following aspects of that commit have been dropped:
- Debugfs support for retrieving and modifying write hints.
- md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
- The write_hints[] array in struct request_queue.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/bio.c                 |  2 ++
 block/blk-crypto-fallback.c |  1 +
 block/blk-merge.c           | 14 ++++++++++++++
 block/blk-mq.c              |  2 ++
 block/bounce.c              |  1 +
 block/fops.c                |  3 +++
 fs/buffer.c                 | 13 ++++++++-----
 fs/direct-io.c              |  1 +
 fs/f2fs/data.c              |  2 ++
 fs/iomap/buffered-io.c      |  2 ++
 fs/iomap/direct-io.c        |  1 +
 fs/mpage.c                  |  1 +
 include/linux/blk-mq.h      |  1 +
 include/linux/blk_types.h   |  1 +
 14 files changed, 40 insertions(+), 5 deletions(-)

Comments

Avri Altman Oct. 2, 2023, 11:23 a.m. UTC | #1
> This patch partially reverts commit c75e707fe1aa ("block: remove the
> per-bio/request write hint"). The following aspects of that commit have
> been reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
Any particular reason to left those out?

> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
Native Linux with ext4 is being used in automotive, and even mobile platforms.
E.g. Qualcomm's RB5 is formally maintained with Debian - https://releases.linaro.org/96boards/rb5/linaro/debian/21.12/

Thanks,
Avri
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bart Van Assche Oct. 2, 2023, 5:02 p.m. UTC | #2
On 10/2/23 04:23, Avri Altman wrote:
>> The following aspects of that commit have been dropped:
>> - Debugfs support for retrieving and modifying write hints.
> 
> Any particular reason to left those out?

The above comment is misleading: what has not been restored is the
struct request_queue write_hints[] array member nor the debugfs
interface for accessing that array. My understanding is that that array
was used to track stream statistics by the NVMe driver. From version
v5.17 of the NVMe driver:

	if (streamid < ARRAY_SIZE(req->q->write_hints))
		req->q->write_hints[streamid] += blk_rq_bytes(req) >> 9;

>> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> 
> Native Linux with ext4 is being used in automotive, and even mobile
> platforms. E.g. Qualcomm's RB5 is formally maintained with Debian -
> https://releases.linaro.org/96boards/rb5/linaro/debian/21.12/

All ext4 did with write hint information is to copy the inode write
hint information into the bio. The inode write hint information is set
by the F_SET_RW_HINT fcntl. The only software packages that use the
F_SET_RW_HINT fcntl and that I'm aware of are RocksDB, Samba, stress_ng
and rr. Are any of these software packages used in automotive software
on top of ext4?

Thanks,

Bart.
Avri Altman Oct. 2, 2023, 6:08 p.m. UTC | #3
> This patch partially reverts commit c75e707fe1aa ("block: remove the per-
> bio/request write hint"). The following aspects of that commit have been
> reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Bean Huo Oct. 3, 2023, 7:52 p.m. UTC | #4
On 20.09.23 9:14 PM, Bart Van Assche wrote:
> Cc: Christoph Hellwig<hch@lst.de>
> Signed-off-by: Bart Van Assche<bvanassche@acm.org>

Reviewed-by: Bean Huo <beanhuo@micron.com>
Daejun Park Oct. 5, 2023, 11:46 a.m. UTC | #5
> This patch partially reverts commit c75e707fe1aa ("block: remove the
> per-bio/request write hint"). The following aspects of that commit have
> been reverted:
> - Pass the struct kiocb write hint information to struct bio.
> - Pass the struct bio write hint information to struct request.
> - Do not merge requests with different write hints.
> - Passing write hint information from the VFS layer to the block layer.
> - In F2FS, initialization of bio.bi_write_hint.
> 
> The following aspects of that commit have been dropped:
> - Debugfs support for retrieving and modifying write hints.
> - md-raid, BTRFS, ext4, gfs2 and zonefs write hint support.
> - The write_hints[] array in struct request_queue.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Daejun Park <daejun7.park@samsung.com>
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..755fcde5cb66 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@  void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_opf = opf;
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
+	bio->bi_write_hint = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -813,6 +814,7 @@  static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
+	bio->bi_write_hint = bio_src->bi_write_hint;
 	bio->bi_iter = bio_src->bi_iter;
 
 	if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index e6468eab2681..b1e7415f8439 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -172,6 +172,7 @@  static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 65e75efa9bd3..b1854d6bd081 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -817,6 +817,13 @@  static struct request *attempt_merge(struct request_queue *q,
 	if (req->ioprio != next->ioprio)
 		return NULL;
 
+	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if (req->write_hint != next->write_hint)
+		return NULL;
+
 	/*
 	 * If we are allowed to merge, then append bio list
 	 * from next to rq and release next. merge_requests_fn
@@ -944,6 +951,13 @@  bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (rq->ioprio != bio_prio(bio))
 		return false;
 
+	/*
+	 * Don't allow merge of different write hints, or for a hint with
+	 * non-hint IO.
+	 */
+	if (rq->write_hint != bio->bi_write_hint)
+		return false;
+
 	return true;
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ec922c6bccbe..1326d1661f0e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2563,6 +2563,7 @@  static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
 		rq->cmd_flags |= REQ_FAILFAST_MASK;
 
 	rq->__sector = bio->bi_iter.bi_sector;
+	rq->write_hint = bio->bi_write_hint;
 	blk_rq_bio_prep(rq, bio, nr_segs);
 
 	/* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -3160,6 +3161,7 @@  int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
 	}
 	rq->nr_phys_segments = rq_src->nr_phys_segments;
 	rq->ioprio = rq_src->ioprio;
+	rq->write_hint = rq_src->write_hint;
 
 	if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
 		goto free_and_out;
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..d6a5219f29dd 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -169,6 +169,7 @@  static struct bio *bounce_clone_bio(struct bio *bio_src)
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
+	bio->bi_write_hint	= bio_src->bi_write_hint;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/fops.c b/block/fops.c
index acff3d5d22d4..6923de13665f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -74,6 +74,7 @@  static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio.bi_ioprio = iocb->ki_ioprio;
+	bio.bi_write_hint = iocb->ki_hint;
 
 	ret = bio_iov_iter_get_pages(&bio, iter);
 	if (unlikely(ret))
@@ -206,6 +207,7 @@  static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
+		bio->bi_write_hint = iocb->ki_hint;
 
 		ret = bio_iov_iter_get_pages(bio, iter);
 		if (unlikely(ret)) {
@@ -323,6 +325,7 @@  static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
+	bio->bi_write_hint = iocb->ki_hint;
 
 	if (iov_iter_is_bvec(iter)) {
 		/*
diff --git a/fs/buffer.c b/fs/buffer.c
index 2379564e5aea..bf1d94f7a96a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -55,7 +55,7 @@ 
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc);
+			  enum rw_hint hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -1904,7 +1904,8 @@  int __block_write_full_folio(struct inode *inode, struct folio *folio,
 	do {
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+					inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -1958,7 +1959,8 @@  int __block_write_full_folio(struct inode *inode, struct folio *folio,
 		struct buffer_head *next = bh->b_this_page;
 		if (buffer_async_write(bh)) {
 			clear_buffer_dirty(bh);
-			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+			submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+					inode->i_write_hint, wbc);
 			nr_underway++;
 		}
 		bh = next;
@@ -2770,7 +2772,7 @@  static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-			  struct writeback_control *wbc)
+			  enum rw_hint write_hint, struct writeback_control *wbc)
 {
 	const enum req_op op = opf & REQ_OP_MASK;
 	struct bio *bio;
@@ -2797,6 +2799,7 @@  static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+	bio->bi_write_hint = write_hint;
 
 	__bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
@@ -2816,7 +2819,7 @@  static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
 {
-	submit_bh_wbc(opf, bh, NULL);
+	submit_bh_wbc(opf, bh, 0, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7bc494ee56b9..bfa32c6ed3dd 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -404,6 +404,7 @@  dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
 	 */
 	bio = bio_alloc(bdev, nr_vecs, dio->opf, GFP_KERNEL);
 	bio->bi_iter.bi_sector = first_sector;
+	bio->bi_write_hint = dio->iocb->ki_hint;
 	if (dio->is_async)
 		bio->bi_end_io = dio_bio_end_aio;
 	else
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 916e317ac925..d759a7b8478f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -478,6 +478,8 @@  static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
 	} else {
 		bio->bi_end_io = f2fs_write_end_io;
 		bio->bi_private = sbi;
+		bio->bi_write_hint =
+			f2fs_io_type_to_rw_hint(sbi, fio->type, fio->temp);
 	}
 	iostat_alloc_and_bind_ctx(sbi, bio, NULL);
 
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ae8673ce08b1..a344418a82ad 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1654,6 +1654,7 @@  iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = sector;
+	bio->bi_write_hint = inode->i_write_hint;
 	wbc_init_bio(wbc, bio);
 
 	ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
@@ -1684,6 +1685,7 @@  iomap_chain_bio(struct bio *prev)
 	new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
 	bio_clone_blkg_association(new, prev);
 	new->bi_iter.bi_sector = bio_end_sector(prev);
+	new->bi_write_hint = prev->bi_write_hint;
 
 	bio_chain(prev, new);
 	bio_get(prev);		/* for iomap_finish_ioend */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..afb704f98a97 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -380,6 +380,7 @@  static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
 					  GFP_KERNEL);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+		bio->bi_write_hint = dio->iocb->ki_hint;
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
 		bio->bi_end_io = iomap_dio_bio_end_io;
diff --git a/fs/mpage.c b/fs/mpage.c
index 242e213ee064..5d444d2c39f1 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -612,6 +612,7 @@  static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 				GFP_NOFS);
 		bio->bi_iter.bi_sector = blocks[0] << (blkbits - 9);
 		wbc_init_bio(wbc, bio);
+		bio->bi_write_hint = inode->i_write_hint;
 	}
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 958ed7e89b30..d2605fb5ee63 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -137,6 +137,7 @@  struct request {
 	struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
+	unsigned short write_hint;
 	unsigned short ioprio;
 
 	enum mq_rq_state state;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..6d1617f2123b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -269,6 +269,7 @@  struct bio {
 						 */
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
+	unsigned short		bi_write_hint;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;