diff mbox series

[7/8] block: refactor the bounce buffering code

Message ID 20210318063923.302738-8-hch@lst.de
State Superseded
Headers show
Series [1/8] aha1542: use a local bounce buffer | expand

Commit Message

Christoph Hellwig March 18, 2021, 6:39 a.m. UTC
Get rid of all the PFN arithmetics and just use an enum for the two
remaining options, and use PageHighMem for the actual bounce decision.

Add a fast path to entirely avoid the call for the common case of a queue
not using the legacy bouncing code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       |  6 ++----
 block/blk-settings.c   | 42 ++++++++----------------------------------
 block/blk.h            | 16 ++++++++++++----
 block/bounce.c         | 35 +++++------------------------------
 include/linux/blkdev.h | 29 +++++++++++------------------
 5 files changed, 38 insertions(+), 90 deletions(-)

Comments

Matthew Wilcox March 18, 2021, 11:29 a.m. UTC | #1
On Thu, Mar 18, 2021 at 07:39:22AM +0100, Christoph Hellwig wrote:
> @@ -536,7 +518,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  					b->max_write_zeroes_sectors);
>  	t->max_zone_append_sectors = min(t->max_zone_append_sectors,
>  					b->max_zone_append_sectors);
> -	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
> +	t->bounce = min_not_zero(t->bounce, b->bounce);

I see how min_not_zero() made sense when it was a pfn.  Does it still
make sense now it's an enum?  I would have thought it'd now be 'max()',
given the definitions later on.

> +/*
> + * BLK_BOUNCE_NONE:	never bounce (default)
> + * BLK_BOUNCE_HIGH:	bounce all highmem pages
> + */
> +enum blk_bounce {
> +	BLK_BOUNCE_NONE,
> +	BLK_BOUNCE_HIGH,
> +};
> +
>  struct queue_limits {
> -	unsigned long		bounce_pfn;
> +	enum blk_bounce		bounce;
>  	unsigned long		seg_boundary_mask;
>  	unsigned long		virt_boundary_mask;
>
Christoph Hellwig March 18, 2021, 12:53 p.m. UTC | #2
On Thu, Mar 18, 2021 at 11:29:50AM +0000, Matthew Wilcox wrote:
> On Thu, Mar 18, 2021 at 07:39:22AM +0100, Christoph Hellwig wrote:
> > @@ -536,7 +518,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >  					b->max_write_zeroes_sectors);
> >  	t->max_zone_append_sectors = min(t->max_zone_append_sectors,
> >  					b->max_zone_append_sectors);
> > -	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
> > +	t->bounce = min_not_zero(t->bounce, b->bounce);
> 
> I see how min_not_zero() made sense when it was a pfn.  Does it still
> make sense now it's an enum?  I would have thought it'd now be 'max()',
> given the definitions later on.

Actually, blk_stack_limits should not look at ->bounce_pfn / ->bounce
at all.  blk_queue_bounce is only called my blk_mq_submit_bio, and
the only stacked blk-mq driver (dm-mpath) does not need bouncing.

I'll add a patch to fix this up to the front of the series.
Benjamin Block March 24, 2021, 5:40 p.m. UTC | #3
On Thu, Mar 18, 2021 at 01:53:40PM +0100, Christoph Hellwig wrote:
> On Thu, Mar 18, 2021 at 11:29:50AM +0000, Matthew Wilcox wrote:

> > On Thu, Mar 18, 2021 at 07:39:22AM +0100, Christoph Hellwig wrote:

> > > @@ -536,7 +518,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,

> > >  					b->max_write_zeroes_sectors);

> > >  	t->max_zone_append_sectors = min(t->max_zone_append_sectors,

> > >  					b->max_zone_append_sectors);

> > > -	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);

> > > +	t->bounce = min_not_zero(t->bounce, b->bounce);

> > 

> > I see how min_not_zero() made sense when it was a pfn.  Does it still

> > make sense now it's an enum?  I would have thought it'd now be 'max()',

> > given the definitions later on.

> 

> Actually, blk_stack_limits should not look at ->bounce_pfn / ->bounce

> at all.  blk_queue_bounce is only called my blk_mq_submit_bio, and

> the only stacked blk-mq driver (dm-mpath) does not need bouncing.

> 

> I'll add a patch to fix this up to the front of the series.


Is blk_queue_bounce() called again when mpath submits the request to the
lower device? I thought when I looked at this code some time ago
bouncing would only be checked the first time a request is created
(dm-mpath), and then not again, so when we don't check for whether
bouncing is necessary in mpath, we still might screw the LLD - hence why
we might inherit this via the limits.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH    /    https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen         /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Christoph Hellwig March 24, 2021, 5:44 p.m. UTC | #4
On Wed, Mar 24, 2021 at 06:40:32PM +0100, Benjamin Block wrote:
> Is blk_queue_bounce() called again when mpath submits the request to the

> lower device? I thought when I looked at this code some time ago

> bouncing would only be checked the first time a request is created

> (dm-mpath), and then not again, so when we don't check for whether

> bouncing is necessary in mpath, we still might screw the LLD - hence why

> we might inherit this via the limits.


Every call to blk_mq_submit_bio also calls blk_queue_bounce, and
blk_queue_bounce then checks if it needs to do anything based on the
bounce limit and max_pfn, and if needed proceeds to check every bvec.

So for extremely unlikely case thay someone is running multipath over one
of the few remaining drivers that need block layering bounce buffering
this inheritance just leads to (harmless) extra work.
Benjamin Block March 24, 2021, 6:07 p.m. UTC | #5
On Wed, Mar 24, 2021 at 06:44:58PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 24, 2021 at 06:40:32PM +0100, Benjamin Block wrote:

> > Is blk_queue_bounce() called again when mpath submits the request to the

> > lower device? I thought when I looked at this code some time ago

> > bouncing would only be checked the first time a request is created

> > (dm-mpath), and then not again, so when we don't check for whether

> > bouncing is necessary in mpath, we still might screw the LLD - hence why

> > we might inherit this via the limits.

> 

> Every call to blk_mq_submit_bio also calls blk_queue_bounce, 


But map_request() -> multipath_clone_and_map() -> dm_dispatch_clone_request() 
doesn't call blk_mq_submit_bio() for requests that have been queued in a
request based mpath device. The requests gets cloned and then dispatched
on the lower queue. Or am I missing something?

> and

> blk_queue_bounce then checks if it needs to do anything based on the

> bounce limit and max_pfn, and if needed proceeds to check every bvec.

> 

> So for extremely unlikely case thay someone is running multipath over one

> of the few remaining drivers that need block layering bounce buffering

> this inheritance just leads to (harmless) extra work.


Yeah fair enough, I don't know whether anyone would care for those old
drivers; it just crossed my mind.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH    /    https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen         /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
Christoph Hellwig March 24, 2021, 6:13 p.m. UTC | #6
On Wed, Mar 24, 2021 at 07:07:40PM +0100, Benjamin Block wrote:
> But map_request() -> multipath_clone_and_map() -> dm_dispatch_clone_request() 

> doesn't call blk_mq_submit_bio() for requests that have been queued in a

> request based mpath device. The requests gets cloned and then dispatched

> on the lower queue. Or am I missing something?


Indeed.  I keep forgetting about the fact that dm-mpath bypassed
the normal submit_bio_noacct path.
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff20849738..9bcdae93f6d4f7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1161,10 +1161,8 @@  static blk_status_t blk_cloned_rq_check_limits(struct request_queue *q,
 	}
 
 	/*
-	 * queue's settings related to segment counting like q->bounce_pfn
-	 * may differ from that of other stacking queues.
-	 * Recalculate it to check the request correctly on this queue's
-	 * limitation.
+	 * The queue settings related to segment counting may differ from the
+	 * original queue.
 	 */
 	rq->nr_phys_segments = blk_recalc_rq_segments(rq);
 	if (rq->nr_phys_segments > queue_max_segments(q)) {
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f9937dd2810e25..c7e26d16c59c0e 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -7,7 +7,6 @@ 
 #include <linux/init.h>
 #include <linux/bio.h>
 #include <linux/blkdev.h>
-#include <linux/memblock.h>	/* for max_pfn/max_low_pfn */
 #include <linux/gcd.h>
 #include <linux/lcm.h>
 #include <linux/jiffies.h>
@@ -17,11 +16,6 @@ 
 #include "blk.h"
 #include "blk-wbt.h"
 
-unsigned long blk_max_low_pfn;
-EXPORT_SYMBOL(blk_max_low_pfn);
-
-unsigned long blk_max_pfn;
-
 void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
 {
 	q->rq_timeout = timeout;
@@ -55,7 +49,7 @@  void blk_set_default_limits(struct queue_limits *lim)
 	lim->discard_alignment = 0;
 	lim->discard_misaligned = 0;
 	lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
-	lim->bounce_pfn = (unsigned long)(BLK_BOUNCE_ANY >> PAGE_SHIFT);
+	lim->bounce = BLK_BOUNCE_NONE;
 	lim->alignment_offset = 0;
 	lim->io_opt = 0;
 	lim->misaligned = 0;
@@ -92,28 +86,16 @@  EXPORT_SYMBOL(blk_set_stacking_limits);
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
- * @max_addr: the maximum address the device can handle
+ * @bounce: bounce limit to enforce
  *
  * Description:
- *    Different hardware can have different requirements as to what pages
- *    it can do I/O directly to. A low level driver can call
- *    blk_queue_bounce_limit to have lower memory pages allocated as bounce
- *    buffers for doing I/O to pages residing above @max_addr.
+ *    Force bouncing for ISA DMA ranges or highmem.
+ *
+ *    DEPRECATED, don't use in new code.
  **/
-void blk_queue_bounce_limit(struct request_queue *q, u64 max_addr)
+void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce bounce)
 {
-	unsigned long b_pfn = max_addr >> PAGE_SHIFT;
-
-#if BITS_PER_LONG == 64
-	/*
-	 * Assume anything <= 4GB can be handled by IOMMU.  Actually
-	 * some IOMMUs can handle everything, but I don't know of a
-	 * way to test this here.
-	 */
-	q->limits.bounce_pfn = max(max_low_pfn, b_pfn);
-#else
-	q->limits.bounce_pfn = b_pfn;
-#endif
+	q->limits.bounce = bounce;
 }
 EXPORT_SYMBOL(blk_queue_bounce_limit);
 
@@ -536,7 +518,7 @@  int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 					b->max_write_zeroes_sectors);
 	t->max_zone_append_sectors = min(t->max_zone_append_sectors,
 					b->max_zone_append_sectors);
-	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
+	t->bounce = min_not_zero(t->bounce, b->bounce);
 
 	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
 					    b->seg_boundary_mask);
@@ -916,11 +898,3 @@  void blk_queue_set_zoned(struct gendisk *disk, enum blk_zoned_model model)
 	}
 }
 EXPORT_SYMBOL_GPL(blk_queue_set_zoned);
-
-static int __init blk_settings_init(void)
-{
-	blk_max_low_pfn = max_low_pfn - 1;
-	blk_max_pfn = max_pfn - 1;
-	return 0;
-}
-subsys_initcall(blk_settings_init);
diff --git a/block/blk.h b/block/blk.h
index 895c9f4a5182a7..8f4337c5a9e66c 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -6,6 +6,7 @@ 
 #include <linux/blk-mq.h>
 #include <linux/part_stat.h>
 #include <linux/blk-crypto.h>
+#include <linux/memblock.h>	/* for max_pfn/max_low_pfn */
 #include <xen/xen.h>
 #include "blk-crypto-internal.h"
 #include "blk-mq.h"
@@ -311,13 +312,20 @@  static inline void blk_throtl_bio_endio(struct bio *bio) { }
 static inline void blk_throtl_stat_add(struct request *rq, u64 time) { }
 #endif
 
-#ifdef CONFIG_BOUNCE
-extern void blk_queue_bounce(struct request_queue *q, struct bio **bio);
-#else
+void __blk_queue_bounce(struct request_queue *q, struct bio **bio);
+
+static inline bool blk_queue_may_bounce(struct request_queue *q)
+{
+	return IS_ENABLED(CONFIG_BOUNCE) &&
+		q->limits.bounce == BLK_BOUNCE_HIGH &&
+		max_low_pfn >= max_pfn;
+}
+
 static inline void blk_queue_bounce(struct request_queue *q, struct bio **bio)
 {
+	if (unlikely(blk_queue_may_bounce(q) && bio_has_data(*bio)))
+		__blk_queue_bounce(q, bio);	
 }
-#endif /* CONFIG_BOUNCE */
 
 #ifdef CONFIG_BLK_CGROUP_IOLATENCY
 extern int blk_iolatency_init(struct request_queue *q);
diff --git a/block/bounce.c b/block/bounce.c
index debd5b0bd31890..6bafc0d1f867a1 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -18,7 +18,6 @@ 
 #include <linux/init.h>
 #include <linux/hash.h>
 #include <linux/highmem.h>
-#include <linux/memblock.h>
 #include <linux/printk.h>
 #include <asm/tlbflush.h>
 
@@ -49,11 +48,11 @@  static void init_bounce_bioset(void)
 	bounce_bs_setup = true;
 }
 
-#if defined(CONFIG_HIGHMEM)
 static __init int init_emergency_pool(void)
 {
 	int ret;
-#if defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTPLUG)
+
+#ifndef CONFIG_MEMORY_HOTPLUG
 	if (max_pfn <= max_low_pfn)
 		return 0;
 #endif
@@ -67,9 +66,7 @@  static __init int init_emergency_pool(void)
 }
 
 __initcall(init_emergency_pool);
-#endif
 
-#ifdef CONFIG_HIGHMEM
 /*
  * highmem version, map in to vec
  */
@@ -82,13 +79,6 @@  static void bounce_copy_vec(struct bio_vec *to, unsigned char *vfrom)
 	kunmap_atomic(vto);
 }
 
-#else /* CONFIG_HIGHMEM */
-
-#define bounce_copy_vec(to, vfrom)	\
-	memcpy(page_address((to)->bv_page) + (to)->bv_offset, vfrom, (to)->bv_len)
-
-#endif /* CONFIG_HIGHMEM */
-
 /*
  * Simple bounce buffer support for highmem pages. Depending on the
  * queue gfp mask set, *to may or may not be a highmem page. kmap it
@@ -236,8 +226,7 @@  static struct bio *bounce_clone_bio(struct bio *bio_src)
 	return NULL;
 }
 
-
-void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
+void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 {
 	struct bio *bio;
 	int rw = bio_data_dir(*bio_orig);
@@ -247,24 +236,10 @@  void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	bool bounce = false;
 	int sectors = 0;
 
-	/*
-	 * Data-less bio, nothing to bounce
-	 */
-	if (!bio_has_data(*bio_orig))
-		return;
-
-	/*
-	 * Just check if the bounce pfn is equal to or bigger than the highest
-	 * pfn in the system -- in that case, don't waste time iterating over
-	 * bio segments
-	 */
-	if (q->limits.bounce_pfn >= blk_max_pfn)
-		return;
-
 	bio_for_each_segment(from, *bio_orig, iter) {
 		if (i++ < BIO_MAX_VECS)
 			sectors += from.bv_len >> 9;
-		if (page_to_pfn(from.bv_page) > q->limits.bounce_pfn)
+		if (PageHighMem(from.bv_page))
 			bounce = true;
 	}
 	if (!bounce)
@@ -287,7 +262,7 @@  void blk_queue_bounce(struct request_queue *q, struct bio **bio_orig)
 	for (i = 0, to = bio->bi_io_vec; i < bio->bi_vcnt; to++, i++) {
 		struct page *page = to->bv_page;
 
-		if (page_to_pfn(page) <= q->limits.bounce_pfn)
+		if (!PageHighMem(page))
 			continue;
 
 		to->bv_page = mempool_alloc(&page_pool, GFP_NOIO);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0dbb72ea373529..55cc8b96c84427 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -313,8 +313,17 @@  enum blk_zoned_model {
 	BLK_ZONED_HM,		/* Host-managed zoned block device */
 };
 
+/*
+ * BLK_BOUNCE_NONE:	never bounce (default)
+ * BLK_BOUNCE_HIGH:	bounce all highmem pages
+ */
+enum blk_bounce {
+	BLK_BOUNCE_NONE,
+	BLK_BOUNCE_HIGH,
+};
+
 struct queue_limits {
-	unsigned long		bounce_pfn;
+	enum blk_bounce		bounce;
 	unsigned long		seg_boundary_mask;
 	unsigned long		virt_boundary_mask;
 
@@ -835,22 +844,6 @@  static inline unsigned int blk_queue_depth(struct request_queue *q)
 	return q->nr_requests;
 }
 
-extern unsigned long blk_max_low_pfn, blk_max_pfn;
-
-/*
- * standard bounce addresses:
- *
- * BLK_BOUNCE_HIGH	: bounce all highmem pages
- * BLK_BOUNCE_ANY	: don't bounce anything
- */
-
-#if BITS_PER_LONG == 32
-#define BLK_BOUNCE_HIGH		((u64)blk_max_low_pfn << PAGE_SHIFT)
-#else
-#define BLK_BOUNCE_HIGH		-1ULL
-#endif
-#define BLK_BOUNCE_ANY		(-1ULL)
-
 /*
  * default timeout for SG_IO if none specified
  */
@@ -1134,7 +1127,7 @@  extern void blk_abort_request(struct request *);
  * Access functions for manipulating queue properties
  */
 extern void blk_cleanup_queue(struct request_queue *);
-extern void blk_queue_bounce_limit(struct request_queue *, u64);
+void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce limit);
 extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);