diff mbox series

[RFC,v2,11/11] scsi: storvsc: Support PAGE_SIZE larger than 4K

Message ID 20200902030107.33380-12-boqun.feng@gmail.com
State New
Headers show
Series Hyper-V: Support PAGE_SIZE larger than 4K | expand

Commit Message

Boqun Feng Sept. 2, 2020, 3:01 a.m. UTC
Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
as the unit for page related data. For storvsc, the data is
vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
into Hyper-V pages in vmbus_packet_mpb_array.

This patch does the conversion by dividing pages in sglist into Hyper-V
pages, offset and indexes in vmbus_packet_mpb_array are recalculated
accordingly.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 6 deletions(-)

Comments

Michael Kelley Sept. 5, 2020, 2:55 a.m. UTC | #1
From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> 
> Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> as the unit for page related data. For storvsc, the data is
> vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> into Hyper-V pages in vmbus_packet_mpb_array.
> 
> This patch does the conversion by dividing pages in sglist into Hyper-V
> pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> accordingly.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 54 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 8f5f5dc863a4..3f6610717d4e 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> scsi_cmnd *scmnd)
>  	payload_sz = sizeof(cmd_request->mpb);
> 
>  	if (sg_count) {
> -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> +		unsigned int hvpg_idx = 0;
> +		unsigned int j = 0;
> +		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> +		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> 
> -			payload_sz = (sg_count * sizeof(u64) +
> +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> +
> +			payload_sz = (hvpg_count * sizeof(u64) +
>  				      sizeof(struct vmbus_packet_mpb_array));
>  			payload = kzalloc(payload_sz, GFP_ATOMIC);
>  			if (!payload)
>  				return SCSI_MLQUEUE_DEVICE_BUSY;
>  		}
> 
> +		/*
> +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> +		 */
>  		payload->range.len = length;
> -		payload->range.offset = sgl[0].offset;
> +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> +		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> 
>  		cur_sgl = sgl;
> -		for (i = 0; i < sg_count; i++) {
> -			payload->range.pfn_array[i] =
> -				page_to_pfn(sg_page((cur_sgl)));
> +		for (i = 0, j = 0; i < sg_count; i++) {
> +			/*
> +			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> +			 * of HV_HYP_PAGEs in the current PAGE.
> +			 *
> +			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> +			 *
> +			 * As shown in the following, the minimal of both is
> +			 * the # of HV_HYP_PAGEs, we need to handle in this
> +			 * PAGE.
> +			 *
> +			 * |------------------ PAGE ----------------------|
> +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> +			 *           ^                     ^
> +			 *         hvpg_idx                |
> +			 *           ^                     |
> +			 *           +---(hvpg_count - j)--+
> +			 *
> +			 * or
> +			 *
> +			 * |------------------ PAGE ----------------------|
> +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> +			 *           ^                                           ^
> +			 *         hvpg_idx                                      |
> +			 *           ^                                           |
> +			 *           +---(hvpg_count - j)------------------------+
> +			 */
> +			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx,
> +						   hvpg_count - j);
> +			unsigned int k;
> +
> +			for (k = 0; k < nr_hvpg; k++) {
> +				payload->range.pfn_array[j] =
> +					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
> +				j++;
> +			}
>  			cur_sgl = sg_next(cur_sgl);
> +			hvpg_idx = 0;
>  		}

This code works; I don't see any errors.  But I think it can be made simpler based
on doing two things:
1)  Rather than iterating over the sg_count, and having to calculate nr_hvpg on
each iteration, base the exit decision on having filled up the pfn_array[].  You've
already calculated the exact size of the array that is needed given the data
length, so it's easy to exit when the array is full.
2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
rather than from 0 to a calculated value.

Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
inner loop.

I think this code does it (though I haven't tested it):

                for (j = 0; ; sgl = sg_next(sgl)) {
                        unsigned int k;
                        unsigned long pfn;

                        pfn = page_to_hvpfn(sg_page(sgl));
                        for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) {
                                payload->range.pfn_array[j] = pfn + k;
                                if (++j == hvpg_count)
                                        goto done;
                        }
                        hvpg_idx = 0;
                }
done:

This approach also makes the limit of the inner loop a constant, and that
constant will be 1 when page size is 4K.  So the compiler should be able to
optimize away the loop in that case.

Michael






>  	}
> 
> --
> 2.28.0
Michael Kelley Sept. 5, 2020, 3:37 p.m. UTC | #2
From: Boqun Feng <boqun.feng@gmail.com> Sent: Saturday, September 5, 2020 7:15 AM
> 
> On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote:
> > From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> > >
> > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > > as the unit for page related data. For storvsc, the data is
> > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > > into Hyper-V pages in vmbus_packet_mpb_array.
> > >
> > > This patch does the conversion by dividing pages in sglist into Hyper-V
> > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > > accordingly.
> > >
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 54 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 8f5f5dc863a4..3f6610717d4e 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct
> > > scsi_cmnd *scmnd)
> > >  	payload_sz = sizeof(cmd_request->mpb);
> > >
> > >  	if (sg_count) {
> > > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +		unsigned int hvpg_idx = 0;
> > > +		unsigned int j = 0;
> > > +		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> > > +		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> > >
> > > -			payload_sz = (sg_count * sizeof(u64) +
> > > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +
> > > +			payload_sz = (hvpg_count * sizeof(u64) +
> > >  				      sizeof(struct vmbus_packet_mpb_array));
> > >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> > >  			if (!payload)
> > >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> > >  		}
> > >
> > > +		/*
> > > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > > +		 */
> > >  		payload->range.len = length;
> > > -		payload->range.offset = sgl[0].offset;
> > > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > > +		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > >
> > >  		cur_sgl = sgl;
> > > -		for (i = 0; i < sg_count; i++) {
> > > -			payload->range.pfn_array[i] =
> > > -				page_to_pfn(sg_page((cur_sgl)));
> > > +		for (i = 0, j = 0; i < sg_count; i++) {
> > > +			/*
> > > +			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> > > +			 * of HV_HYP_PAGEs in the current PAGE.
> > > +			 *
> > > +			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> > > +			 *
> > > +			 * As shown in the following, the minimal of both is
> > > +			 * the # of HV_HYP_PAGEs, we need to handle in this
> > > +			 * PAGE.
> > > +			 *
> > > +			 * |------------------ PAGE ----------------------|
> > > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +			 *           ^                     ^
> > > +			 *         hvpg_idx                |
> > > +			 *           ^                     |
> > > +			 *           +---(hvpg_count - j)--+
> > > +			 *
> > > +			 * or
> > > +			 *
> > > +			 * |------------------ PAGE ----------------------|
> > > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +			 *           ^                                           ^
> > > +			 *         hvpg_idx                                      |
> > > +			 *           ^                                           |
> > > +			 *           +---(hvpg_count - j)------------------------+
> > > +			 */
> > > +			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE /
> HV_HYP_PAGE_SIZE) - hvpg_idx,
> > > +						   hvpg_count - j);
> > > +			unsigned int k;
> > > +
> > > +			for (k = 0; k < nr_hvpg; k++) {
> > > +				payload->range.pfn_array[j] =
> > > +					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
> > > +				j++;
> > > +			}
> > >  			cur_sgl = sg_next(cur_sgl);
> > > +			hvpg_idx = 0;
> > >  		}
> >
> > This code works; I don't see any errors.  But I think it can be made simpler based
> > on doing two things:
> > 1)  Rather than iterating over the sg_count, and having to calculate nr_hvpg on
> > each iteration, base the exit decision on having filled up the pfn_array[].  You've
> > already calculated the exact size of the array that is needed given the data
> > length, so it's easy to exit when the array is full.
> > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
> > rather than from 0 to a calculated value.
> >
> > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
> > inner loop.
> >
> > I think this code does it (though I haven't tested it):
> >
> >                 for (j = 0; ; sgl = sg_next(sgl)) {
> >                         unsigned int k;
> >                         unsigned long pfn;
> >
> >                         pfn = page_to_hvpfn(sg_page(sgl));
> >                         for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) {
> >                                 payload->range.pfn_array[j] = pfn + k;
> >                                 if (++j == hvpg_count)
> >                                         goto done;
> >                         }
> >                         hvpg_idx = 0;
> >                 }
> > done:
> >
> > This approach also makes the limit of the inner loop a constant, and that
> > constant will be 1 when page size is 4K.  So the compiler should be able to
> > optimize away the loop in that case.
> >
> 
> Good point! I like your suggestion, and after thinking a bit harder
> based on your approach, I come up with the following:
> 
> #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE))
> 
> 		for (j = 0; j < hvpg_count; j++) {
> 			unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE;
> 
> 			/*
> 			 * Two cases that we need to fetch a page:
> 			 * a) j == 0: the first step or
> 			 * b) k == 0: when we reach the boundary of a
> 			 * page.
> 			 *
> 			if (k == 0 || j == 0) {
> 				pfn = page_to_hvpfn(sg_page(cur_sgl));
> 				cur_sgl = sg_next(cur_sgl);
> 			}
> 
> 			payload->range.pfn_arrary[j] = pfn + k;
> 		}
> 
> , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think
> compilers could easily optimize the "%" into bit masking operation. And
> when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out
> k is always zero, then the if-statement can be optimized as always
> taken. And that gives us the same code as before ;-)
> 
> Thoughts? I will try with a test to see if I'm missing something subtle.
> 
> Thanks for looking into this!
> 

Your newest version looks right to me -- very clever!  I like it even better 
than my version.

Michael
diff mbox series

Patch

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8f5f5dc863a4..3f6610717d4e 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1739,23 +1739,71 @@  static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
 	payload_sz = sizeof(cmd_request->mpb);
 
 	if (sg_count) {
-		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
+		unsigned int hvpg_idx = 0;
+		unsigned int j = 0;
+		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
+		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
 
-			payload_sz = (sg_count * sizeof(u64) +
+		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
+
+			payload_sz = (hvpg_count * sizeof(u64) +
 				      sizeof(struct vmbus_packet_mpb_array));
 			payload = kzalloc(payload_sz, GFP_ATOMIC);
 			if (!payload)
 				return SCSI_MLQUEUE_DEVICE_BUSY;
 		}
 
+		/*
+		 * sgl is a list of PAGEs, and payload->range.pfn_array
+		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
+		 * page size that Hyper-V uses, so here we need to divide PAGEs
+		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
+		 */
 		payload->range.len = length;
-		payload->range.offset = sgl[0].offset;
+		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
+		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
 
 		cur_sgl = sgl;
-		for (i = 0; i < sg_count; i++) {
-			payload->range.pfn_array[i] =
-				page_to_pfn(sg_page((cur_sgl)));
+		for (i = 0, j = 0; i < sg_count; i++) {
+			/*
+			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
+			 * of HV_HYP_PAGEs in the current PAGE.
+			 *
+			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
+			 *
+			 * As shown in the following, the minimal of both is
+			 * the # of HV_HYP_PAGEs, we need to handle in this
+			 * PAGE.
+			 *
+			 * |------------------ PAGE ----------------------|
+			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
+			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
+			 *           ^                     ^
+			 *         hvpg_idx                |
+			 *           ^                     |
+			 *           +---(hvpg_count - j)--+
+			 *
+			 * or
+			 *
+			 * |------------------ PAGE ----------------------|
+			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
+			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
+			 *           ^                                           ^
+			 *         hvpg_idx                                      |
+			 *           ^                                           |
+			 *           +---(hvpg_count - j)------------------------+
+			 */
+			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx,
+						   hvpg_count - j);
+			unsigned int k;
+
+			for (k = 0; k < nr_hvpg; k++) {
+				payload->range.pfn_array[j] =
+					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
+				j++;
+			}
 			cur_sgl = sg_next(cur_sgl);
+			hvpg_idx = 0;
 		}
 	}