[RFC,v2,00/11] Hyper-V: Support PAGE_SIZE larger than 4K

Message ID	20200902030107.33380-1-boqun.feng@gmail.com
Headers	show Return-Path: <SRS0=15iA=CL=vger.kernel.org=netdev-owner@kernel.org> From: Boqun Feng <boqun.feng@gmail.com> To: linux-hyperv@vger.kernel.org, linux-input@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-scsi@vger.kernel.org Cc: "K. Y. Srinivasan" <kys@microsoft.com>, Haiyang Zhang <haiyangz@microsoft.com>, Stephen Hemminger <sthemmin@microsoft.com>, Wei Liu <wei.liu@kernel.org>, Jiri Kosina <jikos@kernel.org>, Benjamin Tissoires <benjamin.tissoires@redhat.com>, Dmitry Torokhov <dmitry.torokhov@gmail.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, "James E.J. Bottomley" <jejb@linux.ibm.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, Michael Kelley <mikelley@microsoft.com>, Boqun Feng <boqun.feng@gmail.com> Subject: [RFC v2 00/11] Hyper-V: Support PAGE_SIZE larger than 4K Date: Wed, 2 Sep 2020 11:00:56 +0800 Message-Id: <20200902030107.33380-1-boqun.feng@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	Hyper-V: Support PAGE_SIZE larger than 4K \| expand [RFC,v2,00/11] Hyper-V: Support PAGE_SIZE larger than 4K [RFC,v2,03/11] Drivers: hv: vmbus: Introduce types of GPADL [RFC,v2,04/11] Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs() [RFC,v2,05/11] Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header [RFC,v2,06/11] hv: hyperv.h: Introduce some hvpfn helper functions [RFC,v2,09/11] HID: hyperv: Make ringbuffer at least take two pages

Boqun Feng Sept. 2, 2020, 3 a.m. UTC

This patchset add the necessary changes to support guests whose page
size is larger than 4K.

Previous version:
v1: https://lore.kernel.org/lkml/20200721014135.84140-1-boqun.feng@gmail.com/

Changes since v1:

*	Introduce a hv_ring_gpadl_send_offset() to improve the
	readability as per suggestion from Michael.

*	Use max(..., 2 * PAGE_SIZE) instead of hard-coding size for
	inputvsc ringbuffer to align with other ringbuffer settinngs

*	Calculate the exact size of storvsc payload (other than a
	maximum size) to save memory in storvsc_queuecommand() as per
	suggestion from Michael.

*	Use "unsigned int" for loop index inside a page, so that we can
	have the compiler's help for optimization in PAGE_SIZE ==
	HV_HYP_PAGE_SIZE case as per suggestion from Michael.

*	Rebase on to v5.9-rc2 with Michael's latest core support
	patchset[1]


Hyper-V always uses 4K as the page size and expects the same page size
when communicating with guests. That is, all the "pfn"s in the
hypervisor-guest communication protocol are the page numbers in the unit
of HV_HYP_PAGE_SIZE rather than PAGE_SIZE. To support guests with larger
page size, we need to convert between these two page sizes correctly in
the hypervisor-guest communication, which is basically what this
patchset does.

In this conversion, one challenge is how to handle the ringbuffer. A
ringbuffer has two parts: a header and a data part, both of which want
to be PAGE_SIZE aligned in the guest, because we use the "double
mapping" trick to map the data part twice in the guest virtual address
space for faster wrap-around and ease to process data in place. However,
the Hyper-V hypervisor always treats the ringbuffer headers as 4k pages.
To overcome this gap, we enlarge the hv_ring_buffer structure to be
always PAGE_SIZE aligned, and introduce the gpadl type concept to allow
vmbus_establish_gpadl() to handle ringbuffer cases specially. Note that
gpadl type is only meaningful to the guest, there is no such concept in
Hyper-V hypervisor.

This patchset consists of 11 patches:

Patch 1~4: Introduce the types of gpadl, so that we can handle
	   ringbuffer when PAGE_SIZE != HV_HYP_PAGE_SIZE, and also fix
	   a few places where we should use HV_HYP_PAGE_SIZE other than
	   PAGE_SIZE.

Patch 5~6: Add a few helper functions to help calculate the hvpfn (page
	   number in the unit of HV_HYP_PAGE_SIZE) and other related
	   data. So that we can use them in the code of drivers.

Patch 7~11: Use the helpers and change the driver code accordingly to
	    make net/input/util/storage driver work with PAGE_SIZE !=
	    HV_HYP_PAGE_SIZE

I've done some tests with PAGE_SIZE=64k and PAGE_SIZE=16k configurations
on ARM64 guests (with Michael's patchset[1] for ARM64 Hyper-V guest
support), nothing major breaks yet ;-) (I could observe an error caused
by unaligned firmware data, but it's better to have it fixed in the
Hyper-V). I also have done a build and boot test on x86, everything
worked well.

Looking forwards to comments and suggestions!

Regards,
Boqun

[1]: https://lore.kernel.org/lkml/1598287583-71762-1-git-send-email-mikelley@microsoft.com/


Boqun Feng (11):
  Drivers: hv: vmbus: Always use HV_HYP_PAGE_SIZE for gpadl
  Drivers: hv: vmbus: Move __vmbus_open()
  Drivers: hv: vmbus: Introduce types of GPADL
  Drivers: hv: Use HV_HYP_PAGE in hv_synic_enable_regs()
  Drivers: hv: vmbus: Move virt_to_hvpfn() to hyperv header
  hv: hyperv.h: Introduce some hvpfn helper functions
  hv_netvsc: Use HV_HYP_PAGE_SIZE for Hyper-V communication
  Input: hyperv-keyboard: Make ringbuffer at least take two pages
  HID: hyperv: Make ringbuffer at least take two pages
  Driver: hv: util: Make ringbuffer at least take two pages
  scsi: storvsc: Support PAGE_SIZE larger than 4K

 drivers/hid/hid-hyperv.c              |   4 +-
 drivers/hv/channel.c                  | 462 ++++++++++++++++----------
 drivers/hv/hv.c                       |   4 +-
 drivers/hv/hv_util.c                  |  16 +-
 drivers/input/serio/hyperv-keyboard.c |   4 +-
 drivers/net/hyperv/netvsc.c           |   2 +-
 drivers/net/hyperv/netvsc_drv.c       |  46 +--
 drivers/net/hyperv/rndis_filter.c     |  12 +-
 drivers/scsi/storvsc_drv.c            |  60 +++-
 include/linux/hyperv.h                |  63 +++-
 10 files changed, 447 insertions(+), 226 deletions(-)

Michael Kelley Sept. 5, 2020, 12:04 a.m. UTC | #1

From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> 
> Pure function movement, no functional changes. The move is made, because
> in a later change, __vmbus_open() will rely on some static functions
> afterwards, so we sperate the move and the modification of

s/sperate/separate/

> __vmbus_open() in two patches to make it easy to review.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Reviewed-by: Wei Liu <wei.liu@kernel.org>
> ---
>  drivers/hv/channel.c | 309 ++++++++++++++++++++++---------------------
>  1 file changed, 155 insertions(+), 154 deletions(-)
>

Michael Kelley Sept. 5, 2020, 2:55 a.m. UTC | #2

From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> 
> Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> as the unit for page related data. For storvsc, the data is
> vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> into Hyper-V pages in vmbus_packet_mpb_array.
> 
> This patch does the conversion by dividing pages in sglist into Hyper-V
> pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> accordingly.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
>  1 file changed, 54 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 8f5f5dc863a4..3f6610717d4e 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> scsi_cmnd *scmnd)
>  	payload_sz = sizeof(cmd_request->mpb);
> 
>  	if (sg_count) {
> -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> +		unsigned int hvpg_idx = 0;
> +		unsigned int j = 0;
> +		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> +		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> 
> -			payload_sz = (sg_count * sizeof(u64) +
> +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> +
> +			payload_sz = (hvpg_count * sizeof(u64) +
>  				      sizeof(struct vmbus_packet_mpb_array));
>  			payload = kzalloc(payload_sz, GFP_ATOMIC);
>  			if (!payload)
>  				return SCSI_MLQUEUE_DEVICE_BUSY;
>  		}
> 
> +		/*
> +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> +		 */
>  		payload->range.len = length;
> -		payload->range.offset = sgl[0].offset;
> +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> +		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> 
>  		cur_sgl = sgl;
> -		for (i = 0; i < sg_count; i++) {
> -			payload->range.pfn_array[i] =
> -				page_to_pfn(sg_page((cur_sgl)));
> +		for (i = 0, j = 0; i < sg_count; i++) {
> +			/*
> +			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> +			 * of HV_HYP_PAGEs in the current PAGE.
> +			 *
> +			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> +			 *
> +			 * As shown in the following, the minimal of both is
> +			 * the # of HV_HYP_PAGEs, we need to handle in this
> +			 * PAGE.
> +			 *
> +			 * |------------------ PAGE ----------------------|
> +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> +			 *           ^                     ^
> +			 *         hvpg_idx                |
> +			 *           ^                     |
> +			 *           +---(hvpg_count - j)--+
> +			 *
> +			 * or
> +			 *
> +			 * |------------------ PAGE ----------------------|
> +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> +			 *           ^                                           ^
> +			 *         hvpg_idx                                      |
> +			 *           ^                                           |
> +			 *           +---(hvpg_count - j)------------------------+
> +			 */
> +			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx,
> +						   hvpg_count - j);
> +			unsigned int k;
> +
> +			for (k = 0; k < nr_hvpg; k++) {
> +				payload->range.pfn_array[j] =
> +					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
> +				j++;
> +			}
>  			cur_sgl = sg_next(cur_sgl);
> +			hvpg_idx = 0;
>  		}

This code works; I don't see any errors.  But I think it can be made simpler based
on doing two things:
1)  Rather than iterating over the sg_count, and having to calculate nr_hvpg on
each iteration, base the exit decision on having filled up the pfn_array[].  You've
already calculated the exact size of the array that is needed given the data
length, so it's easy to exit when the array is full.
2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
rather than from 0 to a calculated value.

Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
inner loop.

I think this code does it (though I haven't tested it):

                for (j = 0; ; sgl = sg_next(sgl)) {
                        unsigned int k;
                        unsigned long pfn;

                        pfn = page_to_hvpfn(sg_page(sgl));
                        for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) {
                                payload->range.pfn_array[j] = pfn + k;
                                if (++j == hvpg_count)
                                        goto done;
                        }
                        hvpg_idx = 0;
                }
done:

This approach also makes the limit of the inner loop a constant, and that
constant will be 1 when page size is 4K.  So the compiler should be able to
optimize away the loop in that case.

Michael






>  	}
> 
> --
> 2.28.0

Boqun Feng Sept. 5, 2020, 2:15 p.m. UTC | #3

On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote:
> From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> > 
> > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > as the unit for page related data. For storvsc, the data is
> > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > into Hyper-V pages in vmbus_packet_mpb_array.
> > 
> > This patch does the conversion by dividing pages in sglist into Hyper-V
> > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > accordingly.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > ---
> >  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
> >  1 file changed, 54 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > index 8f5f5dc863a4..3f6610717d4e 100644
> > --- a/drivers/scsi/storvsc_drv.c
> > +++ b/drivers/scsi/storvsc_drv.c
> > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct
> > scsi_cmnd *scmnd)
> >  	payload_sz = sizeof(cmd_request->mpb);
> > 
> >  	if (sg_count) {
> > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > +		unsigned int hvpg_idx = 0;
> > +		unsigned int j = 0;
> > +		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> > +		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> > 
> > -			payload_sz = (sg_count * sizeof(u64) +
> > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > +
> > +			payload_sz = (hvpg_count * sizeof(u64) +
> >  				      sizeof(struct vmbus_packet_mpb_array));
> >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> >  			if (!payload)
> >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> >  		}
> > 
> > +		/*
> > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > +		 */
> >  		payload->range.len = length;
> > -		payload->range.offset = sgl[0].offset;
> > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > +		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > 
> >  		cur_sgl = sgl;
> > -		for (i = 0; i < sg_count; i++) {
> > -			payload->range.pfn_array[i] =
> > -				page_to_pfn(sg_page((cur_sgl)));
> > +		for (i = 0, j = 0; i < sg_count; i++) {
> > +			/*
> > +			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> > +			 * of HV_HYP_PAGEs in the current PAGE.
> > +			 *
> > +			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> > +			 *
> > +			 * As shown in the following, the minimal of both is
> > +			 * the # of HV_HYP_PAGEs, we need to handle in this
> > +			 * PAGE.
> > +			 *
> > +			 * |------------------ PAGE ----------------------|
> > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > +			 *           ^                     ^
> > +			 *         hvpg_idx                |
> > +			 *           ^                     |
> > +			 *           +---(hvpg_count - j)--+
> > +			 *
> > +			 * or
> > +			 *
> > +			 * |------------------ PAGE ----------------------|
> > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > +			 *           ^                                           ^
> > +			 *         hvpg_idx                                      |
> > +			 *           ^                                           |
> > +			 *           +---(hvpg_count - j)------------------------+
> > +			 */
> > +			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx,
> > +						   hvpg_count - j);
> > +			unsigned int k;
> > +
> > +			for (k = 0; k < nr_hvpg; k++) {
> > +				payload->range.pfn_array[j] =
> > +					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
> > +				j++;
> > +			}
> >  			cur_sgl = sg_next(cur_sgl);
> > +			hvpg_idx = 0;
> >  		}
> 
> This code works; I don't see any errors.  But I think it can be made simpler based
> on doing two things:
> 1)  Rather than iterating over the sg_count, and having to calculate nr_hvpg on
> each iteration, base the exit decision on having filled up the pfn_array[].  You've
> already calculated the exact size of the array that is needed given the data
> length, so it's easy to exit when the array is full.
> 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
> rather than from 0 to a calculated value.
> 
> Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
> inner loop.
> 
> I think this code does it (though I haven't tested it):
> 
>                 for (j = 0; ; sgl = sg_next(sgl)) {
>                         unsigned int k;
>                         unsigned long pfn;
> 
>                         pfn = page_to_hvpfn(sg_page(sgl));
>                         for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) {
>                                 payload->range.pfn_array[j] = pfn + k;
>                                 if (++j == hvpg_count)
>                                         goto done;
>                         }
>                         hvpg_idx = 0;
>                 }
> done:
> 
> This approach also makes the limit of the inner loop a constant, and that
> constant will be 1 when page size is 4K.  So the compiler should be able to
> optimize away the loop in that case.
> 

Good point! I like your suggestion, and after thinking a bit harder
based on your approach, I come up with the following:

#define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE))

		for (j = 0; j < hvpg_count; j++) {
			unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE;

			/*
			 * Two cases that we need to fetch a page:
			 * a) j == 0: the first step or
			 * b) k == 0: when we reach the boundary of a
			 * page.
			 * 
			if (k == 0 || j == 0) {
				pfn = page_to_hvpfn(sg_page(cur_sgl));
				cur_sgl = sg_next(cur_sgl);
			}

			payload->range.pfn_arrary[j] = pfn + k;
		}

, given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think
compilers could easily optimize the "%" into bit masking operation. And
when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out
k is always zero, then the if-statement can be optimized as always
taken. And that gives us the same code as before ;-)

Thoughts? I will try with a test to see if I'm missing something subtle.

Thanks for looking into this!

Regards,
Boqun


> Michael
> 
> 
> 
> 
> 
> 
> >  	}
> > 
> > --
> > 2.28.0
>

Michael Kelley Sept. 5, 2020, 3:37 p.m. UTC | #4

From: Boqun Feng <boqun.feng@gmail.com> Sent: Saturday, September 5, 2020 7:15 AM
> 
> On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote:
> > From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
> > >
> > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when
> > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE
> > > as the unit for page related data. For storvsc, the data is
> > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit
> > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd
> > > into Hyper-V pages in vmbus_packet_mpb_array.
> > >
> > > This patch does the conversion by dividing pages in sglist into Hyper-V
> > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated
> > > accordingly.
> > >
> > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 54 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 8f5f5dc863a4..3f6610717d4e 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host,
> struct
> > > scsi_cmnd *scmnd)
> > >  	payload_sz = sizeof(cmd_request->mpb);
> > >
> > >  	if (sg_count) {
> > > -		if (sg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +		unsigned int hvpg_idx = 0;
> > > +		unsigned int j = 0;
> > > +		unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK;
> > > +		unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length);
> > >
> > > -			payload_sz = (sg_count * sizeof(u64) +
> > > +		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> > > +
> > > +			payload_sz = (hvpg_count * sizeof(u64) +
> > >  				      sizeof(struct vmbus_packet_mpb_array));
> > >  			payload = kzalloc(payload_sz, GFP_ATOMIC);
> > >  			if (!payload)
> > >  				return SCSI_MLQUEUE_DEVICE_BUSY;
> > >  		}
> > >
> > > +		/*
> > > +		 * sgl is a list of PAGEs, and payload->range.pfn_array
> > > +		 * expects the page number in the unit of HV_HYP_PAGE_SIZE (the
> > > +		 * page size that Hyper-V uses, so here we need to divide PAGEs
> > > +		 * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE.
> > > +		 */
> > >  		payload->range.len = length;
> > > -		payload->range.offset = sgl[0].offset;
> > > +		payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK;
> > > +		hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT;
> > >
> > >  		cur_sgl = sgl;
> > > -		for (i = 0; i < sg_count; i++) {
> > > -			payload->range.pfn_array[i] =
> > > -				page_to_pfn(sg_page((cur_sgl)));
> > > +		for (i = 0, j = 0; i < sg_count; i++) {
> > > +			/*
> > > +			 * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the #
> > > +			 * of HV_HYP_PAGEs in the current PAGE.
> > > +			 *
> > > +			 * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs.
> > > +			 *
> > > +			 * As shown in the following, the minimal of both is
> > > +			 * the # of HV_HYP_PAGEs, we need to handle in this
> > > +			 * PAGE.
> > > +			 *
> > > +			 * |------------------ PAGE ----------------------|
> > > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +			 *           ^                     ^
> > > +			 *         hvpg_idx                |
> > > +			 *           ^                     |
> > > +			 *           +---(hvpg_count - j)--+
> > > +			 *
> > > +			 * or
> > > +			 *
> > > +			 * |------------------ PAGE ----------------------|
> > > +			 * |   PAGE_SIZE / HV_HYP_PAGE_SIZE in total      |
> > > +			 * |hvpg|hvpg| ...                 |hvpg|... |hvpg|
> > > +			 *           ^                                           ^
> > > +			 *         hvpg_idx                                      |
> > > +			 *           ^                                           |
> > > +			 *           +---(hvpg_count - j)------------------------+
> > > +			 */
> > > +			unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE /
> HV_HYP_PAGE_SIZE) - hvpg_idx,
> > > +						   hvpg_count - j);
> > > +			unsigned int k;
> > > +
> > > +			for (k = 0; k < nr_hvpg; k++) {
> > > +				payload->range.pfn_array[j] =
> > > +					page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k;
> > > +				j++;
> > > +			}
> > >  			cur_sgl = sg_next(cur_sgl);
> > > +			hvpg_idx = 0;
> > >  		}
> >
> > This code works; I don't see any errors.  But I think it can be made simpler based
> > on doing two things:
> > 1)  Rather than iterating over the sg_count, and having to calculate nr_hvpg on
> > each iteration, base the exit decision on having filled up the pfn_array[].  You've
> > already calculated the exact size of the array that is needed given the data
> > length, so it's easy to exit when the array is full.
> > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE
> > rather than from 0 to a calculated value.
> >
> > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the
> > inner loop.
> >
> > I think this code does it (though I haven't tested it):
> >
> >                 for (j = 0; ; sgl = sg_next(sgl)) {
> >                         unsigned int k;
> >                         unsigned long pfn;
> >
> >                         pfn = page_to_hvpfn(sg_page(sgl));
> >                         for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) {
> >                                 payload->range.pfn_array[j] = pfn + k;
> >                                 if (++j == hvpg_count)
> >                                         goto done;
> >                         }
> >                         hvpg_idx = 0;
> >                 }
> > done:
> >
> > This approach also makes the limit of the inner loop a constant, and that
> > constant will be 1 when page size is 4K.  So the compiler should be able to
> > optimize away the loop in that case.
> >
> 
> Good point! I like your suggestion, and after thinking a bit harder
> based on your approach, I come up with the following:
> 
> #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE))
> 
> 		for (j = 0; j < hvpg_count; j++) {
> 			unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE;
> 
> 			/*
> 			 * Two cases that we need to fetch a page:
> 			 * a) j == 0: the first step or
> 			 * b) k == 0: when we reach the boundary of a
> 			 * page.
> 			 *
> 			if (k == 0 || j == 0) {
> 				pfn = page_to_hvpfn(sg_page(cur_sgl));
> 				cur_sgl = sg_next(cur_sgl);
> 			}
> 
> 			payload->range.pfn_arrary[j] = pfn + k;
> 		}
> 
> , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think
> compilers could easily optimize the "%" into bit masking operation. And
> when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out
> k is always zero, then the if-statement can be optimized as always
> taken. And that gives us the same code as before ;-)
> 
> Thoughts? I will try with a test to see if I'm missing something subtle.
> 
> Thanks for looking into this!
> 

Your newest version looks right to me -- very clever!  I like it even better 
than my version.

Michael

Boqun Feng Sept. 7, 2020, 12:09 a.m. UTC | #5

On Sat, Sep 05, 2020 at 12:30:48AM +0000, Michael Kelley wrote:
> From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM
[...]
> >  struct rndis_request {
> >  	struct list_head list_ent;
> >  	struct completion  wait_event;
> > @@ -215,18 +215,18 @@ static int rndis_filter_send_request(struct rndis_device *dev,
> >  	packet->page_buf_cnt = 1;
> > 
> >  	pb[0].pfn = virt_to_phys(&req->request_msg) >>
> > -					PAGE_SHIFT;
> > +					HV_HYP_PAGE_SHIFT;
> >  	pb[0].len = req->request_msg.msg_len;
> >  	pb[0].offset =
> > -		(unsigned long)&req->request_msg & (PAGE_SIZE - 1);
> > +		(unsigned long)&req->request_msg & (HV_HYP_PAGE_SIZE - 1);
> 
> Use offset_in_hvpage() as defined in patch 6 of the series?
> 

Fair enough, I will use offset_in_hvpage() in the next version

Regards,
Boqun

> > 
> >  	/* Add one page_buf when request_msg crossing page boundary */
> > -	if (pb[0].offset + pb[0].len > PAGE_SIZE) {
> > +	if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) {
> >  		packet->page_buf_cnt++;
> > -		pb[0].len = PAGE_SIZE -
> > +		pb[0].len = HV_HYP_PAGE_SIZE -
> >  			pb[0].offset;
> >  		pb[1].pfn = virt_to_phys((void *)&req->request_msg
> > -			+ pb[0].len) >> PAGE_SHIFT;
> > +			+ pb[0].len) >> HV_HYP_PAGE_SHIFT;
> >  		pb[1].offset = 0;
> >  		pb[1].len = req->request_msg.msg_len -
> >  			pb[0].len;
> > --
> > 2.28.0
>

[RFC,v2,00/11] Hyper-V: Support PAGE_SIZE larger than 4K

Message

Comments