Message ID | 20200902030107.33380-1-boqun.feng@gmail.com |
---|---|
Headers | show |
Series | Hyper-V: Support PAGE_SIZE larger than 4K | expand |
From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM > > Pure function movement, no functional changes. The move is made, because > in a later change, __vmbus_open() will rely on some static functions > afterwards, so we sperate the move and the modification of s/sperate/separate/ > __vmbus_open() in two patches to make it easy to review. > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > Reviewed-by: Wei Liu <wei.liu@kernel.org> > --- > drivers/hv/channel.c | 309 ++++++++++++++++++++++--------------------- > 1 file changed, 155 insertions(+), 154 deletions(-) >
From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE > as the unit for page related data. For storvsc, the data is > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd > into Hyper-V pages in vmbus_packet_mpb_array. > > This patch does the conversion by dividing pages in sglist into Hyper-V > pages, offset and indexes in vmbus_packet_mpb_array are recalculated > accordingly. > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > --- > drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++---- > 1 file changed, 54 insertions(+), 6 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 8f5f5dc863a4..3f6610717d4e 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct > scsi_cmnd *scmnd) > payload_sz = sizeof(cmd_request->mpb); > > if (sg_count) { > - if (sg_count > MAX_PAGE_BUFFER_COUNT) { > + unsigned int hvpg_idx = 0; > + unsigned int j = 0; > + unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK; > + unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length); > > - payload_sz = (sg_count * sizeof(u64) + > + if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { > + > + payload_sz = (hvpg_count * sizeof(u64) + > sizeof(struct vmbus_packet_mpb_array)); > payload = kzalloc(payload_sz, GFP_ATOMIC); > if (!payload) > return SCSI_MLQUEUE_DEVICE_BUSY; > } > > + /* > + * sgl is a list of PAGEs, and payload->range.pfn_array > + * expects the page number in the unit of HV_HYP_PAGE_SIZE (the > + * page size that Hyper-V uses, so here we need to divide PAGEs > + * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE. > + */ > payload->range.len = length; > - payload->range.offset = sgl[0].offset; > + payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK; > + hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT; > > cur_sgl = sgl; > - for (i = 0; i < sg_count; i++) { > - payload->range.pfn_array[i] = > - page_to_pfn(sg_page((cur_sgl))); > + for (i = 0, j = 0; i < sg_count; i++) { > + /* > + * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the # > + * of HV_HYP_PAGEs in the current PAGE. > + * > + * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs. > + * > + * As shown in the following, the minimal of both is > + * the # of HV_HYP_PAGEs, we need to handle in this > + * PAGE. > + * > + * |------------------ PAGE ----------------------| > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > + * ^ ^ > + * hvpg_idx | > + * ^ | > + * +---(hvpg_count - j)--+ > + * > + * or > + * > + * |------------------ PAGE ----------------------| > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > + * ^ ^ > + * hvpg_idx | > + * ^ | > + * +---(hvpg_count - j)------------------------+ > + */ > + unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx, > + hvpg_count - j); > + unsigned int k; > + > + for (k = 0; k < nr_hvpg; k++) { > + payload->range.pfn_array[j] = > + page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k; > + j++; > + } > cur_sgl = sg_next(cur_sgl); > + hvpg_idx = 0; > } This code works; I don't see any errors. But I think it can be made simpler based on doing two things: 1) Rather than iterating over the sg_count, and having to calculate nr_hvpg on each iteration, base the exit decision on having filled up the pfn_array[]. You've already calculated the exact size of the array that is needed given the data length, so it's easy to exit when the array is full. 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE rather than from 0 to a calculated value. Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the inner loop. I think this code does it (though I haven't tested it): for (j = 0; ; sgl = sg_next(sgl)) { unsigned int k; unsigned long pfn; pfn = page_to_hvpfn(sg_page(sgl)); for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) { payload->range.pfn_array[j] = pfn + k; if (++j == hvpg_count) goto done; } hvpg_idx = 0; } done: This approach also makes the limit of the inner loop a constant, and that constant will be 1 when page size is 4K. So the compiler should be able to optimize away the loop in that case. Michael > } > > -- > 2.28.0
On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote: > From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM > > > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE > > as the unit for page related data. For storvsc, the data is > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd > > into Hyper-V pages in vmbus_packet_mpb_array. > > > > This patch does the conversion by dividing pages in sglist into Hyper-V > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated > > accordingly. > > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > > --- > > drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++---- > > 1 file changed, 54 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > index 8f5f5dc863a4..3f6610717d4e 100644 > > --- a/drivers/scsi/storvsc_drv.c > > +++ b/drivers/scsi/storvsc_drv.c > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct > > scsi_cmnd *scmnd) > > payload_sz = sizeof(cmd_request->mpb); > > > > if (sg_count) { > > - if (sg_count > MAX_PAGE_BUFFER_COUNT) { > > + unsigned int hvpg_idx = 0; > > + unsigned int j = 0; > > + unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK; > > + unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length); > > > > - payload_sz = (sg_count * sizeof(u64) + > > + if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { > > + > > + payload_sz = (hvpg_count * sizeof(u64) + > > sizeof(struct vmbus_packet_mpb_array)); > > payload = kzalloc(payload_sz, GFP_ATOMIC); > > if (!payload) > > return SCSI_MLQUEUE_DEVICE_BUSY; > > } > > > > + /* > > + * sgl is a list of PAGEs, and payload->range.pfn_array > > + * expects the page number in the unit of HV_HYP_PAGE_SIZE (the > > + * page size that Hyper-V uses, so here we need to divide PAGEs > > + * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE. > > + */ > > payload->range.len = length; > > - payload->range.offset = sgl[0].offset; > > + payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK; > > + hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT; > > > > cur_sgl = sgl; > > - for (i = 0; i < sg_count; i++) { > > - payload->range.pfn_array[i] = > > - page_to_pfn(sg_page((cur_sgl))); > > + for (i = 0, j = 0; i < sg_count; i++) { > > + /* > > + * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the # > > + * of HV_HYP_PAGEs in the current PAGE. > > + * > > + * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs. > > + * > > + * As shown in the following, the minimal of both is > > + * the # of HV_HYP_PAGEs, we need to handle in this > > + * PAGE. > > + * > > + * |------------------ PAGE ----------------------| > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > + * ^ ^ > > + * hvpg_idx | > > + * ^ | > > + * +---(hvpg_count - j)--+ > > + * > > + * or > > + * > > + * |------------------ PAGE ----------------------| > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > + * ^ ^ > > + * hvpg_idx | > > + * ^ | > > + * +---(hvpg_count - j)------------------------+ > > + */ > > + unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE) - hvpg_idx, > > + hvpg_count - j); > > + unsigned int k; > > + > > + for (k = 0; k < nr_hvpg; k++) { > > + payload->range.pfn_array[j] = > > + page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k; > > + j++; > > + } > > cur_sgl = sg_next(cur_sgl); > > + hvpg_idx = 0; > > } > > This code works; I don't see any errors. But I think it can be made simpler based > on doing two things: > 1) Rather than iterating over the sg_count, and having to calculate nr_hvpg on > each iteration, base the exit decision on having filled up the pfn_array[]. You've > already calculated the exact size of the array that is needed given the data > length, so it's easy to exit when the array is full. > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE > rather than from 0 to a calculated value. > > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the > inner loop. > > I think this code does it (though I haven't tested it): > > for (j = 0; ; sgl = sg_next(sgl)) { > unsigned int k; > unsigned long pfn; > > pfn = page_to_hvpfn(sg_page(sgl)); > for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) { > payload->range.pfn_array[j] = pfn + k; > if (++j == hvpg_count) > goto done; > } > hvpg_idx = 0; > } > done: > > This approach also makes the limit of the inner loop a constant, and that > constant will be 1 when page size is 4K. So the compiler should be able to > optimize away the loop in that case. > Good point! I like your suggestion, and after thinking a bit harder based on your approach, I come up with the following: #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE)) for (j = 0; j < hvpg_count; j++) { unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE; /* * Two cases that we need to fetch a page: * a) j == 0: the first step or * b) k == 0: when we reach the boundary of a * page. * if (k == 0 || j == 0) { pfn = page_to_hvpfn(sg_page(cur_sgl)); cur_sgl = sg_next(cur_sgl); } payload->range.pfn_arrary[j] = pfn + k; } , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think compilers could easily optimize the "%" into bit masking operation. And when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out k is always zero, then the if-statement can be optimized as always taken. And that gives us the same code as before ;-) Thoughts? I will try with a test to see if I'm missing something subtle. Thanks for looking into this! Regards, Boqun > Michael > > > > > > > > } > > > > -- > > 2.28.0 >
From: Boqun Feng <boqun.feng@gmail.com> Sent: Saturday, September 5, 2020 7:15 AM > > On Sat, Sep 05, 2020 at 02:55:48AM +0000, Michael Kelley wrote: > > From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM > > > > > > Hyper-V always use 4k page size (HV_HYP_PAGE_SIZE), so when > > > communicating with Hyper-V, a guest should always use HV_HYP_PAGE_SIZE > > > as the unit for page related data. For storvsc, the data is > > > vmbus_packet_mpb_array. And since in scsi_cmnd, sglist of pages (in unit > > > of PAGE_SIZE) is used, we need convert pages in the sglist of scsi_cmnd > > > into Hyper-V pages in vmbus_packet_mpb_array. > > > > > > This patch does the conversion by dividing pages in sglist into Hyper-V > > > pages, offset and indexes in vmbus_packet_mpb_array are recalculated > > > accordingly. > > > > > > Signed-off-by: Boqun Feng <boqun.feng@gmail.com> > > > --- > > > drivers/scsi/storvsc_drv.c | 60 ++++++++++++++++++++++++++++++++++---- > > > 1 file changed, 54 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > > index 8f5f5dc863a4..3f6610717d4e 100644 > > > --- a/drivers/scsi/storvsc_drv.c > > > +++ b/drivers/scsi/storvsc_drv.c > > > @@ -1739,23 +1739,71 @@ static int storvsc_queuecommand(struct Scsi_Host *host, > struct > > > scsi_cmnd *scmnd) > > > payload_sz = sizeof(cmd_request->mpb); > > > > > > if (sg_count) { > > > - if (sg_count > MAX_PAGE_BUFFER_COUNT) { > > > + unsigned int hvpg_idx = 0; > > > + unsigned int j = 0; > > > + unsigned long hvpg_offset = sgl->offset & ~HV_HYP_PAGE_MASK; > > > + unsigned int hvpg_count = HVPFN_UP(hvpg_offset + length); > > > > > > - payload_sz = (sg_count * sizeof(u64) + > > > + if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { > > > + > > > + payload_sz = (hvpg_count * sizeof(u64) + > > > sizeof(struct vmbus_packet_mpb_array)); > > > payload = kzalloc(payload_sz, GFP_ATOMIC); > > > if (!payload) > > > return SCSI_MLQUEUE_DEVICE_BUSY; > > > } > > > > > > + /* > > > + * sgl is a list of PAGEs, and payload->range.pfn_array > > > + * expects the page number in the unit of HV_HYP_PAGE_SIZE (the > > > + * page size that Hyper-V uses, so here we need to divide PAGEs > > > + * into HV_HYP_PAGE in case that PAGE_SIZE > HV_HYP_PAGE_SIZE. > > > + */ > > > payload->range.len = length; > > > - payload->range.offset = sgl[0].offset; > > > + payload->range.offset = sgl[0].offset & ~HV_HYP_PAGE_MASK; > > > + hvpg_idx = sgl[0].offset >> HV_HYP_PAGE_SHIFT; > > > > > > cur_sgl = sgl; > > > - for (i = 0; i < sg_count; i++) { > > > - payload->range.pfn_array[i] = > > > - page_to_pfn(sg_page((cur_sgl))); > > > + for (i = 0, j = 0; i < sg_count; i++) { > > > + /* > > > + * "PAGE_SIZE / HV_HYP_PAGE_SIZE - hvpg_idx" is the # > > > + * of HV_HYP_PAGEs in the current PAGE. > > > + * > > > + * "hvpg_count - j" is the # of unhandled HV_HYP_PAGEs. > > > + * > > > + * As shown in the following, the minimal of both is > > > + * the # of HV_HYP_PAGEs, we need to handle in this > > > + * PAGE. > > > + * > > > + * |------------------ PAGE ----------------------| > > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > > + * ^ ^ > > > + * hvpg_idx | > > > + * ^ | > > > + * +---(hvpg_count - j)--+ > > > + * > > > + * or > > > + * > > > + * |------------------ PAGE ----------------------| > > > + * | PAGE_SIZE / HV_HYP_PAGE_SIZE in total | > > > + * |hvpg|hvpg| ... |hvpg|... |hvpg| > > > + * ^ ^ > > > + * hvpg_idx | > > > + * ^ | > > > + * +---(hvpg_count - j)------------------------+ > > > + */ > > > + unsigned int nr_hvpg = min((unsigned int)(PAGE_SIZE / > HV_HYP_PAGE_SIZE) - hvpg_idx, > > > + hvpg_count - j); > > > + unsigned int k; > > > + > > > + for (k = 0; k < nr_hvpg; k++) { > > > + payload->range.pfn_array[j] = > > > + page_to_hvpfn(sg_page((cur_sgl))) + hvpg_idx + k; > > > + j++; > > > + } > > > cur_sgl = sg_next(cur_sgl); > > > + hvpg_idx = 0; > > > } > > > > This code works; I don't see any errors. But I think it can be made simpler based > > on doing two things: > > 1) Rather than iterating over the sg_count, and having to calculate nr_hvpg on > > each iteration, base the exit decision on having filled up the pfn_array[]. You've > > already calculated the exact size of the array that is needed given the data > > length, so it's easy to exit when the array is full. > > 2) In the inner loop, iterate from hvpg_idx to PAGE_SIZE/HV_HYP_PAGE_SIZE > > rather than from 0 to a calculated value. > > > > Also, as an optimization, pull page_to_hvpfn(sg_page((cur_sgl)) out of the > > inner loop. > > > > I think this code does it (though I haven't tested it): > > > > for (j = 0; ; sgl = sg_next(sgl)) { > > unsigned int k; > > unsigned long pfn; > > > > pfn = page_to_hvpfn(sg_page(sgl)); > > for (k = hvpg_idx; k < (unsigned int)(PAGE_SIZE /HV_HYP_PAGE_SIZE); k++) { > > payload->range.pfn_array[j] = pfn + k; > > if (++j == hvpg_count) > > goto done; > > } > > hvpg_idx = 0; > > } > > done: > > > > This approach also makes the limit of the inner loop a constant, and that > > constant will be 1 when page size is 4K. So the compiler should be able to > > optimize away the loop in that case. > > > > Good point! I like your suggestion, and after thinking a bit harder > based on your approach, I come up with the following: > > #define HV_HYP_PAGES_IN_PAGE ((unsigned int)(PAGE_SIZE / HV_HYP_PAGE_SIZE)) > > for (j = 0; j < hvpg_count; j++) { > unsigned int k = (j + hvpg_idx) % HV_HYP_PAGES_IN_PAGE; > > /* > * Two cases that we need to fetch a page: > * a) j == 0: the first step or > * b) k == 0: when we reach the boundary of a > * page. > * > if (k == 0 || j == 0) { > pfn = page_to_hvpfn(sg_page(cur_sgl)); > cur_sgl = sg_next(cur_sgl); > } > > payload->range.pfn_arrary[j] = pfn + k; > } > > , given the HV_HYP_PAGES_IN_PAGE is always a power of 2, so I think > compilers could easily optimize the "%" into bit masking operation. And > when HV_HYP_PAGES_IN_PAGE is 1, I think compilers can easily figure out > k is always zero, then the if-statement can be optimized as always > taken. And that gives us the same code as before ;-) > > Thoughts? I will try with a test to see if I'm missing something subtle. > > Thanks for looking into this! > Your newest version looks right to me -- very clever! I like it even better than my version. Michael
On Sat, Sep 05, 2020 at 12:30:48AM +0000, Michael Kelley wrote: > From: Boqun Feng <boqun.feng@gmail.com> Sent: Tuesday, September 1, 2020 8:01 PM [...] > > struct rndis_request { > > struct list_head list_ent; > > struct completion wait_event; > > @@ -215,18 +215,18 @@ static int rndis_filter_send_request(struct rndis_device *dev, > > packet->page_buf_cnt = 1; > > > > pb[0].pfn = virt_to_phys(&req->request_msg) >> > > - PAGE_SHIFT; > > + HV_HYP_PAGE_SHIFT; > > pb[0].len = req->request_msg.msg_len; > > pb[0].offset = > > - (unsigned long)&req->request_msg & (PAGE_SIZE - 1); > > + (unsigned long)&req->request_msg & (HV_HYP_PAGE_SIZE - 1); > > Use offset_in_hvpage() as defined in patch 6 of the series? > Fair enough, I will use offset_in_hvpage() in the next version Regards, Boqun > > > > /* Add one page_buf when request_msg crossing page boundary */ > > - if (pb[0].offset + pb[0].len > PAGE_SIZE) { > > + if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) { > > packet->page_buf_cnt++; > > - pb[0].len = PAGE_SIZE - > > + pb[0].len = HV_HYP_PAGE_SIZE - > > pb[0].offset; > > pb[1].pfn = virt_to_phys((void *)&req->request_msg > > - + pb[0].len) >> PAGE_SHIFT; > > + + pb[0].len) >> HV_HYP_PAGE_SHIFT; > > pb[1].offset = 0; > > pb[1].len = req->request_msg.msg_len - > > pb[0].len; > > -- > > 2.28.0 >