Message ID | 20180220205924.2035765-1-arnd@arndb.de |
---|---|
State | New |
Headers | show |
Series | RDMA/core: reduce IB_POLL_BATCH constant | expand |
Hi Arnd Bergmann, > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Arnd Bergmann > Sent: Tuesday, February 20, 2018 2:59 PM > To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca> > Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky > <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche > <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux- > kernel@vger.kernel.org > Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant > > The ib_wc structure has grown to much that putting 16 of them on the stack hits > the warning limit for dangerous kernel stack consumption: > > drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct': > drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger > than 1024 bytes [-Werror=frame-larger-than=] > > Using half that number brings us comfortably below that limit again. > > Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track > RDMA resources") It is not clear to me how above commit 02d8883f520e introduced this stack issue. Bodong and I came across ib_wc size increase in [1] and it was fixed in [2]. Did you hit this error after/before applying patch [2]? [1] https://www.spinics.net/lists/linux-rdma/msg50754.html [2] https://patchwork.kernel.org/patch/10159623/
On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: > /* # of WCs to poll for with a single call to ib_poll_cq */ > -#define IB_POLL_BATCH 16 > +#define IB_POLL_BATCH 8 The purpose of batch polling is to minimize contention on the cq spinlock. Reducing the IB_POLL_BATCH constant may affect performance negatively. Has the performance impact of this change been verified for all affected drivers (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS over RDMA, ...)? Bart.
> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: >> /* # of WCs to poll for with a single call to ib_poll_cq */ >> -#define IB_POLL_BATCH 16 >> +#define IB_POLL_BATCH 8 > > The purpose of batch polling is to minimize contention on the cq spinlock. > Reducing the IB_POLL_BATCH constant may affect performance negatively. Has > the performance impact of this change been verified for all affected drivers > (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS > over RDMA, ...)? Only the users of the DIRECT polling method use an on-stack array of ib_wc's. This is only the SRP drivers. The other two modes have use of a dynamically allocated array of ib_wc's that hangs off the ib_cq. These shouldn't need any reduction in the size of this array, and they are the common case. IMO a better solution would be to change ib_process_cq_direct to use a smaller on-stack array, and leave IB_POLL_BATCH alone. -- Chuck Lever
On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <parav@mellanox.com> wrote: > Hi Arnd Bergmann, > >> -----Original Message----- >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- >> owner@vger.kernel.org] On Behalf Of Arnd Bergmann >> Sent: Tuesday, February 20, 2018 2:59 PM >> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca> >> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky >> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche >> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux- >> kernel@vger.kernel.org >> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant >> >> The ib_wc structure has grown to much that putting 16 of them on the stack hits >> the warning limit for dangerous kernel stack consumption: >> >> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct': >> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger >> than 1024 bytes [-Werror=frame-larger-than=] >> >> Using half that number brings us comfortably below that limit again. >> >> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track >> RDMA resources") > > It is not clear to me how above commit 02d8883f520e introduced this stack issue. My mistake, I misread the git history. I did a proper bisection now and ended up with the commit that added the IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq: Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct") > Bodong and I came across ib_wc size increase in [1] and it was fixed in [2]. > Did you hit this error after/before applying patch [2]? > > [1] https://www.spinics.net/lists/linux-rdma/msg50754.html > [2] https://patchwork.kernel.org/patch/10159623/ I did the analysis a few weeks ago when I first hit the problem but didn't send it out at the time. Today I saw the problem still persists on mainline (4.16-rc2), which does contain the patch from [2]. What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes of padding, so 16 of them gets us exactly the warning limit, and then there are a few bytes for the function itself. Arnd
On 2/20/2018 11:47 PM, Chuck Lever wrote: > > >> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: >> >> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: >>> /* # of WCs to poll for with a single call to ib_poll_cq */ >>> -#define IB_POLL_BATCH 16 >>> +#define IB_POLL_BATCH 8 >> >> The purpose of batch polling is to minimize contention on the cq spinlock. >> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has >> the performance impact of this change been verified for all affected drivers >> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS >> over RDMA, ...)? > > Only the users of the DIRECT polling method use an on-stack > array of ib_wc's. This is only the SRP drivers. > > The other two modes have use of a dynamically allocated array > of ib_wc's that hangs off the ib_cq. These shouldn't need any > reduction in the size of this array, and they are the common > case. > > IMO a better solution would be to change ib_process_cq_direct > to use a smaller on-stack array, and leave IB_POLL_BATCH alone. Yup, good idea. you can define IB_DIRECT_POLL_BATCH to be 8 and use it in ib_process_cq_direct. *but* please make sure to use the right value in ib_poll_cq since the wcs array should be able to hold the requested amount of wcs. -Max. > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: >>> /* # of WCs to poll for with a single call to ib_poll_cq */ >>> -#define IB_POLL_BATCH 16 >>> +#define IB_POLL_BATCH 8 >> >> The purpose of batch polling is to minimize contention on the cq spinlock. >> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has >> the performance impact of this change been verified for all affected drivers >> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS >> over RDMA, ...)? > > Only the users of the DIRECT polling method use an on-stack > array of ib_wc's. This is only the SRP drivers. > > The other two modes have use of a dynamically allocated array > of ib_wc's that hangs off the ib_cq. These shouldn't need any > reduction in the size of this array, and they are the common > case. > > IMO a better solution would be to change ib_process_cq_direct > to use a smaller on-stack array, and leave IB_POLL_BATCH alone. The only reason why I added this array on-stack was to allow consumers that did not use ib_alloc_cq api to call it, but that seems like a wrong decision when thinking it over again (as probably these users did not set the wr_cqe correctly). How about we make ib_process_cq_direct use the cq wc array and add a WARN_ON statement (and fail it gracefully) if the caller used this API without calling ib_alloc_cq? -- diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index bc79ca8215d7..cd3e9e124834 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -25,10 +25,10 @@ #define IB_POLL_FLAGS \ (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) +static int __ib_process_cq(struct ib_cq *cq, int budget) { int i, n, completed = 0; - struct ib_wc *wcs = poll_wc ? : cq->wc; + struct ib_wc *wcs = cq->wc; /* * budget might be (-1) if the caller does not @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) */ int ib_process_cq_direct(struct ib_cq *cq, int budget) { - struct ib_wc wcs[IB_POLL_BATCH]; - - return __ib_process_cq(cq, budget, wcs); + if (unlikely(WARN_ON_ONCE(!cq->wc))) + return 0; + return __ib_process_cq(cq, budget); } EXPORT_SYMBOL(ib_process_cq_direct); @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget) struct ib_cq *cq = container_of(iop, struct ib_cq, iop); int completed; - completed = __ib_process_cq(cq, budget, NULL); + completed = __ib_process_cq(cq, budget); if (completed < budget) { irq_poll_complete(&cq->iop); if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work) struct ib_cq *cq = container_of(work, struct ib_cq, work); int completed; - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL); + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); if (completed >= IB_POLL_BUDGET_WORKQUEUE || ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) queue_work(ib_comp_wq, &cq->work); --
On 2/21/2018 3:44 PM, Sagi Grimberg wrote: > >>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: >>>> /* # of WCs to poll for with a single call to ib_poll_cq */ >>>> -#define IB_POLL_BATCH 16 >>>> +#define IB_POLL_BATCH 8 >>> >>> The purpose of batch polling is to minimize contention on the cq >>> spinlock. >>> Reducing the IB_POLL_BATCH constant may affect performance >>> negatively. Has >>> the performance impact of this change been verified for all affected >>> drivers >>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB >>> Direct, NFS >>> over RDMA, ...)? >> >> Only the users of the DIRECT polling method use an on-stack >> array of ib_wc's. This is only the SRP drivers. >> >> The other two modes have use of a dynamically allocated array >> of ib_wc's that hangs off the ib_cq. These shouldn't need any >> reduction in the size of this array, and they are the common >> case. >> >> IMO a better solution would be to change ib_process_cq_direct >> to use a smaller on-stack array, and leave IB_POLL_BATCH alone. > > The only reason why I added this array on-stack was to allow consumers > that did not use ib_alloc_cq api to call it, but that seems like a > wrong decision when thinking it over again (as probably these users > did not set the wr_cqe correctly). > > How about we make ib_process_cq_direct use the cq wc array and add > a WARN_ON statement (and fail it gracefully) if the caller used this > API without calling ib_alloc_cq? but we tried to avoid cuncurrent access to cq->wc. Why can't we use the solution I wrote above ? > > -- > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index bc79ca8215d7..cd3e9e124834 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -25,10 +25,10 @@ > #define IB_POLL_FLAGS \ > (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc > *poll_wc) > +static int __ib_process_cq(struct ib_cq *cq, int budget) > { > int i, n, completed = 0; > - struct ib_wc *wcs = poll_wc ? : cq->wc; > + struct ib_wc *wcs = cq->wc; > > /* > * budget might be (-1) if the caller does not > @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int > budget, struct ib_wc *poll_wc) > */ > int ib_process_cq_direct(struct ib_cq *cq, int budget) > { > - struct ib_wc wcs[IB_POLL_BATCH]; > - > - return __ib_process_cq(cq, budget, wcs); > + if (unlikely(WARN_ON_ONCE(!cq->wc))) > + return 0; > + return __ib_process_cq(cq, budget); > } > EXPORT_SYMBOL(ib_process_cq_direct); > > @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int > budget) > struct ib_cq *cq = container_of(iop, struct ib_cq, iop); > int completed; > > - completed = __ib_process_cq(cq, budget, NULL); > + completed = __ib_process_cq(cq, budget); > if (completed < budget) { > irq_poll_complete(&cq->iop); > if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) > @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work) > struct ib_cq *cq = container_of(work, struct ib_cq, work); > int completed; > > - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL); > + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); > if (completed >= IB_POLL_BUDGET_WORKQUEUE || > ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) > queue_work(ib_comp_wq, &cq->work); > -- > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Feb 21, 2018, at 8:44 AM, Sagi Grimberg <sagi@grimberg.me> wrote: > > >>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote: >>>> /* # of WCs to poll for with a single call to ib_poll_cq */ >>>> -#define IB_POLL_BATCH 16 >>>> +#define IB_POLL_BATCH 8 >>> >>> The purpose of batch polling is to minimize contention on the cq spinlock. >>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has >>> the performance impact of this change been verified for all affected drivers >>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS >>> over RDMA, ...)? >> Only the users of the DIRECT polling method use an on-stack >> array of ib_wc's. This is only the SRP drivers. >> The other two modes have use of a dynamically allocated array >> of ib_wc's that hangs off the ib_cq. These shouldn't need any >> reduction in the size of this array, and they are the common >> case. >> IMO a better solution would be to change ib_process_cq_direct >> to use a smaller on-stack array, and leave IB_POLL_BATCH alone. > > The only reason why I added this array on-stack was to allow consumers > that did not use ib_alloc_cq api to call it, but that seems like a > wrong decision when thinking it over again (as probably these users > did not set the wr_cqe correctly). > > How about we make ib_process_cq_direct use the cq wc array and add > a WARN_ON statement (and fail it gracefully) if the caller used this > API without calling ib_alloc_cq? Agreed, I prefer that all three modes use dynamically allocated memory for that array. > -- > diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c > index bc79ca8215d7..cd3e9e124834 100644 > --- a/drivers/infiniband/core/cq.c > +++ b/drivers/infiniband/core/cq.c > @@ -25,10 +25,10 @@ > #define IB_POLL_FLAGS \ > (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) > +static int __ib_process_cq(struct ib_cq *cq, int budget) > { > int i, n, completed = 0; > - struct ib_wc *wcs = poll_wc ? : cq->wc; > + struct ib_wc *wcs = cq->wc; > > /* > * budget might be (-1) if the caller does not > @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) > */ > int ib_process_cq_direct(struct ib_cq *cq, int budget) > { > - struct ib_wc wcs[IB_POLL_BATCH]; > - > - return __ib_process_cq(cq, budget, wcs); > + if (unlikely(WARN_ON_ONCE(!cq->wc))) > + return 0; > + return __ib_process_cq(cq, budget); > } > EXPORT_SYMBOL(ib_process_cq_direct); > > @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget) > struct ib_cq *cq = container_of(iop, struct ib_cq, iop); > int completed; > > - completed = __ib_process_cq(cq, budget, NULL); > + completed = __ib_process_cq(cq, budget); > if (completed < budget) { > irq_poll_complete(&cq->iop); > if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) > @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work) > struct ib_cq *cq = container_of(work, struct ib_cq, work); > int completed; > > - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL); > + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE); > if (completed >= IB_POLL_BUDGET_WORKQUEUE || > ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) > queue_work(ib_comp_wq, &cq->work); > -- -- Chuck Lever
>> The only reason why I added this array on-stack was to allow consumers >> that did not use ib_alloc_cq api to call it, but that seems like a >> wrong decision when thinking it over again (as probably these users >> did not set the wr_cqe correctly). >> >> How about we make ib_process_cq_direct use the cq wc array and add >> a WARN_ON statement (and fail it gracefully) if the caller used this >> API without calling ib_alloc_cq? > > but we tried to avoid cuncurrent access to cq->wc. Not sure its a valid use-case. But if there is a compelling reason to keep it as is, then we can do smaller on-stack array.
On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote: > > >>The only reason why I added this array on-stack was to allow consumers > >>that did not use ib_alloc_cq api to call it, but that seems like a > >>wrong decision when thinking it over again (as probably these users > >>did not set the wr_cqe correctly). > >> > >>How about we make ib_process_cq_direct use the cq wc array and add > >>a WARN_ON statement (and fail it gracefully) if the caller used this > >>API without calling ib_alloc_cq? > > > >but we tried to avoid cuncurrent access to cq->wc. > > Not sure its a valid use-case. But if there is a compelling > reason to keep it as is, then we can do smaller on-stack > array. Did we come to a conclusion what to do here? Jason
On 2/28/2018 12:09 AM, Jason Gunthorpe wrote: > On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote: >> >>>> The only reason why I added this array on-stack was to allow consumers >>>> that did not use ib_alloc_cq api to call it, but that seems like a >>>> wrong decision when thinking it over again (as probably these users >>>> did not set the wr_cqe correctly). >>>> >>>> How about we make ib_process_cq_direct use the cq wc array and add >>>> a WARN_ON statement (and fail it gracefully) if the caller used this >>>> API without calling ib_alloc_cq? >>> >>> but we tried to avoid cuncurrent access to cq->wc. >> >> Not sure its a valid use-case. But if there is a compelling >> reason to keep it as is, then we can do smaller on-stack >> array. > > Did we come to a conclusion what to do here? guys, what do you think about the following solution (untested): diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index bc79ca8..59d2835 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -17,6 +17,7 @@ /* # of WCs to poll for with a single call to ib_poll_cq */ #define IB_POLL_BATCH 16 +#define IB_POLL_BATCH_DIRECT 8 /* # of WCs to iterate over before yielding */ #define IB_POLL_BUDGET_IRQ 256 @@ -25,17 +26,25 @@ #define IB_POLL_FLAGS \ (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS) -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc, + int batch) { - int i, n, completed = 0; - struct ib_wc *wcs = poll_wc ? : cq->wc; - + int i, n, ib_poll_batch, completed = 0; + struct ib_wc *wcs; + + if (poll_wc) { + wcs = poll_wc; + ib_poll_batch = batch; + } else { + wcs = cq->wc; + ib_poll_batch = IB_POLL_BATCH; + } /* * budget might be (-1) if the caller does not * want to bound this call, thus we need unsigned * minimum here. */ - while ((n = ib_poll_cq(cq, min_t(u32, IB_POLL_BATCH, + while ((n = ib_poll_cq(cq, min_t(u32, ib_poll_batch, budget - completed), wcs)) > 0) { for (i = 0; i < n; i++) { struct ib_wc *wc = &wcs[i]; @@ -48,7 +57,7 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) completed += n; - if (n != IB_POLL_BATCH || + if (n != ib_poll_batch || (budget != -1 && completed >= budget)) break; } @@ -72,9 +81,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc) */ int ib_process_cq_direct(struct ib_cq *cq, int budget) { - struct ib_wc wcs[IB_POLL_BATCH]; + struct ib_wc wcs[IB_POLL_BATCH_DIRECT]; - return __ib_process_cq(cq, budget, wcs); + return __ib_process_cq(cq, budget, wcs, IB_POLL_BATCH_DIRECT); } EXPORT_SYMBOL(ib_process_cq_direct); @@ -88,7 +97,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget) struct ib_cq *cq = container_of(iop, struct ib_cq, iop); int completed; - completed = __ib_process_cq(cq, budget, NULL); + completed = __ib_process_cq(cq, budget, NULL, 0); if (completed < budget) { irq_poll_complete(&cq->iop); if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) @@ -108,7 +117,7 @@ static void ib_cq_poll_work(struct work_struct *work) struct ib_cq *cq = container_of(work, struct ib_cq, work); int completed; - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL); + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL, 0); if (completed >= IB_POLL_BUDGET_WORKQUEUE || ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0) queue_work(ib_comp_wq, &cq->work); > > Jason >
On 02/27/18 14:15, Max Gurtovoy wrote: > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc > *poll_wc) > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc > *poll_wc, > + int batch) > { > - int i, n, completed = 0; > - struct ib_wc *wcs = poll_wc ? : cq->wc; > + int i, n, ib_poll_batch, completed = 0; > + struct ib_wc *wcs; > + > + if (poll_wc) { > + wcs = poll_wc; > + ib_poll_batch = batch; > + } else { > + wcs = cq->wc; > + ib_poll_batch = IB_POLL_BATCH; > + } Since this code has to be touched I think that we can use this opportunity to get rid of the "poll_wc ? : cq->wc" conditional and instead use what the caller passes. That will require to update all __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller pass ib_poll_batch instead of figuring it out in this function. Otherwise the approach of this patch looks fine to me. Thanks, Bart.
On 2/28/2018 2:21 AM, Bart Van Assche wrote: > On 02/27/18 14:15, Max Gurtovoy wrote: >> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc >> *poll_wc) >> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc >> *poll_wc, >> + int batch) >> { >> - int i, n, completed = 0; >> - struct ib_wc *wcs = poll_wc ? : cq->wc; >> + int i, n, ib_poll_batch, completed = 0; >> + struct ib_wc *wcs; >> + >> + if (poll_wc) { >> + wcs = poll_wc; >> + ib_poll_batch = batch; >> + } else { >> + wcs = cq->wc; >> + ib_poll_batch = IB_POLL_BATCH; >> + } > > Since this code has to be touched I think that we can use this > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and > instead use what the caller passes. That will require to update all > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller > pass ib_poll_batch instead of figuring it out in this function. > Otherwise the approach of this patch looks fine to me. Thanks Bart. I'll make these changes and submit. > > Thanks, > > Bart. -Max.
On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote: > > On 2/28/2018 2:21 AM, Bart Van Assche wrote: > > On 02/27/18 14:15, Max Gurtovoy wrote: > > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc > > > *poll_wc) > > > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc > > > *poll_wc, > > > + int batch) > > > { > > > - int i, n, completed = 0; > > > - struct ib_wc *wcs = poll_wc ? : cq->wc; > > > + int i, n, ib_poll_batch, completed = 0; > > > + struct ib_wc *wcs; > > > + > > > + if (poll_wc) { > > > + wcs = poll_wc; > > > + ib_poll_batch = batch; > > > + } else { > > > + wcs = cq->wc; > > > + ib_poll_batch = IB_POLL_BATCH; > > > + } > > > > Since this code has to be touched I think that we can use this > > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and > > instead use what the caller passes. That will require to update all > > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller > > pass ib_poll_batch instead of figuring it out in this function. > > Otherwise the approach of this patch looks fine to me. > > Thanks Bart. > I'll make these changes and submit. That sounds reasonable to me too, thanks for reworking and resubmitting. -- Doug Ledford <dledford@redhat.com> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
On 2/28/2018 8:55 PM, Doug Ledford wrote: > On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote: >> >> On 2/28/2018 2:21 AM, Bart Van Assche wrote: >>> On 02/27/18 14:15, Max Gurtovoy wrote: >>>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc >>>> *poll_wc) >>>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc >>>> *poll_wc, >>>> + int batch) >>>> { >>>> - int i, n, completed = 0; >>>> - struct ib_wc *wcs = poll_wc ? : cq->wc; >>>> + int i, n, ib_poll_batch, completed = 0; >>>> + struct ib_wc *wcs; >>>> + >>>> + if (poll_wc) { >>>> + wcs = poll_wc; >>>> + ib_poll_batch = batch; >>>> + } else { >>>> + wcs = cq->wc; >>>> + ib_poll_batch = IB_POLL_BATCH; >>>> + } >>> >>> Since this code has to be touched I think that we can use this >>> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and >>> instead use what the caller passes. That will require to update all >>> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller >>> pass ib_poll_batch instead of figuring it out in this function. >>> Otherwise the approach of this patch looks fine to me. >> >> Thanks Bart. >> I'll make these changes and submit. > > That sounds reasonable to me too, thanks for reworking and resubmitting. > Sure, NP. We've run NVMe-oF and SRP with the new patch. I'll send it through Mellanox maintainers pull request. Thanks for reporting and reviewing. -Max.
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c index bc79ca8215d7..2626adbb978e 100644 --- a/drivers/infiniband/core/cq.c +++ b/drivers/infiniband/core/cq.c @@ -16,7 +16,7 @@ #include <rdma/ib_verbs.h> /* # of WCs to poll for with a single call to ib_poll_cq */ -#define IB_POLL_BATCH 16 +#define IB_POLL_BATCH 8 /* # of WCs to iterate over before yielding */ #define IB_POLL_BUDGET_IRQ 256
The ib_wc structure has grown to much that putting 16 of them on the stack hits the warning limit for dangerous kernel stack consumption: drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct': drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Using half that number brings us comfortably below that limit again. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- drivers/infiniband/core/cq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.9.0