RDMA/core: reduce IB_POLL_BATCH constant

Message ID	20180220205924.2035765-1-arnd@arndb.de
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Arnd Bergmann <arnd@arndb.de> To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@ziepe.ca> Cc: Arnd Bergmann <arnd@arndb.de>, Leon Romanovsky <leonro@mellanox.com>, Sagi Grimberg <sagi@grimberg.me>, Bart Van Assche <bart.vanassche@sandisk.com>, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Date: Tue, 20 Feb 2018 21:59:09 +0100 Message-Id: <20180220205924.2035765-1-arnd@arndb.de> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	RDMA/core: reduce IB_POLL_BATCH constant \| expand RDMA/core: reduce IB_POLL_BATCH constant

Arnd Bergmann Feb. 20, 2018, 8:59 p.m. UTC

The ib_wc structure has grown to much that putting 16 of them on the stack
hits the warning limit for dangerous kernel stack consumption:

drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Using half that number brings us comfortably below that limit again.

Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
 drivers/infiniband/core/cq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.9.0

Parav Pandit Feb. 20, 2018, 9:14 p.m. UTC | #1

Hi Arnd Bergmann,

> -----Original Message-----

> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-

> owner@vger.kernel.org] On Behalf Of Arnd Bergmann

> Sent: Tuesday, February 20, 2018 2:59 PM

> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>

> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky

> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche

> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-

> kernel@vger.kernel.org

> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

> 

> The ib_wc structure has grown to much that putting 16 of them on the stack hits

> the warning limit for dangerous kernel stack consumption:

> 

> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':

> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger

> than 1024 bytes [-Werror=frame-larger-than=]

> 

> Using half that number brings us comfortably below that limit again.

> 

> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track

> RDMA resources")


It is not clear to me how above commit 02d8883f520e introduced this stack issue.

Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?

[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/

Bart Van Assche Feb. 20, 2018, 9:14 p.m. UTC | #2

On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>  /* # of WCs to poll for with a single call to ib_poll_cq */

> -#define IB_POLL_BATCH			16

> +#define IB_POLL_BATCH			8

The purpose of batch polling is to minimize contention on the cq spinlock.
Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
the performance impact of this change been verified for all affected drivers
(ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
over RDMA, ...)?

Bart.

Chuck Lever Feb. 20, 2018, 9:47 p.m. UTC | #3

> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

> 

> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:

>> /* # of WCs to poll for with a single call to ib_poll_cq */

>> -#define IB_POLL_BATCH			16

>> +#define IB_POLL_BATCH			8

> 

> The purpose of batch polling is to minimize contention on the cq spinlock.

> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has

> the performance impact of this change been verified for all affected drivers

> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS

> over RDMA, ...)?

Only the users of the DIRECT polling method use an on-stack
array of ib_wc's. This is only the SRP drivers.

The other two modes have use of a dynamically allocated array
of ib_wc's that hangs off the ib_cq. These shouldn't need any
reduction in the size of this array, and they are the common
case.

IMO a better solution would be to change ib_process_cq_direct
to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

--
Chuck Lever

Arnd Bergmann Feb. 20, 2018, 9:54 p.m. UTC | #4

On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <parav@mellanox.com> wrote:
> Hi Arnd Bergmann,

>

>> -----Original Message-----

>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-

>> owner@vger.kernel.org] On Behalf Of Arnd Bergmann

>> Sent: Tuesday, February 20, 2018 2:59 PM

>> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>

>> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky

>> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche

>> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-

>> kernel@vger.kernel.org

>> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant

>>

>> The ib_wc structure has grown to much that putting 16 of them on the stack hits

>> the warning limit for dangerous kernel stack consumption:

>>

>> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':

>> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger

>> than 1024 bytes [-Werror=frame-larger-than=]

>>

>> Using half that number brings us comfortably below that limit again.

>>

>> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track

>> RDMA resources")

>

> It is not clear to me how above commit 02d8883f520e introduced this stack issue.


My mistake, I misread the git history.

I did a proper bisection now and ended up with the commit that added the
IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq:
Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")

> Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].

> Did you hit this error after/before applying patch [2]?

>

> [1] https://www.spinics.net/lists/linux-rdma/msg50754.html

> [2] https://patchwork.kernel.org/patch/10159623/


I did the analysis a few weeks ago when I first hit the problem but
didn't send it
out at the time. Today I saw the problem still persists on mainline (4.16-rc2),
which does contain the patch from [2].

What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes
of padding, so 16 of them gets us exactly the warning limit, and then there
are a few bytes for the function itself.

       Arnd

Max Gurtovoy Feb. 21, 2018, 9:47 a.m. UTC | #5

On 2/20/2018 11:47 PM, Chuck Lever wrote:
> 

> 

>> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:

>>

>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:

>>> /* # of WCs to poll for with a single call to ib_poll_cq */

>>> -#define IB_POLL_BATCH			16

>>> +#define IB_POLL_BATCH			8

>>

>> The purpose of batch polling is to minimize contention on the cq spinlock.

>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has

>> the performance impact of this change been verified for all affected drivers

>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS

>> over RDMA, ...)?

> 

> Only the users of the DIRECT polling method use an on-stack

> array of ib_wc's. This is only the SRP drivers.

> 

> The other two modes have use of a dynamically allocated array

> of ib_wc's that hangs off the ib_cq. These shouldn't need any

> reduction in the size of this array, and they are the common

> case.

> 

> IMO a better solution would be to change ib_process_cq_direct

> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.


Yup, good idea.
you can define IB_DIRECT_POLL_BATCH to be 8 and use it in 
ib_process_cq_direct. *but* please make sure to use the right value in 
ib_poll_cq since the wcs array should be able to hold the requested 
amount of wcs.

-Max.

> 

> --

> Chuck Lever

> 

> 

> 

> --

> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

>

Sagi Grimberg Feb. 21, 2018, 1:44 p.m. UTC | #6

>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:

>>> /* # of WCs to poll for with a single call to ib_poll_cq */

>>> -#define IB_POLL_BATCH			16

>>> +#define IB_POLL_BATCH			8

>>

>> The purpose of batch polling is to minimize contention on the cq spinlock.

>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has

>> the performance impact of this change been verified for all affected drivers

>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS

>> over RDMA, ...)?

> 

> Only the users of the DIRECT polling method use an on-stack

> array of ib_wc's. This is only the SRP drivers.

> 

> The other two modes have use of a dynamically allocated array

> of ib_wc's that hangs off the ib_cq. These shouldn't need any

> reduction in the size of this array, and they are the common

> case.

> 

> IMO a better solution would be to change ib_process_cq_direct

> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.


The only reason why I added this array on-stack was to allow consumers
that did not use ib_alloc_cq api to call it, but that seems like a
wrong decision when thinking it over again (as probably these users
did not set the wr_cqe correctly).

How about we make ib_process_cq_direct use the cq wc array and add
a WARN_ON statement (and fail it gracefully) if the caller used this
API without calling ib_alloc_cq?

--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..cd3e9e124834 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -25,10 +25,10 @@
  #define IB_POLL_FLAGS \
         (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget)
  {
         int i, n, completed = 0;
-       struct ib_wc *wcs = poll_wc ? : cq->wc;
+       struct ib_wc *wcs = cq->wc;

         /*
          * budget might be (-1) if the caller does not
@@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       struct ib_wc wcs[IB_POLL_BATCH];
-
-       return __ib_process_cq(cq, budget, wcs);
+       if (unlikely(WARN_ON_ONCE(!cq->wc)))
+               return 0;
+       return __ib_process_cq(cq, budget);
  }
  EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 
budget)
         struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
         int completed;

-       completed = __ib_process_cq(cq, budget, NULL);
+       completed = __ib_process_cq(cq, budget);
         if (completed < budget) {
                 irq_poll_complete(&cq->iop);
                 if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
         struct ib_cq *cq = container_of(work, struct ib_cq, work);
         int completed;

-       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
         if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
             ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
                 queue_work(ib_comp_wq, &cq->work);
--

Max Gurtovoy Feb. 21, 2018, 2:45 p.m. UTC | #7

On 2/21/2018 3:44 PM, Sagi Grimberg wrote:
> 

>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:

>>>> /* # of WCs to poll for with a single call to ib_poll_cq */

>>>> -#define IB_POLL_BATCH            16

>>>> +#define IB_POLL_BATCH            8

>>>

>>> The purpose of batch polling is to minimize contention on the cq 

>>> spinlock.

>>> Reducing the IB_POLL_BATCH constant may affect performance 

>>> negatively. Has

>>> the performance impact of this change been verified for all affected 

>>> drivers

>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB 

>>> Direct, NFS

>>> over RDMA, ...)?

>>

>> Only the users of the DIRECT polling method use an on-stack

>> array of ib_wc's. This is only the SRP drivers.

>>

>> The other two modes have use of a dynamically allocated array

>> of ib_wc's that hangs off the ib_cq. These shouldn't need any

>> reduction in the size of this array, and they are the common

>> case.

>>

>> IMO a better solution would be to change ib_process_cq_direct

>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

> 

> The only reason why I added this array on-stack was to allow consumers

> that did not use ib_alloc_cq api to call it, but that seems like a

> wrong decision when thinking it over again (as probably these users

> did not set the wr_cqe correctly).

> 

> How about we make ib_process_cq_direct use the cq wc array and add

> a WARN_ON statement (and fail it gracefully) if the caller used this

> API without calling ib_alloc_cq?


but we tried to avoid cuncurrent access to cq->wc.
Why can't we use the solution I wrote above ?

> 

> -- 

> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c

> index bc79ca8215d7..cd3e9e124834 100644

> --- a/drivers/infiniband/core/cq.c

> +++ b/drivers/infiniband/core/cq.c

> @@ -25,10 +25,10 @@

>   #define IB_POLL_FLAGS \

>          (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

> 

> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

> *poll_wc)

> +static int __ib_process_cq(struct ib_cq *cq, int budget)

>   {

>          int i, n, completed = 0;

> -       struct ib_wc *wcs = poll_wc ? : cq->wc;

> +       struct ib_wc *wcs = cq->wc;

> 

>          /*

>           * budget might be (-1) if the caller does not

> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 

> budget, struct ib_wc *poll_wc)

>    */

>   int ib_process_cq_direct(struct ib_cq *cq, int budget)

>   {

> -       struct ib_wc wcs[IB_POLL_BATCH];

> -

> -       return __ib_process_cq(cq, budget, wcs);

> +       if (unlikely(WARN_ON_ONCE(!cq->wc)))

> +               return 0;

> +       return __ib_process_cq(cq, budget);

>   }

>   EXPORT_SYMBOL(ib_process_cq_direct);

> 

> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 

> budget)

>          struct ib_cq *cq = container_of(iop, struct ib_cq, iop);

>          int completed;

> 

> -       completed = __ib_process_cq(cq, budget, NULL);

> +       completed = __ib_process_cq(cq, budget);

>          if (completed < budget) {

>                  irq_poll_complete(&cq->iop);

>                  if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)

> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)

>          struct ib_cq *cq = container_of(work, struct ib_cq, work);

>          int completed;

> 

> -       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);

> +       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);

>          if (completed >= IB_POLL_BUDGET_WORKQUEUE ||

>              ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)

>                  queue_work(ib_comp_wq, &cq->work);

> -- 

> -- 

> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chuck Lever Feb. 21, 2018, 3:10 p.m. UTC | #8

> On Feb 21, 2018, at 8:44 AM, Sagi Grimberg <sagi@grimberg.me> wrote:

> 

> 

>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:

>>>> /* # of WCs to poll for with a single call to ib_poll_cq */

>>>> -#define IB_POLL_BATCH			16

>>>> +#define IB_POLL_BATCH			8

>>> 

>>> The purpose of batch polling is to minimize contention on the cq spinlock.

>>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has

>>> the performance impact of this change been verified for all affected drivers

>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS

>>> over RDMA, ...)?

>> Only the users of the DIRECT polling method use an on-stack

>> array of ib_wc's. This is only the SRP drivers.

>> The other two modes have use of a dynamically allocated array

>> of ib_wc's that hangs off the ib_cq. These shouldn't need any

>> reduction in the size of this array, and they are the common

>> case.

>> IMO a better solution would be to change ib_process_cq_direct

>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

> 

> The only reason why I added this array on-stack was to allow consumers

> that did not use ib_alloc_cq api to call it, but that seems like a

> wrong decision when thinking it over again (as probably these users

> did not set the wr_cqe correctly).

> 

> How about we make ib_process_cq_direct use the cq wc array and add

> a WARN_ON statement (and fail it gracefully) if the caller used this

> API without calling ib_alloc_cq?


Agreed, I prefer that all three modes use dynamically allocated
memory for that array.


> --

> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c

> index bc79ca8215d7..cd3e9e124834 100644

> --- a/drivers/infiniband/core/cq.c

> +++ b/drivers/infiniband/core/cq.c

> @@ -25,10 +25,10 @@

> #define IB_POLL_FLAGS \

>        (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

> 

> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)

> +static int __ib_process_cq(struct ib_cq *cq, int budget)

> {

>        int i, n, completed = 0;

> -       struct ib_wc *wcs = poll_wc ? : cq->wc;

> +       struct ib_wc *wcs = cq->wc;

> 

>        /*

>         * budget might be (-1) if the caller does not

> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)

>  */

> int ib_process_cq_direct(struct ib_cq *cq, int budget)

> {

> -       struct ib_wc wcs[IB_POLL_BATCH];

> -

> -       return __ib_process_cq(cq, budget, wcs);

> +       if (unlikely(WARN_ON_ONCE(!cq->wc)))

> +               return 0;

> +       return __ib_process_cq(cq, budget);

> }

> EXPORT_SYMBOL(ib_process_cq_direct);

> 

> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)

>        struct ib_cq *cq = container_of(iop, struct ib_cq, iop);

>        int completed;

> 

> -       completed = __ib_process_cq(cq, budget, NULL);

> +       completed = __ib_process_cq(cq, budget);

>        if (completed < budget) {

>                irq_poll_complete(&cq->iop);

>                if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)

> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)

>        struct ib_cq *cq = container_of(work, struct ib_cq, work);

>        int completed;

> 

> -       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);

> +       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);

>        if (completed >= IB_POLL_BUDGET_WORKQUEUE ||

>            ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)

>                queue_work(ib_comp_wq, &cq->work);

> --


--
Chuck Lever

Sagi Grimberg Feb. 22, 2018, 3:39 p.m. UTC | #9

>> The only reason why I added this array on-stack was to allow consumers

>> that did not use ib_alloc_cq api to call it, but that seems like a

>> wrong decision when thinking it over again (as probably these users

>> did not set the wr_cqe correctly).

>>

>> How about we make ib_process_cq_direct use the cq wc array and add

>> a WARN_ON statement (and fail it gracefully) if the caller used this

>> API without calling ib_alloc_cq?

> 

> but we tried to avoid cuncurrent access to cq->wc.


Not sure its a valid use-case. But if there is a compelling
reason to keep it as is, then we can do smaller on-stack
array.

Jason Gunthorpe Feb. 27, 2018, 10:09 p.m. UTC | #10

On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
> 

> >>The only reason why I added this array on-stack was to allow consumers

> >>that did not use ib_alloc_cq api to call it, but that seems like a

> >>wrong decision when thinking it over again (as probably these users

> >>did not set the wr_cqe correctly).

> >>

> >>How about we make ib_process_cq_direct use the cq wc array and add

> >>a WARN_ON statement (and fail it gracefully) if the caller used this

> >>API without calling ib_alloc_cq?

> >

> >but we tried to avoid cuncurrent access to cq->wc.

> 

> Not sure its a valid use-case. But if there is a compelling

> reason to keep it as is, then we can do smaller on-stack

> array.


Did we come to a conclusion what to do here?

Jason

Max Gurtovoy Feb. 27, 2018, 10:15 p.m. UTC | #11

On 2/28/2018 12:09 AM, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:

>>

>>>> The only reason why I added this array on-stack was to allow consumers

>>>> that did not use ib_alloc_cq api to call it, but that seems like a

>>>> wrong decision when thinking it over again (as probably these users

>>>> did not set the wr_cqe correctly).

>>>>

>>>> How about we make ib_process_cq_direct use the cq wc array and add

>>>> a WARN_ON statement (and fail it gracefully) if the caller used this

>>>> API without calling ib_alloc_cq?

>>>

>>> but we tried to avoid cuncurrent access to cq->wc.

>>

>> Not sure its a valid use-case. But if there is a compelling

>> reason to keep it as is, then we can do smaller on-stack

>> array.

> 

> Did we come to a conclusion what to do here?


guys,
what do you think about the following solution (untested):


diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8..59d2835 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -17,6 +17,7 @@

  /* # of WCs to poll for with a single call to ib_poll_cq */
  #define IB_POLL_BATCH                  16
+#define IB_POLL_BATCH_DIRECT           8

  /* # of WCs to iterate over before yielding */
  #define IB_POLL_BUDGET_IRQ             256
@@ -25,17 +26,25 @@
  #define IB_POLL_FLAGS \
         (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc,
+                          int batch)
  {
-       int i, n, completed = 0;
-       struct ib_wc *wcs = poll_wc ? : cq->wc;
-
+       int i, n, ib_poll_batch, completed = 0;
+       struct ib_wc *wcs;
+
+       if (poll_wc) {
+               wcs = poll_wc;
+               ib_poll_batch = batch;
+       } else {
+               wcs = cq->wc;
+               ib_poll_batch = IB_POLL_BATCH;
+       }
         /*
          * budget might be (-1) if the caller does not
          * want to bound this call, thus we need unsigned
          * minimum here.
          */
-       while ((n = ib_poll_cq(cq, min_t(u32, IB_POLL_BATCH,
+       while ((n = ib_poll_cq(cq, min_t(u32, ib_poll_batch,
                         budget - completed), wcs)) > 0) {
                 for (i = 0; i < n; i++) {
                         struct ib_wc *wc = &wcs[i];
@@ -48,7 +57,7 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)

                 completed += n;

-               if (n != IB_POLL_BATCH ||
+               if (n != ib_poll_batch ||
                     (budget != -1 && completed >= budget))
                         break;
         }
@@ -72,9 +81,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       struct ib_wc wcs[IB_POLL_BATCH];
+       struct ib_wc wcs[IB_POLL_BATCH_DIRECT];

-       return __ib_process_cq(cq, budget, wcs);
+       return __ib_process_cq(cq, budget, wcs, IB_POLL_BATCH_DIRECT);
  }
  EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +97,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 
budget)
         struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
         int completed;

-       completed = __ib_process_cq(cq, budget, NULL);
+       completed = __ib_process_cq(cq, budget, NULL, 0);
         if (completed < budget) {
                 irq_poll_complete(&cq->iop);
                 if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +117,7 @@ static void ib_cq_poll_work(struct work_struct *work)
         struct ib_cq *cq = container_of(work, struct ib_cq, work);
         int completed;

-       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL, 0);
         if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
             ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
                 queue_work(ib_comp_wq, &cq->work);



> 

> Jason

>

Bart Van Assche Feb. 28, 2018, 12:21 a.m. UTC | #12

On 02/27/18 14:15, Max Gurtovoy wrote:
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

> *poll_wc)

> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

> *poll_wc,

> +                          int batch)

>   {

> -       int i, n, completed = 0;

> -       struct ib_wc *wcs = poll_wc ? : cq->wc;

> +       int i, n, ib_poll_batch, completed = 0;

> +       struct ib_wc *wcs;

> +

> +       if (poll_wc) {

> +               wcs = poll_wc;

> +               ib_poll_batch = batch;

> +       } else {

> +               wcs = cq->wc;

> +               ib_poll_batch = IB_POLL_BATCH;

> +       }

Since this code has to be touched I think that we can use this 
opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 
instead use what the caller passes. That will require to update all 
__ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 
pass ib_poll_batch instead of figuring it out in this function. 
Otherwise the approach of this patch looks fine to me.

Thanks,

Bart.

Max Gurtovoy Feb. 28, 2018, 9:50 a.m. UTC | #13

On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> On 02/27/18 14:15, Max Gurtovoy wrote:

>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

>> *poll_wc)

>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

>> *poll_wc,

>> +                          int batch)

>>   {

>> -       int i, n, completed = 0;

>> -       struct ib_wc *wcs = poll_wc ? : cq->wc;

>> +       int i, n, ib_poll_batch, completed = 0;

>> +       struct ib_wc *wcs;

>> +

>> +       if (poll_wc) {

>> +               wcs = poll_wc;

>> +               ib_poll_batch = batch;

>> +       } else {

>> +               wcs = cq->wc;

>> +               ib_poll_batch = IB_POLL_BATCH;

>> +       }

> 

> Since this code has to be touched I think that we can use this 

> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 

> instead use what the caller passes. That will require to update all 

> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 

> pass ib_poll_batch instead of figuring it out in this function. 

> Otherwise the approach of this patch looks fine to me.


Thanks Bart.
I'll make these changes and submit.

> 

> Thanks,

> 

> Bart.


-Max.

Doug Ledford Feb. 28, 2018, 6:55 p.m. UTC | #14

On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
> 

> On 2/28/2018 2:21 AM, Bart Van Assche wrote:

> > On 02/27/18 14:15, Max Gurtovoy wrote:

> > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

> > > *poll_wc)

> > > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 

> > > *poll_wc,

> > > +                          int batch)

> > >   {

> > > -       int i, n, completed = 0;

> > > -       struct ib_wc *wcs = poll_wc ? : cq->wc;

> > > +       int i, n, ib_poll_batch, completed = 0;

> > > +       struct ib_wc *wcs;

> > > +

> > > +       if (poll_wc) {

> > > +               wcs = poll_wc;

> > > +               ib_poll_batch = batch;

> > > +       } else {

> > > +               wcs = cq->wc;

> > > +               ib_poll_batch = IB_POLL_BATCH;

> > > +       }

> > 

> > Since this code has to be touched I think that we can use this 

> > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 

> > instead use what the caller passes. That will require to update all 

> > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 

> > pass ib_poll_batch instead of figuring it out in this function. 

> > Otherwise the approach of this patch looks fine to me.

> 

> Thanks Bart.

> I'll make these changes and submit.


That sounds reasonable to me too, thanks for reworking and resubmitting.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

Max Gurtovoy March 1, 2018, 9:36 a.m. UTC | #15

On 2/28/2018 8:55 PM, Doug Ledford wrote:
> On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:

>>

>> On 2/28/2018 2:21 AM, Bart Van Assche wrote:

>>> On 02/27/18 14:15, Max Gurtovoy wrote:

>>>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc

>>>> *poll_wc)

>>>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc

>>>> *poll_wc,

>>>> +                          int batch)

>>>>    {

>>>> -       int i, n, completed = 0;

>>>> -       struct ib_wc *wcs = poll_wc ? : cq->wc;

>>>> +       int i, n, ib_poll_batch, completed = 0;

>>>> +       struct ib_wc *wcs;

>>>> +

>>>> +       if (poll_wc) {

>>>> +               wcs = poll_wc;

>>>> +               ib_poll_batch = batch;

>>>> +       } else {

>>>> +               wcs = cq->wc;

>>>> +               ib_poll_batch = IB_POLL_BATCH;

>>>> +       }

>>>

>>> Since this code has to be touched I think that we can use this

>>> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and

>>> instead use what the caller passes. That will require to update all

>>> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller

>>> pass ib_poll_batch instead of figuring it out in this function.

>>> Otherwise the approach of this patch looks fine to me.

>>

>> Thanks Bart.

>> I'll make these changes and submit.

> 

> That sounds reasonable to me too, thanks for reworking and resubmitting.

> 


Sure, NP.
We've run NVMe-oF and SRP with the new patch.
I'll send it through Mellanox maintainers pull request.

Thanks for reporting and reviewing.

-Max.

RDMA/core: reduce IB_POLL_BATCH constant

Commit Message

Comments

Patch