diff mbox series

[net-next] net: use in_softirq() to indicate the NAPI context in napi_consume_skb()

Message ID 1600653893-206277-1-git-send-email-linyunsheng@huawei.com
State New
Headers show
Series [net-next] net: use in_softirq() to indicate the NAPI context in napi_consume_skb() | expand

Commit Message

Yunsheng Lin Sept. 21, 2020, 2:04 a.m. UTC
When napi_consume_skb() is called in the tx desc cleaning process,
it is usually in the softirq context(BH disabled, or are processing
softirqs), but it may also be in the task context, such as in the
netpoll or loopback selftest process.

Currently napi_consume_skb() uses non-zero budget to indicate the
NAPI context, the driver writer may provide the wrong budget when
tx desc cleaning function is reused for both NAPI and non-NAPI
context, see [1].

So this patch uses in_softirq() to indicate the NAPI context, which
doesn't necessarily mean in NAPI context, but it shouldn't care if
NAPI context or not as long as it runs in softirq context or with BH
disabled, then _kfree_skb_defer() will push the skb to the particular
cpu' napi_alloc_cache atomically.

[1] https://lkml.org/lkml/2020/9/15/38

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
note that budget parameter is not removed in this patch because it
involves many driver changes, we can remove it in separate patch if
this patch is accepted.
---
 net/core/skbuff.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Sept. 21, 2020, 7:19 a.m. UTC | #1
On Mon, Sep 21, 2020 at 4:08 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> When napi_consume_skb() is called in the tx desc cleaning process,
> it is usually in the softirq context(BH disabled, or are processing
> softirqs), but it may also be in the task context, such as in the
> netpoll or loopback selftest process.
>
> Currently napi_consume_skb() uses non-zero budget to indicate the
> NAPI context, the driver writer may provide the wrong budget when
> tx desc cleaning function is reused for both NAPI and non-NAPI
> context, see [1].
>
> So this patch uses in_softirq() to indicate the NAPI context, which
> doesn't necessarily mean in NAPI context, but it shouldn't care if
> NAPI context or not as long as it runs in softirq context or with BH
> disabled, then _kfree_skb_defer() will push the skb to the particular
> cpu' napi_alloc_cache atomically.
>
> [1] https://lkml.org/lkml/2020/9/15/38
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
> note that budget parameter is not removed in this patch because it
> involves many driver changes, we can remove it in separate patch if
> this patch is accepted.
> ---
>  net/core/skbuff.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index e077447..03d0d28 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -895,8 +895,10 @@ void __kfree_skb_defer(struct sk_buff *skb)
>
>  void napi_consume_skb(struct sk_buff *skb, int budget)
>  {
> -       /* Zero budget indicate non-NAPI context called us, like netpoll */
> -       if (unlikely(!budget)) {
> +       /* called by non-softirq context, which usually means non-NAPI
> +        * context, like netpoll.
> +        */
> +       if (unlikely(!in_softirq())) {
>                 dev_consume_skb_any(skb);
>                 return;
>         }
> --


I do not think we should add this kind of fuzzy logic, just because
_one_ driver author made a mistake.

Add a disable_bh() in the driver slow path, and accept the _existing_
semantic, the one that was understood by dozens.
Yunsheng Lin Sept. 21, 2020, 8:09 a.m. UTC | #2
On 2020/9/21 15:19, Eric Dumazet wrote:
> On Mon, Sep 21, 2020 at 4:08 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:

>>

>> When napi_consume_skb() is called in the tx desc cleaning process,

>> it is usually in the softirq context(BH disabled, or are processing

>> softirqs), but it may also be in the task context, such as in the

>> netpoll or loopback selftest process.

>>

>> Currently napi_consume_skb() uses non-zero budget to indicate the

>> NAPI context, the driver writer may provide the wrong budget when

>> tx desc cleaning function is reused for both NAPI and non-NAPI

>> context, see [1].

>>

>> So this patch uses in_softirq() to indicate the NAPI context, which

>> doesn't necessarily mean in NAPI context, but it shouldn't care if

>> NAPI context or not as long as it runs in softirq context or with BH

>> disabled, then _kfree_skb_defer() will push the skb to the particular

>> cpu' napi_alloc_cache atomically.

>>

>> [1] https://lkml.org/lkml/2020/9/15/38

>>

>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>

>> ---

>> note that budget parameter is not removed in this patch because it

>> involves many driver changes, we can remove it in separate patch if

>> this patch is accepted.

>> ---

>>  net/core/skbuff.c | 6 ++++--

>>  1 file changed, 4 insertions(+), 2 deletions(-)

>>

>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

>> index e077447..03d0d28 100644

>> --- a/net/core/skbuff.c

>> +++ b/net/core/skbuff.c

>> @@ -895,8 +895,10 @@ void __kfree_skb_defer(struct sk_buff *skb)

>>

>>  void napi_consume_skb(struct sk_buff *skb, int budget)

>>  {

>> -       /* Zero budget indicate non-NAPI context called us, like netpoll */

>> -       if (unlikely(!budget)) {

>> +       /* called by non-softirq context, which usually means non-NAPI

>> +        * context, like netpoll.

>> +        */

>> +       if (unlikely(!in_softirq())) {

>>                 dev_consume_skb_any(skb);

>>                 return;

>>         }

>> --

> 

> 

> I do not think we should add this kind of fuzzy logic, just because

> _one_ driver author made a mistake.

> 

> Add a disable_bh() in the driver slow path, and accept the _existing_

> semantic, the one that was understood by dozens.


As my understanding, this patch did not change _existing_ semantic,
it still only call _kfree_skb_defer() in softirq context. This patch
just remove the requirement that a softirq context hint need to be
provided to decide whether calling _kfree_skb_defer().

Yes, we can add DEBUG_NET() clauses to catch this kind of error as
you suggested.

But why we need such a debug clauses, when we can decide if delaying
skb freeing is possible in napi_consume_skb(), why not just use
in_softirq() to make this API more easy to use? Just as __dev_kfree_skb_any()
API use "in_irq() || irqs_disabled()" checking to handle the irq context
and non-irq context.

> .

>
Eric Dumazet Sept. 21, 2020, 8:17 a.m. UTC | #3
On Mon, Sep 21, 2020 at 10:10 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2020/9/21 15:19, Eric Dumazet wrote:
> > On Mon, Sep 21, 2020 at 4:08 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
> >>
> >> When napi_consume_skb() is called in the tx desc cleaning process,
> >> it is usually in the softirq context(BH disabled, or are processing
> >> softirqs), but it may also be in the task context, such as in the
> >> netpoll or loopback selftest process.
> >>
> >> Currently napi_consume_skb() uses non-zero budget to indicate the
> >> NAPI context, the driver writer may provide the wrong budget when
> >> tx desc cleaning function is reused for both NAPI and non-NAPI
> >> context, see [1].
> >>
> >> So this patch uses in_softirq() to indicate the NAPI context, which
> >> doesn't necessarily mean in NAPI context, but it shouldn't care if
> >> NAPI context or not as long as it runs in softirq context or with BH
> >> disabled, then _kfree_skb_defer() will push the skb to the particular
> >> cpu' napi_alloc_cache atomically.
> >>
> >> [1] https://lkml.org/lkml/2020/9/15/38
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >> ---
> >> note that budget parameter is not removed in this patch because it
> >> involves many driver changes, we can remove it in separate patch if
> >> this patch is accepted.
> >> ---
> >>  net/core/skbuff.c | 6 ++++--
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index e077447..03d0d28 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -895,8 +895,10 @@ void __kfree_skb_defer(struct sk_buff *skb)
> >>
> >>  void napi_consume_skb(struct sk_buff *skb, int budget)
> >>  {
> >> -       /* Zero budget indicate non-NAPI context called us, like netpoll */
> >> -       if (unlikely(!budget)) {
> >> +       /* called by non-softirq context, which usually means non-NAPI
> >> +        * context, like netpoll.
> >> +        */
> >> +       if (unlikely(!in_softirq())) {
> >>                 dev_consume_skb_any(skb);
> >>                 return;
> >>         }
> >> --
> >
> >
> > I do not think we should add this kind of fuzzy logic, just because
> > _one_ driver author made a mistake.
> >
> > Add a disable_bh() in the driver slow path, and accept the _existing_
> > semantic, the one that was understood by dozens.
>
> As my understanding, this patch did not change _existing_ semantic,
> it still only call _kfree_skb_defer() in softirq context. This patch
> just remove the requirement that a softirq context hint need to be
> provided to decide whether calling _kfree_skb_defer().

I do not want to remove the requirement.

>
> Yes, we can add DEBUG_NET() clauses to catch this kind of error as
> you suggested.
>
> But why we need such a debug clauses, when we can decide if delaying
> skb freeing is possible in napi_consume_skb(), why not just use
> in_softirq() to make this API more easy to use? Just as __dev_kfree_skb_any()
> API use "in_irq() || irqs_disabled()" checking to handle the irq context
> and non-irq context.


I just do not like your patch.

Copying another piece of fuzzy logic, inherited from legacy code is
not an excuse.

Add a local_bh_disable() in the driver slow path to meet _existing_
requirement, so that we can keep the hot path fast.
Yunsheng Lin Sept. 21, 2020, 8:40 a.m. UTC | #4
On 2020/9/21 16:17, Eric Dumazet wrote:
> On Mon, Sep 21, 2020 at 10:10 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2020/9/21 15:19, Eric Dumazet wrote:
>>> On Mon, Sep 21, 2020 at 4:08 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>>>
>>>> When napi_consume_skb() is called in the tx desc cleaning process,
>>>> it is usually in the softirq context(BH disabled, or are processing
>>>> softirqs), but it may also be in the task context, such as in the
>>>> netpoll or loopback selftest process.
>>>>
>>>> Currently napi_consume_skb() uses non-zero budget to indicate the
>>>> NAPI context, the driver writer may provide the wrong budget when
>>>> tx desc cleaning function is reused for both NAPI and non-NAPI
>>>> context, see [1].
>>>>
>>>> So this patch uses in_softirq() to indicate the NAPI context, which
>>>> doesn't necessarily mean in NAPI context, but it shouldn't care if
>>>> NAPI context or not as long as it runs in softirq context or with BH
>>>> disabled, then _kfree_skb_defer() will push the skb to the particular
>>>> cpu' napi_alloc_cache atomically.
>>>>
>>>> [1] https://lkml.org/lkml/2020/9/15/38
>>>>
>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>>>> ---
>>>> note that budget parameter is not removed in this patch because it
>>>> involves many driver changes, we can remove it in separate patch if
>>>> this patch is accepted.
>>>> ---
>>>>  net/core/skbuff.c | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index e077447..03d0d28 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -895,8 +895,10 @@ void __kfree_skb_defer(struct sk_buff *skb)
>>>>
>>>>  void napi_consume_skb(struct sk_buff *skb, int budget)
>>>>  {
>>>> -       /* Zero budget indicate non-NAPI context called us, like netpoll */
>>>> -       if (unlikely(!budget)) {
>>>> +       /* called by non-softirq context, which usually means non-NAPI
>>>> +        * context, like netpoll.
>>>> +        */
>>>> +       if (unlikely(!in_softirq())) {
>>>>                 dev_consume_skb_any(skb);
>>>>                 return;
>>>>         }
>>>> --
>>>
>>>
>>> I do not think we should add this kind of fuzzy logic, just because
>>> _one_ driver author made a mistake.
>>>
>>> Add a disable_bh() in the driver slow path, and accept the _existing_
>>> semantic, the one that was understood by dozens.
>>
>> As my understanding, this patch did not change _existing_ semantic,
>> it still only call _kfree_skb_defer() in softirq context. This patch
>> just remove the requirement that a softirq context hint need to be
>> provided to decide whether calling _kfree_skb_defer().
> 
> I do not want to remove the requirement.
> 
>>
>> Yes, we can add DEBUG_NET() clauses to catch this kind of error as
>> you suggested.
>>
>> But why we need such a debug clauses, when we can decide if delaying
>> skb freeing is possible in napi_consume_skb(), why not just use
>> in_softirq() to make this API more easy to use? Just as __dev_kfree_skb_any()
>> API use "in_irq() || irqs_disabled()" checking to handle the irq context
>> and non-irq context.
> 
> 
> I just do not like your patch.
> 
> Copying another piece of fuzzy logic, inherited from legacy code is
> not an excuse.
> 
> Add a local_bh_disable() in the driver slow path to meet _existing_
> requirement, so that we can keep the hot path fast.

"!in_softirq()" checking make the napi_consume_skb() slower than
"!budget" checking? do I miss something?

As a matter of fact, the hns3 driver has fixed this problem by
passing zero-budget to napi_consume_skb() in non-NAPI context, this
patch is more about how to avoid or catch this kind of error.

So your opinion is still to catch this kind of error using something
like DEBUG_NET() clauses?

> .
>
Eric Dumazet Sept. 21, 2020, 9:09 a.m. UTC | #5
On 9/21/20 10:40 AM, Yunsheng Lin wrote:
> On 2020/9/21 16:17, Eric Dumazet wrote:

>> On Mon, Sep 21, 2020 at 10:10 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:

>>>

>>> On 2020/9/21 15:19, Eric Dumazet wrote:

>>>> On Mon, Sep 21, 2020 at 4:08 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:

>>>>>

>>>>> When napi_consume_skb() is called in the tx desc cleaning process,

>>>>> it is usually in the softirq context(BH disabled, or are processing

>>>>> softirqs), but it may also be in the task context, such as in the

>>>>> netpoll or loopback selftest process.

>>>>>

>>>>> Currently napi_consume_skb() uses non-zero budget to indicate the

>>>>> NAPI context, the driver writer may provide the wrong budget when

>>>>> tx desc cleaning function is reused for both NAPI and non-NAPI

>>>>> context, see [1].

>>>>>

>>>>> So this patch uses in_softirq() to indicate the NAPI context, which

>>>>> doesn't necessarily mean in NAPI context, but it shouldn't care if

>>>>> NAPI context or not as long as it runs in softirq context or with BH

>>>>> disabled, then _kfree_skb_defer() will push the skb to the particular

>>>>> cpu' napi_alloc_cache atomically.

>>>>>

>>>>> [1] https://lkml.org/lkml/2020/9/15/38

>>>>>

>>>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>

>>>>> ---

>>>>> note that budget parameter is not removed in this patch because it

>>>>> involves many driver changes, we can remove it in separate patch if

>>>>> this patch is accepted.

>>>>> ---

>>>>>  net/core/skbuff.c | 6 ++++--

>>>>>  1 file changed, 4 insertions(+), 2 deletions(-)

>>>>>

>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

>>>>> index e077447..03d0d28 100644

>>>>> --- a/net/core/skbuff.c

>>>>> +++ b/net/core/skbuff.c

>>>>> @@ -895,8 +895,10 @@ void __kfree_skb_defer(struct sk_buff *skb)

>>>>>

>>>>>  void napi_consume_skb(struct sk_buff *skb, int budget)

>>>>>  {

>>>>> -       /* Zero budget indicate non-NAPI context called us, like netpoll */

>>>>> -       if (unlikely(!budget)) {

>>>>> +       /* called by non-softirq context, which usually means non-NAPI

>>>>> +        * context, like netpoll.

>>>>> +        */

>>>>> +       if (unlikely(!in_softirq())) {

>>>>>                 dev_consume_skb_any(skb);

>>>>>                 return;

>>>>>         }

>>>>> --

>>>>

>>>>

>>>> I do not think we should add this kind of fuzzy logic, just because

>>>> _one_ driver author made a mistake.

>>>>

>>>> Add a disable_bh() in the driver slow path, and accept the _existing_

>>>> semantic, the one that was understood by dozens.

>>>

>>> As my understanding, this patch did not change _existing_ semantic,

>>> it still only call _kfree_skb_defer() in softirq context. This patch

>>> just remove the requirement that a softirq context hint need to be

>>> provided to decide whether calling _kfree_skb_defer().

>>

>> I do not want to remove the requirement.

>>

>>>

>>> Yes, we can add DEBUG_NET() clauses to catch this kind of error as

>>> you suggested.

>>>

>>> But why we need such a debug clauses, when we can decide if delaying

>>> skb freeing is possible in napi_consume_skb(), why not just use

>>> in_softirq() to make this API more easy to use? Just as __dev_kfree_skb_any()

>>> API use "in_irq() || irqs_disabled()" checking to handle the irq context

>>> and non-irq context.

>>

>>

>> I just do not like your patch.

>>

>> Copying another piece of fuzzy logic, inherited from legacy code is

>> not an excuse.

>>

>> Add a local_bh_disable() in the driver slow path to meet _existing_

>> requirement, so that we can keep the hot path fast.

> 

> "!in_softirq()" checking make the napi_consume_skb() slower than

> "!budget" checking? do I miss something?


Yes, you missed that we can _remove_ this condition completely, if drivers
make sure to always have BH disabled.

> 

> As a matter of fact, the hns3 driver has fixed this problem by

> passing zero-budget to napi_consume_skb() in non-NAPI context, this

> patch is more about how to avoid or catch this kind of error.


I think I understood this. Having one error in hns3 does not mean we are going
to slow down the stack.

> 

> So your opinion is still to catch this kind of error using something

> like DEBUG_NET() clauses?

> 


Yes.

Again, we want to keep fast path fast, not something that is an swiss army knife.
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e077447..03d0d28 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -895,8 +895,10 @@  void __kfree_skb_defer(struct sk_buff *skb)
 
 void napi_consume_skb(struct sk_buff *skb, int budget)
 {
-	/* Zero budget indicate non-NAPI context called us, like netpoll */
-	if (unlikely(!budget)) {
+	/* called by non-softirq context, which usually means non-NAPI
+	 * context, like netpoll.
+	 */
+	if (unlikely(!in_softirq())) {
 		dev_consume_skb_any(skb);
 		return;
 	}