diff mbox series

[net-next,v1,1/2] net: core: count drops from GRO

Message ID 20210106215539.2103688-2-jesse.brandeburg@intel.com
State New
Headers show
Series [net-next,v1,1/2] net: core: count drops from GRO | expand

Commit Message

Jesse Brandeburg Jan. 6, 2021, 9:55 p.m. UTC
When drivers call the various receive upcalls to receive an skb
to the stack, sometimes that stack can drop the packet. The good
news is that the return code is given to all the drivers of
NET_RX_DROP or GRO_DROP. The bad news is that no drivers except
the one "ice" driver that I changed, check the stat and increment
the dropped count. This is currently leading to packets that
arrive at the edge interface and are fully handled by the driver
and then mysteriously disappear.

Rather than fix all drivers to increment the drop stat when
handling the return code, emulate the already existing statistic
update for NET_RX_DROP events for the two GRO_DROP locations, and
increment the dev->rx_dropped associated with the skb.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/core/dev.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Shannon Nelson Jan. 8, 2021, 12:50 a.m. UTC | #1
On 1/6/21 1:55 PM, Jesse Brandeburg wrote:
> When drivers call the various receive upcalls to receive an skb

> to the stack, sometimes that stack can drop the packet. The good

> news is that the return code is given to all the drivers of

> NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

> the one "ice" driver that I changed, check the stat and increment


If the stack is dropping the packet, isn't it up to the stack to track 
that, perhaps with something that shows up in netstat -s?  We don't 
really want to make the driver responsible for any drops that happen 
above its head, do we?

sln

> the dropped count. This is currently leading to packets that

> arrive at the edge interface and are fully handled by the driver

> and then mysteriously disappear.

>

> Rather than fix all drivers to increment the drop stat when

> handling the return code, emulate the already existing statistic

> update for NET_RX_DROP events for the two GRO_DROP locations, and

> increment the dev->rx_dropped associated with the skb.

>

> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

> Cc: Eric Dumazet <edumazet@google.com>

> Cc: Jamal Hadi Salim <jhs@mojatatu.com>

> ---

>   net/core/dev.c | 2 ++

>   1 file changed, 2 insertions(+)

>

> diff --git a/net/core/dev.c b/net/core/dev.c

> index 8fa739259041..ef34043a9550 100644

> --- a/net/core/dev.c

> +++ b/net/core/dev.c

> @@ -6071,6 +6071,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi,

>   		break;

>   

>   	case GRO_DROP:

> +		atomic_long_inc(&skb->dev->rx_dropped);

>   		kfree_skb(skb);

>   		break;

>   

> @@ -6159,6 +6160,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,

>   		break;

>   

>   	case GRO_DROP:

> +		atomic_long_inc(&skb->dev->rx_dropped);

>   		napi_reuse_skb(napi, skb);

>   		break;

>
Eric Dumazet Jan. 8, 2021, 9:25 a.m. UTC | #2
On Wed, Jan 6, 2021 at 10:56 PM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>

> When drivers call the various receive upcalls to receive an skb

> to the stack, sometimes that stack can drop the packet. The good

> news is that the return code is given to all the drivers of

> NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

> the one "ice" driver that I changed, check the stat and increment

> the dropped count. This is currently leading to packets that

> arrive at the edge interface and are fully handled by the driver

> and then mysteriously disappear.

>

> Rather than fix all drivers to increment the drop stat when

> handling the return code, emulate the already existing statistic

> update for NET_RX_DROP events for the two GRO_DROP locations, and

> increment the dev->rx_dropped associated with the skb.

>

> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

> Cc: Eric Dumazet <edumazet@google.com>

> Cc: Jamal Hadi Salim <jhs@mojatatu.com>

> ---

>  net/core/dev.c | 2 ++

>  1 file changed, 2 insertions(+)

>

> diff --git a/net/core/dev.c b/net/core/dev.c

> index 8fa739259041..ef34043a9550 100644

> --- a/net/core/dev.c

> +++ b/net/core/dev.c

> @@ -6071,6 +6071,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi,

>                 break;

>

>         case GRO_DROP:

> +               atomic_long_inc(&skb->dev->rx_dropped);

>                 kfree_skb(skb);

>                 break;

>

> @@ -6159,6 +6160,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,

>                 break;

>

>         case GRO_DROP:

> +               atomic_long_inc(&skb->dev->rx_dropped);

>                 napi_reuse_skb(napi, skb);

>                 break;

>



This is not needed. I think we should clean up ice instead.

Drivers are supposed to have allocated the skb (using
napi_get_frags()) before calling napi_gro_frags()

Only napi_gro_frags() would return GRO_DROP, but we supposedly could
crash at that point, since a driver is clearly buggy.

We probably can remove GRO_DROP completely, assuming lazy drivers are fixed.

diff --git a/net/core/dev.c b/net/core/dev.c
index 8fa739259041aaa03585b5a7b8ebce862f4b7d1d..c9460c9597f1de51957fdcfc7a64ca45bce5af7c
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6223,9 +6223,6 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)
        gro_result_t ret;
        struct sk_buff *skb = napi_frags_skb(napi);

-       if (!skb)
-               return GRO_DROP;
-
        trace_napi_gro_frags_entry(skb);

        ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
Jesse Brandeburg Jan. 8, 2021, 6:26 p.m. UTC | #3
Shannon Nelson wrote:

> On 1/6/21 1:55 PM, Jesse Brandeburg wrote:

> > When drivers call the various receive upcalls to receive an skb

> > to the stack, sometimes that stack can drop the packet. The good

> > news is that the return code is given to all the drivers of

> > NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

> > the one "ice" driver that I changed, check the stat and increment

> 

> If the stack is dropping the packet, isn't it up to the stack to track 

> that, perhaps with something that shows up in netstat -s?  We don't 

> really want to make the driver responsible for any drops that happen 

> above its head, do we?


I totally agree!

In patch 2/2 I revert the driver-specific changes I had made in an
earlier patch, and this patch *was* my effort to make the stack show the
drops.

Maybe I wasn't clear. I'm seeing packets disappear during TCP
workloads, and this GRO_DROP code was the source of the drops (I see it
returning infrequently but regularly)

The driver processes the packet but the stack never sees it, and there
were no drop counters anywhere tracking it.
Jesse Brandeburg Jan. 8, 2021, 6:35 p.m. UTC | #4
Eric Dumazet wrote:
> > --- a/net/core/dev.c

> > +++ b/net/core/dev.c

> > @@ -6071,6 +6071,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi,

> >                 break;

> >

> >         case GRO_DROP:

> > +               atomic_long_inc(&skb->dev->rx_dropped);

> >                 kfree_skb(skb);

> >                 break;

> >

> > @@ -6159,6 +6160,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,

> >                 break;

> >

> >         case GRO_DROP:

> > +               atomic_long_inc(&skb->dev->rx_dropped);

> >                 napi_reuse_skb(napi, skb);

> >                 break;

> >

> 

> 

> This is not needed. I think we should clean up ice instead.


My patch 2 already did that. I was trying to address the fact that I'm
*actually seeing* GRO_DROP return codes coming back from stack.

I'll try to reproduce that issue again that I saw. Maybe modern kernels
don't have the problem as frequently or at all.

> Drivers are supposed to have allocated the skb (using

> napi_get_frags()) before calling napi_gro_frags()


ice doesn't use napi_get_frags/napi_gro_frags, so I'm not sure how this
is relevant. 

> Only napi_gro_frags() would return GRO_DROP, but we supposedly could

> crash at that point, since a driver is clearly buggy.


seems unlikely since we don't call those functions.
 
> We probably can remove GRO_DROP completely, assuming lazy drivers are fixed.


This might be ok, but doesn't explain why I was seeing this return
code (which was the whole reason I was trying to count them), however I
may have been running on a distro kernel from redhat/centos 8 when I
was seeing these events. I haven't fully completed spelunking all the
different sources, but might be able to follow down the rabbit hole
further.

 
> diff --git a/net/core/dev.c b/net/core/dev.c

> index 8fa739259041aaa03585b5a7b8ebce862f4b7d1d..c9460c9597f1de51957fdcfc7a64ca45bce5af7c

> 100644

> --- a/net/core/dev.c

> +++ b/net/core/dev.c

> @@ -6223,9 +6223,6 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)

>         gro_result_t ret;

>         struct sk_buff *skb = napi_frags_skb(napi);

> 

> -       if (!skb)

> -               return GRO_DROP;

> -

>         trace_napi_gro_frags_entry(skb);

> 

>         ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));


This change (noted from your other patches is fine), and a likely
improvement, thanks for sending those!
Eric Dumazet Jan. 8, 2021, 6:45 p.m. UTC | #5
On Fri, Jan 8, 2021 at 7:35 PM Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
>

> Eric Dumazet wrote:

> > > --- a/net/core/dev.c

> > > +++ b/net/core/dev.c

> > > @@ -6071,6 +6071,7 @@ static gro_result_t napi_skb_finish(struct napi_struct *napi,

> > >                 break;

> > >

> > >         case GRO_DROP:

> > > +               atomic_long_inc(&skb->dev->rx_dropped);

> > >                 kfree_skb(skb);

> > >                 break;

> > >

> > > @@ -6159,6 +6160,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,

> > >                 break;

> > >

> > >         case GRO_DROP:

> > > +               atomic_long_inc(&skb->dev->rx_dropped);

> > >                 napi_reuse_skb(napi, skb);

> > >                 break;

> > >

> >

> >

> > This is not needed. I think we should clean up ice instead.

>

> My patch 2 already did that. I was trying to address the fact that I'm

> *actually seeing* GRO_DROP return codes coming back from stack.

>

> I'll try to reproduce that issue again that I saw. Maybe modern kernels

> don't have the problem as frequently or at all.



Jesse, you are sending a patch for current kernels.

It is pretty clear that the issue you have can not happen with current
kernels, by reading the code source,
even without an actual ICE piece of hardware to test this :)

>

> > Drivers are supposed to have allocated the skb (using

> > napi_get_frags()) before calling napi_gro_frags()

>

> ice doesn't use napi_get_frags/napi_gro_frags, so I'm not sure how this

> is relevant.

>

> > Only napi_gro_frags() would return GRO_DROP, but we supposedly could

> > crash at that point, since a driver is clearly buggy.

>

> seems unlikely since we don't call those functions.

>

> > We probably can remove GRO_DROP completely, assuming lazy drivers are fixed.

>

> This might be ok, but doesn't explain why I was seeing this return

> code (which was the whole reason I was trying to count them), however I

> may have been running on a distro kernel from redhat/centos 8 when I

> was seeing these events. I haven't fully completed spelunking all the

> different sources, but might be able to follow down the rabbit hole

> further.


Yes please :)

>

>

> > diff --git a/net/core/dev.c b/net/core/dev.c

> > index 8fa739259041aaa03585b5a7b8ebce862f4b7d1d..c9460c9597f1de51957fdcfc7a64ca45bce5af7c

> > 100644

> > --- a/net/core/dev.c

> > +++ b/net/core/dev.c

> > @@ -6223,9 +6223,6 @@ gro_result_t napi_gro_frags(struct napi_struct *napi)

> >         gro_result_t ret;

> >         struct sk_buff *skb = napi_frags_skb(napi);

> >

> > -       if (!skb)

> > -               return GRO_DROP;

> > -

> >         trace_napi_gro_frags_entry(skb);

> >

> >         ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));

>

> This change (noted from your other patches is fine), and a likely

> improvement, thanks for sending those!


Sure !
Shannon Nelson Jan. 8, 2021, 7:21 p.m. UTC | #6
On 1/8/21 10:26 AM, Jesse Brandeburg wrote:
> Shannon Nelson wrote:

>

>> On 1/6/21 1:55 PM, Jesse Brandeburg wrote:

>>> When drivers call the various receive upcalls to receive an skb

>>> to the stack, sometimes that stack can drop the packet. The good

>>> news is that the return code is given to all the drivers of

>>> NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

>>> the one "ice" driver that I changed, check the stat and increment

>> If the stack is dropping the packet, isn't it up to the stack to track

>> that, perhaps with something that shows up in netstat -s?  We don't

>> really want to make the driver responsible for any drops that happen

>> above its head, do we?

> I totally agree!

>

> In patch 2/2 I revert the driver-specific changes I had made in an

> earlier patch, and this patch *was* my effort to make the stack show the

> drops.

>

> Maybe I wasn't clear. I'm seeing packets disappear during TCP

> workloads, and this GRO_DROP code was the source of the drops (I see it

> returning infrequently but regularly)

>

> The driver processes the packet but the stack never sees it, and there

> were no drop counters anywhere tracking it.

>


My point is that the patch increments a netdev counter, which to my mind 
immediately implicates the driver and hardware, rather than the stack.  
As a driver maintainer, I don't want to be chasing driver packet drop 
reports that are a stack problem.  I'd rather see a new counter in 
netstat -s that reflects the stack decision and can better imply what 
went wrong.  I don't have a good suggestion for a counter name at the 
moment.

I guess part of the issue is that this is right on the boundary of 
driver-stack.  But if we follow Eric's suggestions, maybe the problem 
magically goes away :-) .

sln
Saeed Mahameed Jan. 8, 2021, 8:26 p.m. UTC | #7
On Fri, 2021-01-08 at 11:21 -0800, Shannon Nelson wrote:
> On 1/8/21 10:26 AM, Jesse Brandeburg wrote:

> > Shannon Nelson wrote:

> > 

> > > On 1/6/21 1:55 PM, Jesse Brandeburg wrote:

> > > > When drivers call the various receive upcalls to receive an skb

> > > > to the stack, sometimes that stack can drop the packet. The

> > > > good

> > > > news is that the return code is given to all the drivers of

> > > > NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

> > > > the one "ice" driver that I changed, check the stat and

> > > > increment

> > > If the stack is dropping the packet, isn't it up to the stack to

> > > track

> > > that, perhaps with something that shows up in netstat -s?  We

> > > don't

> > > really want to make the driver responsible for any drops that

> > > happen

> > > above its head, do we?

> > I totally agree!

> > 

> > In patch 2/2 I revert the driver-specific changes I had made in an

> > earlier patch, and this patch *was* my effort to make the stack

> > show the

> > drops.

> > 

> > Maybe I wasn't clear. I'm seeing packets disappear during TCP

> > workloads, and this GRO_DROP code was the source of the drops (I

> > see it

> > returning infrequently but regularly)

> > 

> > The driver processes the packet but the stack never sees it, and

> > there

> > were no drop counters anywhere tracking it.

> > 

> 

> My point is that the patch increments a netdev counter, which to my

> mind 

> immediately implicates the driver and hardware, rather than the

> stack.  

> As a driver maintainer, I don't want to be chasing driver packet

> drop 

> reports that are a stack problem.  I'd rather see a new counter in 

> netstat -s that reflects the stack decision and can better imply

> what 

> went wrong.  I don't have a good suggestion for a counter name at

> the 

> moment.

> 

> I guess part of the issue is that this is right on the boundary of 

> driver-stack.  But if we follow Eric's suggestions, maybe the

> problem 

> magically goes away :-) .

> 

> sln

> 


I think there is still some merit in this patchset even with Eric's
removal of GRO_DROP from gro_receive(). As Eric explained, it is still
possible to silently drop for the same reason when drivers
call napi_get_frags or even alloc_skb() apis, many drivers do not
account for such packet drops, and maybe it is the right thing to do to
inline the packet drop accounting into the skb alloc APIs ? the
question is, is it the job of those APIs to update netdev->stats ?
Eric Dumazet Jan. 8, 2021, 10:17 p.m. UTC | #8
On Fri, Jan 8, 2021 at 9:27 PM Saeed Mahameed <saeed@kernel.org> wrote:
>


> I think there is still some merit in this patchset even with Eric's

> removal of GRO_DROP from gro_receive(). As Eric explained, it is still

> possible to silently drop for the same reason when drivers

> call napi_get_frags or even alloc_skb() apis, many drivers do not

> account for such packet drops, and maybe it is the right thing to do to

> inline the packet drop accounting into the skb alloc APIs ? the

> question is, is it the job of those APIs to update netdev->stats ?

>


You absolutely do not want to have a generic increment of
netdev->stats for multiqueue drivers.
This would add terrible cache line false sharing under DDOS and memory stress.

Each driver maintains (or should maintain) per rx queue counter for this case.

It seems  mlx4 does nothing special, I would suggest you fix it :)
Jamal Hadi Salim Jan. 14, 2021, 1:53 p.m. UTC | #9
On 2021-01-08 2:21 p.m., Shannon Nelson wrote:
> On 1/8/21 10:26 AM, Jesse Brandeburg wrote:

>> Shannon Nelson wrote:

>>

>>> On 1/6/21 1:55 PM, Jesse Brandeburg wrote:

>>>> When drivers call the various receive upcalls to receive an skb

>>>> to the stack, sometimes that stack can drop the packet. The good

>>>> news is that the return code is given to all the drivers of

>>>> NET_RX_DROP or GRO_DROP. The bad news is that no drivers except

>>>> the one "ice" driver that I changed, check the stat and increment

>>> If the stack is dropping the packet, isn't it up to the stack to track

>>> that, perhaps with something that shows up in netstat -s?  We don't

>>> really want to make the driver responsible for any drops that happen

>>> above its head, do we?

>> I totally agree!

>>

>> In patch 2/2 I revert the driver-specific changes I had made in an

>> earlier patch, and this patch *was* my effort to make the stack show the

>> drops.

>>

>> Maybe I wasn't clear. I'm seeing packets disappear during TCP

>> workloads, and this GRO_DROP code was the source of the drops (I see it

>> returning infrequently but regularly)

>>

>> The driver processes the packet but the stack never sees it, and there

>> were no drop counters anywhere tracking it.

>>

> 

> My point is that the patch increments a netdev counter, which to my mind 

> immediately implicates the driver and hardware, rather than the stack. 

> As a driver maintainer, I don't want to be chasing driver packet drop 

> reports that are a stack problem.  I'd rather see a new counter in 

> netstat -s that reflects the stack decision and can better imply what 

> went wrong.  I don't have a good suggestion for a counter name at the 

> moment.

> 

> I guess part of the issue is that this is right on the boundary of 

> driver-stack.  But if we follow Eric's suggestions, maybe the problem 

> magically goes away :-) .

> 


So: How does one know that the stack-upcall dropped a packet because
of GRO issues? Debugging with kprobe or traces doesnt count as an
answer.

cheers,
jamal
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 8fa739259041..ef34043a9550 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6071,6 +6071,7 @@  static gro_result_t napi_skb_finish(struct napi_struct *napi,
 		break;
 
 	case GRO_DROP:
+		atomic_long_inc(&skb->dev->rx_dropped);
 		kfree_skb(skb);
 		break;
 
@@ -6159,6 +6160,7 @@  static gro_result_t napi_frags_finish(struct napi_struct *napi,
 		break;
 
 	case GRO_DROP:
+		atomic_long_inc(&skb->dev->rx_dropped);
 		napi_reuse_skb(napi, skb);
 		break;