diff mbox series

[bpf,v4] xsk: do not discard packet when NETDEV_TX_BUSY

Message ID 1599828221-19364-1-git-send-email-magnus.karlsson@gmail.com
State New
Headers show
Series [bpf,v4] xsk: do not discard packet when NETDEV_TX_BUSY | expand

Commit Message

Magnus Karlsson Sept. 11, 2020, 12:43 p.m. UTC
From: Magnus Karlsson <magnus.karlsson@intel.com>

In the skb Tx path, transmission of a packet is performed with
dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it
signifies that it was not possible to send the packet right now,
please try later. Unfortunately, the xsk transmit code discarded the
packet and returned EBUSY to the application. Fix this unnecessary
packet loss, by not discarding the packet in the Tx ring and return
EAGAIN. As EAGAIN is returned to the application, it can then retry
the send operation later and the packet will then likely be sent as
the driver will then likely have space/resources to send the packet.

In summary, EAGAIN tells the application that the packet was not
discarded from the Tx ring and that it needs to call send()
again. EBUSY, on the other hand, signifies that the packet was not
sent and discarded from the Tx ring. The application needs to put the
packet on the Tx ring again if it wants it to be sent.

Fixes: 35fcde7f8deb ("xsk: support for Tx")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com>
Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
---
v3->v4:
* Free the skb without triggering the drop trace when NETDEV_TX_BUSY
* Call consume_skb instead of kfree_skb when the packet has been
  sent successfully for correct tracing
* Use sock_wfree as destructor when NETDEV_TX_BUSY
v1->v3:
* Hinder dev_direct_xmit() from freeing and completing the packet to
  user space by manipulating the skb->users count as suggested by
  Daniel Borkmann.
---
 net/xdp/xsk.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Jesse Brandeburg Sept. 14, 2020, 4:52 p.m. UTC | #1
Magnus Karlsson wrote:

> From: Magnus Karlsson <magnus.karlsson@intel.com>
> 
> In the skb Tx path, transmission of a packet is performed with
> dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it
> signifies that it was not possible to send the packet right now,
> please try later. Unfortunately, the xsk transmit code discarded the
> packet and returned EBUSY to the application. Fix this unnecessary
> packet loss, by not discarding the packet in the Tx ring and return
> EAGAIN. As EAGAIN is returned to the application, it can then retry
> the send operation later and the packet will then likely be sent as
> the driver will then likely have space/resources to send the packet.
> 
> In summary, EAGAIN tells the application that the packet was not
> discarded from the Tx ring and that it needs to call send()
> again. EBUSY, on the other hand, signifies that the packet was not
> sent and discarded from the Tx ring. The application needs to put the
> packet on the Tx ring again if it wants it to be sent.
> 
> Fixes: 35fcde7f8deb ("xsk: support for Tx")
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com>
> Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>

Patch seems to make sense to me, better matches the expectations/path
of the stack.

Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Daniel Borkmann Sept. 15, 2020, 3:49 p.m. UTC | #2
Hey Magnus,

On 9/11/20 2:43 PM, Magnus Karlsson wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>

> 

> In the skb Tx path, transmission of a packet is performed with

> dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it

> signifies that it was not possible to send the packet right now,

> please try later. Unfortunately, the xsk transmit code discarded the

> packet and returned EBUSY to the application. Fix this unnecessary

> packet loss, by not discarding the packet in the Tx ring and return

> EAGAIN. As EAGAIN is returned to the application, it can then retry

> the send operation later and the packet will then likely be sent as

> the driver will then likely have space/resources to send the packet.

> 

> In summary, EAGAIN tells the application that the packet was not

> discarded from the Tx ring and that it needs to call send()

> again. EBUSY, on the other hand, signifies that the packet was not

> sent and discarded from the Tx ring. The application needs to put the

> packet on the Tx ring again if it wants it to be sent.

> 

> Fixes: 35fcde7f8deb ("xsk: support for Tx")

> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

> Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com>

> Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com>

> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>

> ---

> v3->v4:

> * Free the skb without triggering the drop trace when NETDEV_TX_BUSY

> * Call consume_skb instead of kfree_skb when the packet has been

>    sent successfully for correct tracing

> * Use sock_wfree as destructor when NETDEV_TX_BUSY

> v1->v3:

> * Hinder dev_direct_xmit() from freeing and completing the packet to

>    user space by manipulating the skb->users count as suggested by

>    Daniel Borkmann.

> ---

>   net/xdp/xsk.c | 17 ++++++++++++++++-

>   1 file changed, 16 insertions(+), 1 deletion(-)

> 

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c

> index c323162..d32e39d 100644

> --- a/net/xdp/xsk.c

> +++ b/net/xdp/xsk.c

> @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk)

>   		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;

>   		skb->destructor = xsk_destruct_skb;

>   

> +		/* Hinder dev_direct_xmit from freeing the packet and

> +		 * therefore completing it in the destructor

> +		 */

> +		refcount_inc(&skb->users);

>   		err = dev_direct_xmit(skb, xs->queue_id);

> +		if  (err == NETDEV_TX_BUSY) {

> +			/* Tell user-space to retry the send */

> +			skb->destructor = sock_wfree;


I see, good catch, you need this one here as otherwise you leak wmem accounting
given it's also part of xsk_destruct_skb() and we do free the prior allocated skb
in this case.

> +			/* Free skb without triggering the perf drop trace */

> +			__kfree_skb(skb);


As a minor nit, I would just use consume_skb(skb) here given this doesn't blindly
ignore the skb_unref(). It's mostly about seeing where drops are happening so that
tracepoint is set to kfree_skb() which is the more interesting one. Other than that
looks good and ready to go. Thanks (& sorry for late reply)!

> +			err = -EAGAIN;

> +			goto out;

> +		}

> +

>   		xskq_cons_release(xs->tx);

>   		/* Ignore NET_XMIT_CN as packet might have been sent */

> -		if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {

> +		if (err == NET_XMIT_DROP) {

>   			/* SKB completed but not sent */

> +			kfree_skb(skb);

>   			err = -EBUSY;

>   			goto out;

>   		}

>   

> +		consume_skb(skb);

>   		sent_frame = true;

>   	}

>   

>
Magnus Karlsson Sept. 15, 2020, 5:46 p.m. UTC | #3
On Tue, Sep 15, 2020 at 5:49 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>

> Hey Magnus,

>

> On 9/11/20 2:43 PM, Magnus Karlsson wrote:

> > From: Magnus Karlsson <magnus.karlsson@intel.com>

> >

> > In the skb Tx path, transmission of a packet is performed with

> > dev_direct_xmit(). When NETDEV_TX_BUSY is set in the drivers, it

> > signifies that it was not possible to send the packet right now,

> > please try later. Unfortunately, the xsk transmit code discarded the

> > packet and returned EBUSY to the application. Fix this unnecessary

> > packet loss, by not discarding the packet in the Tx ring and return

> > EAGAIN. As EAGAIN is returned to the application, it can then retry

> > the send operation later and the packet will then likely be sent as

> > the driver will then likely have space/resources to send the packet.

> >

> > In summary, EAGAIN tells the application that the packet was not

> > discarded from the Tx ring and that it needs to call send()

> > again. EBUSY, on the other hand, signifies that the packet was not

> > sent and discarded from the Tx ring. The application needs to put the

> > packet on the Tx ring again if it wants it to be sent.

> >

> > Fixes: 35fcde7f8deb ("xsk: support for Tx")

> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>

> > Reported-by: Arkadiusz Zema <A.Zema@falconvsystems.com>

> > Suggested-by: Arkadiusz Zema <A.Zema@falconvsystems.com>

> > Suggested-by: Daniel Borkmann <daniel@iogearbox.net>

> > ---

> > v3->v4:

> > * Free the skb without triggering the drop trace when NETDEV_TX_BUSY

> > * Call consume_skb instead of kfree_skb when the packet has been

> >    sent successfully for correct tracing

> > * Use sock_wfree as destructor when NETDEV_TX_BUSY

> > v1->v3:

> > * Hinder dev_direct_xmit() from freeing and completing the packet to

> >    user space by manipulating the skb->users count as suggested by

> >    Daniel Borkmann.

> > ---

> >   net/xdp/xsk.c | 17 ++++++++++++++++-

> >   1 file changed, 16 insertions(+), 1 deletion(-)

> >

> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c

> > index c323162..d32e39d 100644

> > --- a/net/xdp/xsk.c

> > +++ b/net/xdp/xsk.c

> > @@ -377,15 +377,30 @@ static int xsk_generic_xmit(struct sock *sk)

> >               skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;

> >               skb->destructor = xsk_destruct_skb;

> >

> > +             /* Hinder dev_direct_xmit from freeing the packet and

> > +              * therefore completing it in the destructor

> > +              */

> > +             refcount_inc(&skb->users);

> >               err = dev_direct_xmit(skb, xs->queue_id);

> > +             if  (err == NETDEV_TX_BUSY) {

> > +                     /* Tell user-space to retry the send */

> > +                     skb->destructor = sock_wfree;

>

> I see, good catch, you need this one here as otherwise you leak wmem accounting

> given it's also part of xsk_destruct_skb() and we do free the prior allocated skb

> in this case.

>

> > +                     /* Free skb without triggering the perf drop trace */

> > +                     __kfree_skb(skb);

>

> As a minor nit, I would just use consume_skb(skb) here given this doesn't blindly

> ignore the skb_unref(). It's mostly about seeing where drops are happening so that

> tracepoint is set to kfree_skb() which is the more interesting one. Other than that

> looks good and ready to go. Thanks (& sorry for late reply)!


Thank you for reviewing this. I will spin a v5.

> > +                     err = -EAGAIN;

> > +                     goto out;

> > +             }

> > +

> >               xskq_cons_release(xs->tx);

> >               /* Ignore NET_XMIT_CN as packet might have been sent */

> > -             if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {

> > +             if (err == NET_XMIT_DROP) {

> >                       /* SKB completed but not sent */

> > +                     kfree_skb(skb);

> >                       err = -EBUSY;

> >                       goto out;

> >               }

> >

> > +             consume_skb(skb);

> >               sent_frame = true;

> >       }

> >

> >

>
diff mbox series

Patch

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index c323162..d32e39d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -377,15 +377,30 @@  static int xsk_generic_xmit(struct sock *sk)
 		skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr;
 		skb->destructor = xsk_destruct_skb;
 
+		/* Hinder dev_direct_xmit from freeing the packet and
+		 * therefore completing it in the destructor
+		 */
+		refcount_inc(&skb->users);
 		err = dev_direct_xmit(skb, xs->queue_id);
+		if  (err == NETDEV_TX_BUSY) {
+			/* Tell user-space to retry the send */
+			skb->destructor = sock_wfree;
+			/* Free skb without triggering the perf drop trace */
+			__kfree_skb(skb);
+			err = -EAGAIN;
+			goto out;
+		}
+
 		xskq_cons_release(xs->tx);
 		/* Ignore NET_XMIT_CN as packet might have been sent */
-		if (err == NET_XMIT_DROP || err == NETDEV_TX_BUSY) {
+		if (err == NET_XMIT_DROP) {
 			/* SKB completed but not sent */
+			kfree_skb(skb);
 			err = -EBUSY;
 			goto out;
 		}
 
+		consume_skb(skb);
 		sent_frame = true;
 	}