diff mbox series

[1/1] RDMA/rxe: Fetch skb packets from ethernet layer

Message ID 1604574721-2505-1-git-send-email-yanjunz@nvidia.com
State New
Headers show
Series [1/1] RDMA/rxe: Fetch skb packets from ethernet layer | expand

Commit Message

Zhu Yanjun Nov. 5, 2020, 11:12 a.m. UTC
In the original design, in rx, skb packet would pass ethernet
layer and IP layer, eventually reach udp tunnel.

Now rxe fetches the skb packets from the ethernet layer directly.
So this bypasses the IP and UDP layer. As such, the skb packets
are sent to the upper protocals directly from the ethernet layer.

This increases bandwidth and decreases latency.

Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_net.c |   45 ++++++++++++++++++++++++++++++++++-
 1 files changed, 44 insertions(+), 1 deletions(-)

Comments

Jakub Kicinski Nov. 7, 2020, 8:26 p.m. UTC | #1
On Thu,  5 Nov 2020 19:12:01 +0800 Zhu Yanjun wrote:
> In the original design, in rx, skb packet would pass ethernet

> layer and IP layer, eventually reach udp tunnel.

> 

> Now rxe fetches the skb packets from the ethernet layer directly.

> So this bypasses the IP and UDP layer. As such, the skb packets

> are sent to the upper protocals directly from the ethernet layer.

> 

> This increases bandwidth and decreases latency.

> 

> Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>


Nope, no stealing UDP packets with some random rx handlers.

The tunnel socket is a correct approach.
Zhu Yanjun Nov. 8, 2020, 5:27 a.m. UTC | #2
On Sun, Nov 8, 2020 at 1:24 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>

>

>

>

> -------- Forwarded Message --------

> Subject: Re: [PATCH 1/1] RDMA/rxe: Fetch skb packets from ethernet layer

> Date: Sat, 7 Nov 2020 12:26:17 -0800

> From: Jakub Kicinski <kuba@kernel.org>

> To: Zhu Yanjun <yanjunz@nvidia.com>

> CC: dledford@redhat.com, jgg@ziepe.ca, linux-rdma@vger.kernel.org, netdev@vger.kernel.org

>

>

> On Thu, 5 Nov 2020 19:12:01 +0800 Zhu Yanjun wrote:

>

> In the original design, in rx, skb packet would pass ethernet

> layer and IP layer, eventually reach udp tunnel.

>

> Now rxe fetches the skb packets from the ethernet layer directly.

> So this bypasses the IP and UDP layer. As such, the skb packets

> are sent to the upper protocals directly from the ethernet layer.

>

> This increases bandwidth and decreases latency.

>

> Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>

>

>

> Nope, no stealing UDP packets with some random rx handlers.


Why? Is there any risks?

Zhu Yanjun
>

> The tunnel socket is a correct approach.
Jakub Kicinski Nov. 9, 2020, 6:25 p.m. UTC | #3
On Sun, 8 Nov 2020 13:27:32 +0800 Zhu Yanjun wrote:
> On Sun, Nov 8, 2020 at 1:24 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:

> > On Thu, 5 Nov 2020 19:12:01 +0800 Zhu Yanjun wrote:

> >

> > In the original design, in rx, skb packet would pass ethernet

> > layer and IP layer, eventually reach udp tunnel.

> >

> > Now rxe fetches the skb packets from the ethernet layer directly.

> > So this bypasses the IP and UDP layer. As such, the skb packets

> > are sent to the upper protocals directly from the ethernet layer.

> >

> > This increases bandwidth and decreases latency.

> >

> > Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>

> >

> >

> > Nope, no stealing UDP packets with some random rx handlers.  

> 

> Why? Is there any risks?


Are there risks in layering violations? Yes.

For example - you do absolutely no protocol parsing, checksum
validation, only support IPv4, etc.

Besides it also makes the code far less maintainable, rx_handler is a
singleton, etc. etc.

> > The tunnel socket is a correct approach.
Zhu Yanjun Nov. 10, 2020, 1:58 a.m. UTC | #4
On Tue, Nov 10, 2020 at 2:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

> On Sun, 8 Nov 2020 13:27:32 +0800 Zhu Yanjun wrote:

> > On Sun, Nov 8, 2020 at 1:24 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:

> > > On Thu, 5 Nov 2020 19:12:01 +0800 Zhu Yanjun wrote:

> > >

> > > In the original design, in rx, skb packet would pass ethernet

> > > layer and IP layer, eventually reach udp tunnel.

> > >

> > > Now rxe fetches the skb packets from the ethernet layer directly.

> > > So this bypasses the IP and UDP layer. As such, the skb packets

> > > are sent to the upper protocals directly from the ethernet layer.

> > >

> > > This increases bandwidth and decreases latency.

> > >

> > > Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>

> > >

> > >

> > > Nope, no stealing UDP packets with some random rx handlers.

> >

> > Why? Is there any risks?

>

> Are there risks in layering violations? Yes.

>

> For example - you do absolutely no protocol parsing,


Protocol parsing is in rxe driver.

> checksum validation, only support IPv4, etc.


Since only ipv4 is supported in rxe, if ipv6 is supported in rxe, I
will add ipv6.

>

> Besides it also makes the code far less maintainable, rx_handler is a


This rx_handler is also used in openvswitch and bridge.

Zhu Yanjun

> singleton, etc. etc.

>

> > > The tunnel socket is a correct approach.
Zhu Yanjun Nov. 11, 2020, 11:15 a.m. UTC | #5
On Tue, Nov 10, 2020 at 9:58 AM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:
>

> On Tue, Nov 10, 2020 at 2:25 AM Jakub Kicinski <kuba@kernel.org> wrote:

> >

> > On Sun, 8 Nov 2020 13:27:32 +0800 Zhu Yanjun wrote:

> > > On Sun, Nov 8, 2020 at 1:24 PM Zhu Yanjun <zyjzyj2000@gmail.com> wrote:

> > > > On Thu, 5 Nov 2020 19:12:01 +0800 Zhu Yanjun wrote:

> > > >

> > > > In the original design, in rx, skb packet would pass ethernet

> > > > layer and IP layer, eventually reach udp tunnel.

> > > >

> > > > Now rxe fetches the skb packets from the ethernet layer directly.

> > > > So this bypasses the IP and UDP layer. As such, the skb packets

> > > > are sent to the upper protocals directly from the ethernet layer.

> > > >

> > > > This increases bandwidth and decreases latency.

> > > >

> > > > Signed-off-by: Zhu Yanjun <yanjunz@nvidia.com>

> > > >

> > > >

> > > > Nope, no stealing UDP packets with some random rx handlers.

> > >

> > > Why? Is there any risks?

> >

> > Are there risks in layering violations? Yes.

> >

> > For example - you do absolutely no protocol parsing,

>

> Protocol parsing is in rxe driver.

>

> > checksum validation, only support IPv4, etc.

>

> Since only ipv4 is supported in rxe, if ipv6 is supported in rxe, I

> will add ipv6.

>

> >

> > Besides it also makes the code far less maintainable, rx_handler is a

>

> This rx_handler is also used in openvswitch and bridge.


in Vacation. I will reply as soon as I come back.

Zhu Yanjun

>

> Zhu Yanjun

>

> > singleton, etc. etc.

> >

> > > > The tunnel socket is a correct approach.
diff mbox series

Patch

diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 2e490e5..8ea68b6 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -18,6 +18,7 @@ 
 #include "rxe_loc.h"
 
 static struct rxe_recv_sockets recv_sockets;
+static struct net_device *g_ndev;
 
 struct device *rxe_dma_device(struct rxe_dev *rxe)
 {
@@ -113,7 +114,7 @@  static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	}
 
 	tnl_cfg.encap_type = 1;
-	tnl_cfg.encap_rcv = rxe_udp_encap_recv;
+	tnl_cfg.encap_rcv = NULL;
 
 	/* Setup UDP tunnel */
 	setup_udp_tunnel_sock(net, sock, &tnl_cfg);
@@ -357,6 +358,38 @@  struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 	return rxe->ndev->name;
 }
 
+static rx_handler_result_t rxe_handle_frame(struct sk_buff **pskb)
+{
+	struct sk_buff *skb = *pskb;
+	struct iphdr *iph;
+	struct udphdr *udph;
+
+	if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
+		return RX_HANDLER_PASS;
+
+	if (!is_valid_ether_addr(eth_hdr(skb)->h_source)) {
+		kfree(skb);
+		return RX_HANDLER_CONSUMED;
+	}
+
+	if (eth_hdr(skb)->h_proto != cpu_to_be16(ETH_P_IP))
+		return RX_HANDLER_PASS;
+
+	iph = ip_hdr(skb);
+
+	if (iph->protocol != IPPROTO_UDP)
+		return RX_HANDLER_PASS;
+
+	udph = udp_hdr(skb);
+
+	if (udph->dest != cpu_to_be16(ROCE_V2_UDP_DPORT))
+		return RX_HANDLER_PASS;
+
+	rxe_udp_encap_recv(NULL, skb);
+
+	return RX_HANDLER_CONSUMED;
+}
+
 int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
 {
 	int err;
@@ -367,6 +400,7 @@  int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
 		return -ENOMEM;
 
 	rxe->ndev = ndev;
+	g_ndev = ndev;
 
 	err = rxe_add(rxe, ndev->mtu, ibdev_name);
 	if (err) {
@@ -374,6 +408,12 @@  int rxe_net_add(const char *ibdev_name, struct net_device *ndev)
 		return err;
 	}
 
+	rtnl_lock();
+	err = netdev_rx_handler_register(ndev, rxe_handle_frame, rxe);
+	rtnl_unlock();
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -498,6 +538,9 @@  static int rxe_net_ipv6_init(void)
 
 void rxe_net_exit(void)
 {
+	rtnl_lock();
+	netdev_rx_handler_unregister(g_ndev);
+	rtnl_unlock();
 	rxe_release_udp_tunnel(recv_sockets.sk6);
 	rxe_release_udp_tunnel(recv_sockets.sk4);
 	unregister_netdevice_notifier(&rxe_net_notifier);