From patchwork Wed Mar 31 07:11:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 413414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3385DC433E3 for ; Wed, 31 Mar 2021 07:12:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0AA1619E2 for ; Wed, 31 Mar 2021 07:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233981AbhCaHLq (ORCPT ); Wed, 31 Mar 2021 03:11:46 -0400 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:59310 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233853AbhCaHLm (ORCPT ); Wed, 31 Mar 2021 03:11:42 -0400 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R201e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04395; MF=xuanzhuo@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0UTwL0Di_1617174699; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UTwL0Di_1617174699) by smtp.aliyun-inc.com(127.0.0.1); Wed, 31 Mar 2021 15:11:40 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?q?l?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org, bpf@vger.kernel.org, Dust Li Subject: [PATCH net-next v3 1/8] xsk: XDP_SETUP_XSK_POOL support option check_dma Date: Wed, 31 Mar 2021 15:11:32 +0800 Message-Id: <20210331071139.15473-2-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> References: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The check_dma option is added to the bpf command. Because virtio-net does not complete the dma initialization in advance. Instead, vring does the dma operation every time data is sent. Of course, in theory, it would be better to complete the dma initialization in advance. But the modification vring may be more troublesome, so here is an option to notify xsk dma whether the initialization is complete. In this way, xsk will not report an error because dma has not been initialized. Of course, I still hope that virtio-net can support the completion of dma operations in advance. Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li --- include/linux/netdevice.h | 1 + net/xdp/xsk_buff_pool.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f57b70fc251f..47666b5d2dff 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -979,6 +979,7 @@ struct netdev_bpf { struct { struct xsk_buff_pool *pool; u16 queue_id; + bool check_dma; } xsk; }; }; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 8de01aaac4a0..4d3aed73ee3e 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -166,12 +166,13 @@ int xp_assign_dev(struct xsk_buff_pool *pool, bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; + bpf.xsk.check_dma = true; err = netdev->netdev_ops->ndo_bpf(netdev, &bpf); if (err) goto err_unreg_pool; - if (!pool->dma_pages) { + if (bpf.xsk.check_dma && !pool->dma_pages) { WARN(1, "Driver did not DMA map zero-copy buffers"); err = -EINVAL; goto err_unreg_xsk; From patchwork Wed Mar 31 07:11:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 413412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 122CCC433E9 for ; Wed, 31 Mar 2021 07:12:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D9F39619DA for ; Wed, 31 Mar 2021 07:12:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234002AbhCaHLt (ORCPT ); Wed, 31 Mar 2021 03:11:49 -0400 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:59394 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233895AbhCaHLo (ORCPT ); Wed, 31 Mar 2021 03:11:44 -0400 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R301e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e01424; MF=xuanzhuo@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0UTwOni8_1617174701; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UTwOni8_1617174701) by smtp.aliyun-inc.com(127.0.0.1); Wed, 31 Mar 2021 15:11:41 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?q?l?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org, bpf@vger.kernel.org, Dust Li Subject: [PATCH net-next v3 4/8] virtio-net: xsk zero copy xmit implement wakeup and xmit Date: Wed, 31 Mar 2021 15:11:35 +0800 Message-Id: <20210331071139.15473-5-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> References: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When the user calls sendto to consume the data in the xsk tx queue, virtnet_xsk_wakeup will be called. In wakeup, it will try to send a part of the data directly, the quantity is operated by the module parameter xsk_budget. There are two purposes for this realization: 1. Send part of the data quickly to reduce the transmission delay of the first packet 2. Trigger tx interrupt, start napi to consume xsk tx data All sent xsk packets share the virtio-net header of xsk_hdr. If xsk needs to support csum and other functions later, consider assigning xsk hdr separately for each sent packet. Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li --- drivers/net/virtio_net.c | 183 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 183 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4e25408a2b37..c8a317a93ef7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -28,9 +28,11 @@ static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); static bool csum = true, gso = true, napi_tx = true; +static int xsk_budget = 32; module_param(csum, bool, 0444); module_param(gso, bool, 0444); module_param(napi_tx, bool, 0644); +module_param(xsk_budget, int, 0644); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -47,6 +49,8 @@ module_param(napi_tx, bool, 0644); #define VIRTIO_XDP_FLAG BIT(0) +static struct virtio_net_hdr_mrg_rxbuf xsk_hdr; + /* RX packet size EWMA. The average packet size is used to determine the packet * buffer size when refilling RX rings. As the entire RX ring may be refilled * at once, the weight is chosen so that the EWMA will be insensitive to short- @@ -138,6 +142,9 @@ struct send_queue { struct { /* xsk pool */ struct xsk_buff_pool __rcu *pool; + + /* save the desc for next xmit, when xmit fail. */ + struct xdp_desc last_desc; } xsk; }; @@ -2532,6 +2539,179 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog, return err; } +static void virtnet_xsk_check_space(struct send_queue *sq) +{ + struct virtnet_info *vi = sq->vq->vdev->priv; + struct net_device *dev = vi->dev; + int qnum = sq - vi->sq; + + /* If this sq is not the exclusive queue of the current cpu, + * then it may be called by start_xmit, so check it running out + * of space. + */ + if (is_xdp_raw_buffer_queue(vi, qnum)) + return; + + /* Stop the queue to avoid getting packets that we are + * then unable to transmit. Then wait the tx interrupt. + */ + if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) + netif_stop_subqueue(dev, qnum); +} + +static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, + struct xdp_desc *desc) +{ + struct virtnet_info *vi; + struct page *page; + void *data; + u32 offset; + u64 addr; + int err; + + vi = sq->vq->vdev->priv; + addr = desc->addr; + data = xsk_buff_raw_get_data(pool, addr); + offset = offset_in_page(data); + + sg_init_table(sq->sg, 2); + sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len); + page = xsk_buff_xdp_get_page(pool, addr); + sg_set_page(sq->sg + 1, page, desc->len, offset); + + err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC); + if (unlikely(err)) + sq->xsk.last_desc = *desc; + + return err; +} + +static int virtnet_xsk_xmit_batch(struct send_queue *sq, + struct xsk_buff_pool *pool, + unsigned int budget, + bool in_napi, int *done) +{ + struct xdp_desc desc; + int err, packet = 0; + int ret = -EAGAIN; + + if (sq->xsk.last_desc.addr) { + err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc); + if (unlikely(err)) + return -EBUSY; + + ++packet; + --budget; + sq->xsk.last_desc.addr = 0; + } + + while (budget-- > 0) { + if (sq->vq->num_free < 2 + MAX_SKB_FRAGS) { + ret = -EBUSY; + break; + } + + if (!xsk_tx_peek_desc(pool, &desc)) { + /* done */ + ret = 0; + break; + } + + err = virtnet_xsk_xmit(sq, pool, &desc); + if (unlikely(err)) { + ret = -EBUSY; + break; + } + + ++packet; + } + + if (packet) { + if (virtqueue_kick_prepare(sq->vq) && + virtqueue_notify(sq->vq)) { + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.kicks += 1; + u64_stats_update_end(&sq->stats.syncp); + } + + *done = packet; + + xsk_tx_release(pool); + } + + return ret; +} + +static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool, + int budget, bool in_napi) +{ + int done = 0; + int err; + + free_old_xmit_skbs(sq, in_napi); + + err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done); + /* -EAGAIN: done == budget + * -EBUSY: done < budget + * 0 : done < budget + */ + if (err == -EBUSY) { + free_old_xmit_skbs(sq, in_napi); + + /* If the space is enough, let napi run again. */ + if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) + done = budget; + } + + virtnet_xsk_check_space(sq); + + return done; +} + +static int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag) +{ + struct virtnet_info *vi = netdev_priv(dev); + struct xsk_buff_pool *pool; + struct netdev_queue *txq; + struct send_queue *sq; + + if (!netif_running(dev)) + return -ENETDOWN; + + if (qid >= vi->curr_queue_pairs) + return -EINVAL; + + sq = &vi->sq[qid]; + + rcu_read_lock(); + + pool = rcu_dereference(sq->xsk.pool); + if (!pool) + goto end; + + if (napi_if_scheduled_mark_missed(&sq->napi)) + goto end; + + txq = netdev_get_tx_queue(dev, qid); + + __netif_tx_lock_bh(txq); + + /* Send part of the packet directly to reduce the delay in sending the + * packet, and this can actively trigger the tx interrupts. + * + * If no packet is sent out, the ring of the device is full. In this + * case, we will still get a tx interrupt response. Then we will deal + * with the subsequent packet sending work. + */ + virtnet_xsk_run(sq, pool, xsk_budget, false); + + __netif_tx_unlock_bh(txq); + +end: + rcu_read_unlock(); + return 0; +} + static int virtnet_xsk_pool_enable(struct net_device *dev, struct xsk_buff_pool *pool, u16 qid) @@ -2553,6 +2733,8 @@ static int virtnet_xsk_pool_enable(struct net_device *dev, if (rcu_dereference(sq->xsk.pool)) goto end; + memset(&sq->xsk, 0, sizeof(sq->xsk)); + /* Here is already protected by rtnl_lock, so rcu_assign_pointer is * safe. */ @@ -2656,6 +2838,7 @@ static const struct net_device_ops virtnet_netdev = { .ndo_vlan_rx_kill_vid = virtnet_vlan_rx_kill_vid, .ndo_bpf = virtnet_xdp, .ndo_xdp_xmit = virtnet_xdp_xmit, + .ndo_xsk_wakeup = virtnet_xsk_wakeup, .ndo_features_check = passthru_features_check, .ndo_get_phys_port_name = virtnet_get_phys_port_name, .ndo_set_features = virtnet_set_features, From patchwork Wed Mar 31 07:11:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 413411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD71BC433E0 for ; Wed, 31 Mar 2021 07:12:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AEBF9619F0 for ; Wed, 31 Mar 2021 07:12:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234046AbhCaHMV (ORCPT ); Wed, 31 Mar 2021 03:12:21 -0400 Received: from out30-57.freemail.mail.aliyun.com ([115.124.30.57]:55975 "EHLO out30-57.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233906AbhCaHLo (ORCPT ); Wed, 31 Mar 2021 03:11:44 -0400 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R141e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e01424; MF=xuanzhuo@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0UTwBY6S_1617174701; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UTwBY6S_1617174701) by smtp.aliyun-inc.com(127.0.0.1); Wed, 31 Mar 2021 15:11:42 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?q?l?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org, bpf@vger.kernel.org, Dust Li Subject: [PATCH net-next v3 5/8] virtio-net: xsk zero copy xmit support xsk unaligned mode Date: Wed, 31 Mar 2021 15:11:36 +0800 Message-Id: <20210331071139.15473-6-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> References: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In xsk unaligned mode, the frame pointed to by desc may span two consecutive pages, but not more than two pages. Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li --- drivers/net/virtio_net.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index c8a317a93ef7..259fafcf6028 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2562,24 +2562,42 @@ static void virtnet_xsk_check_space(struct send_queue *sq) static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, struct xdp_desc *desc) { + u32 offset, n, i, copy, copied; struct virtnet_info *vi; struct page *page; void *data; - u32 offset; + int err, m; u64 addr; - int err; vi = sq->vq->vdev->priv; addr = desc->addr; + data = xsk_buff_raw_get_data(pool, addr); + offset = offset_in_page(data); + m = desc->len - (PAGE_SIZE - offset); + /* xsk unaligned mode, desc will use two page */ + if (m > 0) + n = 3; + else + n = 2; - sg_init_table(sq->sg, 2); + sg_init_table(sq->sg, n); sg_set_buf(sq->sg, &xsk_hdr, vi->hdr_len); - page = xsk_buff_xdp_get_page(pool, addr); - sg_set_page(sq->sg + 1, page, desc->len, offset); - err = virtqueue_add_outbuf(sq->vq, sq->sg, 2, NULL, GFP_ATOMIC); + copied = 0; + for (i = 1; i < n; ++i) { + copy = min_t(int, desc->len - copied, PAGE_SIZE - offset); + + page = xsk_buff_xdp_get_page(pool, addr + copied); + + sg_set_page(sq->sg + i, page, copy, offset); + copied += copy; + if (offset) + offset = 0; + } + + err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC); if (unlikely(err)) sq->xsk.last_desc = *desc; From patchwork Wed Mar 31 07:11:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 413410 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, UNPARSEABLE_RELAY, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C0ACC433E6 for ; Wed, 31 Mar 2021 07:12:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 49FCD61A03 for ; Wed, 31 Mar 2021 07:12:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234059AbhCaHMX (ORCPT ); Wed, 31 Mar 2021 03:12:23 -0400 Received: from out30-57.freemail.mail.aliyun.com ([115.124.30.57]:40098 "EHLO out30-57.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233983AbhCaHLr (ORCPT ); Wed, 31 Mar 2021 03:11:47 -0400 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R211e4; CH=green; DM=||false|; DS=||; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e04420; MF=xuanzhuo@linux.alibaba.com; NM=1; PH=DS; RN=20; SR=0; TI=SMTPD_---0UTwL0EX_1617174703; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UTwL0EX_1617174703) by smtp.aliyun-inc.com(127.0.0.1); Wed, 31 Mar 2021 15:11:43 +0800 From: Xuan Zhuo To: netdev@vger.kernel.org Cc: "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?utf-8?b?QmrDtnJuIFTDtnBl?= =?utf-8?q?l?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org, bpf@vger.kernel.org, Dust Li Subject: [PATCH net-next v3 8/8] virtio-net: free old xmit handle xsk Date: Wed, 31 Mar 2021 15:11:39 +0800 Message-Id: <20210331071139.15473-9-xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 2.31.0 In-Reply-To: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> References: <20210331071139.15473-1-xuanzhuo@linux.alibaba.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Based on the last two bit of ptr returned by virtqueue_get_buf, 01 represents the packet sent by xdp, 10 is the packet sent by xsk, and 00 is skb by default. If the xmit work of xsk has not been completed, but the ring is full, napi must first exit and wait for the ring to be available, so need_wakeup is set. If __free_old_xmit is called first by start_xmit, we can quickly wake up napi to execute xsk xmit work. When recycling, we need to count the number of bytes sent, so put xsk desc->len into the ptr pointer. Because ptr does not point to meaningful objects in xsk. Signed-off-by: Xuan Zhuo Reviewed-by: Dust Li Reported-by: kernel test robot --- drivers/net/virtio_net.c | 171 ++++++++++++++++++++++++++------------- 1 file changed, 113 insertions(+), 58 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index fac7d0020013..8318b89b2971 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -50,6 +50,9 @@ module_param(xsk_kick_thr, int, 0644); #define VIRTIO_XDP_REDIR BIT(1) #define VIRTIO_XDP_FLAG BIT(0) +#define VIRTIO_XSK_FLAG BIT(1) + +#define VIRTIO_XSK_PTR_SHIFT 4 static struct virtio_net_hdr_mrg_rxbuf xsk_hdr; @@ -147,6 +150,9 @@ struct send_queue { /* save the desc for next xmit, when xmit fail. */ struct xdp_desc last_desc; + + /* xsk wait for tx inter or softirq */ + bool need_wakeup; } xsk; }; @@ -266,6 +272,12 @@ struct padded_vnet_hdr { static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool, int budget, bool in_napi); +static void virtnet_xsk_complete(struct send_queue *sq, u32 num); + +static bool is_skb_ptr(void *ptr) +{ + return !((unsigned long)ptr & (VIRTIO_XDP_FLAG | VIRTIO_XSK_FLAG)); +} static bool is_xdp_frame(void *ptr) { @@ -277,11 +289,58 @@ static void *xdp_to_ptr(struct xdp_frame *ptr) return (void *)((unsigned long)ptr | VIRTIO_XDP_FLAG); } +static void *xsk_to_ptr(struct xdp_desc *desc) +{ + /* save the desc len to ptr */ + u64 p = desc->len << VIRTIO_XSK_PTR_SHIFT; + + return (void *)(p | VIRTIO_XSK_FLAG); +} + +static void ptr_to_xsk(void *ptr, struct xdp_desc *desc) +{ + desc->len = ((u64)ptr) >> VIRTIO_XSK_PTR_SHIFT; +} + static struct xdp_frame *ptr_to_xdp(void *ptr) { return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG); } +static void __free_old_xmit(struct send_queue *sq, bool in_napi, + struct virtnet_sq_stats *stats) +{ + unsigned int xsknum = 0; + unsigned int len; + void *ptr; + + while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { + if (is_skb_ptr(ptr)) { + struct sk_buff *skb = ptr; + + pr_debug("Sent skb %p\n", skb); + + stats->bytes += skb->len; + napi_consume_skb(skb, in_napi); + } else if (is_xdp_frame(ptr)) { + struct xdp_frame *frame = ptr_to_xdp(ptr); + + stats->bytes += frame->len; + xdp_return_frame(frame); + } else { + struct xdp_desc desc; + + ptr_to_xsk(ptr, &desc); + stats->bytes += desc.len; + ++xsknum; + } + stats->packets++; + } + + if (xsknum) + virtnet_xsk_complete(sq, xsknum); +} + /* Converting between virtqueue no. and kernel tx/rx queue no. * 0:rx0 1:tx0 2:rx1 3:tx1 ... 2N:rxN 2N+1:txN 2N+2:cvq */ @@ -543,15 +602,12 @@ static int virtnet_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags) { struct virtnet_info *vi = netdev_priv(dev); + struct virtnet_sq_stats stats = {}; struct receive_queue *rq = vi->rq; struct bpf_prog *xdp_prog; struct send_queue *sq; - unsigned int len; - int packets = 0; - int bytes = 0; int nxmit = 0; int kicks = 0; - void *ptr; int ret; int i; @@ -570,20 +626,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, } /* Free up any pending old buffers before queueing new ones. */ - while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { - if (likely(is_xdp_frame(ptr))) { - struct xdp_frame *frame = ptr_to_xdp(ptr); - - bytes += frame->len; - xdp_return_frame(frame); - } else { - struct sk_buff *skb = ptr; - - bytes += skb->len; - napi_consume_skb(skb, false); - } - packets++; - } + __free_old_xmit(sq, false, &stats); for (i = 0; i < n; i++) { struct xdp_frame *xdpf = frames[i]; @@ -600,8 +643,8 @@ static int virtnet_xdp_xmit(struct net_device *dev, } out: u64_stats_update_begin(&sq->stats.syncp); - sq->stats.bytes += bytes; - sq->stats.packets += packets; + sq->stats.bytes += stats.bytes; + sq->stats.packets += stats.packets; sq->stats.xdp_tx += n; sq->stats.xdp_tx_drops += n - nxmit; sq->stats.kicks += kicks; @@ -1426,37 +1469,19 @@ static int virtnet_receive(struct receive_queue *rq, int budget, static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) { - unsigned int len; - unsigned int packets = 0; - unsigned int bytes = 0; - void *ptr; - - while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { - if (likely(!is_xdp_frame(ptr))) { - struct sk_buff *skb = ptr; + struct virtnet_sq_stats stats = {}; - pr_debug("Sent skb %p\n", skb); - - bytes += skb->len; - napi_consume_skb(skb, in_napi); - } else { - struct xdp_frame *frame = ptr_to_xdp(ptr); - - bytes += frame->len; - xdp_return_frame(frame); - } - packets++; - } + __free_old_xmit(sq, in_napi, &stats); /* Avoid overhead when no packets have been processed * happens when called speculatively from start_xmit. */ - if (!packets) + if (!stats.packets) return; u64_stats_update_begin(&sq->stats.syncp); - sq->stats.bytes += bytes; - sq->stats.packets += packets; + sq->stats.bytes += stats.bytes; + sq->stats.packets += stats.packets; u64_stats_update_end(&sq->stats.syncp); } @@ -2575,6 +2600,28 @@ static void virtnet_xsk_check_space(struct send_queue *sq) netif_stop_subqueue(dev, qnum); } +static void virtnet_xsk_complete(struct send_queue *sq, u32 num) +{ + struct xsk_buff_pool *pool; + + rcu_read_lock(); + + pool = rcu_dereference(sq->xsk.pool); + if (!pool) { + rcu_read_unlock(); + return; + } + + xsk_tx_completed(pool, num); + + rcu_read_unlock(); + + if (sq->xsk.need_wakeup) { + sq->xsk.need_wakeup = false; + virtqueue_napi_schedule(&sq->napi, sq->vq); + } +} + static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, struct xdp_desc *desc) { @@ -2613,7 +2660,8 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, offset = 0; } - err = virtqueue_add_outbuf(sq->vq, sq->sg, n, NULL, GFP_ATOMIC); + err = virtqueue_add_outbuf(sq->vq, sq->sg, n, xsk_to_ptr(desc), + GFP_ATOMIC); if (unlikely(err)) sq->xsk.last_desc = *desc; @@ -2623,13 +2671,13 @@ static int virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool, static int virtnet_xsk_xmit_batch(struct send_queue *sq, struct xsk_buff_pool *pool, unsigned int budget, - bool in_napi, int *done) + bool in_napi, int *done, + struct virtnet_sq_stats *stats) { struct xdp_desc desc; int err, packet = 0; int ret = -EAGAIN; int need_kick = 0; - int kicks = 0; if (sq->xsk.last_desc.addr) { err = virtnet_xsk_xmit(sq, pool, &sq->xsk.last_desc); @@ -2665,7 +2713,7 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq, if (need_kick > xsk_kick_thr) { if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) - ++kicks; + ++stats->kicks; need_kick = 0; } @@ -2675,15 +2723,11 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq, if (need_kick) { if (virtqueue_kick_prepare(sq->vq) && virtqueue_notify(sq->vq)) - ++kicks; - } - if (kicks) { - u64_stats_update_begin(&sq->stats.syncp); - sq->stats.kicks += kicks; - u64_stats_update_end(&sq->stats.syncp); + ++stats->kicks; } *done = packet; + stats->xdp_tx += packet; xsk_tx_release(pool); } @@ -2694,26 +2738,37 @@ static int virtnet_xsk_xmit_batch(struct send_queue *sq, static int virtnet_xsk_run(struct send_queue *sq, struct xsk_buff_pool *pool, int budget, bool in_napi) { + struct virtnet_sq_stats stats = {}; int done = 0; int err; - free_old_xmit_skbs(sq, in_napi); + sq->xsk.need_wakeup = false; + __free_old_xmit(sq, in_napi, &stats); - err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done); + err = virtnet_xsk_xmit_batch(sq, pool, budget, in_napi, &done, &stats); /* -EAGAIN: done == budget * -EBUSY: done < budget * 0 : done < budget */ if (err == -EBUSY) { - free_old_xmit_skbs(sq, in_napi); + __free_old_xmit(sq, in_napi, &stats); /* If the space is enough, let napi run again. */ if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS) done = budget; + else + sq->xsk.need_wakeup = true; } virtnet_xsk_check_space(sq); + u64_stats_update_begin(&sq->stats.syncp); + sq->stats.packets += stats.packets; + sq->stats.bytes += stats.bytes; + sq->stats.kicks += stats.kicks; + sq->stats.xdp_tx += stats.xdp_tx; + u64_stats_update_end(&sq->stats.syncp); + return done; } @@ -2991,9 +3046,9 @@ static void free_unused_bufs(struct virtnet_info *vi) for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->sq[i].vq; while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (!is_xdp_frame(buf)) + if (is_skb_ptr(buf)) dev_kfree_skb(buf); - else + else if (is_xdp_frame(buf)) xdp_return_frame(ptr_to_xdp(buf)); } }