From patchwork Fri Aug 28 08:26:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261807 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD112C433E6 for ; Fri, 28 Aug 2020 08:27:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7F2F920776 for ; Fri, 28 Aug 2020 08:27:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728713AbgH1I11 (ORCPT ); Fri, 28 Aug 2020 04:27:27 -0400 Received: from mga03.intel.com ([134.134.136.65]:23546 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728274AbgH1I0z (ORCPT ); Fri, 28 Aug 2020 04:26:55 -0400 IronPort-SDR: EMNXZThKAMEFhndNO/BzMQBHoRbXlk5J7Som7fB6gRE1Ovuiy17yxPYnXbBo0UgnOvcEgFlUsZ v41f4K36aVqA== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156633947" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156633947" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:26:46 -0700 IronPort-SDR: Mn0h93/JBLvKFOP5KwVaxFuOLFE4HyP9nQC2DIIyCIEn9ea7BTL3JGrqkeK1aZyHEZhJ79plj8 ON6EoW2lk9AA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762486" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:26:42 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 02/15] xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces Date: Fri, 28 Aug 2020 10:26:16 +0200 Message-Id: <1598603189-32145-3-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 34 +++--- drivers/net/ethernet/intel/ice/ice_base.c | 6 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 28 ++--- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 32 +++--- drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 4 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/pool.c | 12 +-- .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 8 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h | 6 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 +- include/net/xdp_sock.h | 1 + include/net/xdp_sock_drv.h | 114 +++++++++++---------- net/ethtool/channels.c | 2 +- net/ethtool/ioctl.c | 2 +- net/xdp/xdp_umem.c | 24 ++--- net/xdp/xsk.c | 45 ++++---- 19 files changed, 179 insertions(+), 167 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index cbf2a44..05c6d3e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3138,7 +3138,7 @@ static struct xsk_buff_pool *i40e_xsk_pool(struct i40e_ring *ring) if (!xdp_on || !test_bit(qid, ring->vsi->af_xdp_zc_qps)) return NULL; - return xdp_get_xsk_pool_from_qid(ring->vsi->netdev, qid); + return xsk_get_pool_from_qid(ring->vsi->netdev, qid); } /** @@ -3286,7 +3286,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) if (ret) return ret; ring->rx_buf_len = - xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + xsk_pool_get_rx_frame_size(ring->xsk_pool); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -3370,7 +3370,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) writel(0, ring->tail); if (ring->xsk_pool) { - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); ok = i40e_alloc_rx_buffers_zc(ring, I40E_DESC_UNUSED(ring)); } else { ok = !i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring)); diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 00e9fe6..95b9a7e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -55,8 +55,7 @@ static int i40e_xsk_pool_enable(struct i40e_vsi *vsi, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(pool->umem, &vsi->back->pdev->dev, - I40E_RX_DMA_ATTR); + err = xsk_pool_dma_map(pool, &vsi->back->pdev->dev, I40E_RX_DMA_ATTR); if (err) return err; @@ -97,7 +96,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) bool if_running; int err; - pool = xdp_get_xsk_pool_from_qid(netdev, qid); + pool = xsk_get_pool_from_qid(netdev, qid); if (!pool) return -EINVAL; @@ -110,7 +109,7 @@ static int i40e_xsk_pool_disable(struct i40e_vsi *vsi, u16 qid) } clear_bit(qid, vsi->af_xdp_zc_qps); - xsk_buff_dma_unmap(pool->umem, I40E_RX_DMA_ATTR); + xsk_pool_dma_unmap(pool, I40E_RX_DMA_ATTR); if (if_running) { err = i40e_queue_pair_enable(vsi, qid); @@ -196,7 +195,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count) rx_desc = I40E_RX_DESC(rx_ring, ntu); bi = i40e_rx_bi(rx_ring, ntu); do { - xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!xdp) { ok = false; goto no_buffers; @@ -363,11 +362,11 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) i40e_finalize_xdp_rx(rx_ring, xdp_xmit); i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -390,12 +389,11 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) dma_addr_t dma; while (budget-- > 0) { - if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, - desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc.len); tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use]; @@ -422,7 +420,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget) I40E_TXD_QW1_CMD_SHIFT); i40e_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); + xsk_tx_release(xdp_ring->xsk_pool); i40e_update_tx_stats(xdp_ring, sent_frames, total_bytes); } @@ -494,13 +492,13 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring) tx_ring->next_to_clean -= tx_ring->count; if (xsk_frames) - xsk_umem_complete_tx(bp->umem, xsk_frames); + xsk_tx_completed(bp, xsk_frames); i40e_arm_wb(tx_ring, vsi, completed_frames); out_xmit: - if (xsk_umem_uses_need_wakeup(tx_ring->xsk_pool->umem)) - xsk_set_tx_need_wakeup(tx_ring->xsk_pool->umem); + if (xsk_uses_need_wakeup(tx_ring->xsk_pool)) + xsk_set_tx_need_wakeup(tx_ring->xsk_pool); return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring)); } @@ -591,7 +589,7 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(bp->umem, xsk_frames); + xsk_tx_completed(bp, xsk_frames); } /** @@ -607,7 +605,7 @@ bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi) int i; for (i = 0; i < vsi->num_queue_pairs; i++) { - if (xdp_get_xsk_pool_from_qid(netdev, i)) + if (xsk_get_pool_from_qid(netdev, i)) return true; } diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 3c92448..fe4320e 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -313,7 +313,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); ring->rx_buf_len = - xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + xsk_pool_get_rx_frame_size(ring->xsk_pool); /* For AF_XDP ZC, we disallow packets to span on * multiple buffers, thus letting us skip that * handling in the fast-path. @@ -324,7 +324,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) NULL); if (err) return err; - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->q_index); @@ -418,7 +418,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring) writel(0, ring->tail); if (ring->xsk_pool) { - if (!xsk_buff_can_alloc(ring->xsk_pool->umem, num_bufs)) { + if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) { dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n", num_bufs, ring->q_index); dev_warn(dev, "Change Rx ring/fill queue size to avoid performance issues\n"); diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 8b70e8c..dffef37 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -311,7 +311,7 @@ static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid) !vsi->xsk_pools[qid]) return -EINVAL; - xsk_buff_dma_unmap(vsi->xsk_pools[qid]->umem, ICE_RX_DMA_ATTR); + xsk_pool_dma_unmap(vsi->xsk_pools[qid], ICE_RX_DMA_ATTR); ice_xsk_remove_pool(vsi, qid); return 0; @@ -348,7 +348,7 @@ ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) vsi->xsk_pools[qid] = pool; vsi->num_xsk_pools_used++; - err = xsk_buff_dma_map(vsi->xsk_pools[qid]->umem, ice_pf_to_dev(vsi->back), + err = xsk_pool_dma_map(vsi->xsk_pools[qid], ice_pf_to_dev(vsi->back), ICE_RX_DMA_ATTR); if (err) return err; @@ -425,7 +425,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count) rx_buf = &rx_ring->rx_buf[ntu]; do { - rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!rx_buf->xdp) { ret = true; break; @@ -645,11 +645,11 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget) ice_finalize_xdp_rx(rx_ring, xdp_xmit); ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes); - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -682,11 +682,11 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use]; - if (!xsk_umem_consume_tx(xdp_ring->xsk_pool->umem, &desc)) + if (!xsk_tx_peek_desc(xdp_ring->xsk_pool, &desc)) break; - dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool->umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool->umem, dma, + dma = xsk_buff_raw_get_dma(xdp_ring->xsk_pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(xdp_ring->xsk_pool, dma, desc.len); tx_buf->bytecount = desc.len; @@ -703,7 +703,7 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget) if (tx_desc) { ice_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(xdp_ring->xsk_pool->umem); + xsk_tx_release(xdp_ring->xsk_pool); } return budget > 0 && work_done; @@ -777,10 +777,10 @@ bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget) xdp_ring->next_to_clean = ntc; if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); - if (xsk_umem_uses_need_wakeup(xdp_ring->xsk_pool->umem)) - xsk_set_tx_need_wakeup(xdp_ring->xsk_pool->umem); + if (xsk_uses_need_wakeup(xdp_ring->xsk_pool)) + xsk_set_tx_need_wakeup(xdp_ring->xsk_pool); ice_update_tx_ring_stats(xdp_ring, total_packets, total_bytes); xmit_done = ice_xmit_zc(xdp_ring, ICE_DFLT_IRQ_WORK); @@ -896,5 +896,5 @@ void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) } if (xsk_frames) - xsk_umem_complete_tx(xdp_ring->xsk_pool->umem, xsk_frames); + xsk_tx_completed(xdp_ring->xsk_pool, xsk_frames); } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index c498465..c4d41f8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -3714,7 +3714,7 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter, /* configure the packet buffer length */ if (rx_ring->xsk_pool) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(rx_ring->xsk_pool->umem); + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(rx_ring->xsk_pool); /* If the MAC support setting RXDCTL.RLPML, the * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and @@ -4064,7 +4064,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL)); - xsk_buff_set_rxq_info(ring->xsk_pool->umem, &ring->xdp_rxq); + xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq); } else { WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL)); @@ -4120,7 +4120,7 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter, } if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) { - u32 xsk_buf_len = xsk_umem_get_rx_frame_size(ring->xsk_pool->umem); + u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK | IXGBE_RXDCTL_RLPML_EN); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 8ad954f..6af34da 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -17,7 +17,7 @@ struct xsk_buff_pool *ixgbe_xsk_pool(struct ixgbe_adapter *adapter, if (!xdp_on || !test_bit(qid, adapter->af_xdp_zc_qps)) return NULL; - return xdp_get_xsk_pool_from_qid(adapter->netdev, qid); + return xsk_get_pool_from_qid(adapter->netdev, qid); } static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, @@ -35,7 +35,7 @@ static int ixgbe_xsk_pool_enable(struct ixgbe_adapter *adapter, qid >= netdev->real_num_tx_queues) return -EINVAL; - err = xsk_buff_dma_map(pool->umem, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); + err = xsk_pool_dma_map(pool, &adapter->pdev->dev, IXGBE_RX_DMA_ATTR); if (err) return err; @@ -64,7 +64,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) struct xsk_buff_pool *pool; bool if_running; - pool = xdp_get_xsk_pool_from_qid(adapter->netdev, qid); + pool = xsk_get_pool_from_qid(adapter->netdev, qid); if (!pool) return -EINVAL; @@ -75,7 +75,7 @@ static int ixgbe_xsk_pool_disable(struct ixgbe_adapter *adapter, u16 qid) ixgbe_txrx_ring_disable(adapter, qid); clear_bit(qid, adapter->af_xdp_zc_qps); - xsk_buff_dma_unmap(pool->umem, IXGBE_RX_DMA_ATTR); + xsk_pool_dma_unmap(pool, IXGBE_RX_DMA_ATTR); if (if_running) ixgbe_txrx_ring_enable(adapter, qid); @@ -150,7 +150,7 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count) i -= rx_ring->count; do { - bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool->umem); + bi->xdp = xsk_buff_alloc(rx_ring->xsk_pool); if (!bi->xdp) { ok = false; break; @@ -345,11 +345,11 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, q_vector->rx.total_packets += total_rx_packets; q_vector->rx.total_bytes += total_rx_bytes; - if (xsk_umem_uses_need_wakeup(rx_ring->xsk_pool->umem)) { + if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) - xsk_set_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_set_rx_need_wakeup(rx_ring->xsk_pool); else - xsk_clear_rx_need_wakeup(rx_ring->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rx_ring->xsk_pool); return (int)total_rx_packets; } @@ -389,11 +389,11 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) break; } - if (!xsk_umem_consume_tx(pool->umem, &desc)) + if (!xsk_tx_peek_desc(pool, &desc)) break; - dma = xsk_buff_raw_get_dma(pool->umem, desc.addr); - xsk_buff_raw_dma_sync_for_device(pool->umem, dma, desc.len); + dma = xsk_buff_raw_get_dma(pool, desc.addr); + xsk_buff_raw_dma_sync_for_device(pool, dma, desc.len); tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use]; tx_bi->bytecount = desc.len; @@ -419,7 +419,7 @@ static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget) if (tx_desc) { ixgbe_xdp_ring_update_tail(xdp_ring); - xsk_umem_consume_tx_done(pool->umem); + xsk_tx_release(pool); } return !!budget && work_done; @@ -485,10 +485,10 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector, q_vector->tx.total_packets += total_packets; if (xsk_frames) - xsk_umem_complete_tx(pool->umem, xsk_frames); + xsk_tx_completed(pool, xsk_frames); - if (xsk_umem_uses_need_wakeup(pool->umem)) - xsk_set_tx_need_wakeup(pool->umem); + if (xsk_uses_need_wakeup(pool)) + xsk_set_tx_need_wakeup(pool); return ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit); } @@ -547,5 +547,5 @@ void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring) } if (xsk_frames) - xsk_umem_complete_tx(pool->umem, xsk_frames); + xsk_tx_completed(pool, xsk_frames); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c index c9f0d2b..d521449 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c @@ -445,7 +445,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq) } while ((++i < MLX5E_TX_CQ_POLL_BUDGET) && (cqe = mlx5_cqwq_get_cqe(&cq->wq))); if (xsk_frames) - xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames); + xsk_tx_completed(sq->xsk_pool, xsk_frames); sq->stats->cqes += i; @@ -475,7 +475,7 @@ void mlx5e_free_xdpsq_descs(struct mlx5e_xdpsq *sq) } if (xsk_frames) - xsk_umem_complete_tx(sq->xsk_pool->umem, xsk_frames); + xsk_tx_completed(sq->xsk_pool, xsk_frames); } int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c index 8ccd920..3503e77 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c @@ -11,13 +11,13 @@ static int mlx5e_xsk_map_pool(struct mlx5e_priv *priv, { struct device *dev = priv->mdev->device; - return xsk_buff_dma_map(pool->umem, dev, 0); + return xsk_pool_dma_map(pool, dev, 0); } static void mlx5e_xsk_unmap_pool(struct mlx5e_priv *priv, struct xsk_buff_pool *pool) { - return xsk_buff_dma_unmap(pool->umem, 0); + return xsk_pool_dma_unmap(pool, 0); } static int mlx5e_xsk_get_pools(struct mlx5e_xsk *xsk) @@ -64,14 +64,14 @@ static void mlx5e_xsk_remove_pool(struct mlx5e_xsk *xsk, u16 ix) static bool mlx5e_xsk_is_pool_sane(struct xsk_buff_pool *pool) { - return xsk_umem_get_headroom(pool->umem) <= 0xffff && - xsk_umem_get_chunk_size(pool->umem) <= 0xffff; + return xsk_pool_get_headroom(pool) <= 0xffff && + xsk_pool_get_chunk_size(pool) <= 0xffff; } void mlx5e_build_xsk_param(struct xsk_buff_pool *pool, struct mlx5e_xsk_param *xsk) { - xsk->headroom = xsk_umem_get_headroom(pool->umem); - xsk->chunk_size = xsk_umem_get_chunk_size(pool->umem); + xsk->headroom = xsk_pool_get_headroom(pool); + xsk->chunk_size = xsk_pool_get_chunk_size(pool); } static int mlx5e_xsk_enable_locked(struct mlx5e_priv *priv, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h index 3dd056a..7f88ccf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.h @@ -22,7 +22,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq, static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info) { - dma_info->xsk = xsk_buff_alloc(rq->xsk_pool->umem); + dma_info->xsk = xsk_buff_alloc(rq->xsk_pool); if (!dma_info->xsk) return -ENOMEM; @@ -38,13 +38,13 @@ static inline int mlx5e_xsk_page_alloc_pool(struct mlx5e_rq *rq, static inline bool mlx5e_xsk_update_rx_wakeup(struct mlx5e_rq *rq, bool alloc_err) { - if (!xsk_umem_uses_need_wakeup(rq->xsk_pool->umem)) + if (!xsk_uses_need_wakeup(rq->xsk_pool)) return alloc_err; if (unlikely(alloc_err)) - xsk_set_rx_need_wakeup(rq->xsk_pool->umem); + xsk_set_rx_need_wakeup(rq->xsk_pool); else - xsk_clear_rx_need_wakeup(rq->xsk_pool->umem); + xsk_clear_rx_need_wakeup(rq->xsk_pool); return false; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c index 5f94702..aa91cbd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.c @@ -87,7 +87,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - if (!xsk_umem_consume_tx(pool->umem, &desc)) { + if (!xsk_tx_peek_desc(pool, &desc)) { /* TX will get stuck until something wakes it up by * triggering NAPI. Currently it's expected that the * application calls sendto() if there are consumed, but @@ -96,11 +96,11 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) break; } - xdptxd.dma_addr = xsk_buff_raw_get_dma(pool->umem, desc.addr); - xdptxd.data = xsk_buff_raw_get_data(pool->umem, desc.addr); + xdptxd.dma_addr = xsk_buff_raw_get_dma(pool, desc.addr); + xdptxd.data = xsk_buff_raw_get_data(pool, desc.addr); xdptxd.len = desc.len; - xsk_buff_raw_dma_sync_for_device(pool->umem, xdptxd.dma_addr, xdptxd.len); + xsk_buff_raw_dma_sync_for_device(pool, xdptxd.dma_addr, xdptxd.len); ret = INDIRECT_CALL_2(sq->xmit_xdp_frame, mlx5e_xmit_xdp_frame_mpwqe, mlx5e_xmit_xdp_frame, sq, &xdptxd, &xdpi, check_result); @@ -119,7 +119,7 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget) mlx5e_xdp_mpwqe_complete(sq); mlx5e_xmit_xdp_doorbell(sq); - xsk_umem_consume_tx_done(pool->umem); + xsk_tx_release(pool); } return !(budget && work_done); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h index ddb61d5..a050850 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/tx.h @@ -15,13 +15,13 @@ bool mlx5e_xsk_tx(struct mlx5e_xdpsq *sq, unsigned int budget); static inline void mlx5e_xsk_update_tx_wakeup(struct mlx5e_xdpsq *sq) { - if (!xsk_umem_uses_need_wakeup(sq->xsk_pool->umem)) + if (!xsk_uses_need_wakeup(sq->xsk_pool)) return; if (sq->pc != sq->cc) - xsk_clear_tx_need_wakeup(sq->xsk_pool->umem); + xsk_clear_tx_need_wakeup(sq->xsk_pool); else - xsk_set_tx_need_wakeup(sq->xsk_pool->umem); + xsk_set_tx_need_wakeup(sq->xsk_pool); } #endif /* __MLX5_EN_XSK_TX_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f598683..2683462 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -477,7 +477,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, if (xsk) { err = xdp_rxq_info_reg_mem_model(&rq->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); - xsk_buff_set_rxq_info(rq->xsk_pool->umem, &rq->xdp_rxq); + xsk_pool_set_rxq_info(rq->xsk_pool, &rq->xdp_rxq); } else { /* Create a page_pool and register it with rxq */ pp_params.order = 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index b33f9f2..57034c5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -407,7 +407,7 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, u8 wqe_bulk) * allocating one-by-one, failing and moving frames to the * Reuse Ring. */ - if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, pages_desired))) + if (unlikely(!xsk_buff_can_alloc(rq->xsk_pool, pages_desired))) return -ENOMEM; } @@ -506,7 +506,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix) * one-by-one, failing and moving frames to the Reuse Ring. */ if (rq->xsk_pool && - unlikely(!xsk_buff_can_alloc(rq->xsk_pool->umem, MLX5_MPWRQ_PAGES_PER_WQE))) { + unlikely(!xsk_buff_can_alloc(rq->xsk_pool, MLX5_MPWRQ_PAGES_PER_WQE))) { err = -ENOMEM; goto err; } diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index c9d87cc..ccf6cb5 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -52,6 +52,7 @@ struct xdp_sock { struct net_device *dev; struct xdp_umem *umem; struct list_head flush_node; + struct xsk_buff_pool *pool; u16 queue_id; bool zc; enum { diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 5dc8d3c..a7c7d2e 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -11,48 +11,50 @@ #ifdef CONFIG_XDP_SOCKETS -void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries); -bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc); -void xsk_umem_consume_tx_done(struct xdp_umem *umem); -struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, - u16 queue_id); -void xsk_set_rx_need_wakeup(struct xdp_umem *umem); -void xsk_set_tx_need_wakeup(struct xdp_umem *umem); -void xsk_clear_rx_need_wakeup(struct xdp_umem *umem); -void xsk_clear_tx_need_wakeup(struct xdp_umem *umem); -bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem); +void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries); +bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc); +void xsk_tx_release(struct xsk_buff_pool *pool); +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id); +void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool); +void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool); +bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool); -static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) +static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { - return XDP_PACKET_HEADROOM + umem->headroom; + return XDP_PACKET_HEADROOM + pool->headroom; } -static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) { - return umem->chunk_size; + return pool->chunk_size; } -static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) { - return xsk_umem_get_chunk_size(umem) - xsk_umem_get_headroom(umem); + return xsk_pool_get_chunk_size(pool) - xsk_pool_get_headroom(pool); } -static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, +static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) { - xp_set_rxq_info(umem->pool, rxq); + xp_set_rxq_info(pool, rxq); } -static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, +static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { - xp_dma_unmap(umem->pool, attrs); + xp_dma_unmap(pool, attrs); } -static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, - unsigned long attrs) +static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, + struct device *dev, unsigned long attrs) { - return xp_dma_map(umem->pool, dev, attrs, umem->pgs, umem->npgs); + struct xdp_umem *umem = pool->umem; + + return xp_dma_map(pool, dev, attrs, umem->pgs, umem->npgs); } static inline dma_addr_t xsk_buff_xdp_get_dma(struct xdp_buff *xdp) @@ -69,14 +71,14 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp) return xp_get_frame_dma(xskb); } -static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) +static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) { - return xp_alloc(umem->pool); + return xp_alloc(pool); } -static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) +static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) { - return xp_can_alloc(umem->pool, count); + return xp_can_alloc(pool, count); } static inline void xsk_buff_free(struct xdp_buff *xdp) @@ -86,14 +88,15 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) xp_free(xskb); } -static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) +static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, + u64 addr) { - return xp_raw_get_dma(umem->pool, addr); + return xp_raw_get_dma(pool, addr); } -static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) +static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) { - return xp_raw_get_data(umem->pool, addr); + return xp_raw_get_data(pool, addr); } static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) @@ -103,83 +106,83 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) xp_dma_sync_for_cpu(xskb); } -static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, +static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) { - xp_dma_sync_for_device(umem->pool, dma, size); + xp_dma_sync_for_device(pool, dma, size); } #else -static inline void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) +static inline void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { } -static inline bool xsk_umem_consume_tx(struct xdp_umem *umem, - struct xdp_desc *desc) +static inline bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, + struct xdp_desc *desc) { return false; } -static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem) +static inline void xsk_tx_release(struct xsk_buff_pool *pool) { } static inline struct xsk_buff_pool * -xdp_get_xsk_pool_from_qid(struct net_device *dev, u16 queue_id) +xsk_get_pool_from_qid(struct net_device *dev, u16 queue_id) { return NULL; } -static inline void xsk_set_rx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_set_tx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) +static inline void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) { } -static inline bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) +static inline bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { return false; } -static inline u32 xsk_umem_get_headroom(struct xdp_umem *umem) +static inline u32 xsk_pool_get_headroom(struct xsk_buff_pool *pool) { return 0; } -static inline u32 xsk_umem_get_chunk_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_chunk_size(struct xsk_buff_pool *pool) { return 0; } -static inline u32 xsk_umem_get_rx_frame_size(struct xdp_umem *umem) +static inline u32 xsk_pool_get_rx_frame_size(struct xsk_buff_pool *pool) { return 0; } -static inline void xsk_buff_set_rxq_info(struct xdp_umem *umem, +static inline void xsk_pool_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) { } -static inline void xsk_buff_dma_unmap(struct xdp_umem *umem, +static inline void xsk_pool_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { } -static inline int xsk_buff_dma_map(struct xdp_umem *umem, struct device *dev, - unsigned long attrs) +static inline int xsk_pool_dma_map(struct xsk_buff_pool *pool, + struct device *dev, unsigned long attrs) { return 0; } @@ -194,12 +197,12 @@ static inline dma_addr_t xsk_buff_xdp_get_frame_dma(struct xdp_buff *xdp) return 0; } -static inline struct xdp_buff *xsk_buff_alloc(struct xdp_umem *umem) +static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool) { return NULL; } -static inline bool xsk_buff_can_alloc(struct xdp_umem *umem, u32 count) +static inline bool xsk_buff_can_alloc(struct xsk_buff_pool *pool, u32 count) { return false; } @@ -208,12 +211,13 @@ static inline void xsk_buff_free(struct xdp_buff *xdp) { } -static inline dma_addr_t xsk_buff_raw_get_dma(struct xdp_umem *umem, u64 addr) +static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, + u64 addr) { return 0; } -static inline void *xsk_buff_raw_get_data(struct xdp_umem *umem, u64 addr) +static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) { return NULL; } @@ -222,7 +226,7 @@ static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) { } -static inline void xsk_buff_raw_dma_sync_for_device(struct xdp_umem *umem, +static inline void xsk_buff_raw_dma_sync_for_device(struct xsk_buff_pool *pool, dma_addr_t dma, size_t size) { diff --git a/net/ethtool/channels.c b/net/ethtool/channels.c index 78d990b..9ecda09 100644 --- a/net/ethtool/channels.c +++ b/net/ethtool/channels.c @@ -223,7 +223,7 @@ int ethnl_set_channels(struct sk_buff *skb, struct genl_info *info) from_channel = channels.combined_count + min(channels.rx_count, channels.tx_count); for (i = from_channel; i < old_total; i++) - if (xdp_get_xsk_pool_from_qid(dev, i)) { + if (xsk_get_pool_from_qid(dev, i)) { GENL_SET_ERR_MSG(info, "requested channel counts are too low for existing zerocopy AF_XDP sockets"); return -EINVAL; } diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 2e6a678..925b573 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -1706,7 +1706,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev, min(channels.rx_count, channels.tx_count); to_channel = curr.combined_count + max(curr.rx_count, curr.tx_count); for (i = from_channel; i < to_channel; i++) - if (xdp_get_xsk_pool_from_qid(dev, i)) + if (xsk_get_pool_from_qid(dev, i)) return -EINVAL; ret = dev->ethtool_ops->set_channels(dev, &channels); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 0b5f3b0..adde4d5 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -51,9 +51,9 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) * not know if the device has more tx queues than rx, or the opposite. * This might also change during run time. */ -static int xdp_reg_xsk_pool_at_qid(struct net_device *dev, - struct xsk_buff_pool *pool, - u16 queue_id) +static int xsk_reg_pool_at_qid(struct net_device *dev, + struct xsk_buff_pool *pool, + u16 queue_id) { if (queue_id >= max_t(unsigned int, dev->real_num_rx_queues, @@ -68,8 +68,8 @@ static int xdp_reg_xsk_pool_at_qid(struct net_device *dev, return 0; } -struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, - u16 queue_id) +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id) { if (queue_id < dev->real_num_rx_queues) return dev->_rx[queue_id].pool; @@ -78,9 +78,9 @@ struct xsk_buff_pool *xdp_get_xsk_pool_from_qid(struct net_device *dev, return NULL; } -EXPORT_SYMBOL(xdp_get_xsk_pool_from_qid); +EXPORT_SYMBOL(xsk_get_pool_from_qid); -static void xdp_clear_xsk_pool_at_qid(struct net_device *dev, u16 queue_id) +static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) { if (queue_id < dev->real_num_rx_queues) dev->_rx[queue_id].pool = NULL; @@ -103,10 +103,10 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (force_zc && force_copy) return -EINVAL; - if (xdp_get_xsk_pool_from_qid(dev, queue_id)) + if (xsk_get_pool_from_qid(dev, queue_id)) return -EBUSY; - err = xdp_reg_xsk_pool_at_qid(dev, umem->pool, queue_id); + err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id); if (err) return err; @@ -119,7 +119,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, * Also for supporting drivers that do not implement this * feature. They will always have to call sendto(). */ - xsk_set_tx_need_wakeup(umem); + xsk_set_tx_need_wakeup(umem->pool); } dev_hold(dev); @@ -148,7 +148,7 @@ int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, if (!force_zc) err = 0; /* fallback to copy mode */ if (err) - xdp_clear_xsk_pool_at_qid(dev, queue_id); + xsk_clear_pool_at_qid(dev, queue_id); return err; } @@ -173,7 +173,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem) WARN(1, "failed to disable umem!\n"); } - xdp_clear_xsk_pool_at_qid(umem->dev, umem->queue_id); + xsk_clear_pool_at_qid(umem->dev, umem->queue_id); dev_put(umem->dev); umem->dev = NULL; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index c323162..befc9a4 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -39,8 +39,10 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) READ_ONCE(xs->umem->fq); } -void xsk_set_rx_need_wakeup(struct xdp_umem *umem) +void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; + if (umem->need_wakeup & XDP_WAKEUP_RX) return; @@ -49,8 +51,9 @@ void xsk_set_rx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_set_rx_need_wakeup); -void xsk_set_tx_need_wakeup(struct xdp_umem *umem) +void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; if (umem->need_wakeup & XDP_WAKEUP_TX) @@ -66,8 +69,10 @@ void xsk_set_tx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_set_tx_need_wakeup); -void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) +void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; + if (!(umem->need_wakeup & XDP_WAKEUP_RX)) return; @@ -76,8 +81,9 @@ void xsk_clear_rx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); -void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) +void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; if (!(umem->need_wakeup & XDP_WAKEUP_TX)) @@ -93,11 +99,11 @@ void xsk_clear_tx_need_wakeup(struct xdp_umem *umem) } EXPORT_SYMBOL(xsk_clear_tx_need_wakeup); -bool xsk_umem_uses_need_wakeup(struct xdp_umem *umem) +bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { - return umem->flags & XDP_UMEM_USES_NEED_WAKEUP; + return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP; } -EXPORT_SYMBOL(xsk_umem_uses_need_wakeup); +EXPORT_SYMBOL(xsk_uses_need_wakeup); void xp_release(struct xdp_buff_xsk *xskb) { @@ -155,12 +161,12 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, struct xdp_buff *xsk_xdp; int err; - if (len > xsk_umem_get_rx_frame_size(xs->umem)) { + if (len > xsk_pool_get_rx_frame_size(xs->pool)) { xs->rx_dropped++; return -ENOSPC; } - xsk_xdp = xsk_buff_alloc(xs->umem); + xsk_xdp = xsk_buff_alloc(xs->pool); if (!xsk_xdp) { xs->rx_dropped++; return -ENOSPC; @@ -249,27 +255,28 @@ void __xsk_map_flush(void) } } -void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries) +void xsk_tx_completed(struct xsk_buff_pool *pool, u32 nb_entries) { - xskq_prod_submit_n(umem->cq, nb_entries); + xskq_prod_submit_n(pool->umem->cq, nb_entries); } -EXPORT_SYMBOL(xsk_umem_complete_tx); +EXPORT_SYMBOL(xsk_tx_completed); -void xsk_umem_consume_tx_done(struct xdp_umem *umem) +void xsk_tx_release(struct xsk_buff_pool *pool) { struct xdp_sock *xs; rcu_read_lock(); - list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { + list_for_each_entry_rcu(xs, &pool->umem->xsk_tx_list, list) { __xskq_cons_release(xs->tx); xs->sk.sk_write_space(&xs->sk); } rcu_read_unlock(); } -EXPORT_SYMBOL(xsk_umem_consume_tx_done); +EXPORT_SYMBOL(xsk_tx_release); -bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc) +bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) { + struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; rcu_read_lock(); @@ -296,7 +303,7 @@ bool xsk_umem_consume_tx(struct xdp_umem *umem, struct xdp_desc *desc) rcu_read_unlock(); return false; } -EXPORT_SYMBOL(xsk_umem_consume_tx); +EXPORT_SYMBOL(xsk_tx_peek_desc); static int xsk_wakeup(struct xdp_sock *xs, u8 flags) { @@ -359,7 +366,7 @@ static int xsk_generic_xmit(struct sock *sk) skb_put(skb, len); addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->umem, addr); + buffer = xsk_buff_raw_get_data(xs->pool, addr); err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed @@ -762,6 +769,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return PTR_ERR(umem); } + xs->pool = umem->pool; + /* Make sure umem is ready before it can be seen by others */ smp_wmb(); WRITE_ONCE(xs->umem, umem); From patchwork Fri Aug 28 08:26:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7BB0C433E2 for ; Fri, 28 Aug 2020 08:27:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 739C420776 for ; Fri, 28 Aug 2020 08:27:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728357AbgH1I1D (ORCPT ); Fri, 28 Aug 2020 04:27:03 -0400 Received: from mga03.intel.com ([134.134.136.65]:23551 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728690AbgH1I0v (ORCPT ); Fri, 28 Aug 2020 04:26:51 -0400 IronPort-SDR: SPuQ/98taKuhwTzqQo3EF1JNKA9nPc2XNzVeAagkoONM+z1NXM3MVQ+cdGLC8ZCye6A0dnaPBT Adrt3xDqWPqw== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156633956" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156633956" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:26:51 -0700 IronPort-SDR: /jNEF9+3r5IabBXPP7iJS/jTO9oQSyqcNIunxvzD1b+WKM9saBNjQs8CexPLOc2BHZj8yd5MeD g1GVUGV4kNGw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762533" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:26:46 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 03/15] xsk: create and free buffer pool independently from umem Date: Fri, 28 Aug 2020 10:26:17 +0200 Message-Id: <1598603189-32145-4-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- include/net/xdp_sock.h | 3 +- include/net/xsk_buff_pool.h | 13 +++- net/xdp/xdp_umem.c | 164 ++++---------------------------------------- net/xdp/xdp_umem.h | 4 +- net/xdp/xsk.c | 74 +++++++++++++++++--- net/xdp/xsk.h | 3 + net/xdp/xsk_buff_pool.c | 150 ++++++++++++++++++++++++++++++++++++---- net/xdp/xsk_queue.h | 12 ++-- 8 files changed, 236 insertions(+), 187 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index ccf6cb5..ea2b020 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -20,13 +20,12 @@ struct xdp_buff; struct xdp_umem { struct xsk_queue *fq; struct xsk_queue *cq; - struct xsk_buff_pool *pool; u64 size; u32 headroom; u32 chunk_size; + u32 chunks; struct user_struct *user; refcount_t users; - struct work_struct work; struct page **pgs; u32 npgs; u16 queue_id; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index f851b0a..4025486 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -14,6 +14,7 @@ struct xdp_rxq_info; struct xsk_queue; struct xdp_desc; struct xdp_umem; +struct xdp_sock; struct device; struct page; @@ -46,16 +47,22 @@ struct xsk_buff_pool { struct xdp_umem *umem; void *addrs; struct device *dev; + refcount_t users; + struct work_struct work; struct xdp_buff_xsk *free_heads[]; }; /* AF_XDP core. */ -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, - u32 chunk_size, u32 headroom, u64 size, - bool unaligned); +struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, + struct xdp_umem *umem); +int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, + u16 queue_id, u16 flags); void xp_set_fq(struct xsk_buff_pool *pool, struct xsk_queue *fq); void xp_destroy(struct xsk_buff_pool *pool); void xp_release(struct xdp_buff_xsk *xskb); +void xp_get_pool(struct xsk_buff_pool *pool); +void xp_put_pool(struct xsk_buff_pool *pool); +void xp_clear_dev(struct xsk_buff_pool *pool); /* AF_XDP, and XDP core. */ void xp_free(struct xdp_buff_xsk *xskb); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index adde4d5..f290345 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -47,160 +47,41 @@ void xdp_del_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs) spin_unlock_irqrestore(&umem->xsk_tx_list_lock, flags); } -/* The umem is stored both in the _rx struct and the _tx struct as we do - * not know if the device has more tx queues than rx, or the opposite. - * This might also change during run time. - */ -static int xsk_reg_pool_at_qid(struct net_device *dev, - struct xsk_buff_pool *pool, - u16 queue_id) -{ - if (queue_id >= max_t(unsigned int, - dev->real_num_rx_queues, - dev->real_num_tx_queues)) - return -EINVAL; - - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].pool = pool; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].pool = pool; - - return 0; -} - -struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, - u16 queue_id) +static void xdp_umem_unpin_pages(struct xdp_umem *umem) { - if (queue_id < dev->real_num_rx_queues) - return dev->_rx[queue_id].pool; - if (queue_id < dev->real_num_tx_queues) - return dev->_tx[queue_id].pool; + unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); - return NULL; + kfree(umem->pgs); + umem->pgs = NULL; } -EXPORT_SYMBOL(xsk_get_pool_from_qid); -static void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) +static void xdp_umem_unaccount_pages(struct xdp_umem *umem) { - if (queue_id < dev->real_num_rx_queues) - dev->_rx[queue_id].pool = NULL; - if (queue_id < dev->real_num_tx_queues) - dev->_tx[queue_id].pool = NULL; + if (umem->user) { + atomic_long_sub(umem->npgs, &umem->user->locked_vm); + free_uid(umem->user); + } } -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id, u16 flags) +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, + u16 queue_id) { - bool force_zc, force_copy; - struct netdev_bpf bpf; - int err = 0; - - ASSERT_RTNL(); - - force_zc = flags & XDP_ZEROCOPY; - force_copy = flags & XDP_COPY; - - if (force_zc && force_copy) - return -EINVAL; - - if (xsk_get_pool_from_qid(dev, queue_id)) - return -EBUSY; - - err = xsk_reg_pool_at_qid(dev, umem->pool, queue_id); - if (err) - return err; - umem->dev = dev; umem->queue_id = queue_id; - if (flags & XDP_USE_NEED_WAKEUP) { - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; - /* Tx needs to be explicitly woken up the first time. - * Also for supporting drivers that do not implement this - * feature. They will always have to call sendto(). - */ - xsk_set_tx_need_wakeup(umem->pool); - } - dev_hold(dev); - - if (force_copy) - /* For copy-mode, we are done. */ - return 0; - - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { - err = -EOPNOTSUPP; - goto err_unreg_umem; - } - - bpf.command = XDP_SETUP_XSK_POOL; - bpf.xsk.pool = umem->pool; - bpf.xsk.queue_id = queue_id; - - err = dev->netdev_ops->ndo_bpf(dev, &bpf); - if (err) - goto err_unreg_umem; - - umem->zc = true; - return 0; - -err_unreg_umem: - if (!force_zc) - err = 0; /* fallback to copy mode */ - if (err) - xsk_clear_pool_at_qid(dev, queue_id); - return err; } void xdp_umem_clear_dev(struct xdp_umem *umem) { - struct netdev_bpf bpf; - int err; - - ASSERT_RTNL(); - - if (!umem->dev) - return; - - if (umem->zc) { - bpf.command = XDP_SETUP_XSK_POOL; - bpf.xsk.pool = NULL; - bpf.xsk.queue_id = umem->queue_id; - - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); - - if (err) - WARN(1, "failed to disable umem!\n"); - } - - xsk_clear_pool_at_qid(umem->dev, umem->queue_id); - dev_put(umem->dev); umem->dev = NULL; umem->zc = false; } -static void xdp_umem_unpin_pages(struct xdp_umem *umem) -{ - unpin_user_pages_dirty_lock(umem->pgs, umem->npgs, true); - - kfree(umem->pgs); - umem->pgs = NULL; -} - -static void xdp_umem_unaccount_pages(struct xdp_umem *umem) -{ - if (umem->user) { - atomic_long_sub(umem->npgs, &umem->user->locked_vm); - free_uid(umem->user); - } -} - static void xdp_umem_release(struct xdp_umem *umem) { - rtnl_lock(); xdp_umem_clear_dev(umem); - rtnl_unlock(); ida_simple_remove(&umem_ida, umem->id); @@ -214,20 +95,12 @@ static void xdp_umem_release(struct xdp_umem *umem) umem->cq = NULL; } - xp_destroy(umem->pool); xdp_umem_unpin_pages(umem); xdp_umem_unaccount_pages(umem); kfree(umem); } -static void xdp_umem_release_deferred(struct work_struct *work) -{ - struct xdp_umem *umem = container_of(work, struct xdp_umem, work); - - xdp_umem_release(umem); -} - void xdp_get_umem(struct xdp_umem *umem) { refcount_inc(&umem->users); @@ -238,10 +111,8 @@ void xdp_put_umem(struct xdp_umem *umem) if (!umem) return; - if (refcount_dec_and_test(&umem->users)) { - INIT_WORK(&umem->work, xdp_umem_release_deferred); - schedule_work(&umem->work); - } + if (refcount_dec_and_test(&umem->users)) + xdp_umem_release(umem); } static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address) @@ -357,6 +228,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) umem->size = size; umem->headroom = headroom; umem->chunk_size = chunk_size; + umem->chunks = chunks; umem->npgs = (u32)npgs; umem->pgs = NULL; umem->user = NULL; @@ -374,16 +246,8 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (err) goto out_account; - umem->pool = xp_create(umem, chunks, chunk_size, headroom, size, - unaligned_chunks); - if (!umem->pool) { - err = -ENOMEM; - goto out_pin; - } return 0; -out_pin: - xdp_umem_unpin_pages(umem); out_account: xdp_umem_unaccount_pages(umem); return err; diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 32067fe..93e96be 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -8,8 +8,8 @@ #include -int xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id, u16 flags); +void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, + u16 queue_id); void xdp_umem_clear_dev(struct xdp_umem *umem); bool xdp_umem_validate_queues(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index befc9a4..5739f19 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -105,6 +105,46 @@ bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) } EXPORT_SYMBOL(xsk_uses_need_wakeup); +struct xsk_buff_pool *xsk_get_pool_from_qid(struct net_device *dev, + u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + return dev->_rx[queue_id].pool; + if (queue_id < dev->real_num_tx_queues) + return dev->_tx[queue_id].pool; + + return NULL; +} +EXPORT_SYMBOL(xsk_get_pool_from_qid); + +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id) +{ + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].pool = NULL; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].pool = NULL; +} + +/* The buffer pool is stored both in the _rx struct and the _tx struct as we do + * not know if the device has more tx queues than rx, or the opposite. + * This might also change during run time. + */ +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, + u16 queue_id) +{ + if (queue_id >= max_t(unsigned int, + dev->real_num_rx_queues, + dev->real_num_tx_queues)) + return -EINVAL; + + if (queue_id < dev->real_num_rx_queues) + dev->_rx[queue_id].pool = pool; + if (queue_id < dev->real_num_tx_queues) + dev->_tx[queue_id].pool = pool; + + return 0; +} + void xp_release(struct xdp_buff_xsk *xskb) { xskb->pool->free_heads[xskb->pool->free_heads_cnt++] = xskb; @@ -281,7 +321,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) rcu_read_lock(); list_for_each_entry_rcu(xs, &umem->xsk_tx_list, list) { - if (!xskq_cons_peek_desc(xs->tx, desc, umem)) { + if (!xskq_cons_peek_desc(xs->tx, desc, pool)) { xs->tx->queue_empty_descs++; continue; } @@ -349,7 +389,7 @@ static int xsk_generic_xmit(struct sock *sk) if (xs->queue_id >= xs->dev->real_num_tx_queues) goto out; - while (xskq_cons_peek_desc(xs->tx, &desc, xs->umem)) { + while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { char *buffer; u64 addr; u32 len; @@ -667,6 +707,9 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } + /* Share the buffer pool with the other socket. */ + xp_get_pool(umem_xs->pool); + xs->pool = umem_xs->pool; xdp_get_umem(umem_xs->umem); WRITE_ONCE(xs->umem, umem_xs->umem); sockfd_put(sock); @@ -675,9 +718,21 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } else { /* This xsk has its own umem. */ - err = xdp_umem_assign_dev(xs->umem, dev, qid, flags); - if (err) + xdp_umem_assign_dev(xs->umem, dev, qid); + xs->pool = xp_create_and_assign_umem(xs, xs->umem); + if (!xs->pool) { + err = -ENOMEM; + xdp_umem_clear_dev(xs->umem); goto out_unlock; + } + + err = xp_assign_dev(xs->pool, dev, qid, flags); + if (err) { + xp_destroy(xs->pool); + xs->pool = NULL; + xdp_umem_clear_dev(xs->umem); + goto out_unlock; + } } xs->dev = dev; @@ -769,8 +824,6 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return PTR_ERR(umem); } - xs->pool = umem->pool; - /* Make sure umem is ready before it can be seen by others */ smp_wmb(); WRITE_ONCE(xs->umem, umem); @@ -800,7 +853,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, &xs->umem->cq; err = xsk_init_queue(entries, q, true); if (optname == XDP_UMEM_FILL_RING) - xp_set_fq(xs->umem->pool, *q); + xp_set_fq(xs->pool, *q); mutex_unlock(&xs->mutex); return err; } @@ -1028,7 +1081,8 @@ static int xsk_notifier(struct notifier_block *this, xsk_unbind_dev(xs); - /* Clear device references in umem. */ + /* Clear device references. */ + xp_clear_dev(xs->pool); xdp_umem_clear_dev(xs->umem); } mutex_unlock(&xs->mutex); @@ -1073,7 +1127,7 @@ static void xsk_destruct(struct sock *sk) if (!sock_flag(sk, SOCK_DEAD)) return; - xdp_put_umem(xs->umem); + xp_put_pool(xs->pool); sk_refcnt_debug_dec(sk); } @@ -1081,8 +1135,8 @@ static void xsk_destruct(struct sock *sk) static int xsk_create(struct net *net, struct socket *sock, int protocol, int kern) { - struct sock *sk; struct xdp_sock *xs; + struct sock *sk; if (!ns_capable(net->user_ns, CAP_NET_RAW)) return -EPERM; diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index 455ddd4..a00e3e2 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -51,5 +51,8 @@ void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs, struct xdp_sock **map_entry); int xsk_map_inc(struct xsk_map *map); void xsk_map_put(struct xsk_map *map); +void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id); +int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool, + u16 queue_id); #endif /* XSK_H_ */ diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index f3df3cb..e58c54d 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -2,8 +2,14 @@ #include #include +#include +#include +#include +#include #include "xsk_queue.h" +#include "xdp_umem.h" +#include "xsk.h" static void xp_addr_unmap(struct xsk_buff_pool *pool) { @@ -29,38 +35,40 @@ void xp_destroy(struct xsk_buff_pool *pool) kvfree(pool); } -struct xsk_buff_pool *xp_create(struct xdp_umem *umem, u32 chunks, - u32 chunk_size, u32 headroom, u64 size, - bool unaligned) +struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, + struct xdp_umem *umem) { struct xsk_buff_pool *pool; struct xdp_buff_xsk *xskb; int err; u32 i; - pool = kvzalloc(struct_size(pool, free_heads, chunks), GFP_KERNEL); + pool = kvzalloc(struct_size(pool, free_heads, umem->chunks), + GFP_KERNEL); if (!pool) goto out; - pool->heads = kvcalloc(chunks, sizeof(*pool->heads), GFP_KERNEL); + pool->heads = kvcalloc(umem->chunks, sizeof(*pool->heads), GFP_KERNEL); if (!pool->heads) goto out; - pool->chunk_mask = ~((u64)chunk_size - 1); - pool->addrs_cnt = size; - pool->heads_cnt = chunks; - pool->free_heads_cnt = chunks; - pool->headroom = headroom; - pool->chunk_size = chunk_size; - pool->unaligned = unaligned; - pool->frame_len = chunk_size - headroom - XDP_PACKET_HEADROOM; + pool->chunk_mask = ~((u64)umem->chunk_size - 1); + pool->addrs_cnt = umem->size; + pool->heads_cnt = umem->chunks; + pool->free_heads_cnt = umem->chunks; + pool->headroom = umem->headroom; + pool->chunk_size = umem->chunk_size; + pool->unaligned = umem->flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG; + pool->frame_len = umem->chunk_size - umem->headroom - + XDP_PACKET_HEADROOM; pool->umem = umem; INIT_LIST_HEAD(&pool->free_list); + refcount_set(&pool->users, 1); for (i = 0; i < pool->free_heads_cnt; i++) { xskb = &pool->heads[i]; xskb->pool = pool; - xskb->xdp.frame_sz = chunk_size - headroom; + xskb->xdp.frame_sz = umem->chunk_size - umem->headroom; pool->free_heads[i] = xskb; } @@ -87,6 +95,120 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) } EXPORT_SYMBOL(xp_set_rxq_info); +int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, + u16 queue_id, u16 flags) +{ + struct xdp_umem *umem = pool->umem; + bool force_zc, force_copy; + struct netdev_bpf bpf; + int err = 0; + + ASSERT_RTNL(); + + force_zc = flags & XDP_ZEROCOPY; + force_copy = flags & XDP_COPY; + + if (force_zc && force_copy) + return -EINVAL; + + if (xsk_get_pool_from_qid(dev, queue_id)) + return -EBUSY; + + err = xsk_reg_pool_at_qid(dev, pool, queue_id); + if (err) + return err; + + if (flags & XDP_USE_NEED_WAKEUP) { + umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; + /* Tx needs to be explicitly woken up the first time. + * Also for supporting drivers that do not implement this + * feature. They will always have to call sendto(). + */ + umem->need_wakeup = XDP_WAKEUP_TX; + } + + if (force_copy) + /* For copy-mode, we are done. */ + return 0; + + if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { + err = -EOPNOTSUPP; + goto err_unreg_pool; + } + + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = pool; + bpf.xsk.queue_id = queue_id; + + err = dev->netdev_ops->ndo_bpf(dev, &bpf); + if (err) + goto err_unreg_pool; + + umem->zc = true; + return 0; + +err_unreg_pool: + if (!force_zc) + err = 0; /* fallback to copy mode */ + if (err) + xsk_clear_pool_at_qid(dev, queue_id); + return err; +} + +void xp_clear_dev(struct xsk_buff_pool *pool) +{ + struct xdp_umem *umem = pool->umem; + struct netdev_bpf bpf; + int err; + + ASSERT_RTNL(); + + if (!umem->dev) + return; + + if (umem->zc) { + bpf.command = XDP_SETUP_XSK_POOL; + bpf.xsk.pool = NULL; + bpf.xsk.queue_id = umem->queue_id; + + err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); + + if (err) + WARN(1, "failed to disable umem!\n"); + } + + xsk_clear_pool_at_qid(umem->dev, umem->queue_id); +} + +static void xp_release_deferred(struct work_struct *work) +{ + struct xsk_buff_pool *pool = container_of(work, struct xsk_buff_pool, + work); + + rtnl_lock(); + xp_clear_dev(pool); + rtnl_unlock(); + + xdp_put_umem(pool->umem); + xp_destroy(pool); +} + +void xp_get_pool(struct xsk_buff_pool *pool) +{ + refcount_inc(&pool->users); +} + +void xp_put_pool(struct xsk_buff_pool *pool) +{ + if (!pool) + return; + + if (refcount_dec_and_test(&pool->users)) { + INIT_WORK(&pool->work, xp_release_deferred); + schedule_work(&pool->work); + } +} + void xp_dma_unmap(struct xsk_buff_pool *pool, unsigned long attrs) { dma_addr_t *dma; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index bf42cfd..2d883f6 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -166,9 +166,9 @@ static inline bool xp_validate_desc(struct xsk_buff_pool *pool, static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { - if (!xp_validate_desc(umem->pool, d)) { + if (!xp_validate_desc(pool, d)) { q->invalid_descs++; return false; } @@ -177,14 +177,14 @@ static inline bool xskq_cons_is_valid_desc(struct xsk_queue *q, static inline bool xskq_cons_read_desc(struct xsk_queue *q, struct xdp_desc *desc, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { while (q->cached_cons != q->cached_prod) { struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring; u32 idx = q->cached_cons & q->ring_mask; *desc = ring->desc[idx]; - if (xskq_cons_is_valid_desc(q, desc, umem)) + if (xskq_cons_is_valid_desc(q, desc, pool)) return true; q->cached_cons++; @@ -236,11 +236,11 @@ static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr) static inline bool xskq_cons_peek_desc(struct xsk_queue *q, struct xdp_desc *desc, - struct xdp_umem *umem) + struct xsk_buff_pool *pool) { if (q->cached_prod == q->cached_cons) xskq_cons_get_entries(q); - return xskq_cons_read_desc(q, desc, umem); + return xskq_cons_read_desc(q, desc, pool); } static inline void xskq_cons_release(struct xsk_queue *q) From patchwork Fri Aug 28 08:26:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53DA9C433E2 for ; Fri, 28 Aug 2020 08:27:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 15FE720776 for ; Fri, 28 Aug 2020 08:27:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728737AbgH1I1i (ORCPT ); Fri, 28 Aug 2020 04:27:38 -0400 Received: from mga03.intel.com ([134.134.136.65]:23538 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728444AbgH1I1Z (ORCPT ); Fri, 28 Aug 2020 04:27:25 -0400 IronPort-SDR: XrmYvH6TsumdpPzfL+tjcMFWHdSfsc7Mogci2cQfrC+sbCNlCf58Cc8QzDHrCKDu7SLGA29pEK XCXJl6f9qfJQ== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156633981" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156633981" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:26:59 -0700 IronPort-SDR: XsKcdkwh6gs/qouyqZGbSCchiRMxXFB+7JMGIM/I1jM+kioFWz8bnUGNU9sn1K5Wqn1CG+HTz4 dogxYMi4OoIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762629" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:26:55 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 05/15] xsk: move queue_id, dev and need_wakeup to buffer pool Date: Fri, 28 Aug 2020 10:26:19 +0200 Message-Id: <1598603189-32145-6-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- include/net/xdp_sock.h | 3 --- include/net/xsk_buff_pool.h | 4 ++++ net/xdp/xdp_umem.c | 22 ++-------------------- net/xdp/xdp_umem.h | 4 ---- net/xdp/xsk.c | 34 +++++++++++++--------------------- net/xdp/xsk.h | 7 ------- net/xdp/xsk_buff_pool.c | 39 ++++++++++++++++++++++----------------- net/xdp/xsk_diag.c | 4 ++-- 8 files changed, 43 insertions(+), 74 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 2a284e1..b052f1c 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -26,11 +26,8 @@ struct xdp_umem { refcount_t users; struct page **pgs; u32 npgs; - u16 queue_id; - u8 need_wakeup; u8 flags; int id; - struct net_device *dev; bool zc; spinlock_t xsk_tx_list_lock; struct list_head xsk_tx_list; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 380d9ae..2d94890 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -43,11 +43,15 @@ struct xsk_buff_pool { u32 headroom; u32 chunk_size; u32 frame_len; + u16 queue_id; + u8 cached_need_wakeup; + bool uses_need_wakeup; bool dma_need_sync; bool unaligned; struct xdp_umem *umem; void *addrs; struct device *dev; + struct net_device *netdev; refcount_t users; struct work_struct work; struct xdp_buff_xsk *free_heads[]; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 7d86a63..3e612fc 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -63,26 +63,9 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem) } } -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id) -{ - umem->dev = dev; - umem->queue_id = queue_id; - - dev_hold(dev); -} - -void xdp_umem_clear_dev(struct xdp_umem *umem) -{ - dev_put(umem->dev); - umem->dev = NULL; - umem->zc = false; -} - static void xdp_umem_release(struct xdp_umem *umem) { - xdp_umem_clear_dev(umem); - + umem->zc = false; ida_simple_remove(&umem_ida, umem->id); xdp_umem_unpin_pages(umem); @@ -181,8 +164,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) return -EINVAL; } - if (mr->flags & ~(XDP_UMEM_UNALIGNED_CHUNK_FLAG | - XDP_UMEM_USES_NEED_WAKEUP)) + if (mr->flags & ~XDP_UMEM_UNALIGNED_CHUNK_FLAG) return -EINVAL; if (!unaligned_chunks && !is_power_of_2(chunk_size)) diff --git a/net/xdp/xdp_umem.h b/net/xdp/xdp_umem.h index 93e96be..67bf3f3 100644 --- a/net/xdp/xdp_umem.h +++ b/net/xdp/xdp_umem.h @@ -8,10 +8,6 @@ #include -void xdp_umem_assign_dev(struct xdp_umem *umem, struct net_device *dev, - u16 queue_id); -void xdp_umem_clear_dev(struct xdp_umem *umem); -bool xdp_umem_validate_queues(struct xdp_umem *umem); void xdp_get_umem(struct xdp_umem *umem); void xdp_put_umem(struct xdp_umem *umem); void xdp_add_sk_umem(struct xdp_umem *umem, struct xdp_sock *xs); diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index dacd340..9f1b906e 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -41,13 +41,11 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs) void xsk_set_rx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; - - if (umem->need_wakeup & XDP_WAKEUP_RX) + if (pool->cached_need_wakeup & XDP_WAKEUP_RX) return; pool->fq->ring->flags |= XDP_RING_NEED_WAKEUP; - umem->need_wakeup |= XDP_WAKEUP_RX; + pool->cached_need_wakeup |= XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_set_rx_need_wakeup); @@ -56,7 +54,7 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; - if (umem->need_wakeup & XDP_WAKEUP_TX) + if (pool->cached_need_wakeup & XDP_WAKEUP_TX) return; rcu_read_lock(); @@ -65,19 +63,17 @@ void xsk_set_tx_need_wakeup(struct xsk_buff_pool *pool) } rcu_read_unlock(); - umem->need_wakeup |= XDP_WAKEUP_TX; + pool->cached_need_wakeup |= XDP_WAKEUP_TX; } EXPORT_SYMBOL(xsk_set_tx_need_wakeup); void xsk_clear_rx_need_wakeup(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; - - if (!(umem->need_wakeup & XDP_WAKEUP_RX)) + if (!(pool->cached_need_wakeup & XDP_WAKEUP_RX)) return; pool->fq->ring->flags &= ~XDP_RING_NEED_WAKEUP; - umem->need_wakeup &= ~XDP_WAKEUP_RX; + pool->cached_need_wakeup &= ~XDP_WAKEUP_RX; } EXPORT_SYMBOL(xsk_clear_rx_need_wakeup); @@ -86,7 +82,7 @@ void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) struct xdp_umem *umem = pool->umem; struct xdp_sock *xs; - if (!(umem->need_wakeup & XDP_WAKEUP_TX)) + if (!(pool->cached_need_wakeup & XDP_WAKEUP_TX)) return; rcu_read_lock(); @@ -95,13 +91,13 @@ void xsk_clear_tx_need_wakeup(struct xsk_buff_pool *pool) } rcu_read_unlock(); - umem->need_wakeup &= ~XDP_WAKEUP_TX; + pool->cached_need_wakeup &= ~XDP_WAKEUP_TX; } EXPORT_SYMBOL(xsk_clear_tx_need_wakeup); bool xsk_uses_need_wakeup(struct xsk_buff_pool *pool) { - return pool->umem->flags & XDP_UMEM_USES_NEED_WAKEUP; + return pool->uses_need_wakeup; } EXPORT_SYMBOL(xsk_uses_need_wakeup); @@ -478,16 +474,16 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock, __poll_t mask = datagram_poll(file, sock, wait); struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); - struct xdp_umem *umem; + struct xsk_buff_pool *pool; if (unlikely(!xsk_is_bound(xs))) return mask; - umem = xs->umem; + pool = xs->pool; - if (umem->need_wakeup) { + if (pool->cached_need_wakeup) { if (xs->zc) - xsk_wakeup(xs, umem->need_wakeup); + xsk_wakeup(xs, pool->cached_need_wakeup); else /* Poll needs to drive Tx also in copy mode */ __xsk_sendmsg(sk); @@ -731,11 +727,9 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) goto out_unlock; } else { /* This xsk has its own umem. */ - xdp_umem_assign_dev(xs->umem, dev, qid); xs->pool = xp_create_and_assign_umem(xs, xs->umem); if (!xs->pool) { err = -ENOMEM; - xdp_umem_clear_dev(xs->umem); goto out_unlock; } @@ -743,7 +737,6 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) if (err) { xp_destroy(xs->pool); xs->pool = NULL; - xdp_umem_clear_dev(xs->umem); goto out_unlock; } } @@ -1089,7 +1082,6 @@ static int xsk_notifier(struct notifier_block *this, /* Clear device references. */ xp_clear_dev(xs->pool); - xdp_umem_clear_dev(xs->umem); } mutex_unlock(&xs->mutex); } diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h index a00e3e2..da1f73e 100644 --- a/net/xdp/xsk.h +++ b/net/xdp/xsk.h @@ -11,13 +11,6 @@ #define XSK_NEXT_PG_CONTIG_SHIFT 0 #define XSK_NEXT_PG_CONTIG_MASK BIT_ULL(XSK_NEXT_PG_CONTIG_SHIFT) -/* Flags for the umem flags field. - * - * The NEED_WAKEUP flag is 1 due to the reuse of the flags field for public - * flags. See inlude/uapi/include/linux/if_xdp.h. - */ -#define XDP_UMEM_USES_NEED_WAKEUP BIT(1) - struct xdp_ring_offset_v1 { __u64 producer; __u64 consumer; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 36287d2..436648a 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -95,10 +95,9 @@ void xp_set_rxq_info(struct xsk_buff_pool *pool, struct xdp_rxq_info *rxq) } EXPORT_SYMBOL(xp_set_rxq_info); -int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, +int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *netdev, u16 queue_id, u16 flags) { - struct xdp_umem *umem = pool->umem; bool force_zc, force_copy; struct netdev_bpf bpf; int err = 0; @@ -111,27 +110,30 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, if (force_zc && force_copy) return -EINVAL; - if (xsk_get_pool_from_qid(dev, queue_id)) + if (xsk_get_pool_from_qid(netdev, queue_id)) return -EBUSY; - err = xsk_reg_pool_at_qid(dev, pool, queue_id); + err = xsk_reg_pool_at_qid(netdev, pool, queue_id); if (err) return err; if (flags & XDP_USE_NEED_WAKEUP) { - umem->flags |= XDP_UMEM_USES_NEED_WAKEUP; + pool->uses_need_wakeup = true; /* Tx needs to be explicitly woken up the first time. * Also for supporting drivers that do not implement this * feature. They will always have to call sendto(). */ - umem->need_wakeup = XDP_WAKEUP_TX; + pool->cached_need_wakeup = XDP_WAKEUP_TX; } + dev_hold(netdev); + if (force_copy) /* For copy-mode, we are done. */ return 0; - if (!dev->netdev_ops->ndo_bpf || !dev->netdev_ops->ndo_xsk_wakeup) { + if (!netdev->netdev_ops->ndo_bpf || + !netdev->netdev_ops->ndo_xsk_wakeup) { err = -EOPNOTSUPP; goto err_unreg_pool; } @@ -140,44 +142,47 @@ int xp_assign_dev(struct xsk_buff_pool *pool, struct net_device *dev, bpf.xsk.pool = pool; bpf.xsk.queue_id = queue_id; - err = dev->netdev_ops->ndo_bpf(dev, &bpf); + err = netdev->netdev_ops->ndo_bpf(netdev, &bpf); if (err) goto err_unreg_pool; - umem->zc = true; + pool->netdev = netdev; + pool->queue_id = queue_id; + pool->umem->zc = true; return 0; err_unreg_pool: if (!force_zc) err = 0; /* fallback to copy mode */ if (err) - xsk_clear_pool_at_qid(dev, queue_id); + xsk_clear_pool_at_qid(netdev, queue_id); return err; } void xp_clear_dev(struct xsk_buff_pool *pool) { - struct xdp_umem *umem = pool->umem; struct netdev_bpf bpf; int err; ASSERT_RTNL(); - if (!umem->dev) + if (!pool->netdev) return; - if (umem->zc) { + if (pool->umem->zc) { bpf.command = XDP_SETUP_XSK_POOL; bpf.xsk.pool = NULL; - bpf.xsk.queue_id = umem->queue_id; + bpf.xsk.queue_id = pool->queue_id; - err = umem->dev->netdev_ops->ndo_bpf(umem->dev, &bpf); + err = pool->netdev->netdev_ops->ndo_bpf(pool->netdev, &bpf); if (err) - WARN(1, "failed to disable umem!\n"); + WARN(1, "Failed to disable zero-copy!\n"); } - xsk_clear_pool_at_qid(umem->dev, umem->queue_id); + xsk_clear_pool_at_qid(pool->netdev, pool->queue_id); + dev_put(pool->netdev); + pool->netdev = NULL; } static void xp_release_deferred(struct work_struct *work) diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c index 52675ea..5bd8ea9 100644 --- a/net/xdp/xsk_diag.c +++ b/net/xdp/xsk_diag.c @@ -59,8 +59,8 @@ static int xsk_diag_put_umem(const struct xdp_sock *xs, struct sk_buff *nlskb) du.num_pages = umem->npgs; du.chunk_size = umem->chunk_size; du.headroom = umem->headroom; - du.ifindex = umem->dev ? umem->dev->ifindex : 0; - du.queue_id = umem->queue_id; + du.ifindex = pool->netdev ? pool->netdev->ifindex : 0; + du.queue_id = pool->queue_id; du.flags = 0; if (umem->zc) du.flags |= XDP_DU_F_ZEROCOPY; From patchwork Fri Aug 28 08:26:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261801 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB5A4C43461 for ; Fri, 28 Aug 2020 08:28:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C1D3620C56 for ; Fri, 28 Aug 2020 08:28:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728651AbgH1I2d (ORCPT ); Fri, 28 Aug 2020 04:28:33 -0400 Received: from mga03.intel.com ([134.134.136.65]:23551 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728274AbgH1I11 (ORCPT ); Fri, 28 Aug 2020 04:27:27 -0400 IronPort-SDR: IFA3R7forvtn5GMhZjSmEs8XU6ojwC59RNC89XQtmufeQzuPwv/fVLTOFQOHIjv+vRG/kBzvjk N4WKGj7vQtpg== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156634001" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156634001" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:27:07 -0700 IronPort-SDR: +t30vP4BdfyfHohVx/lZBdv65NiQUMHnovX9afg0rWPpdIVjY3bc25gF1PTQ7nsslWsq3aW48v A3reHRr24jPg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762705" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:27:03 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 07/15] xsk: move addrs from buffer pool to umem Date: Fri, 28 Aug 2020 10:26:21 +0200 Message-Id: <1598603189-32145-8-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- include/net/xdp_sock.h | 1 + net/xdp/xdp_umem.c | 22 ++++++++++++++++++++++ net/xdp/xsk_buff_pool.c | 21 ++------------------- 3 files changed, 25 insertions(+), 19 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 9a61d05..126d243 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -18,6 +18,7 @@ struct xsk_queue; struct xdp_buff; struct xdp_umem { + void *addrs; u64 size; u32 headroom; u32 chunk_size; diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 7751592..77604c3 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -39,11 +39,27 @@ static void xdp_umem_unaccount_pages(struct xdp_umem *umem) } } +static void xdp_umem_addr_unmap(struct xdp_umem *umem) +{ + vunmap(umem->addrs); + umem->addrs = NULL; +} + +static int xdp_umem_addr_map(struct xdp_umem *umem, struct page **pages, + u32 nr_pages) +{ + umem->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); + if (!umem->addrs) + return -ENOMEM; + return 0; +} + static void xdp_umem_release(struct xdp_umem *umem) { umem->zc = false; ida_simple_remove(&umem_ida, umem->id); + xdp_umem_addr_unmap(umem); xdp_umem_unpin_pages(umem); xdp_umem_unaccount_pages(umem); @@ -192,8 +208,14 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr) if (err) goto out_account; + err = xdp_umem_addr_map(umem, umem->pgs, umem->npgs); + if (err) + goto out_unpin; + return 0; +out_unpin: + xdp_umem_unpin_pages(umem); out_account: xdp_umem_unaccount_pages(umem); return err; diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index dbd913e..c563874 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -35,26 +35,11 @@ void xp_del_xsk(struct xsk_buff_pool *pool, struct xdp_sock *xs) spin_unlock_irqrestore(&pool->xsk_tx_list_lock, flags); } -static void xp_addr_unmap(struct xsk_buff_pool *pool) -{ - vunmap(pool->addrs); -} - -static int xp_addr_map(struct xsk_buff_pool *pool, - struct page **pages, u32 nr_pages) -{ - pool->addrs = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); - if (!pool->addrs) - return -ENOMEM; - return 0; -} - void xp_destroy(struct xsk_buff_pool *pool) { if (!pool) return; - xp_addr_unmap(pool); kvfree(pool->heads); kvfree(pool); } @@ -64,7 +49,6 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, { struct xsk_buff_pool *pool; struct xdp_buff_xsk *xskb; - int err; u32 i; pool = kvzalloc(struct_size(pool, free_heads, umem->chunks), @@ -86,6 +70,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, pool->frame_len = umem->chunk_size - umem->headroom - XDP_PACKET_HEADROOM; pool->umem = umem; + pool->addrs = umem->addrs; INIT_LIST_HEAD(&pool->free_list); INIT_LIST_HEAD(&pool->xsk_tx_list); spin_lock_init(&pool->xsk_tx_list_lock); @@ -103,9 +88,7 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, pool->free_heads[i] = xskb; } - err = xp_addr_map(pool, umem->pgs, umem->npgs); - if (!err) - return pool; + return pool; out: xp_destroy(pool); From patchwork Fri Aug 28 08:26:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56FD2C43461 for ; Fri, 28 Aug 2020 08:27:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 267A520776 for ; Fri, 28 Aug 2020 08:27:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728751AbgH1I1s (ORCPT ); Fri, 28 Aug 2020 04:27:48 -0400 Received: from mga03.intel.com ([134.134.136.65]:23538 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728717AbgH1I1b (ORCPT ); Fri, 28 Aug 2020 04:27:31 -0400 IronPort-SDR: KBpCXOAdIIgXf+1DHEoy88BqpxCbSC1WoPSFU/p6ZTIJylLpDj3hTX8Z6otxvVJhvldpwBupwH bFTK7Ki1crwA== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156634021" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156634021" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:27:15 -0700 IronPort-SDR: FaCIU5KMPFvV97h+FG89IBs7Pk+vLpedorSvPAUfw6NOVPM4KsWqkBNYvywIsA6x6GqW+3cy1h JQFdJ0u7MPbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762770" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:27:11 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 09/15] xsk: rearrange internal structs for better performance Date: Fri, 28 Aug 2020 10:26:23 +0200 Message-Id: <1598603189-32145-10-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so that they get smaller and align better to the cache lines. In the previous commits of this patch set, these structs have been reordered with the focus on functionality and simplicity, not performance. This patch improves throughput performance by around 3%. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- include/net/xdp_sock.h | 13 +++++++------ include/net/xsk_buff_pool.h | 27 +++++++++++++++------------ 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 282aeba..1a9559c 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -23,13 +23,13 @@ struct xdp_umem { u32 headroom; u32 chunk_size; u32 chunks; + u32 npgs; struct user_struct *user; refcount_t users; - struct page **pgs; - u32 npgs; u8 flags; - int id; bool zc; + struct page **pgs; + int id; struct list_head xsk_dma_list; }; @@ -42,7 +42,7 @@ struct xsk_map { struct xdp_sock { /* struct sock must be the first member of struct xdp_sock */ struct sock sk; - struct xsk_queue *rx; + struct xsk_queue *rx ____cacheline_aligned_in_smp; struct net_device *dev; struct xdp_umem *umem; struct list_head flush_node; @@ -54,8 +54,7 @@ struct xdp_sock { XSK_BOUND, XSK_UNBOUND, } state; - /* Protects multiple processes in the control path */ - struct mutex mutex; + struct xsk_queue *tx ____cacheline_aligned_in_smp; struct list_head tx_list; /* Mutual exclusion of NAPI TX thread and sendmsg error paths @@ -72,6 +71,8 @@ struct xdp_sock { struct list_head map_list; /* Protects map_list */ spinlock_t map_list_lock; + /* Protects multiple processes in the control path */ + struct mutex mutex; struct xsk_queue *fq_tmp; /* Only as tmp storage before bind */ struct xsk_queue *cq_tmp; /* Only as tmp storage before bind */ }; diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 356d0ac..38d03a6 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -39,9 +39,22 @@ struct xsk_dma_map { }; struct xsk_buff_pool { - struct xsk_queue *fq; - struct xsk_queue *cq; + /* Members only used in the control path first. */ + struct device *dev; + struct net_device *netdev; + struct list_head xsk_tx_list; + /* Protects modifications to the xsk_tx_list */ + spinlock_t xsk_tx_list_lock; + refcount_t users; + struct xdp_umem *umem; + struct work_struct work; struct list_head free_list; + u32 heads_cnt; + u16 queue_id; + + /* Data path members as close to free_heads at the end as possible. */ + struct xsk_queue *fq ____cacheline_aligned_in_smp; + struct xsk_queue *cq; /* For performance reasons, each buff pool has its own array of dma_pages * even when they are identical. */ @@ -51,25 +64,15 @@ struct xsk_buff_pool { u64 addrs_cnt; u32 free_list_cnt; u32 dma_pages_cnt; - u32 heads_cnt; u32 free_heads_cnt; u32 headroom; u32 chunk_size; u32 frame_len; - u16 queue_id; u8 cached_need_wakeup; bool uses_need_wakeup; bool dma_need_sync; bool unaligned; - struct xdp_umem *umem; void *addrs; - struct device *dev; - struct net_device *netdev; - struct list_head xsk_tx_list; - /* Protects modifications to the xsk_tx_list */ - spinlock_t xsk_tx_list_lock; - refcount_t users; - struct work_struct work; struct xdp_buff_xsk *free_heads[]; }; From patchwork Fri Aug 28 08:26:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261804 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B9C1C433E7 for ; Fri, 28 Aug 2020 08:27:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 250EE20776 for ; Fri, 28 Aug 2020 08:27:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728758AbgH1I1x (ORCPT ); Fri, 28 Aug 2020 04:27:53 -0400 Received: from mga03.intel.com ([134.134.136.65]:23551 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728722AbgH1I1c (ORCPT ); Fri, 28 Aug 2020 04:27:32 -0400 IronPort-SDR: vNgBJ1jyocbEx6igenEQRb9vyHWWo0SjyOrml6LXtvGXc61GSoBFMWZdTeWPgYIEqfTy7mOU+u 1oo9F13Wk7sA== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156634026" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156634026" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:27:18 -0700 IronPort-SDR: n8Og1iFhqtb5IrTVPS3vX/W5mpx88D+UVyV6aLFCh6wtN6KtSpxvutRJyd29ARdNhLc8PJrhDe /y7rX3MELQLw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762828" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:27:15 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 10/15] xsk: i40e: ice: ixgbe: mlx5: test for dma_need_sync earlier for better performance Date: Fri, 28 Aug 2020 10:26:24 +0200 Message-Id: <1598603189-32145-11-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c | 4 ++-- include/net/xdp_sock_drv.h | 7 +++++-- include/net/xsk_buff_pool.h | 3 --- 6 files changed, 10 insertions(+), 10 deletions(-) -- 2.7.4 diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 95b9a7e..2a1153d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -314,7 +314,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) bi = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); (*bi)->data_end = (*bi)->data + size; - xsk_buff_dma_sync_for_cpu(*bi); + xsk_buff_dma_sync_for_cpu(*bi, rx_ring->xsk_pool); xdp_res = i40e_run_xdp_zc(rx_ring, *bi); if (xdp_res) { diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index dffef37..7978865 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -595,7 +595,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget) rx_buf = &rx_ring->rx_buf[rx_ring->next_to_clean]; rx_buf->xdp->data_end = rx_buf->xdp->data + size; - xsk_buff_dma_sync_for_cpu(rx_buf->xdp); + xsk_buff_dma_sync_for_cpu(rx_buf->xdp, rx_ring->xsk_pool); xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp); if (xdp_res) { diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index 6af34da..3771857 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -287,7 +287,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, } bi->xdp->data_end = bi->xdp->data + size; - xsk_buff_dma_sync_for_cpu(bi->xdp); + xsk_buff_dma_sync_for_cpu(bi->xdp, rx_ring->xsk_pool); xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp); if (xdp_res) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c index a33a1f7..902ce77 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c @@ -48,7 +48,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, xdp->data_end = xdp->data + cqe_bcnt32; xdp_set_data_meta_invalid(xdp); - xsk_buff_dma_sync_for_cpu(xdp); + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); prefetch(xdp->data); rcu_read_lock(); @@ -99,7 +99,7 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct mlx5e_rq *rq, xdp->data_end = xdp->data + cqe_bcnt; xdp_set_data_meta_invalid(xdp); - xsk_buff_dma_sync_for_cpu(xdp); + xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool); prefetch(xdp->data); if (unlikely(get_cqe_opcode(cqe) != MLX5_CQE_RESP_SEND)) { diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index a7c7d2e..5b1ee8a 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -99,10 +99,13 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return xp_raw_get_data(pool, addr); } -static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) +static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp); + if (!pool->dma_need_sync) + return; + xp_dma_sync_for_cpu(xskb); } @@ -222,7 +225,7 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr) return NULL; } -static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp) +static inline void xsk_buff_dma_sync_for_cpu(struct xdp_buff *xdp, struct xsk_buff_pool *pool) { } diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index 38d03a6..907537d 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -114,9 +114,6 @@ static inline dma_addr_t xp_get_frame_dma(struct xdp_buff_xsk *xskb) void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb); static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb) { - if (!xskb->pool->dma_need_sync) - return; - xp_dma_sync_for_cpu_slow(xskb); } From patchwork Fri Aug 28 08:26:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261803 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CF9EC433E2 for ; Fri, 28 Aug 2020 08:28:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3F0320C56 for ; Fri, 28 Aug 2020 08:28:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728773AbgH1I2P (ORCPT ); Fri, 28 Aug 2020 04:28:15 -0400 Received: from mga03.intel.com ([134.134.136.65]:23538 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728573AbgH1I1p (ORCPT ); Fri, 28 Aug 2020 04:27:45 -0400 IronPort-SDR: FVh2b4vEj7NEqVYWVOyObiBR9IHl/hrnX5LA0DqS1hMpR4WBRHb42xtkigKMPQS/S3vfn54jZa mK4v7jUtnKQQ== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156634043" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156634043" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:27:26 -0700 IronPort-SDR: v18x8kLv9AVZmhff2FhpqbqpW3nZ2NHj0qVoCCpkvC1I5dHo1g0GjHTJastiiJzj1ULre+NSJI PcRipm4OZfpw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762889" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:27:23 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 12/15] xsk: add shared umem support between devices Date: Fri, 28 Aug 2020 10:26:26 +0200 Message-Id: <1598603189-32145-13-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add support to share a umem between different devices. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same device. Note that when sharing a umem between devices, just as in the case of sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket (with two setsockopts, one for each ring) before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics of the rings can be upheld. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- net/xdp/xsk.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index ea8d2ec..5eb6662 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -701,14 +701,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) sockfd_put(sock); goto out_unlock; } - if (umem_xs->dev != dev) { - err = -EINVAL; - sockfd_put(sock); - goto out_unlock; - } - if (umem_xs->queue_id != qid) { - /* Share the umem with another socket on another qid */ + if (umem_xs->queue_id != qid || umem_xs->dev != dev) { + /* Share the umem with another socket on another qid + * and/or device. + */ xs->pool = xp_create_and_assign_umem(xs, umem_xs->umem); if (!xs->pool) { From patchwork Fri Aug 28 08:26:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Magnus Karlsson X-Patchwork-Id: 261802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCD35C433E6 for ; Fri, 28 Aug 2020 08:28:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B4FCE20C56 for ; Fri, 28 Aug 2020 08:28:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728779AbgH1I2W (ORCPT ); Fri, 28 Aug 2020 04:28:22 -0400 Received: from mga03.intel.com ([134.134.136.65]:23551 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728740AbgH1I1v (ORCPT ); Fri, 28 Aug 2020 04:27:51 -0400 IronPort-SDR: x1w6UAMmhaEntNGE/lcXOPft2bQdhSC6kQrzH8RV4ENoSXIfV/kwIDgXbqmRN4NI0XilDRqleR HYlfWajoHZ6Q== X-IronPort-AV: E=McAfee;i="6000,8403,9726"; a="156634047" X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="156634047" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Aug 2020 01:27:31 -0700 IronPort-SDR: MacldJqZvTvWYMNLrHtTwJnrDFL6LoCIQkIvg5KrFL3wLpBhvXqPUd3SsGgN7sbZ7nI5ktPMHb 45kGoHCFrU8A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,363,1592895600"; d="scan'208";a="444762907" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO localhost.localdomain) ([10.249.36.33]) by orsmga004.jf.intel.com with ESMTP; 28 Aug 2020 01:27:27 -0700 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jonathan.lemon@gmail.com, maximmi@mellanox.com Cc: bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, anthony.l.nguyen@intel.com, maciej.fijalkowski@intel.com, maciejromanfijalkowski@gmail.com, cristian.dumitrescu@intel.com Subject: [PATCH bpf-next v5 13/15] libbpf: support shared umems between queues and devices Date: Fri, 28 Aug 2020 10:26:27 +0200 Message-Id: <1598603189-32145-14-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> References: <1598603189-32145-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add support for shared umems between hardware queues and devices to the AF_XDP part of libbpf. This so that zero-copy can be achieved in applications that want to send and receive packets between HW queues on one device or between different devices/netdevs. In order to create sockets that share a umem between hardware queues and devices, a new function has been added called xsk_socket__create_shared(). It takes the same arguments as xsk_socket_create() plus references to a fill ring and a completion ring. So for every socket that share a umem, you need to have one more set of fill and completion rings. This in order to maintain the single-producer single-consumer semantics of the rings. You can create all the sockets via the new xsk_socket__create_shared() call, or create the first one with xsk_socket__create() and the rest with xsk_socket__create_shared(). Both methods work. Signed-off-by: Magnus Karlsson Acked-by: Björn Töpel --- tools/lib/bpf/libbpf.map | 1 + tools/lib/bpf/xsk.c | 376 ++++++++++++++++++++++++++++++----------------- tools/lib/bpf/xsk.h | 9 ++ 3 files changed, 254 insertions(+), 132 deletions(-) diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 66a6286..3fedcdc 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -306,4 +306,5 @@ LIBBPF_0.2.0 { perf_buffer__buffer_fd; perf_buffer__epoll_fd; perf_buffer__consume_buffer; + xsk_socket__create_shared; } LIBBPF_0.1.0; diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c index a9b0210..49c3245 100644 --- a/tools/lib/bpf/xsk.c +++ b/tools/lib/bpf/xsk.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -45,26 +46,35 @@ #endif struct xsk_umem { - struct xsk_ring_prod *fill; - struct xsk_ring_cons *comp; + struct xsk_ring_prod *fill_save; + struct xsk_ring_cons *comp_save; char *umem_area; struct xsk_umem_config config; int fd; int refcount; + struct list_head ctx_list; +}; + +struct xsk_ctx { + struct xsk_ring_prod *fill; + struct xsk_ring_cons *comp; + __u32 queue_id; + struct xsk_umem *umem; + int refcount; + int ifindex; + struct list_head list; + int prog_fd; + int xsks_map_fd; + char ifname[IFNAMSIZ]; }; struct xsk_socket { struct xsk_ring_cons *rx; struct xsk_ring_prod *tx; __u64 outstanding_tx; - struct xsk_umem *umem; + struct xsk_ctx *ctx; struct xsk_socket_config config; int fd; - int ifindex; - int prog_fd; - int xsks_map_fd; - __u32 queue_id; - char ifname[IFNAMSIZ]; }; struct xsk_nl_info { @@ -200,15 +210,73 @@ static int xsk_get_mmap_offsets(int fd, struct xdp_mmap_offsets *off) return -EINVAL; } +static int xsk_create_umem_rings(struct xsk_umem *umem, int fd, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xdp_mmap_offsets off; + void *map; + int err; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING, + &umem->config.fill_size, + sizeof(umem->config.fill_size)); + if (err) + return -errno; + + err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, + &umem->config.comp_size, + sizeof(umem->config.comp_size)); + if (err) + return -errno; + + err = xsk_get_mmap_offsets(fd, &off); + if (err) + return -errno; + + map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_FILL_RING); + if (map == MAP_FAILED) + return -errno; + + fill->mask = umem->config.fill_size - 1; + fill->size = umem->config.fill_size; + fill->producer = map + off.fr.producer; + fill->consumer = map + off.fr.consumer; + fill->flags = map + off.fr.flags; + fill->ring = map + off.fr.desc; + fill->cached_cons = umem->config.fill_size; + + map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), + PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, fd, + XDP_UMEM_PGOFF_COMPLETION_RING); + if (map == MAP_FAILED) { + err = -errno; + goto out_mmap; + } + + comp->mask = umem->config.comp_size - 1; + comp->size = umem->config.comp_size; + comp->producer = map + off.cr.producer; + comp->consumer = map + off.cr.consumer; + comp->flags = map + off.cr.flags; + comp->ring = map + off.cr.desc; + + return 0; + +out_mmap: + munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); + return err; +} + int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, __u64 size, struct xsk_ring_prod *fill, struct xsk_ring_cons *comp, const struct xsk_umem_config *usr_config) { - struct xdp_mmap_offsets off; struct xdp_umem_reg mr; struct xsk_umem *umem; - void *map; int err; if (!umem_area || !umem_ptr || !fill || !comp) @@ -227,6 +295,7 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, } umem->umem_area = umem_area; + INIT_LIST_HEAD(&umem->ctx_list); xsk_set_umem_config(&umem->config, usr_config); memset(&mr, 0, sizeof(mr)); @@ -241,71 +310,16 @@ int xsk_umem__create_v0_0_4(struct xsk_umem **umem_ptr, void *umem_area, err = -errno; goto out_socket; } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING, - &umem->config.fill_size, - sizeof(umem->config.fill_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING, - &umem->config.comp_size, - sizeof(umem->config.comp_size)); - if (err) { - err = -errno; - goto out_socket; - } - err = xsk_get_mmap_offsets(umem->fd, &off); - if (err) { - err = -errno; - goto out_socket; - } - - map = mmap(NULL, off.fr.desc + umem->config.fill_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_FILL_RING); - if (map == MAP_FAILED) { - err = -errno; + err = xsk_create_umem_rings(umem, umem->fd, fill, comp); + if (err) goto out_socket; - } - - umem->fill = fill; - fill->mask = umem->config.fill_size - 1; - fill->size = umem->config.fill_size; - fill->producer = map + off.fr.producer; - fill->consumer = map + off.fr.consumer; - fill->flags = map + off.fr.flags; - fill->ring = map + off.fr.desc; - fill->cached_prod = *fill->producer; - /* cached_cons is "size" bigger than the real consumer pointer - * See xsk_prod_nb_free - */ - fill->cached_cons = *fill->consumer + umem->config.fill_size; - - map = mmap(NULL, off.cr.desc + umem->config.comp_size * sizeof(__u64), - PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE, umem->fd, - XDP_UMEM_PGOFF_COMPLETION_RING); - if (map == MAP_FAILED) { - err = -errno; - goto out_mmap; - } - - umem->comp = comp; - comp->mask = umem->config.comp_size - 1; - comp->size = umem->config.comp_size; - comp->producer = map + off.cr.producer; - comp->consumer = map + off.cr.consumer; - comp->flags = map + off.cr.flags; - comp->ring = map + off.cr.desc; - comp->cached_prod = *comp->producer; - comp->cached_cons = *comp->consumer; + umem->fill_save = fill; + umem->comp_save = comp; *umem_ptr = umem; return 0; -out_mmap: - munmap(map, off.fr.desc + umem->config.fill_size * sizeof(__u64)); out_socket: close(umem->fd); out_umem_alloc: @@ -339,6 +353,7 @@ DEFAULT_VERSION(xsk_umem__create_v0_0_4, xsk_umem__create, LIBBPF_0.0.4) static int xsk_load_xdp_prog(struct xsk_socket *xsk) { static const int log_buf_size = 16 * 1024; + struct xsk_ctx *ctx = xsk->ctx; char log_buf[log_buf_size]; int err, prog_fd; @@ -366,7 +381,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* *(u32 *)(r10 - 4) = r2 */ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = XDP_PASS */ BPF_MOV64_IMM(BPF_REG_3, 2), /* call bpf_redirect_map */ @@ -378,7 +393,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 += -4 */ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* call bpf_map_lookup_elem */ BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), /* r1 = r0 */ @@ -390,7 +405,7 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) /* r2 = *(u32 *)(r10 - 4) */ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4), /* r1 = xskmap[] */ - BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd), + BPF_LD_MAP_FD(BPF_REG_1, ctx->xsks_map_fd), /* r3 = 0 */ BPF_MOV64_IMM(BPF_REG_3, 0), /* call bpf_redirect_map */ @@ -408,19 +423,21 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk) return prog_fd; } - err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags); + err = bpf_set_link_xdp_fd(xsk->ctx->ifindex, prog_fd, + xsk->config.xdp_flags); if (err) { close(prog_fd); return err; } - xsk->prog_fd = prog_fd; + ctx->prog_fd = prog_fd; return 0; } static int xsk_get_max_queues(struct xsk_socket *xsk) { struct ethtool_channels channels = { .cmd = ETHTOOL_GCHANNELS }; + struct xsk_ctx *ctx = xsk->ctx; struct ifreq ifr = {}; int fd, err, ret; @@ -429,7 +446,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk) return -errno; ifr.ifr_data = (void *)&channels; - memcpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1); + memcpy(ifr.ifr_name, ctx->ifname, IFNAMSIZ - 1); ifr.ifr_name[IFNAMSIZ - 1] = '\0'; err = ioctl(fd, SIOCETHTOOL, &ifr); if (err && errno != EOPNOTSUPP) { @@ -457,6 +474,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk) static int xsk_create_bpf_maps(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; int max_queues; int fd; @@ -469,15 +487,17 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk) if (fd < 0) return fd; - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; return 0; } static void xsk_delete_bpf_maps(struct xsk_socket *xsk) { - bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id); - close(xsk->xsks_map_fd); + struct xsk_ctx *ctx = xsk->ctx; + + bpf_map_delete_elem(ctx->xsks_map_fd, &ctx->queue_id); + close(ctx->xsks_map_fd); } static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) @@ -485,10 +505,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) __u32 i, *map_ids, num_maps, prog_len = sizeof(struct bpf_prog_info); __u32 map_len = sizeof(struct bpf_map_info); struct bpf_prog_info prog_info = {}; + struct xsk_ctx *ctx = xsk->ctx; struct bpf_map_info map_info; int fd, err; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) return err; @@ -502,11 +523,11 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) prog_info.nr_map_ids = num_maps; prog_info.map_ids = (__u64)(unsigned long)map_ids; - err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len); + err = bpf_obj_get_info_by_fd(ctx->prog_fd, &prog_info, &prog_len); if (err) goto out_map_ids; - xsk->xsks_map_fd = -1; + ctx->xsks_map_fd = -1; for (i = 0; i < prog_info.nr_map_ids; i++) { fd = bpf_map_get_fd_by_id(map_ids[i]); @@ -520,7 +541,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } if (!strcmp(map_info.name, "xsks_map")) { - xsk->xsks_map_fd = fd; + ctx->xsks_map_fd = fd; continue; } @@ -528,7 +549,7 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) } err = 0; - if (xsk->xsks_map_fd == -1) + if (ctx->xsks_map_fd == -1) err = -ENOENT; out_map_ids: @@ -538,16 +559,19 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk) static int xsk_set_bpf_maps(struct xsk_socket *xsk) { - return bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id, + struct xsk_ctx *ctx = xsk->ctx; + + return bpf_map_update_elem(ctx->xsks_map_fd, &ctx->queue_id, &xsk->fd, 0); } static int xsk_setup_xdp_prog(struct xsk_socket *xsk) { + struct xsk_ctx *ctx = xsk->ctx; __u32 prog_id = 0; int err; - err = bpf_get_link_xdp_id(xsk->ifindex, &prog_id, + err = bpf_get_link_xdp_id(ctx->ifindex, &prog_id, xsk->config.xdp_flags); if (err) return err; @@ -563,12 +587,12 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) return err; } } else { - xsk->prog_fd = bpf_prog_get_fd_by_id(prog_id); - if (xsk->prog_fd < 0) + ctx->prog_fd = bpf_prog_get_fd_by_id(prog_id); + if (ctx->prog_fd < 0) return -errno; err = xsk_lookup_bpf_maps(xsk); if (err) { - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } } @@ -577,25 +601,110 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk) err = xsk_set_bpf_maps(xsk); if (err) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); return err; } return 0; } -int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, - __u32 queue_id, struct xsk_umem *umem, - struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, - const struct xsk_socket_config *usr_config) +static struct xsk_ctx *xsk_get_ctx(struct xsk_umem *umem, int ifindex, + __u32 queue_id) +{ + struct xsk_ctx *ctx; + + if (list_empty(&umem->ctx_list)) + return NULL; + + list_for_each_entry(ctx, &umem->ctx_list, list) { + if (ctx->ifindex == ifindex && ctx->queue_id == queue_id) { + ctx->refcount++; + return ctx; + } + } + + return NULL; +} + +static void xsk_put_ctx(struct xsk_ctx *ctx) +{ + struct xsk_umem *umem = ctx->umem; + struct xdp_mmap_offsets off; + int err; + + if (--ctx->refcount == 0) { + err = xsk_get_mmap_offsets(umem->fd, &off); + if (!err) { + munmap(ctx->fill->ring - off.fr.desc, + off.fr.desc + umem->config.fill_size * + sizeof(__u64)); + munmap(ctx->comp->ring - off.cr.desc, + off.cr.desc + umem->config.comp_size * + sizeof(__u64)); + } + + list_del(&ctx->list); + free(ctx); + } +} + +static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk, + struct xsk_umem *umem, int ifindex, + const char *ifname, __u32 queue_id, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp) +{ + struct xsk_ctx *ctx; + int err; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) + return NULL; + + if (!umem->fill_save) { + err = xsk_create_umem_rings(umem, xsk->fd, fill, comp); + if (err) { + free(ctx); + return NULL; + } + } else if (umem->fill_save != fill || umem->comp_save != comp) { + /* Copy over rings to new structs. */ + memcpy(fill, umem->fill_save, sizeof(*fill)); + memcpy(comp, umem->comp_save, sizeof(*comp)); + } + + ctx->ifindex = ifindex; + ctx->refcount = 1; + ctx->umem = umem; + ctx->queue_id = queue_id; + memcpy(ctx->ifname, ifname, IFNAMSIZ - 1); + ctx->ifname[IFNAMSIZ - 1] = '\0'; + + umem->fill_save = NULL; + umem->comp_save = NULL; + ctx->fill = fill; + ctx->comp = comp; + list_add(&ctx->list, &umem->ctx_list); + return ctx; +} + +int xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *usr_config) { void *rx_map = NULL, *tx_map = NULL; struct sockaddr_xdp sxdp = {}; struct xdp_mmap_offsets off; struct xsk_socket *xsk; - int err; + struct xsk_ctx *ctx; + int err, ifindex; - if (!umem || !xsk_ptr || !(rx || tx)) + if (!umem || !xsk_ptr || !(rx || tx) || !fill || !comp) return -EFAULT; xsk = calloc(1, sizeof(*xsk)); @@ -606,10 +715,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, if (err) goto out_xsk_alloc; - if (umem->refcount && - !(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { - pr_warn("Error: shared umems not supported by libbpf supplied XDP program.\n"); - err = -EBUSY; + xsk->outstanding_tx = 0; + ifindex = if_nametoindex(ifname); + if (!ifindex) { + err = -errno; goto out_xsk_alloc; } @@ -623,16 +732,16 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd = umem->fd; } - xsk->outstanding_tx = 0; - xsk->queue_id = queue_id; - xsk->umem = umem; - xsk->ifindex = if_nametoindex(ifname); - if (!xsk->ifindex) { - err = -errno; - goto out_socket; + ctx = xsk_get_ctx(umem, ifindex, queue_id); + if (!ctx) { + ctx = xsk_create_ctx(xsk, umem, ifindex, ifname, queue_id, + fill, comp); + if (!ctx) { + err = -ENOMEM; + goto out_socket; + } } - memcpy(xsk->ifname, ifname, IFNAMSIZ - 1); - xsk->ifname[IFNAMSIZ - 1] = '\0'; + xsk->ctx = ctx; if (rx) { err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING, @@ -640,7 +749,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.rx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } if (tx) { @@ -649,14 +758,14 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, sizeof(xsk->config.tx_size)); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } } err = xsk_get_mmap_offsets(xsk->fd, &off); if (err) { err = -errno; - goto out_socket; + goto out_put_ctx; } if (rx) { @@ -666,7 +775,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->fd, XDP_PGOFF_RX_RING); if (rx_map == MAP_FAILED) { err = -errno; - goto out_socket; + goto out_put_ctx; } rx->mask = xsk->config.rx_size - 1; @@ -705,10 +814,10 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, xsk->tx = tx; sxdp.sxdp_family = PF_XDP; - sxdp.sxdp_ifindex = xsk->ifindex; - sxdp.sxdp_queue_id = xsk->queue_id; + sxdp.sxdp_ifindex = ctx->ifindex; + sxdp.sxdp_queue_id = ctx->queue_id; if (umem->refcount > 1) { - sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_flags |= XDP_SHARED_UMEM; sxdp.sxdp_shared_umem_fd = umem->fd; } else { sxdp.sxdp_flags = xsk->config.bind_flags; @@ -720,7 +829,7 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, goto out_mmap_tx; } - xsk->prog_fd = -1; + ctx->prog_fd = -1; if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) { err = xsk_setup_xdp_prog(xsk); @@ -739,6 +848,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, if (rx) munmap(rx_map, off.rx.desc + xsk->config.rx_size * sizeof(struct xdp_desc)); +out_put_ctx: + xsk_put_ctx(ctx); out_socket: if (--umem->refcount) close(xsk->fd); @@ -747,25 +858,24 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, return err; } -int xsk_umem__delete(struct xsk_umem *umem) +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, + const struct xsk_socket_config *usr_config) { - struct xdp_mmap_offsets off; - int err; + return xsk_socket__create_shared(xsk_ptr, ifname, queue_id, umem, + rx, tx, umem->fill_save, + umem->comp_save, usr_config); +} +int xsk_umem__delete(struct xsk_umem *umem) +{ if (!umem) return 0; if (umem->refcount) return -EBUSY; - err = xsk_get_mmap_offsets(umem->fd, &off); - if (!err) { - munmap(umem->fill->ring - off.fr.desc, - off.fr.desc + umem->config.fill_size * sizeof(__u64)); - munmap(umem->comp->ring - off.cr.desc, - off.cr.desc + umem->config.comp_size * sizeof(__u64)); - } - close(umem->fd); free(umem); @@ -775,15 +885,16 @@ int xsk_umem__delete(struct xsk_umem *umem) void xsk_socket__delete(struct xsk_socket *xsk) { size_t desc_sz = sizeof(struct xdp_desc); + struct xsk_ctx *ctx = xsk->ctx; struct xdp_mmap_offsets off; int err; if (!xsk) return; - if (xsk->prog_fd != -1) { + if (ctx->prog_fd != -1) { xsk_delete_bpf_maps(xsk); - close(xsk->prog_fd); + close(ctx->prog_fd); } err = xsk_get_mmap_offsets(xsk->fd, &off); @@ -796,14 +907,15 @@ void xsk_socket__delete(struct xsk_socket *xsk) munmap(xsk->tx->ring - off.tx.desc, off.tx.desc + xsk->config.tx_size * desc_sz); } - } - xsk->umem->refcount--; + xsk_put_ctx(ctx); + + ctx->umem->refcount--; /* Do not close an fd that also has an associated umem connected * to it. */ - if (xsk->fd != xsk->umem->fd) + if (xsk->fd != ctx->umem->fd) close(xsk->fd); free(xsk); } diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h index 584f682..1069c46 100644 --- a/tools/lib/bpf/xsk.h +++ b/tools/lib/bpf/xsk.h @@ -234,6 +234,15 @@ LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk, struct xsk_ring_cons *rx, struct xsk_ring_prod *tx, const struct xsk_socket_config *config); +LIBBPF_API int +xsk_socket__create_shared(struct xsk_socket **xsk_ptr, + const char *ifname, + __u32 queue_id, struct xsk_umem *umem, + struct xsk_ring_cons *rx, + struct xsk_ring_prod *tx, + struct xsk_ring_prod *fill, + struct xsk_ring_cons *comp, + const struct xsk_socket_config *config); /* Returns 0 for success and -EBUSY if the umem is still in use. */ LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);