Message ID | 20210409223801.104657-1-mcroce@linux.microsoft.com |
---|---|
Headers | show |
Series | page_pool: recycle buffers | expand |
On 2021/4/10 6:37, Matteo Croce wrote: > From: Matteo Croce <mcroce@microsoft.com> > > This is a respin of [1] > > This patchset shows the plans for allowing page_pool to handle and > maintain DMA map/unmap of the pages it serves to the driver. For this > to work a return hook in the network core is introduced. > > The overall purpose is to simplify drivers, by providing a page > allocation API that does recycling, such that each driver doesn't have > to reinvent its own recycling scheme. Using page_pool in a driver > does not require implementing XDP support, but it makes it trivially > easy to do so. Instead of allocating buffers specifically for SKBs > we now allocate a generic buffer and either wrap it on an SKB > (via build_skb) or create an XDP frame. > The recycling code leverages the XDP recycle APIs. > > The Marvell mvpp2 and mvneta drivers are used in this patchset to > demonstrate how to use the API, and tested on a MacchiatoBIN > and EspressoBIN boards respectively. > Hi, Matteo I added the skb frag page recycling in hns3 based on this patchset, and it has above 10%~20% performance improvement for one thread iperf TCP flow(IOMMU is off, there may be more performance improvement if considering the DMA map/unmap avoiding for IOMMU), thanks for the job. The skb frag page recycling support in hns3 driver is not so simple as the mvpp2 and mvneta driver, because: 1. the hns3 driver do not have XDP support yet, so "struct xdp_rxq_info" is added to assist relation binding between the "struct page" and "struct page_pool". 2. the hns3 driver has already a page reusing based on page spliting and page reference count, but it may not work if the upper stack can not handle skb and release the corresponding page fast enough. 3. the hns3 driver support page reference count updating batching, see: aeda9bf87a45 ("net: hns3: batch the page reference count updates") So it would be better if: 1. skb frag page recycling do not need "struct xdp_rxq_info" or "struct xdp_mem_info" to bond the relation between "struct page" and "struct page_pool", which seems uncessary at this point if bonding a "struct page_pool" pointer directly in "struct page" does not cause space increasing. 2. it would be good to do the page reference count updating batching in page pool instead of specific driver. page_pool_atomic_sub_if_positive() is added to decide who can call page_pool_put_full_page(), because the driver and stack may hold reference to the same page, only if last one which hold complete reference to a page can call page_pool_put_full_page() to decide if recycling is possible, if not, the page is released, so I am wondering if a similar page_pool_atomic_sub_if_positive() can added to specific user space address unmapping path to allow skb recycling for RX zerocopy too? diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index c21dd11..8b01a7d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -2566,7 +2566,10 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring, unsigned int order = hns3_page_order(ring); struct page *p; - p = dev_alloc_pages(order); + if (ring->page_pool) + p = page_pool_dev_alloc_pages(ring->page_pool); + else + p = dev_alloc_pages(order); if (!p) return -ENOMEM; @@ -2582,13 +2585,32 @@ static int hns3_alloc_buffer(struct hns3_enet_ring *ring, return 0; } +static void hns3_page_frag_cache_drain(struct hns3_enet_ring *ring, + struct hns3_desc_cb *cb) +{ + if (ring->page_pool) { + struct page *p = cb->priv; + + if (page_pool_atomic_sub_if_positive(p, cb->pagecnt_bias)) + return; + + if (cb->pagecnt_bias > 1) + page_ref_sub(p, cb->pagecnt_bias - 1); + + page_pool_put_full_page(ring->page_pool, p, false); + return; + } + + __page_frag_cache_drain(cb->priv, cb->pagecnt_bias); +} + static void hns3_free_buffer(struct hns3_enet_ring *ring, struct hns3_desc_cb *cb, int budget) { if (cb->type == DESC_TYPE_SKB) napi_consume_skb(cb->priv, budget); else if (!HNAE3_IS_TX_RING(ring) && cb->pagecnt_bias) - __page_frag_cache_drain(cb->priv, cb->pagecnt_bias); + hns3_page_frag_cache_drain(ring, cb); memset(cb, 0, sizeof(*cb)); } @@ -2892,13 +2914,15 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len, size - pull_len, truesize); + skb_mark_for_recycle(skb, desc_cb->priv, &ring->rxq_info.mem); + /* Avoid re-using remote and pfmemalloc pages, or the stack is still * using the page when page_offset rollback to zero, flag default * unreuse */ if (!dev_page_is_reusable(desc_cb->priv) || (!desc_cb->page_offset && !hns3_can_reuse_page(desc_cb))) { - __page_frag_cache_drain(desc_cb->priv, desc_cb->pagecnt_bias); + hns3_page_frag_cache_drain(ring, desc_cb); return; } @@ -2911,7 +2935,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, desc_cb->reuse_flag = 1; desc_cb->page_offset = 0; } else if (desc_cb->pagecnt_bias) { - __page_frag_cache_drain(desc_cb->priv, desc_cb->pagecnt_bias); + hns3_page_frag_cache_drain(ring, desc_cb); return; } @@ -3156,8 +3180,7 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length, if (dev_page_is_reusable(desc_cb->priv)) desc_cb->reuse_flag = 1; else /* This page cannot be reused so discard it */ - __page_frag_cache_drain(desc_cb->priv, - desc_cb->pagecnt_bias); + hns3_page_frag_cache_drain(ring, desc_cb); hns3_rx_ring_move_fw(ring); return 0; @@ -4028,6 +4051,33 @@ static int hns3_alloc_ring_memory(struct hns3_enet_ring *ring) goto out_with_desc_cb; if (!HNAE3_IS_TX_RING(ring)) { + struct page_pool_params pp_params = { + /* internal DMA mapping in page_pool */ + .flags = 0, + .order = 0, + .pool_size = 1024, + .nid = dev_to_node(ring_to_dev(ring)), + .dev = ring_to_dev(ring), + .dma_dir = DMA_FROM_DEVICE, + .offset = 0, + .max_len = 0, + }; + + ring->page_pool = page_pool_create(&pp_params); + if (IS_ERR(ring->page_pool)) { + dev_err(ring_to_dev(ring), "page pool creation failed\n"); + ring->page_pool = NULL; + } + + ret = xdp_rxq_info_reg(&ring->rxq_info, ring_to_netdev(ring), ring->queue_index, 0); + if (ret) + dev_err(ring_to_dev(ring), "xdp_rxq_info_reg failed\n"); + + ret = xdp_rxq_info_reg_mem_model(&ring->rxq_info, MEM_TYPE_PAGE_POOL, + ring->page_pool); + if (ret) + dev_err(ring_to_dev(ring), "xdp_rxq_info_reg_mem_model failed\n"); + ret = hns3_alloc_ring_buffers(ring); if (ret) goto out_with_desc; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index daa04ae..fd53fcc 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -6,6 +6,9 @@ #include <linux/if_vlan.h> +#include <net/page_pool.h> +#include <net/xdp.h> + #include "hnae3.h" enum hns3_nic_state { @@ -408,6 +411,8 @@ struct hns3_enet_ring { struct hnae3_queue *tqp; int queue_index; struct device *dev; /* will be used for DMA mapping of descriptors */ + struct page_pool *page_pool; struct hnae3_queue *tqp; int queue_index; struct device *dev; /* will be used for DMA mapping of descriptors */ + struct page_pool *page_pool; + struct xdp_rxq_info rxq_info; /* statistic */ struct ring_stats stats; diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 75fffc1..70c310e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -195,6 +195,8 @@ static inline void page_pool_put_full_page(struct page_pool *pool, #endif } +bool page_pool_atomic_sub_if_positive(struct page *page, int i); + /* Same as above but the caller must guarantee safe context. e.g NAPI */ static inline void page_pool_recycle_direct(struct page_pool *pool, struct page *page) diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 43bfd2e..8bc8b7e 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -596,6 +596,26 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } EXPORT_SYMBOL(page_pool_update_nid); +bool page_pool_atomic_sub_if_positive(struct page *page, int i) +{ + atomic_t *v = &page->_refcount; + int dec, c; + + do { + c = atomic_read(v); + + dec = c - i; + if (unlikely(dec == 0)) + return false; + else if (unlikely(dec < 0)) { + pr_err("c: %d, dec: %d, i: %d\n", c, dec, i); + return false; + } + } while (!atomic_try_cmpxchg(v, &c, dec)); + + return true; +} + bool page_pool_return_skb_page(void *data) { struct xdp_mem_info mem_info; @@ -606,6 +626,9 @@ bool page_pool_return_skb_page(void *data) if (unlikely(page->signature != PP_SIGNATURE)) return false; + if (page_pool_atomic_sub_if_positive(page, 1)) + return true; + info.raw = page_private(page); mem_info = info.mem_info;
Hi Yunsheng, On Thu, Apr 29, 2021 at 04:27:21PM +0800, Yunsheng Lin wrote: > On 2021/4/10 6:37, Matteo Croce wrote: > > From: Matteo Croce <mcroce@microsoft.com> > > > > This is a respin of [1] > > > > This patchset shows the plans for allowing page_pool to handle and > > maintain DMA map/unmap of the pages it serves to the driver. For this > > to work a return hook in the network core is introduced. > > > > The overall purpose is to simplify drivers, by providing a page > > allocation API that does recycling, such that each driver doesn't have > > to reinvent its own recycling scheme. Using page_pool in a driver > > does not require implementing XDP support, but it makes it trivially > > easy to do so. Instead of allocating buffers specifically for SKBs > > we now allocate a generic buffer and either wrap it on an SKB > > (via build_skb) or create an XDP frame. > > The recycling code leverages the XDP recycle APIs. > > > > The Marvell mvpp2 and mvneta drivers are used in this patchset to > > demonstrate how to use the API, and tested on a MacchiatoBIN > > and EspressoBIN boards respectively. > > > > Hi, Matteo > I added the skb frag page recycling in hns3 based on this patchset, > and it has above 10%~20% performance improvement for one thread iperf > TCP flow(IOMMU is off, there may be more performance improvement if > considering the DMA map/unmap avoiding for IOMMU), thanks for the job. > > The skb frag page recycling support in hns3 driver is not so simple > as the mvpp2 and mvneta driver, because: > > 1. the hns3 driver do not have XDP support yet, so "struct xdp_rxq_info" > is added to assist relation binding between the "struct page" and > "struct page_pool". > > 2. the hns3 driver has already a page reusing based on page spliting and > page reference count, but it may not work if the upper stack can not > handle skb and release the corresponding page fast enough. > > 3. the hns3 driver support page reference count updating batching, see: > aeda9bf87a45 ("net: hns3: batch the page reference count updates") > > So it would be better if: > > 1. skb frag page recycling do not need "struct xdp_rxq_info" or > "struct xdp_mem_info" to bond the relation between "struct page" and > "struct page_pool", which seems uncessary at this point if bonding > a "struct page_pool" pointer directly in "struct page" does not cause > space increasing. We can't do that. The reason we need those structs is that we rely on the existing XDP code, which already recycles it's buffers, to enable recycling. Since we allocate a page per packet when using page_pool for a driver , the same ideas apply to an SKB and XDP frame. We just recycle the payload and we don't really care what's in that. We could rename the functions to something more generic in the future though ? > > 2. it would be good to do the page reference count updating batching > in page pool instead of specific driver. > > > page_pool_atomic_sub_if_positive() is added to decide who can call > page_pool_put_full_page(), because the driver and stack may hold > reference to the same page, only if last one which hold complete > reference to a page can call page_pool_put_full_page() to decide if > recycling is possible, if not, the page is released, so I am wondering > if a similar page_pool_atomic_sub_if_positive() can added to specific > user space address unmapping path to allow skb recycling for RX zerocopy > too? > I would prefer a different page pool type if we wanted to support the split page model. The changes as is are quite intrusive, since they change the entire skb return path. So I would prefer introducing the changes one at a time. The fundamental difference between having the recycling in the driver vs having it in a generic API is pretty straightforward. When a driver holds the extra page references he is free to decide what to reuse, when he is about to refill his Rx descriptors. So TCP zerocopy might work even if the userspace applications hold the buffers for an X amount of time. On this proposal though we *need* to decide what to do with the buffer when we are about to free the skb. [...] Cheers /Ilias
On 2021/4/30 2:51, Ilias Apalodimas wrote: > Hi Yunsheng, > > On Thu, Apr 29, 2021 at 04:27:21PM +0800, Yunsheng Lin wrote: >> On 2021/4/10 6:37, Matteo Croce wrote: >>> From: Matteo Croce <mcroce@microsoft.com> [...] >> >> 1. skb frag page recycling do not need "struct xdp_rxq_info" or >> "struct xdp_mem_info" to bond the relation between "struct page" and >> "struct page_pool", which seems uncessary at this point if bonding >> a "struct page_pool" pointer directly in "struct page" does not cause >> space increasing. > > We can't do that. The reason we need those structs is that we rely on the > existing XDP code, which already recycles it's buffers, to enable > recycling. Since we allocate a page per packet when using page_pool for a > driver , the same ideas apply to an SKB and XDP frame. We just recycle the I am not really familar with XDP here, but a packet from hw is either a "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at the same time, right? What does not really make sense to me is that the page has to be from page pool when a skb's frag page can be recycled, right? If it is ture, the switch case in __xdp_return() does not really make sense for skb recycling, why go all the trouble of checking the mem->type and mem->id to find the page_pool pointer when recyclable page for skb can only be from page pool? > payload and we don't really care what's in that. We could rename the functions > to something more generic in the future though ? > >> >> 2. it would be good to do the page reference count updating batching >> in page pool instead of specific driver. >> >> >> page_pool_atomic_sub_if_positive() is added to decide who can call >> page_pool_put_full_page(), because the driver and stack may hold >> reference to the same page, only if last one which hold complete >> reference to a page can call page_pool_put_full_page() to decide if >> recycling is possible, if not, the page is released, so I am wondering >> if a similar page_pool_atomic_sub_if_positive() can added to specific >> user space address unmapping path to allow skb recycling for RX zerocopy >> too? >> > > I would prefer a different page pool type if we wanted to support the split > page model. The changes as is are quite intrusive, since they change the > entire skb return path. So I would prefer introducing the changes one at a > time. I understand there may be fundamental semantic change when split page model is supported by page pool, but the split page support change mainly affect the skb recycling path and the driver that uses page pool(XDP too) if we are careful enough, not the entire skb return path as my understanding. Anyway, one changes at a time is always prefered if supporting split page is proved to be non-trivel and intrusive. > > The fundamental difference between having the recycling in the driver vs > having it in a generic API is pretty straightforward. When a driver holds > the extra page references he is free to decide what to reuse, when he is about > to refill his Rx descriptors. So TCP zerocopy might work even if the > userspace applications hold the buffers for an X amount of time. > On this proposal though we *need* to decide what to do with the buffer when we > are about to free the skb. I am not sure I understand what you meant by "free the skb", does it mean that kfree_skb() is called to free the skb. As my understanding, if the skb completely own the page(which means page_count() == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() try to handle it atomically. > > [...] > > > Cheers > /Ilias > > . >
[...] > >> > >> 1. skb frag page recycling do not need "struct xdp_rxq_info" or > >> "struct xdp_mem_info" to bond the relation between "struct page" and > >> "struct page_pool", which seems uncessary at this point if bonding > >> a "struct page_pool" pointer directly in "struct page" does not cause > >> space increasing. > > > > We can't do that. The reason we need those structs is that we rely on the > > existing XDP code, which already recycles it's buffers, to enable > > recycling. Since we allocate a page per packet when using page_pool for a > > driver , the same ideas apply to an SKB and XDP frame. We just recycle the > > I am not really familar with XDP here, but a packet from hw is either a > "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > the same time, right? > Yes, but the payload is irrelevant in both cases and that's what we use page_pool for. You can't use this patchset unless your driver usues build_skb(). So in both cases you just allocate memory for the payload and decide what the wrap the buffer with (XDP or SKB) later. > What does not really make sense to me is that the page has to be from page > pool when a skb's frag page can be recycled, right? If it is ture, the switch > case in __xdp_return() does not really make sense for skb recycling, why go > all the trouble of checking the mem->type and mem->id to find the page_pool > pointer when recyclable page for skb can only be from page pool? In any case you need to find in which pool the buffer you try to recycle belongs. In order to make the whole idea generic and be able to recycle skb fragments instead of just the skb head you need to store some information on struct page. That's the fundamental difference of this patchset compared to the RFC we sent a few years back [1] which was just storing information on the skb. The way this is done on the current patchset is that we store the struct xdp_mem_info in page->private and then look it up on xdp_return(). Now that being said Matthew recently reworked struct page, so we could see if we can store the page pool pointer directly instead of the struct xdp_mem_info. That would allow us to call into page pool functions directly. But we'll have to agree if that makes sense to go into struct page to begin with and make sure the pointer is still valid when we take the recycling path. > > payload and we don't really care what's in that. We could rename the functions > > to something more generic in the future though ? > > > >> > >> 2. it would be good to do the page reference count updating batching > >> in page pool instead of specific driver. > >> > >> > >> page_pool_atomic_sub_if_positive() is added to decide who can call > >> page_pool_put_full_page(), because the driver and stack may hold > >> reference to the same page, only if last one which hold complete > >> reference to a page can call page_pool_put_full_page() to decide if > >> recycling is possible, if not, the page is released, so I am wondering > >> if a similar page_pool_atomic_sub_if_positive() can added to specific > >> user space address unmapping path to allow skb recycling for RX zerocopy > >> too? > >> > > > > I would prefer a different page pool type if we wanted to support the split > > page model. The changes as is are quite intrusive, since they change the > > entire skb return path. So I would prefer introducing the changes one at a > > time. > > I understand there may be fundamental semantic change when split page model > is supported by page pool, but the split page support change mainly affect the > skb recycling path and the driver that uses page pool(XDP too) if we are careful > enough, not the entire skb return path as my understanding. It affects those drivers only, but in order to do so is intercepts the packet in skb_free_head(), which pretty much affects the entire network path. > > Anyway, one changes at a time is always prefered if supporting split page is > proved to be non-trivel and intrusive. > > > > > The fundamental difference between having the recycling in the driver vs > > having it in a generic API is pretty straightforward. When a driver holds > > the extra page references he is free to decide what to reuse, when he is about > > to refill his Rx descriptors. So TCP zerocopy might work even if the > > userspace applications hold the buffers for an X amount of time. > > On this proposal though we *need* to decide what to do with the buffer when we > > are about to free the skb. > > I am not sure I understand what you meant by "free the skb", does it mean > that kfree_skb() is called to free the skb. Yes > > As my understanding, if the skb completely own the page(which means page_count() > == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise > page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() > try to handle it atomically. > Not really, the opposite is happening here. If the pp_recycle bit is set we will always call page_pool_return_skb_page(). If the page signature matches the 'magic' set by page pool we will always call xdp_return_skb_frame() will end up calling __page_pool_put_page(). If the refcnt is 1 we'll try to recycle the page. If it's not we'll release it from page_pool (releasing some internal references we keep) unmap the buffer and decrement the refcnt. [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ Cheers /Ilias
(-cc invalid emails) Replying to my self here but.... [...] > > > > > > We can't do that. The reason we need those structs is that we rely on the > > > existing XDP code, which already recycles it's buffers, to enable > > > recycling. Since we allocate a page per packet when using page_pool for a > > > driver , the same ideas apply to an SKB and XDP frame. We just recycle the > > > > I am not really familar with XDP here, but a packet from hw is either a > > "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > > a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > > the same time, right? > > > > Yes, but the payload is irrelevant in both cases and that's what we use > page_pool for. You can't use this patchset unless your driver usues > build_skb(). So in both cases you just allocate memory for the payload and > decide what the wrap the buffer with (XDP or SKB) later. > > > What does not really make sense to me is that the page has to be from page > > pool when a skb's frag page can be recycled, right? If it is ture, the switch > > case in __xdp_return() does not really make sense for skb recycling, why go > > all the trouble of checking the mem->type and mem->id to find the page_pool > > pointer when recyclable page for skb can only be from page pool? > > In any case you need to find in which pool the buffer you try to recycle > belongs. In order to make the whole idea generic and be able to recycle skb > fragments instead of just the skb head you need to store some information on > struct page. That's the fundamental difference of this patchset compared to > the RFC we sent a few years back [1] which was just storing information on the > skb. The way this is done on the current patchset is that we store the > struct xdp_mem_info in page->private and then look it up on xdp_return(). > > Now that being said Matthew recently reworked struct page, so we could see if > we can store the page pool pointer directly instead of the struct > xdp_mem_info. That would allow us to call into page pool functions directly. > But we'll have to agree if that makes sense to go into struct page to begin > with and make sure the pointer is still valid when we take the recycling path. > Thinking more about it the reason that prevented us from storing a page pool pointer directly is not there anymore. Jesper fixed that already a while back. So we might as well store the page_pool ptr in page->private and call into the functions directly. I'll have a look before v4. [...] Thanks /Ilias
On Fri, 30 Apr 2021 20:32:07 +0300 Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > (-cc invalid emails) > Replying to my self here but.... > > [...] > > > > > > > > We can't do that. The reason we need those structs is that we rely on the > > > > existing XDP code, which already recycles it's buffers, to enable > > > > recycling. Since we allocate a page per packet when using page_pool for a > > > > driver , the same ideas apply to an SKB and XDP frame. We just recycle the > > > > > > I am not really familar with XDP here, but a packet from hw is either a > > > "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > > > a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > > > the same time, right? > > > > > > > Yes, but the payload is irrelevant in both cases and that's what we use > > page_pool for. You can't use this patchset unless your driver usues > > build_skb(). So in both cases you just allocate memory for the payload and > > decide what the wrap the buffer with (XDP or SKB) later. > > > > > What does not really make sense to me is that the page has to be from page > > > pool when a skb's frag page can be recycled, right? If it is ture, the switch > > > case in __xdp_return() does not really make sense for skb recycling, why go > > > all the trouble of checking the mem->type and mem->id to find the page_pool > > > pointer when recyclable page for skb can only be from page pool? > > > > In any case you need to find in which pool the buffer you try to recycle > > belongs. In order to make the whole idea generic and be able to recycle skb > > fragments instead of just the skb head you need to store some information on > > struct page. That's the fundamental difference of this patchset compared to > > the RFC we sent a few years back [1] which was just storing information on the > > skb. The way this is done on the current patchset is that we store the > > struct xdp_mem_info in page->private and then look it up on xdp_return(). > > > > Now that being said Matthew recently reworked struct page, so we could see if > > we can store the page pool pointer directly instead of the struct > > xdp_mem_info. That would allow us to call into page pool functions directly. > > But we'll have to agree if that makes sense to go into struct page to begin > > with and make sure the pointer is still valid when we take the recycling path. > > > > Thinking more about it the reason that prevented us from storing a > page pool pointer directly is not there anymore. Jesper fixed that > already a while back. So we might as well store the page_pool ptr in > page->private and call into the functions directly. I'll have a look > before v4. I want to give credit to Jonathan Lemon whom came up with the idea of storing the page_pool object that "owns" the page directly in struct page. I see this as an optimization that we can add later, so it doesn't block this patchset. As Ilias mention, it required some work/changes[1]+[2] to guarantee that the page_pool object life-time were longer than all the outstanding in-flight page-objects, but that have been stable for some/many kernel releases now. This is already need/used for making sure the DMA-mappings can be safely released[1], but I on-purpose enabled the same in-flight tracking for page_pool users that doesn't use the DMA-mapping feature (making sure the code is exercised). [1] 99c07c43c4ea ("xdp: tracking page_pool resources and safe removal") [2] c3f812cea0d7 ("page_pool: do not release pool until inflight == 0.") -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
On 2021/5/1 0:24, Ilias Apalodimas wrote: > [...] >>>> >>>> 1. skb frag page recycling do not need "struct xdp_rxq_info" or >>>> "struct xdp_mem_info" to bond the relation between "struct page" and >>>> "struct page_pool", which seems uncessary at this point if bonding >>>> a "struct page_pool" pointer directly in "struct page" does not cause >>>> space increasing. >>> >>> We can't do that. The reason we need those structs is that we rely on the >>> existing XDP code, which already recycles it's buffers, to enable >>> recycling. Since we allocate a page per packet when using page_pool for a >>> driver , the same ideas apply to an SKB and XDP frame. We just recycle the >> >> I am not really familar with XDP here, but a packet from hw is either a >> "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, >> a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at >> the same time, right? >> > > Yes, but the payload is irrelevant in both cases and that's what we use > page_pool for. You can't use this patchset unless your driver usues > build_skb(). So in both cases you just allocate memory for the payload and I am not sure I understood why build_skb() matters here. If the head data of a skb is a page frag and is from page pool, then it's page->signature should be PP_SIGNATURE, otherwise it's page->signature is zero, so a recyclable skb does not require it's head data being from a page pool, right? > decide what the wrap the buffer with (XDP or SKB) later. [...] >> >> I am not sure I understand what you meant by "free the skb", does it mean >> that kfree_skb() is called to free the skb. > > Yes > >> >> As my understanding, if the skb completely own the page(which means page_count() >> == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise >> page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() >> try to handle it atomically. >> > > Not really, the opposite is happening here. If the pp_recycle bit is set we > will always call page_pool_return_skb_page(). If the page signature matches > the 'magic' set by page pool we will always call xdp_return_skb_frame() will > end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > to recycle the page. If it's not we'll release it from page_pool (releasing > some internal references we keep) unmap the buffer and decrement the refcnt. Yes, I understood the above is what the page pool do now. But the question is who is still holding an extral reference to the page when kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral reference to the same page? So why not just do a page_ref_dec() if the orginal skb is freed first, and call __page_pool_put_page() when the cloned skb is freed later? So that we can always reuse the recyclable page from a recyclable skb. This may make the page_pool_destroy() process delays longer than before, I am supposed the page_pool_destroy() delaying for cloned skb case does not really matters here. If the above works, I think the samiliar handling can be added to RX zerocopy if the RX zerocopy also hold extral references to the recyclable page from a recyclable skb too? > > [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ > > Cheers > /Ilias > > . >
On Thu, May 06, 2021 at 08:34:48PM +0800, Yunsheng Lin wrote: > On 2021/5/1 0:24, Ilias Apalodimas wrote: > > [...] > >>>> > >>>> 1. skb frag page recycling do not need "struct xdp_rxq_info" or > >>>> "struct xdp_mem_info" to bond the relation between "struct page" and > >>>> "struct page_pool", which seems uncessary at this point if bonding > >>>> a "struct page_pool" pointer directly in "struct page" does not cause > >>>> space increasing. > >>> > >>> We can't do that. The reason we need those structs is that we rely on the > >>> existing XDP code, which already recycles it's buffers, to enable > >>> recycling. Since we allocate a page per packet when using page_pool for a > >>> driver , the same ideas apply to an SKB and XDP frame. We just recycle the > >> > >> I am not really familar with XDP here, but a packet from hw is either a > >> "struct xdp_frame/xdp_buff" for XDP or a "struct sk_buff" for TCP/IP stack, > >> a packet can not be both "struct xdp_frame/xdp_buff" and "struct sk_buff" at > >> the same time, right? > >> > > > > Yes, but the payload is irrelevant in both cases and that's what we use > > page_pool for. You can't use this patchset unless your driver usues > > build_skb(). So in both cases you just allocate memory for the payload and > > I am not sure I understood why build_skb() matters here. If the head data of > a skb is a page frag and is from page pool, then it's page->signature should be > PP_SIGNATURE, otherwise it's page->signature is zero, so a recyclable skb does > not require it's head data being from a page pool, right? > Correct, and that's the big improvement compared to the original RFC. The wording was a bit off in my initial response. I was trying to point out you can recycle *any* buffer coming from page_pool and one of the ways you can do that in your driver, is use build_skb() while the payload is allocated by page_pool. > > decide what the wrap the buffer with (XDP or SKB) later. > > [...] > > >> > >> I am not sure I understand what you meant by "free the skb", does it mean > >> that kfree_skb() is called to free the skb. > > > > Yes > > > >> > >> As my understanding, if the skb completely own the page(which means page_count() > >> == 1) when kfree_skb() is called, __page_pool_put_page() is called, otherwise > >> page_ref_dec() is called, which is exactly what page_pool_atomic_sub_if_positive() > >> try to handle it atomically. > >> > > > > Not really, the opposite is happening here. If the pp_recycle bit is set we > > will always call page_pool_return_skb_page(). If the page signature matches > > the 'magic' set by page pool we will always call xdp_return_skb_frame() will > > end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > > to recycle the page. If it's not we'll release it from page_pool (releasing > > some internal references we keep) unmap the buffer and decrement the refcnt. > > Yes, I understood the above is what the page pool do now. > > But the question is who is still holding an extral reference to the page when > kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral > reference to the same page? So why not just do a page_ref_dec() if the orginal skb > is freed first, and call __page_pool_put_page() when the cloned skb is freed later? > So that we can always reuse the recyclable page from a recyclable skb. This may > make the page_pool_destroy() process delays longer than before, I am supposed the > page_pool_destroy() delaying for cloned skb case does not really matters here. > > If the above works, I think the samiliar handling can be added to RX zerocopy if > the RX zerocopy also hold extral references to the recyclable page from a recyclable > skb too? > Right, this sounds doable, but I'll have to go back code it and see if it really makes sense. However I'd still prefer the support to go in as-is (including the struct xdp_mem_info in struct page, instead of a page_pool pointer). There's a couple of reasons for that. If we keep the struct xdp_mem_info we can in the future recycle different kind of buffers using __xdp_return(). And this is a non intrusive change if we choose to store the page pool address directly in the future. It just affects the internal contract between the page_pool code and struct page. So it won't affect any drivers that already use the feature. Regarding the page_ref_dec(), which as I said sounds doable, I'd prefer playing it safe for now and getting rid of the buffers that somehow ended up holding an extra reference. Once this gets approved we can go back and try to save the extra space. I hope I am not wrong but the changes required to support a few extra refcounts should not change the current patches much. Thanks for taking the time on this! /Ilias > > > > [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ > > > > Cheers > > /Ilias > > > > . > > >
On 2021/5/6 20:58, Ilias Apalodimas wrote: >>>> >>> >>> Not really, the opposite is happening here. If the pp_recycle bit is set we >>> will always call page_pool_return_skb_page(). If the page signature matches >>> the 'magic' set by page pool we will always call xdp_return_skb_frame() will >>> end up calling __page_pool_put_page(). If the refcnt is 1 we'll try >>> to recycle the page. If it's not we'll release it from page_pool (releasing >>> some internal references we keep) unmap the buffer and decrement the refcnt. >> >> Yes, I understood the above is what the page pool do now. >> >> But the question is who is still holding an extral reference to the page when >> kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral >> reference to the same page? So why not just do a page_ref_dec() if the orginal skb >> is freed first, and call __page_pool_put_page() when the cloned skb is freed later? >> So that we can always reuse the recyclable page from a recyclable skb. This may >> make the page_pool_destroy() process delays longer than before, I am supposed the >> page_pool_destroy() delaying for cloned skb case does not really matters here. >> >> If the above works, I think the samiliar handling can be added to RX zerocopy if >> the RX zerocopy also hold extral references to the recyclable page from a recyclable >> skb too? >> > > Right, this sounds doable, but I'll have to go back code it and see if it > really makes sense. However I'd still prefer the support to go in as-is > (including the struct xdp_mem_info in struct page, instead of a page_pool > pointer). > > There's a couple of reasons for that. If we keep the struct xdp_mem_info we > can in the future recycle different kind of buffers using __xdp_return(). > And this is a non intrusive change if we choose to store the page pool address > directly in the future. It just affects the internal contract between the > page_pool code and struct page. So it won't affect any drivers that already > use the feature. This patchset has embeded a signature field in "struct page", and xdp_mem_info is stored in page_private(), which seems not considering the case for associating the page pool with "struct page" directly yet? Is the page pool also stored in page_private() and a different signature is used to indicate that? I am not saying we have to do it in this patchset, but we have to consider it while we are adding new signature field to "struct page", right? > Regarding the page_ref_dec(), which as I said sounds doable, I'd prefer > playing it safe for now and getting rid of the buffers that somehow ended up > holding an extra reference. Once this gets approved we can go back and try to > save the extra space. I hope I am not wrong but the changes required to > support a few extra refcounts should not change the current patches much. > > Thanks for taking the time on this! Thanks all invovled in the effort improving page pool too:) > /Ilias > >>> >>> [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ >>> >>> Cheers >>> /Ilias >>> >>> . >>> >> > > . >
On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote: > On 2021/5/6 20:58, Ilias Apalodimas wrote: > >>>> > >>> > >>> Not really, the opposite is happening here. If the pp_recycle bit is set we > >>> will always call page_pool_return_skb_page(). If the page signature matches > >>> the 'magic' set by page pool we will always call xdp_return_skb_frame() will > >>> end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > >>> to recycle the page. If it's not we'll release it from page_pool (releasing > >>> some internal references we keep) unmap the buffer and decrement the refcnt. > >> > >> Yes, I understood the above is what the page pool do now. > >> > >> But the question is who is still holding an extral reference to the page when > >> kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral > >> reference to the same page? So why not just do a page_ref_dec() if the orginal skb > >> is freed first, and call __page_pool_put_page() when the cloned skb is freed later? > >> So that we can always reuse the recyclable page from a recyclable skb. This may > >> make the page_pool_destroy() process delays longer than before, I am supposed the > >> page_pool_destroy() delaying for cloned skb case does not really matters here. > >> > >> If the above works, I think the samiliar handling can be added to RX zerocopy if > >> the RX zerocopy also hold extral references to the recyclable page from a recyclable > >> skb too? > >> > > > > Right, this sounds doable, but I'll have to go back code it and see if it > > really makes sense. However I'd still prefer the support to go in as-is > > (including the struct xdp_mem_info in struct page, instead of a page_pool > > pointer). > > > > There's a couple of reasons for that. If we keep the struct xdp_mem_info we > > can in the future recycle different kind of buffers using __xdp_return(). > > And this is a non intrusive change if we choose to store the page pool address > > directly in the future. It just affects the internal contract between the > > page_pool code and struct page. So it won't affect any drivers that already > > use the feature. > > This patchset has embeded a signature field in "struct page", and xdp_mem_info > is stored in page_private(), which seems not considering the case for associating > the page pool with "struct page" directly yet? Correct > Is the page pool also stored in > page_private() and a different signature is used to indicate that? No only struct xdp_mem_info as you mentioned before > > I am not saying we have to do it in this patchset, but we have to consider it > while we are adding new signature field to "struct page", right? We won't need a new signature. The signature in both cases is there to guarantee the page you are trying to recycle was indeed allocated by page_pool. Basically we got two design choices here: - We store the page_pool ptr address directly in page->private and then, we call into page_pool APIs directly to do the recycling. That would eliminate the lookup through xdp_mem_info and the XDP helpers to locate page pool pointer (through __xdp_return). - You store the xdp_mem_info on page_private. In that case you need to go through __xdp_return() to locate the page_pool pointer. Although we might loose some performance that would allow us to recycle additional memory types and not only MEM_TYPE_PAGE_POOL (in case we ever need it). I think both choices are sane. What I am trying to explain here, is regardless of what we choose now, we can change it in the future without affecting the API consumers at all. What will change internally is the way we lookup the page pool pointer we are trying to recycle. [...] Cheers /Ilias
On 2021/5/7 15:06, Ilias Apalodimas wrote: > On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote: >> On 2021/5/6 20:58, Ilias Apalodimas wrote: >>>>>> >>>>> >>>>> Not really, the opposite is happening here. If the pp_recycle bit is set we >>>>> will always call page_pool_return_skb_page(). If the page signature matches >>>>> the 'magic' set by page pool we will always call xdp_return_skb_frame() will >>>>> end up calling __page_pool_put_page(). If the refcnt is 1 we'll try >>>>> to recycle the page. If it's not we'll release it from page_pool (releasing >>>>> some internal references we keep) unmap the buffer and decrement the refcnt. >>>> >>>> Yes, I understood the above is what the page pool do now. >>>> >>>> But the question is who is still holding an extral reference to the page when >>>> kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral >>>> reference to the same page? So why not just do a page_ref_dec() if the orginal skb >>>> is freed first, and call __page_pool_put_page() when the cloned skb is freed later? >>>> So that we can always reuse the recyclable page from a recyclable skb. This may >>>> make the page_pool_destroy() process delays longer than before, I am supposed the >>>> page_pool_destroy() delaying for cloned skb case does not really matters here. >>>> >>>> If the above works, I think the samiliar handling can be added to RX zerocopy if >>>> the RX zerocopy also hold extral references to the recyclable page from a recyclable >>>> skb too? >>>> >>> >>> Right, this sounds doable, but I'll have to go back code it and see if it >>> really makes sense. However I'd still prefer the support to go in as-is >>> (including the struct xdp_mem_info in struct page, instead of a page_pool >>> pointer). >>> >>> There's a couple of reasons for that. If we keep the struct xdp_mem_info we >>> can in the future recycle different kind of buffers using __xdp_return(). >>> And this is a non intrusive change if we choose to store the page pool address >>> directly in the future. It just affects the internal contract between the >>> page_pool code and struct page. So it won't affect any drivers that already >>> use the feature. >> >> This patchset has embeded a signature field in "struct page", and xdp_mem_info >> is stored in page_private(), which seems not considering the case for associating >> the page pool with "struct page" directly yet? > > Correct > >> Is the page pool also stored in >> page_private() and a different signature is used to indicate that? > > No only struct xdp_mem_info as you mentioned before > >> >> I am not saying we have to do it in this patchset, but we have to consider it >> while we are adding new signature field to "struct page", right? > > We won't need a new signature. The signature in both cases is there to > guarantee the page you are trying to recycle was indeed allocated by page_pool. > > Basically we got two design choices here: > - We store the page_pool ptr address directly in page->private and then, > we call into page_pool APIs directly to do the recycling. > That would eliminate the lookup through xdp_mem_info and the > XDP helpers to locate page pool pointer (through __xdp_return). > - You store the xdp_mem_info on page_private. In that case you need to go > through __xdp_return() to locate the page_pool pointer. Although we might > loose some performance that would allow us to recycle additional memory types > and not only MEM_TYPE_PAGE_POOL (in case we ever need it). So the signature field in "struct page" is used to only indicate a page is from a page pool, then how do we tell the content of page_private() if both of the above choices are needed, we might still need an extra indicator to tell page_private() is page_pool ptr or xdp_mem_info. It seems storing the page pool ptr in page_private() is clear for recyclable page from a recyclable skb use case, and the use case for storing xdp_mem_info in page_private() is unclear yet? As XDP seems to have the xdp_mem_info in the "struct xdp_frame", so it does not need the xdp_mem_info from page_private(). If the above is true, what does not really makes sense to me here is that: why do we first implement a unclear use case for storing xdp_mem_info in page_private(), why not implement the clear use case for storing page pool ptr in page_private() first? > > > I think both choices are sane. What I am trying to explain here, is > regardless of what we choose now, we can change it in the future without > affecting the API consumers at all. What will change internally is the way we > lookup the page pool pointer we are trying to recycle. It seems the below API need changing? +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct page *page, + struct xdp_mem_info *mem) > > [...] > > > Cheers > /Ilias > > . >
On Fri, 7 May 2021 16:28:30 +0800 Yunsheng Lin <linyunsheng@huawei.com> wrote: > On 2021/5/7 15:06, Ilias Apalodimas wrote: > > On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote: > >> On 2021/5/6 20:58, Ilias Apalodimas wrote: > >>>>>> > >>>>> > >>>>> Not really, the opposite is happening here. If the pp_recycle bit is set we > >>>>> will always call page_pool_return_skb_page(). If the page signature matches > >>>>> the 'magic' set by page pool we will always call xdp_return_skb_frame() will > >>>>> end up calling __page_pool_put_page(). If the refcnt is 1 we'll try > >>>>> to recycle the page. If it's not we'll release it from page_pool (releasing > >>>>> some internal references we keep) unmap the buffer and decrement the refcnt. > >>>> > >>>> Yes, I understood the above is what the page pool do now. > >>>> > >>>> But the question is who is still holding an extral reference to the page when > >>>> kfree_skb()? Perhaps a cloned and pskb_expand_head()'ed skb is holding an extral > >>>> reference to the same page? So why not just do a page_ref_dec() if the orginal skb > >>>> is freed first, and call __page_pool_put_page() when the cloned skb is freed later? > >>>> So that we can always reuse the recyclable page from a recyclable skb. This may > >>>> make the page_pool_destroy() process delays longer than before, I am supposed the > >>>> page_pool_destroy() delaying for cloned skb case does not really matters here. > >>>> > >>>> If the above works, I think the samiliar handling can be added to RX zerocopy if > >>>> the RX zerocopy also hold extral references to the recyclable page from a recyclable > >>>> skb too? > >>>> > >>> > >>> Right, this sounds doable, but I'll have to go back code it and see if it > >>> really makes sense. However I'd still prefer the support to go in as-is > >>> (including the struct xdp_mem_info in struct page, instead of a page_pool > >>> pointer). > >>> > >>> There's a couple of reasons for that. If we keep the struct xdp_mem_info we > >>> can in the future recycle different kind of buffers using __xdp_return(). > >>> And this is a non intrusive change if we choose to store the page pool address > >>> directly in the future. It just affects the internal contract between the > >>> page_pool code and struct page. So it won't affect any drivers that already > >>> use the feature. > >> > >> This patchset has embeded a signature field in "struct page", and xdp_mem_info > >> is stored in page_private(), which seems not considering the case for associating > >> the page pool with "struct page" directly yet? > > > > Correct > > > >> Is the page pool also stored in > >> page_private() and a different signature is used to indicate that? > > > > No only struct xdp_mem_info as you mentioned before > > > >> > >> I am not saying we have to do it in this patchset, but we have to consider it > >> while we are adding new signature field to "struct page", right? > > > > We won't need a new signature. The signature in both cases is there to > > guarantee the page you are trying to recycle was indeed allocated by page_pool. > > > > Basically we got two design choices here: > > - We store the page_pool ptr address directly in page->private and then, > > we call into page_pool APIs directly to do the recycling. > > That would eliminate the lookup through xdp_mem_info and the > > XDP helpers to locate page pool pointer (through __xdp_return). > > - You store the xdp_mem_info on page_private. In that case you need to go > > through __xdp_return() to locate the page_pool pointer. Although we might > > loose some performance that would allow us to recycle additional memory types > > and not only MEM_TYPE_PAGE_POOL (in case we ever need it). > > So the signature field in "struct page" is used to only indicate a page is > from a page pool, then how do we tell the content of page_private() if both of > the above choices are needed, we might still need an extra indicator to tell > page_private() is page_pool ptr or xdp_mem_info. The signature field in "struct page" and "xdp_mem_info" is a double construct that was introduced in this patchset. AFAIK Matteo took the idea from Jonathan's patchset. I'm not convinced we need both, maybe later we do (when someone introduce a new mem model ala NetGPU). I think Jonathan's use-case was NetGPU[1] (which got shutdown due to Nvidia[2] being involved which I think was unfair). The general idea behind NetGPU makes sense to me, to allow packet headers to live in first page, and second page belongs to hardware. This implies that an SKB can can point to two different pages with different memory types, which need to be handled correctly when freeing the SKB and the pages it points to. Thus, placing (xdp_)mem_info in SKB is wrong as it implies all pages belong the same mem_info.type. The point is, when designing this I want us to think about how our design can handle other memory models than just page_pool. In this patchset design, we use a single bit in SKB to indicate that the pages pointed comes from another memory model, in this case page_pool is the only user of this bit. The remaining info about the memory model (page_pool) is stored in struct-page, which we look at when freeing the pages that the SKB points to (that is at the layer above the MM-calls that would free the page for real). [1] https://linuxplumbersconf.org/event/7/contributions/670/ [2] https://lwn.net/Articles/827596/ > It seems storing the page pool ptr in page_private() is clear for recyclable > page from a recyclable skb use case, and the use case for storing xdp_mem_info > in page_private() is unclear yet? As XDP seems to have the xdp_mem_info in the > "struct xdp_frame", so it does not need the xdp_mem_info from page_private(). > > If the above is true, what does not really makes sense to me here is that: > why do we first implement a unclear use case for storing xdp_mem_info in > page_private(), why not implement the clear use case for storing page pool ptr > in page_private() first? I'm actually not against storing page_pool object ptr directly in struct-page. It is a nice optimization. Perhaps we should implement this optimization outside this patchset first, and let __xdp_return() for XDP-redirected packets also take advantage to this optimization? Then it would feel more natural to also used this optimization in this patchset, right? > > > > > > I think both choices are sane. What I am trying to explain here, is > > regardless of what we choose now, we can change it in the future without > > affecting the API consumers at all. What will change internally is the way we > > lookup the page pool pointer we are trying to recycle. > > It seems the below API need changing? > +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct page *page, > + struct xdp_mem_info *mem) I don't think we need to change this API, to support future memory models. Notice that xdp_mem_info have a 'type' member. Naming in Computer Science is a hard problem ;-). Something that seems to confuse a lot of people is the naming of the struct "xdp_mem_info". Maybe we should have named it "mem_info" instead or "net_mem_info", as it doesn't indicate that the device is running XDP. I see XDP as the RX-layer before the network stack, that helps drivers to support different memory models, also for handling normal packets that doesn't get process by XDP, and the drivers doesn't even need to support XDP to use the "xdp_mem_info" type. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer struct xdp_mem_info { u32 type; /* enum xdp_mem_type, but known size type */ u32 id; }; enum xdp_mem_type { MEM_TYPE_PAGE_SHARED = 0, /* Split-page refcnt based model */ MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */ MEM_TYPE_PAGE_POOL, MEM_TYPE_XSK_BUFF_POOL, MEM_TYPE_MAX, };
On Fri, May 07, 2021 at 12:19:53PM +0200, Jesper Dangaard Brouer wrote: > Nvidia[2] being involved which I think was unfair). The general idea > behind NetGPU makes sense to me, Sorry, but that is utter bullshit. It was rejected because it did nothing but injecting hooks for an out of tree module while ignoring the existing kernel infrastructure for much of what it tries to archive.
Jesper Dangaard Brouer <brouer@redhat.com> writes: > On Fri, 7 May 2021 16:28:30 +0800 > Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> On 2021/5/7 15:06, Ilias Apalodimas wrote: >> > On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote: >> >> On 2021/5/6 20:58, Ilias Apalodimas wrote: >> >>>>>> >> >>>>> > ... >> > >> > >> > I think both choices are sane. What I am trying to explain >> > here, is >> > regardless of what we choose now, we can change it in the >> > future without >> > affecting the API consumers at all. What will change >> > internally is the way we >> > lookup the page pool pointer we are trying to recycle. >> >> It seems the below API need changing? >> +static inline void skb_mark_for_recycle(struct sk_buff *skb, >> struct page *page, >> + struct xdp_mem_info *mem) > > I don't think we need to change this API, to support future > memory > models. Notice that xdp_mem_info have a 'type' member. Hi, Providing that we will (possibly as a future optimization) store the pointer to the page pool in struct page instead of strcut xdp_mem_info, passing xdp_mem_info * instead of struct page_pool * would mean that for every packet we'll need to call xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params); xa->page_pool; which might pressure the Dcache to fetch a pointer that might be present already in cache as part of driver's data-structures. I tend to agree with Yunsheng that it makes more sense to adjust the API for the clear use-case now rather than using xdp_mem_info indirection. It seems to me like the page signature provides the same information anyway and allows to support different memory types. Shay > > Naming in Computer Science is a hard problem ;-). Something that > seems > to confuse a lot of people is the naming of the struct > "xdp_mem_info". > Maybe we should have named it "mem_info" instead or > "net_mem_info", as > it doesn't indicate that the device is running XDP. > > I see XDP as the RX-layer before the network stack, that helps > drivers > to support different memory models, also for handling normal > packets > that doesn't get process by XDP, and the drivers doesn't even > need to > support XDP to use the "xdp_mem_info" type.
On 2021/5/7 18:19, Jesper Dangaard Brouer wrote: > On Fri, 7 May 2021 16:28:30 +0800 > Yunsheng Lin <linyunsheng@huawei.com> wrote: > [...] > > I'm actually not against storing page_pool object ptr directly in > struct-page. It is a nice optimization. Perhaps we should implement > this optimization outside this patchset first, and let __xdp_return() > for XDP-redirected packets also take advantage to this optimization? > > Then it would feel more natural to also used this optimization in this > patchset, right? Yes, it would be good if XDP can take advantage of this optimization too. Then it seems we can remove the "mem_id_ht"? if we want to support different type of page pool in the future, the type field is in the page pool too when page_pool object ptr directly in struct-page. >
Hi Shay, On Sun, May 09, 2021 at 08:11:35AM +0300, Shay Agroskin wrote: > > Jesper Dangaard Brouer <brouer@redhat.com> writes: > > > On Fri, 7 May 2021 16:28:30 +0800 > > Yunsheng Lin <linyunsheng@huawei.com> wrote: > > > > > On 2021/5/7 15:06, Ilias Apalodimas wrote: > > > > On Fri, May 07, 2021 at 11:23:28AM +0800, Yunsheng Lin wrote: >> > > > On 2021/5/6 20:58, Ilias Apalodimas wrote: >>>>>> >>>>> > > ... > > > > > > I think both choices are sane. What I am trying to explain > > > > here, is > > > > regardless of what we choose now, we can change it in the > future > > > without > > > > affecting the API consumers at all. What will change > internally > > > is the way we > > > > lookup the page pool pointer we are trying to recycle. > > > > > > It seems the below API need changing? > > > +static inline void skb_mark_for_recycle(struct sk_buff *skb, struct > > > page *page, > > > + struct xdp_mem_info *mem) > > > > I don't think we need to change this API, to support future memory > > models. Notice that xdp_mem_info have a 'type' member. > > Hi, > Providing that we will (possibly as a future optimization) store the pointer > to the page pool in struct page instead of strcut xdp_mem_info, passing > xdp_mem_info * instead of struct page_pool * would mean that for every > packet we'll need to call > xa = rhashtable_lookup(mem_id_ht, &mem->id, > mem_id_rht_params); > xa->page_pool; > > which might pressure the Dcache to fetch a pointer that might be present > already in cache as part of driver's data-structures. > > I tend to agree with Yunsheng that it makes more sense to adjust the API for > the clear use-case now rather than using xdp_mem_info indirection. It seems > to me like > the page signature provides the same information anyway and allows to > support different memory types. We've switched the patches already. We didn't notice any performance boost by doing so (tested on a machiattobin), but I agree as well. As I explained the only thing that will change if we ever the need the struct xdp_mem_info in struct page is the internal contract between struct page and the recycling function, so let's start clean and see if we ever need that. Cheers /Ilias > > Shay > > > > > Naming in Computer Science is a hard problem ;-). Something that seems > > to confuse a lot of people is the naming of the struct "xdp_mem_info". > > Maybe we should have named it "mem_info" instead or "net_mem_info", as > > it doesn't indicate that the device is running XDP. > > > > I see XDP as the RX-layer before the network stack, that helps drivers > > to support different memory models, also for handling normal packets > > that doesn't get process by XDP, and the drivers doesn't even need to > > support XDP to use the "xdp_mem_info" type. >
From: Matteo Croce <mcroce@microsoft.com> This is a respin of [1] This patchset shows the plans for allowing page_pool to handle and maintain DMA map/unmap of the pages it serves to the driver. For this to work a return hook in the network core is introduced. The overall purpose is to simplify drivers, by providing a page allocation API that does recycling, such that each driver doesn't have to reinvent its own recycling scheme. Using page_pool in a driver does not require implementing XDP support, but it makes it trivially easy to do so. Instead of allocating buffers specifically for SKBs we now allocate a generic buffer and either wrap it on an SKB (via build_skb) or create an XDP frame. The recycling code leverages the XDP recycle APIs. The Marvell mvpp2 and mvneta drivers are used in this patchset to demonstrate how to use the API, and tested on a MacchiatoBIN and EspressoBIN boards respectively. Please let this going in on a future -rc1 so to allow enough time to have wider tests. [1] https://lore.kernel.org/netdev/154413868810.21735.572808840657728172.stgit@firesoul/ v2 -> v3: - added missing SOBs - CCed the MM people v1 -> v2: - fix a commit message - avoid setting pp_recycle multiple times on mvneta - squash two patches to avoid breaking bisect Ilias Apalodimas (1): page_pool: Allow drivers to hint on SKB recycling Jesper Dangaard Brouer (1): xdp: reduce size of struct xdp_mem_info Matteo Croce (3): mm: add a signature in struct page mvpp2: recycle buffers mvneta: recycle buffers .../chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 2 +- drivers/net/ethernet/marvell/mvneta.c | 7 ++- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 17 +++---- drivers/net/ethernet/marvell/sky2.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +- include/linux/mm_types.h | 1 + include/linux/skbuff.h | 35 ++++++++++++-- include/net/page_pool.h | 15 ++++++ include/net/xdp.h | 5 +- net/core/page_pool.c | 47 +++++++++++++++++++ net/core/skbuff.c | 20 +++++++- net/core/xdp.c | 14 ++++-- net/tls/tls_device.c | 2 +- 13 files changed, 142 insertions(+), 27 deletions(-)