Message ID | 20240710001749.1388631-1-almasrymina@google.com |
---|---|
Headers | show |
Series | Device Memory TCP | expand |
Mina Almasry <almasrymina@google.com> writes: > Add documentation outlining the usage and details of devmem TCP. > > Signed-off-by: Mina Almasry <almasrymina@google.com> > Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
On Wed, 10 Jul 2024 00:17:34 +0000 Mina Almasry wrote:
> + DEBUG_NET_WARN_ON_ONCE(!rtnl_is_locked());
ASSERT_RTNL() ?
On Wed, Jul 10, 2024 at 9:49 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 10 Jul 2024 00:17:38 +0000 Mina Almasry wrote: > > @@ -68,17 +107,103 @@ static inline netmem_ref page_to_netmem(struct page *page) > > > > static inline int netmem_ref_count(netmem_ref netmem) > > { > > + /* The non-pp refcount of net_iov is always 1. On net_iov, we only > > + * support pp refcounting which uses the pp_ref_count field. > > + */ > > + if (netmem_is_net_iov(netmem)) > > + return 1; > > + > > return page_ref_count(netmem_to_page(netmem)); > > } > > How can this work if we had to revert the patch which made all of > the networking stack take pp-aware refs? Maybe we should add the > refcount, and let it be bumped, but WARN() if the net_iov is released > with refcount other than 1? Or we need a very solid explanation why > the conversion had to be reverted and this is fine. > Right, as you are aware, page refcounting is based on 2 refcounts: pp refs and full page refs. To be honest I find the 2-ref flow confusing and I made an effort to avoid porting this bit to net_iovs. net_iovs just supports 1 refcount, which is the pp-ref. My intention is that when a reference is needed on a net_iov, we obtain the pp-ref, and when we drop a reference on a net_iov, we drop the pp_ref. This is able to work for net_iov but not pages, because (as you explained to me) pages can be inserted into the net stack with full page refs. So when it comes to refcounting pages we need to be careful which ref to obtain or drop depending on is_pp_netmem() and skb->pp_recycle (as pp_recycle serves as a concurrency check, like you explained). AFAICT, since net_iovs always originate from the net stack, we can make the simplification that they're always seeded with 1 pp-ref, and never support non-pp-refs. This simplifies the refcounting such that: 1. net_iov are always is_pp_netmem (they are never disconnected from the pp as they never have elevated non-pp refcount), and 2. net_iov refcounting doesn't need to check skb->pp_recycle for refcounting, because we can be sure that the caller always has a non-pp ref (since it's the only one supported). Currently, as written, I just realized I did not add net_iov support to __skb_frag_ref(). But net_iov does support skb_pp_frag_ref(). So there is no way to increment a non-pp ref for net_iov. If we want to add __skb_frag_ref() support for net_iov I suggest something like: diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488a..02f7f4c7d4821 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,13 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + netmem_ref netmem = skb_frag_netmem(frag); + + /* netmem always uses pp-refs for refcounting. Never non-pp refs. */ + if (!netmem_is_net_iov(netmem)) + get_page(netmem_to_page(netmem)); + else + page_pool_ref_netmem(netmem); } If you don't like the 1 ref simplification, I can definitely add a second refcount as you suggest, but AFAICT the simplification is safe due to how net_iov are originated, and maybe also because devmem usage in the net stack is limited due to all the skb_is_readable() checks, and it's possible that the edge cases don't reproduce. I was looking to find a concrete bug report with devmem before taking a hammer and adding a secondary refcount, rather than do it preemptively, but I'm happy to look into it if you insist. > > static inline unsigned long netmem_to_pfn(netmem_ref netmem) > > { > > + if (netmem_is_net_iov(netmem)) > > + return 0; > > + > > return page_to_pfn(netmem_to_page(netmem)); > > } > > Can we move this out and rename it to netmem_pfn_trace() ? > Silently returning 0 is not generally okay, but since it's only > for tracing we don't care. > Yes, I will do. > > +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) > > +{ > > + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); > > +} > > + > > +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp_magic; > > +} > > + > > +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic; > > +} > > + > > +static inline void netmem_clear_pp_magic(netmem_ref netmem) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic = 0; > > +} > > + > > +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp; > > +} > > + > > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) > > +{ > > + __netmem_clear_lsb(netmem)->pp = pool; > > +} > > Why is all this stuff in the main header? It's really low level. > Please put helpers which are only used by the core in a header > under net/core/, like net/core/dev.h Sorry, will do. -- Thanks, Mina
On Wed, Jul 10, 2024 at 9:49 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 10 Jul 2024 00:17:38 +0000 Mina Almasry wrote: > > +static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem) > > +{ > > + return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV); > > +} > > + > > +static inline unsigned long netmem_get_pp_magic(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp_magic; > > +} > > + > > +static inline void netmem_or_pp_magic(netmem_ref netmem, unsigned long pp_magic) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic |= pp_magic; > > +} > > + > > +static inline void netmem_clear_pp_magic(netmem_ref netmem) > > +{ > > + __netmem_clear_lsb(netmem)->pp_magic = 0; > > +} > > + > > +static inline struct page_pool *netmem_get_pp(netmem_ref netmem) > > +{ > > + return __netmem_clear_lsb(netmem)->pp; > > +} > > + > > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) > > +{ > > + __netmem_clear_lsb(netmem)->pp = pool; > > +} > > Why is all this stuff in the main header? It's really low level. > Please put helpers which are only used by the core in a header > under net/core/, like net/core/dev.h Sorry none of those are only used by net/core/*. Pretty much all of these are used by include/net/page_pool/helpers.h, and some have callers in net/core/devmem.c or net/core/skbuff.c Would you like me to move these pp specific looking ones to include/net/page_pool/netmem.h or something similar?
On Wed, 10 Jul 2024 13:29:03 -0700 Mina Almasry wrote: > If we want to add __skb_frag_ref() support for net_iov I suggest something like: > > diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h > index 0f3c58007488a..02f7f4c7d4821 100644 > --- a/include/linux/skbuff_ref.h > +++ b/include/linux/skbuff_ref.h > @@ -17,7 +17,13 @@ > */ > static inline void __skb_frag_ref(skb_frag_t *frag) > { > - get_page(skb_frag_page(frag)); > + netmem_ref netmem = skb_frag_netmem(frag); > + > + /* netmem always uses pp-refs for refcounting. Never non-pp refs. */ > + if (!netmem_is_net_iov(netmem)) > + get_page(netmem_to_page(netmem)); > + else > + page_pool_ref_netmem(netmem); > } Probably not much better since freeing still looks at the recycle bit. Eric and Willem acked patch 8, maybe it works, and if it doesn't we know who to call :) I can't point out any case that won't work so if y'all think this is fine, let's leave it.
On Wed, 10 Jul 2024 16:42:04 -0700 Mina Almasry wrote: > > > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) > > > +{ > > > + __netmem_clear_lsb(netmem)->pp = pool; > > > +} > > > > Why is all this stuff in the main header? It's really low level. > > Please put helpers which are only used by the core in a header > > under net/core/, like net/core/dev.h > > Sorry none of those are only used by net/core/*. Pretty much all of > these are used by include/net/page_pool/helpers.h, and some have > callers in net/core/devmem.c or net/core/skbuff.c > > Would you like me to move these pp specific looking ones to > include/net/page_pool/netmem.h or something similar? That's because some things already in helpers have no real business being there either. Why is page_pool_set_pp_info() in helpers.h?
On Wed, 10 Jul 2024 00:17:46 +0000 Mina Almasry wrote: > Add dmabuf information to page_pool stats: > > $ ./cli.py --spec ../netlink/specs/netdev.yaml --dump page-pool-get Aha, thanks! Reviewed-by: Jakub Kicinski <kuba@kernel.org>
On Wed, Jul 10, 2024 at 6:23 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 10 Jul 2024 16:42:04 -0700 Mina Almasry wrote: > > > > +static inline void netmem_set_pp(netmem_ref netmem, struct page_pool *pool) > > > > +{ > > > > + __netmem_clear_lsb(netmem)->pp = pool; > > > > +} > > > > > > Why is all this stuff in the main header? It's really low level. > > > Please put helpers which are only used by the core in a header > > > under net/core/, like net/core/dev.h > > > > Sorry none of those are only used by net/core/*. Pretty much all of > > these are used by include/net/page_pool/helpers.h, and some have > > callers in net/core/devmem.c or net/core/skbuff.c > > > > Would you like me to move these pp specific looking ones to > > include/net/page_pool/netmem.h or something similar? > > That's because some things already in helpers have no real business > being there either. Why is page_pool_set_pp_info() in helpers.h? OK, I looked into this a bit. It looks like I can trivially move page_pool_set/clear_pp_info() to page_pool_priv.h, and that lets me move out a few of these netmem helpers to a header under net/core. However, to move more of these netmem helpers to a private header, I think I need to move all the page pool dma helpers and reffing helpers to a private header or the .c file, which I think will uninline them as they're eventually called from drivers. I had guessed the previous authors put those dma and ref helpers in the .h file to inline them as they're used in fast paths. Do you think the refactor and the uninling is desirable? Or should I just do with the trivial moving of the page_pool_set/clear_pp_info() to the private file?
On Thu, 11 Jul 2024 13:57:01 -0700 Mina Almasry wrote: > > > Sorry none of those are only used by net/core/*. Pretty much all of > > > these are used by include/net/page_pool/helpers.h, and some have > > > callers in net/core/devmem.c or net/core/skbuff.c > > > > > > Would you like me to move these pp specific looking ones to > > > include/net/page_pool/netmem.h or something similar? > > > > That's because some things already in helpers have no real business > > being there either. Why is page_pool_set_pp_info() in helpers.h? > > OK, I looked into this a bit. It looks like I can trivially move > page_pool_set/clear_pp_info() to page_pool_priv.h, and that lets me > move out a few of these netmem helpers to a header under net/core. > > However, to move more of these netmem helpers to a private header, I > think I need to move all the page pool dma helpers and reffing helpers > to a private header or the .c file, which I think will uninline them > as they're eventually called from drivers. > > I had guessed the previous authors put those dma and ref helpers in > the .h file to inline them as they're used in fast paths. Do you think > the refactor and the uninling is desirable? Or should I just do with > the trivial moving of the page_pool_set/clear_pp_info() to the private > file? The helpers which modify pp_magic and dma_addr should go. I don't see anything else on a quick look, but in general the public header shouldn't contain helpers which are meant for setup / init of a buffer.