[RFC] net: page_pool: Don't use page->private to store dma_addr_t

Message ID 1549550196-25581-1-git-send-email-ilias.apalodimas@linaro.org
State New
Headers show
Series
  • [RFC] net: page_pool: Don't use page->private to store dma_addr_t
Related show

Commit Message

Ilias Apalodimas Feb. 7, 2019, 2:36 p.m.
As pointed out in
https://www.mail-archive.com/netdev@vger.kernel.org/msg257926.html
the current page_pool implementation stores dma_addr_t in page->private.
This won't work on 32-bit platforms with 64-bit DMA addresses since the
page->private is an unsigned long and the dma_addr_t a u64.

Since no driver is yet using the DMA mapping capabilities of the API let's
try and fix this by shadowing struct_page and use 'struct list_head lru'
to store and retrieve DMA addresses from network drivers.
As long as the addresses returned from dma_map_page() are aligned the
first bit, used by the compound pages code should not be set.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
 include/net/page_pool.h | 55 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/core/page_pool.c    | 18 ++++++++++++----
 2 files changed, 69 insertions(+), 4 deletions(-)

-- 
2.7.4

Comments

Matthew Wilcox Feb. 7, 2019, 3:07 p.m. | #1
On Thu, Feb 07, 2019 at 04:36:36PM +0200, Ilias Apalodimas wrote:
> +/* Until we can update struct-page, have a shadow struct-page, that

> + * include our use-case

> + * Used to store retrieve dma addresses from network drivers.

> + * Never access this directly, use helper functions provided

> + * page_pool_get_dma_addr()

> + */


Huh?  Why not simply:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2c471a2c43fa..2495a93ad90c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -28,6 +28,10 @@ struct address_space;
 struct mem_cgroup;
 struct hmm;
 
+struct page_pool {
+       dma_addr_t dma_addr;
+};
+
 /*
  * Each physical page in the system has a struct page associated with
  * it to keep track of whatever it is we are using the page for at the
@@ -77,6 +81,7 @@ struct page {
         * avoid collision and false-positive PageTail().
         */
        union {
+               struct page_pool pool;
                struct {        /* Page cache and anonymous pages */
                        /**
                         * @lru: Pageout list, eg. active_list protected by
Ilias Apalodimas Feb. 7, 2019, 3:20 p.m. | #2
Hi Matthew,

On Thu, Feb 07, 2019 at 07:07:45AM -0800, Matthew Wilcox wrote:
> On Thu, Feb 07, 2019 at 04:36:36PM +0200, Ilias Apalodimas wrote:

> > +/* Until we can update struct-page, have a shadow struct-page, that

> > + * include our use-case

> > + * Used to store retrieve dma addresses from network drivers.

> > + * Never access this directly, use helper functions provided

> > + * page_pool_get_dma_addr()

> > + */

> 

> Huh?  Why not simply:

> 

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h

> index 2c471a2c43fa..2495a93ad90c 100644

> --- a/include/linux/mm_types.h

> +++ b/include/linux/mm_types.h

> @@ -28,6 +28,10 @@ struct address_space;

>  struct mem_cgroup;

>  struct hmm;

>  

> +struct page_pool {

> +       dma_addr_t dma_addr;

> +};

> +

>  /*

>   * Each physical page in the system has a struct page associated with

>   * it to keep track of whatever it is we are using the page for at the

> @@ -77,6 +81,7 @@ struct page {

>          * avoid collision and false-positive PageTail().

>          */

>         union {

> +               struct page_pool pool;

>                 struct {        /* Page cache and anonymous pages */

>                         /**

>                          * @lru: Pageout list, eg. active_list protected by

> 


Well updating struct page is the final goal, hence the comment. I am mostly
looking for opinions here since we are trying to store dma addresses which are
irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a
bit controversial isn't it ? If we can add that, then yes the code would look
better

Thanks
/Ilias
David Miller Feb. 7, 2019, 9:25 p.m. | #3
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Date: Thu, 7 Feb 2019 17:20:34 +0200

> Well updating struct page is the final goal, hence the comment. I am mostly

> looking for opinions here since we are trying to store dma addresses which are

> irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a

> bit controversial isn't it ? If we can add that, then yes the code would look

> better


I fundamentally disagree.

One of the core operations performed on a page is mapping it so that a device
and use it.

Why have ancillary data structure support for this all over the place, rather
than in the common spot which is the page.

A page really is not just a 'mm' structure, it is a system structure.
Matthew Wilcox Feb. 7, 2019, 9:34 p.m. | #4
On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> Date: Thu, 7 Feb 2019 17:20:34 +0200

> 

> > Well updating struct page is the final goal, hence the comment. I am mostly

> > looking for opinions here since we are trying to store dma addresses which are

> > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a

> > bit controversial isn't it ? If we can add that, then yes the code would look

> > better

> 

> I fundamentally disagree.

> 

> One of the core operations performed on a page is mapping it so that a device

> and use it.

> 

> Why have ancillary data structure support for this all over the place, rather

> than in the common spot which is the page.

> 

> A page really is not just a 'mm' structure, it is a system structure.


+1

The fundamental point of computing is to do I/O.
Ilias Apalodimas Feb. 7, 2019, 9:37 p.m. | #5
Hi David,

On Thu, Feb 07, 2019 at 01:25I:19PM -0800, David Miller wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> Date: Thu, 7 Feb 2019 17:20:34 +0200

> 

> > Well updating struct page is the final goal, hence the comment. I am mostly

> > looking for opinions here since we are trying to store dma addresses which are

> > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a

> > bit controversial isn't it ? If we can add that, then yes the code would look

> > better

> 

> I fundamentally disagree.

> 

> One of the core operations performed on a page is mapping it so that a device

> and use it.

> 

> Why have ancillary data structure support for this all over the place, rather

> than in the common spot which is the page.


You are right on that. Moreover the intention of this change is to facilitate
the page recycling patches we proposed with Jesper. In that context we do need
the dma mapping information in a common spot since we'll need to access it from
drivers, networking code etc. The struct page *is* the best place for that.

> 

> A page really is not just a 'mm' structure, it is a system structure.


Well if you put it that way i completely agree (also it makes our life a *lot*
easier :))


Thanks
/Ilias
Ilias Apalodimas Feb. 7, 2019, 9:42 p.m. | #6
Hi Matthew,

On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote:
> On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:

> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > Date: Thu, 7 Feb 2019 17:20:34 +0200

> > 

> > > Well updating struct page is the final goal, hence the comment. I am mostly

> > > looking for opinions here since we are trying to store dma addresses which are

> > > irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a

> > > bit controversial isn't it ? If we can add that, then yes the code would look

> > > better

> > 

> > I fundamentally disagree.

> > 

> > One of the core operations performed on a page is mapping it so that a device

> > and use it.

> > 

> > Why have ancillary data structure support for this all over the place, rather

> > than in the common spot which is the page.

> > 

> > A page really is not just a 'mm' structure, it is a system structure.

> 

> +1

> 

> The fundamental point of computing is to do I/O.

Ok, great that should sort it out then.
I'll use your proposal and base the patch on that. 

Thanks for taking the time with this

/Ilias
Tariq Toukan Feb. 11, 2019, 8:53 a.m. | #7
On 2/7/2019 11:42 PM, Ilias Apalodimas wrote:
> Hi Matthew,

> 

> On Thu, Feb 07, 2019 at 01:34:00PM -0800, Matthew Wilcox wrote:

>> On Thu, Feb 07, 2019 at 01:25:19PM -0800, David Miller wrote:

>>> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>>> Date: Thu, 7 Feb 2019 17:20:34 +0200

>>>

>>>> Well updating struct page is the final goal, hence the comment. I am mostly

>>>> looking for opinions here since we are trying to store dma addresses which are

>>>> irrelevant to pages. Having dma_addr_t definitions in mm-related headers is a

>>>> bit controversial isn't it ? If we can add that, then yes the code would look

>>>> better

>>>

>>> I fundamentally disagree.

>>>

>>> One of the core operations performed on a page is mapping it so that a device

>>> and use it.

>>>

>>> Why have ancillary data structure support for this all over the place, rather

>>> than in the common spot which is the page.

>>>

>>> A page really is not just a 'mm' structure, it is a system structure.

>>

>> +1

>>

>> The fundamental point of computing is to do I/O.

> Ok, great that should sort it out then.

> I'll use your proposal and base the patch on that.

> 

> Thanks for taking the time with this

> 

> /Ilias

> 


Hi,

It's great to use the struct page to store its dma mapping, but I am 
worried about extensibility.
page_pool is evolving, and it would need several more per-page fields. 
One of them would be pageref_bias, a planned optimization to reduce the 
number of the costly atomic pageref operations (and replace existing 
code in several drivers).

I would replace this dma field with a pointer to an extensible struct, 
that would contain the dma mapping (and other stuff in the near future).
This pointer fits perfectly with the existing unsigned long private; 
they can share the memory, for both 32- and 64-bits systems.

The only downside is one more pointer de-reference. This should be perf 
tested.
However, when introducing the page refcnt bias optimization into 
page_pool, I believe the perf gain would be guaranteed.

Regards,
Tariq
Matthew Wilcox Feb. 11, 2019, 12:12 p.m. | #8
On Mon, Feb 11, 2019 at 08:53:19AM +0000, Tariq Toukan wrote:
> It's great to use the struct page to store its dma mapping, but I am 

> worried about extensibility.

> page_pool is evolving, and it would need several more per-page fields. 

> One of them would be pageref_bias, a planned optimization to reduce the 

> number of the costly atomic pageref operations (and replace existing 

> code in several drivers).


There's space for five words (20 or 40 bytes on 32/64 bit).
Tariq Toukan Feb. 11, 2019, 3:38 p.m. | #9
On 2/11/2019 2:12 PM, Matthew Wilcox wrote:
> On Mon, Feb 11, 2019 at 08:53:19AM +0000, Tariq Toukan wrote:

>> It's great to use the struct page to store its dma mapping, but I am

>> worried about extensibility.

>> page_pool is evolving, and it would need several more per-page fields.

>> One of them would be pageref_bias, a planned optimization to reduce the

>> number of the costly atomic pageref operations (and replace existing

>> code in several drivers).

> 

> There's space for five words (20 or 40 bytes on 32/64 bit).

> 


OK so this is good for now, and for the near future.
Fine by me.

Regards,
Tariq
Eric Dumazet Feb. 11, 2019, 5:14 p.m. | #10
On 02/11/2019 12:53 AM, Tariq Toukan wrote:
> 


> Hi,

> 

> It's great to use the struct page to store its dma mapping, but I am 

> worried about extensibility.

> page_pool is evolving, and it would need several more per-page fields. 

> One of them would be pageref_bias, a planned optimization to reduce the 

> number of the costly atomic pageref operations (and replace existing 

> code in several drivers).

> 


But the point about pageref_bias is to place it in a different cache line than "struct page"

The major cost is having a cache line bouncing between producer and consumer.

pageref_bias means the producer only have to read the "struct page" and not dirty it
in the case the page can be recycled.



> I would replace this dma field with a pointer to an extensible struct, 

> that would contain the dma mapping (and other stuff in the near future).

> This pointer fits perfectly with the existing unsigned long private; 

> they can share the memory, for both 32- and 64-bits systems.

> 

> The only downside is one more pointer de-reference. This should be perf 

> tested.

> However, when introducing the page refcnt bias optimization into 

> page_pool, I believe the perf gain would be guaranteed.


Only in some cases perhaps (when the cache line can be dirtied without performance hit)
Tariq Toukan Feb. 12, 2019, 12:39 p.m. | #11
On 2/11/2019 7:14 PM, Eric Dumazet wrote:
> 

> 

> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

>>

> 

>> Hi,

>>

>> It's great to use the struct page to store its dma mapping, but I am

>> worried about extensibility.

>> page_pool is evolving, and it would need several more per-page fields.

>> One of them would be pageref_bias, a planned optimization to reduce the

>> number of the costly atomic pageref operations (and replace existing

>> code in several drivers).

>>

> 

> But the point about pageref_bias is to place it in a different cache line than "struct page"

> 

> The major cost is having a cache line bouncing between producer and consumer.

> 


pageref_bias is meant to be dirtied only by the page requester, i.e. the 
NIC driver / page_pool.
All other components (basically, SKB release flow / put_page) should 
continue working with the atomic page_refcnt, and not dirty the 
pageref_bias.

However, what bothers me more is another issue.
The optimization doesn't cleanly combine with the new page_pool 
direction for maintaining a queue for "available" pages, as the put_page 
flow would need to read pageref_bias, asynchronously, and act accordingly.

The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt" 
transition) causes a problem to the traditional pageref_bias idea, as it 
implies a new point in which the pageref_bias field is read 
*asynchronously*. This would risk missing the this critical 2 -> 1 
transition! Unless pageref_bias is atomic...


> pageref_bias means the producer only have to read the "struct page" and not dirty it

> in the case the page can be recycled.

> 

> 

> 

>> I would replace this dma field with a pointer to an extensible struct,

>> that would contain the dma mapping (and other stuff in the near future).

>> This pointer fits perfectly with the existing unsigned long private;

>> they can share the memory, for both 32- and 64-bits systems.

>>

>> The only downside is one more pointer de-reference. This should be perf

>> tested.

>> However, when introducing the page refcnt bias optimization into

>> page_pool, I believe the perf gain would be guaranteed.

> 

> Only in some cases perhaps (when the cache line can be dirtied without performance hit)

>
Jesper Dangaard Brouer Feb. 12, 2019, 1:49 p.m. | #12
On Tue, 12 Feb 2019 12:39:59 +0000
Tariq Toukan <tariqt@mellanox.com> wrote:

> On 2/11/2019 7:14 PM, Eric Dumazet wrote:

> > 

> > On 02/11/2019 12:53 AM, Tariq Toukan wrote:  

> >>  

> >   

> >> Hi,

> >>

> >> It's great to use the struct page to store its dma mapping, but I am

> >> worried about extensibility.

> >> page_pool is evolving, and it would need several more per-page fields.

> >> One of them would be pageref_bias, a planned optimization to reduce the

> >> number of the costly atomic pageref operations (and replace existing

> >> code in several drivers).

> >>  

> > 

> > But the point about pageref_bias is to place it in a different

> > cache line than "struct page"


Yes, exactly.


> > The major cost is having a cache line bouncing between producer and

> > consumer. 

> 

> pageref_bias is meant to be dirtied only by the page requester, i.e. the 

> NIC driver / page_pool.

> All other components (basically, SKB release flow / put_page) should 

> continue working with the atomic page_refcnt, and not dirty the 

> pageref_bias.

> 

> However, what bothers me more is another issue.

> The optimization doesn't cleanly combine with the new page_pool 

> direction for maintaining a queue for "available" pages, as the put_page 

> flow would need to read pageref_bias, asynchronously, and act accordingly.

> 

> The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt" 

> transition) causes a problem to the traditional pageref_bias idea, as it 

> implies a new point in which the pageref_bias field is read 

> *asynchronously*. This would risk missing the this critical 2 -> 1 

> transition! Unless pageref_bias is atomic...


I want to stop you here...

It seems to me that you are trying to shoehorn in a refcount
optimization into page_pool.  The page_pool is optimized for the XDP
case of one-frame-per-page, where we can avoid changing the refcount,
and tradeoff memory usage for speed.  It is compatible with the elevated
refcount usage, but that is not the optimization target.

If the case you are optimizing for is "packing" more frames in a page,
then the page_pool might be the wrong choice.  To me it would make more
sense to create another enum xdp_mem_type, that generalize the
pageref_bias tricks also used by some drivers.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Tariq Toukan Feb. 12, 2019, 2:58 p.m. | #13
On 2/12/2019 3:49 PM, Jesper Dangaard Brouer wrote:
> On Tue, 12 Feb 2019 12:39:59 +0000

> Tariq Toukan <tariqt@mellanox.com> wrote:

> 

>> On 2/11/2019 7:14 PM, Eric Dumazet wrote:

>>>

>>> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

>>>>   

>>>    

>>>> Hi,

>>>>

>>>> It's great to use the struct page to store its dma mapping, but I am

>>>> worried about extensibility.

>>>> page_pool is evolving, and it would need several more per-page fields.

>>>> One of them would be pageref_bias, a planned optimization to reduce the

>>>> number of the costly atomic pageref operations (and replace existing

>>>> code in several drivers).

>>>>   

>>>

>>> But the point about pageref_bias is to place it in a different

>>> cache line than "struct page"

> 

> Yes, exactly.

> 

> 

>>> The major cost is having a cache line bouncing between producer and

>>> consumer.

>>

>> pageref_bias is meant to be dirtied only by the page requester, i.e. the

>> NIC driver / page_pool.

>> All other components (basically, SKB release flow / put_page) should

>> continue working with the atomic page_refcnt, and not dirty the

>> pageref_bias.

>>

>> However, what bothers me more is another issue.

>> The optimization doesn't cleanly combine with the new page_pool

>> direction for maintaining a queue for "available" pages, as the put_page

>> flow would need to read pageref_bias, asynchronously, and act accordingly.

>>

>> The suggested hook in put_page (to catch the 2 -> 1 "biased refcnt"

>> transition) causes a problem to the traditional pageref_bias idea, as it

>> implies a new point in which the pageref_bias field is read

>> *asynchronously*. This would risk missing the this critical 2 -> 1

>> transition! Unless pageref_bias is atomic...

> 

> I want to stop you here...

> 

> It seems to me that you are trying to shoehorn in a refcount

> optimization into page_pool.  The page_pool is optimized for the XDP

> case of one-frame-per-page, where we can avoid changing the refcount,

> and tradeoff memory usage for speed.  It is compatible with the elevated

> refcount usage, but that is not the optimization target.

> 

> If the case you are optimizing for is "packing" more frames in a page,

> then the page_pool might be the wrong choice.  To me it would make more

> sense to create another enum xdp_mem_type, that generalize the

> pageref_bias tricks also used by some drivers.

> 


Hi Jesper,

We share the same interest, I tried to combine the pageref_bias 
optimization on top of the put_page hook, but turns out it doesn't fit. 
That's all.

Of course, I am aware of the fact that page_pool is optimized for XDP 
use cases. But, as drivers prefer a single flow for their 
page-allocation management, rather than having several allocation/free 
methods depending on whether XDP program is loaded or not, the 
performance for non-XDP flow also matters.
I know you're not ignoring this, the fact that you're adding 
compatibility for the elevated refcount usage is a key step in this 
direction.

Another key benefit of page_pool is providing a netdev-optimized API 
that can replace the page allocation / dma mapping logic of the 
different drivers, and take it into one common shared unit.
This helps remove many LOCs from drivers, significantly improves 
modularity, and eases the support of new optimizations.
By improving the general non-XDP flow (packing several packets in a 
page) you encourage more and more drivers to do the transition.

We all look to further improve the page-pool performance. The 
pageref_bias idea does not fit, that's fine.
We can still introduce an API for bulk page-allocation, it will improve 
both XDP and non-XDP flows.

Regards,
Tariq
Eric Dumazet Feb. 12, 2019, 3:15 p.m. | #14
On 02/12/2019 04:39 AM, Tariq Toukan wrote:
> 

> 

> On 2/11/2019 7:14 PM, Eric Dumazet wrote:

>>

>>

>> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

>>>

>>

>>> Hi,

>>>

>>> It's great to use the struct page to store its dma mapping, but I am

>>> worried about extensibility.

>>> page_pool is evolving, and it would need several more per-page fields.

>>> One of them would be pageref_bias, a planned optimization to reduce the

>>> number of the costly atomic pageref operations (and replace existing

>>> code in several drivers).

>>>

>>

>> But the point about pageref_bias is to place it in a different cache line than "struct page"

>>

>> The major cost is having a cache line bouncing between producer and consumer.

>>

> 

> pageref_bias is meant to be dirtied only by the page requester, i.e. the 

> NIC driver / page_pool.

> All other components (basically, SKB release flow / put_page) should 

> continue working with the atomic page_refcnt, and not dirty the 

> pageref_bias.


This is exactly my point.

You suggested to put pageref_bias in struct page, which breaks this completely.

pageref_bias is better kept in a driver structure, with appropriate prefetching
since most NIC use a ring buffer for their queues.

The dma address _can_ be put in the struct page, since the driver does not dirty it
and does not even read it when page can be recycled.
Alexander Duyck Feb. 12, 2019, 6:13 p.m. | #15
On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>

>

>

> On 02/12/2019 04:39 AM, Tariq Toukan wrote:

> >

> >

> > On 2/11/2019 7:14 PM, Eric Dumazet wrote:

> >>

> >>

> >> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

> >>>

> >>

> >>> Hi,

> >>>

> >>> It's great to use the struct page to store its dma mapping, but I am

> >>> worried about extensibility.

> >>> page_pool is evolving, and it would need several more per-page fields.

> >>> One of them would be pageref_bias, a planned optimization to reduce the

> >>> number of the costly atomic pageref operations (and replace existing

> >>> code in several drivers).

> >>>

> >>

> >> But the point about pageref_bias is to place it in a different cache line than "struct page"

> >>

> >> The major cost is having a cache line bouncing between producer and consumer.

> >>

> >

> > pageref_bias is meant to be dirtied only by the page requester, i.e. the

> > NIC driver / page_pool.

> > All other components (basically, SKB release flow / put_page) should

> > continue working with the atomic page_refcnt, and not dirty the

> > pageref_bias.

>

> This is exactly my point.

>

> You suggested to put pageref_bias in struct page, which breaks this completely.

>

> pageref_bias is better kept in a driver structure, with appropriate prefetching

> since most NIC use a ring buffer for their queues.

>

> The dma address _can_ be put in the struct page, since the driver does not dirty it

> and does not even read it when page can be recycled.


Instead of maintaining the pageref_bias in the page itself it could be
maintained in some sort of separate structure. You could just maintain
a pointer to a slot in an array somewhere. Then you can still access
it if needed, the pointer would be static for as long as it is in the
page pool, and you could invalidate the pointer prior to removing the
bias from the page.
Ilias Apalodimas Feb. 12, 2019, 6:20 p.m. | #16
Hi Alexander, 

On Tue, Feb 12, 2019 at 10:13:30AM -0800, Alexander Duyck wrote:
> On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:

> >

> >

> >

> > On 02/12/2019 04:39 AM, Tariq Toukan wrote:

> > >

> > >

> > > On 2/11/2019 7:14 PM, Eric Dumazet wrote:

> > >>

> > >>

> > >> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

> > >>>

> > >>

> > >>> Hi,

> > >>>

> > >>> It's great to use the struct page to store its dma mapping, but I am

> > >>> worried about extensibility.

> > >>> page_pool is evolving, and it would need several more per-page fields.

> > >>> One of them would be pageref_bias, a planned optimization to reduce the

> > >>> number of the costly atomic pageref operations (and replace existing

> > >>> code in several drivers).

> > >>>

> > >>

> > >> But the point about pageref_bias is to place it in a different cache line than "struct page"

> > >>

> > >> The major cost is having a cache line bouncing between producer and consumer.

> > >>

> > >

> > > pageref_bias is meant to be dirtied only by the page requester, i.e. the

> > > NIC driver / page_pool.

> > > All other components (basically, SKB release flow / put_page) should

> > > continue working with the atomic page_refcnt, and not dirty the

> > > pageref_bias.

> >

> > This is exactly my point.

> >

> > You suggested to put pageref_bias in struct page, which breaks this completely.

> >

> > pageref_bias is better kept in a driver structure, with appropriate prefetching

> > since most NIC use a ring buffer for their queues.

> >

> > The dma address _can_ be put in the struct page, since the driver does not dirty it

> > and does not even read it when page can be recycled.

> 

> Instead of maintaining the pageref_bias in the page itself it could be

> maintained in some sort of separate structure. You could just maintain

> a pointer to a slot in an array somewhere. Then you can still access

> it if needed, the pointer would be static for as long as it is in the

> page pool, and you could invalidate the pointer prior to removing the

> bias from the page.


I think that's what Tariq was suggesting in the first place.

/Ilias
Tariq Toukan Feb. 13, 2019, 8:46 a.m. | #17
On 2/12/2019 8:13 PM, Alexander Duyck wrote:
> On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:

>>

>>

>>

>> On 02/12/2019 04:39 AM, Tariq Toukan wrote:

>>>

>>>

>>> On 2/11/2019 7:14 PM, Eric Dumazet wrote:

>>>>

>>>>

>>>> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

>>>>>

>>>>

>>>>> Hi,

>>>>>

>>>>> It's great to use the struct page to store its dma mapping, but I am

>>>>> worried about extensibility.

>>>>> page_pool is evolving, and it would need several more per-page fields.

>>>>> One of them would be pageref_bias, a planned optimization to reduce the

>>>>> number of the costly atomic pageref operations (and replace existing

>>>>> code in several drivers).

>>>>>

>>>>

>>>> But the point about pageref_bias is to place it in a different cache line than "struct page"

>>>>

>>>> The major cost is having a cache line bouncing between producer and consumer.

>>>>

>>>

>>> pageref_bias is meant to be dirtied only by the page requester, i.e. the

>>> NIC driver / page_pool.

>>> All other components (basically, SKB release flow / put_page) should

>>> continue working with the atomic page_refcnt, and not dirty the

>>> pageref_bias.

>>

>> This is exactly my point.

>>

>> You suggested to put pageref_bias in struct page, which breaks this completely.

>>

>> pageref_bias is better kept in a driver structure, with appropriate prefetching

>> since most NIC use a ring buffer for their queues.

>>

>> The dma address _can_ be put in the struct page, since the driver does not dirty it

>> and does not even read it when page can be recycled.

> 

> Instead of maintaining the pageref_bias in the page itself it could be

> maintained in some sort of separate structure. You could just maintain

> a pointer to a slot in an array somewhere. Then you can still access

> it if needed, the pointer would be static for as long as it is in the

> page pool, and you could invalidate the pointer prior to removing the

> bias from the page.

> 


Hi Alex,

That's right.
But as I describe in my other reply yesterday, there is a more serious 
issue with combining the pageref_bias feature with the new page_pool 
capability for elevated refcount pages.
It won't work on top, and that's fine, as the idea of the new elevated 
refcount queue capability is more promising and important, and it wins here.

Regards,
Tariq
Tariq Toukan Feb. 13, 2019, 8:50 a.m. | #18
On 2/12/2019 8:20 PM, Ilias Apalodimas wrote:
> Hi Alexander,

> 

> On Tue, Feb 12, 2019 at 10:13:30AM -0800, Alexander Duyck wrote:

>> On Tue, Feb 12, 2019 at 7:16 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:

>>>

>>>

>>>

>>> On 02/12/2019 04:39 AM, Tariq Toukan wrote:

>>>>

>>>>

>>>> On 2/11/2019 7:14 PM, Eric Dumazet wrote:

>>>>>

>>>>>

>>>>> On 02/11/2019 12:53 AM, Tariq Toukan wrote:

>>>>>>

>>>>>

>>>>>> Hi,

>>>>>>

>>>>>> It's great to use the struct page to store its dma mapping, but I am

>>>>>> worried about extensibility.

>>>>>> page_pool is evolving, and it would need several more per-page fields.

>>>>>> One of them would be pageref_bias, a planned optimization to reduce the

>>>>>> number of the costly atomic pageref operations (and replace existing

>>>>>> code in several drivers).

>>>>>>

>>>>>

>>>>> But the point about pageref_bias is to place it in a different cache line than "struct page"

>>>>>

>>>>> The major cost is having a cache line bouncing between producer and consumer.

>>>>>

>>>>

>>>> pageref_bias is meant to be dirtied only by the page requester, i.e. the

>>>> NIC driver / page_pool.

>>>> All other components (basically, SKB release flow / put_page) should

>>>> continue working with the atomic page_refcnt, and not dirty the

>>>> pageref_bias.

>>>

>>> This is exactly my point.

>>>

>>> You suggested to put pageref_bias in struct page, which breaks this completely.

>>>

>>> pageref_bias is better kept in a driver structure, with appropriate prefetching

>>> since most NIC use a ring buffer for their queues.

>>>

>>> The dma address _can_ be put in the struct page, since the driver does not dirty it

>>> and does not even read it when page can be recycled.

>>

>> Instead of maintaining the pageref_bias in the page itself it could be

>> maintained in some sort of separate structure. You could just maintain

>> a pointer to a slot in an array somewhere. Then you can still access

>> it if needed, the pointer would be static for as long as it is in the

>> page pool, and you could invalidate the pointer prior to removing the

>> bias from the page.

> 

> I think that's what Tariq was suggesting in the first place.

> 

> /Ilias

> 


Correct.
But not relevant anymore, as it won't work for other reasons.

Tariq

Patch

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 694d055..618f2e5 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -98,6 +98,52 @@  struct page_pool {
 	struct ptr_ring ring;
 };
 
+/* Until we can update struct-page, have a shadow struct-page, that
+ * include our use-case
+ * Used to store retrieve dma addresses from network drivers.
+ * Never access this directly, use helper functions provided
+ * page_pool_get_dma_addr()
+ */
+struct page_shadow {
+	unsigned long flags;		/* Atomic flags, some possibly
+					 * updated asynchronously
+					 */
+	/*
+	 * Five words (20/40 bytes) are available in this union.
+	 * WARNING: bit 0 of the first word is used for PageTail(). That
+	 * means the other users of this union MUST NOT use the bit to
+	 * avoid collision and false-positive PageTail().
+	 */
+	union {
+		struct {	/* Page cache and anonymous pages */
+			/**
+			 * @lru: Pageout list, eg. active_list protected by
+			 * zone_lru_lock.  Sometimes used as a generic list
+			 * by the page owner.
+			 */
+			struct list_head lru;
+			/* See page-flags.h for PAGE_MAPPING_FLAGS */
+			struct address_space *mapping;
+			pgoff_t index;		/* Our offset within mapping. */
+			/**
+			 * @private: Mapping-private opaque data.
+			 * Usually used for buffer_heads if PagePrivate.
+			 * Used for swp_entry_t if PageSwapCache.
+			 * Indicates order in the buddy system if PageBuddy.
+			 */
+			unsigned long private;
+		};
+		struct {	/* page_pool used by netstack */
+			/**
+			 * @dma_addr: Page_pool need to store DMA-addr, and
+			 * cannot use @private, as DMA-mappings can be 64bit
+			 * even on 32-bit Architectures.
+			 */
+			dma_addr_t dma_addr; /* Shares area with @lru */
+		};
+	};
+};
+
 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
 
 static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
@@ -141,4 +187,13 @@  static inline bool is_page_pool_compiled_in(void)
 #endif
 }
 
+static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
+{
+	struct page_shadow *_page;
+
+	_page = (struct page_shadow *)page;
+
+	return _page->dma_addr;
+}
+
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 43a932c..1a956a6 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -111,6 +111,7 @@  noinline
 static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 						 gfp_t _gfp)
 {
+	struct page_shadow *_page;
 	struct page *page;
 	gfp_t gfp = _gfp;
 	dma_addr_t dma;
@@ -136,7 +137,7 @@  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	if (!(pool->p.flags & PP_FLAG_DMA_MAP))
 		goto skip_dma_map;
 
-	/* Setup DMA mapping: use page->private for DMA-addr
+	/* Setup DMA mapping: use struct-page area for storing DMA-addr
 	 * This mapping is kept for lifetime of page, until leaving pool.
 	 */
 	dma = dma_map_page(pool->p.dev, page, 0,
@@ -146,7 +147,8 @@  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 		put_page(page);
 		return NULL;
 	}
-	set_page_private(page, dma); /* page->private = dma; */
+	_page = (struct page_shadow *)page;
+	_page->dma_addr = dma;
 
 skip_dma_map:
 	/* When page just alloc'ed is should/must have refcnt 1. */
@@ -175,13 +177,21 @@  EXPORT_SYMBOL(page_pool_alloc_pages);
 static void __page_pool_clean_page(struct page_pool *pool,
 				   struct page *page)
 {
+	struct page_shadow *_page = (struct page_shadow *)page;
+	dma_addr_t dma;
+
 	if (!(pool->p.flags & PP_FLAG_DMA_MAP))
 		return;
 
+	dma = _page->dma_addr;
+
 	/* DMA unmap */
-	dma_unmap_page(pool->p.dev, page_private(page),
+	dma_unmap_page(pool->p.dev, dma,
 		       PAGE_SIZE << pool->p.order, pool->p.dma_dir);
-	set_page_private(page, 0);
+	_page->dma_addr = 0;
+	/* 1. Make sure we don't need to list-init page->lru.
+	 * 2. What does it mean: bit 0 of LRU first word is used for PageTail()
+	 */
 }
 
 /* Return a page to the page allocator, cleaning up our state */