diff mbox series

[1/1,v2] skbuff: Fix a potential race while recycling page_pool packets

Message ID 20210709062943.101532-1-ilias.apalodimas@linaro.org
State New
Headers show
Series [1/1,v2] skbuff: Fix a potential race while recycling page_pool packets | expand

Commit Message

Ilias Apalodimas July 9, 2021, 6:29 a.m. UTC
As Alexander points out, when we are trying to recycle a cloned/expanded
SKB we might trigger a race.  The recycling code relies on the
pp_recycle bit to trigger,  which we carry over to cloned SKBs.
If that cloned SKB gets expanded or if we get references to the frags,
call skbb_release_data() and overwrite skb->head, we are creating separate
instances accessing the same page frags.  Since the skb_release_data()
will first try to recycle the frags,  there's a potential race between
the original and cloned SKB, since both will have the pp_recycle bit set.

Fix this by explicitly those SKBs not recyclable.
The atomic_sub_return effectively limits us to a single release case,
and when we are calling skb_release_data we are also releasing the
option to perform the recycling, or releasing the pages from the page pool.

Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")
Reported-by: Alexander Duyck <alexanderduyck@fb.com>
Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

---
Changes since v1:
- Set the recycle bit to 0 during skb_release_data instead of the 
  individual fucntions triggering the issue, in order to catch all 
  cases
 net/core/skbuff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.32.0.rc0

Comments

Alexander Duyck July 9, 2021, 2:34 p.m. UTC | #1
On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>

> As Alexander points out, when we are trying to recycle a cloned/expanded

> SKB we might trigger a race.  The recycling code relies on the

> pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> If that cloned SKB gets expanded or if we get references to the frags,

> call skbb_release_data() and overwrite skb->head, we are creating separate

> instances accessing the same page frags.  Since the skb_release_data()

> will first try to recycle the frags,  there's a potential race between

> the original and cloned SKB, since both will have the pp_recycle bit set.

>

> Fix this by explicitly those SKBs not recyclable.

> The atomic_sub_return effectively limits us to a single release case,

> and when we are calling skb_release_data we are also releasing the

> option to perform the recycling, or releasing the pages from the page pool.

>

> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> ---

> Changes since v1:

> - Set the recycle bit to 0 during skb_release_data instead of the

>   individual fucntions triggering the issue, in order to catch all

>   cases

>  net/core/skbuff.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

>

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> index 12aabcda6db2..f91f09a824be 100644

> --- a/net/core/skbuff.c

> +++ b/net/core/skbuff.c

> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

>         if (skb->cloned &&

>             atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>                               &shinfo->dataref))

> -               return;

> +               goto exit;

>

>         skb_zcopy_clear(skb, true);

>

> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

>                 kfree_skb_list(shinfo->frag_list);

>

>         skb_free_head(skb);

> +exit:

> +       skb->pp_recycle = 0;

>  }

>

>  /*

> --

> 2.32.0.rc0

>


This is probably the cleanest approach with the least amount of
change, but one thing I am concerned with in this approach is that we
end up having to dirty a cacheline that I am not sure is otherwise
touched during skb cleanup. I am not sure if that will be an issue or
not. If it is then an alternative or follow-on patch could move the
pp_recycle flag into the skb_shared_info flags itself and then make
certain that we clear it around the same time we are setting
shinfo->dataref to 1.

Otherwise this looks good to me.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Ilias Apalodimas July 9, 2021, 5:29 p.m. UTC | #2
On Fri, Jul 09, 2021 at 07:34:38AM -0700, Alexander Duyck wrote:
> On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas

> <ilias.apalodimas@linaro.org> wrote:

> >

> > As Alexander points out, when we are trying to recycle a cloned/expanded

> > SKB we might trigger a race.  The recycling code relies on the

> > pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> > If that cloned SKB gets expanded or if we get references to the frags,

> > call skbb_release_data() and overwrite skb->head, we are creating separate

> > instances accessing the same page frags.  Since the skb_release_data()

> > will first try to recycle the frags,  there's a potential race between

> > the original and cloned SKB, since both will have the pp_recycle bit set.

> >

> > Fix this by explicitly those SKBs not recyclable.

> > The atomic_sub_return effectively limits us to a single release case,

> > and when we are calling skb_release_data we are also releasing the

> > option to perform the recycling, or releasing the pages from the page pool.

> >

> > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> > Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> > Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > ---

> > Changes since v1:

> > - Set the recycle bit to 0 during skb_release_data instead of the

> >   individual fucntions triggering the issue, in order to catch all

> >   cases

> >  net/core/skbuff.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> >

> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> > index 12aabcda6db2..f91f09a824be 100644

> > --- a/net/core/skbuff.c

> > +++ b/net/core/skbuff.c

> > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

> >         if (skb->cloned &&

> >             atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >                               &shinfo->dataref))

> > -               return;

> > +               goto exit;

> >

> >         skb_zcopy_clear(skb, true);

> >

> > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> >                 kfree_skb_list(shinfo->frag_list);

> >

> >         skb_free_head(skb);

> > +exit:

> > +       skb->pp_recycle = 0;

> >  }

> >

> >  /*

> > --

> > 2.32.0.rc0

> >

> 

> This is probably the cleanest approach with the least amount of

> change, but one thing I am concerned with in this approach is that we

> end up having to dirty a cacheline that I am not sure is otherwise

> touched during skb cleanup. I am not sure if that will be an issue or

> not. If it is then an alternative or follow-on patch could move the

> pp_recycle flag into the skb_shared_info flags itself and then make

> certain that we clear it around the same time we are setting

> shinfo->dataref to 1.

> 


Yep that's a viable alternative. Let's see if there's any measurable
impact.

> Otherwise this looks good to me.

> 

> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>


Thanks Alexander!
Jesper Dangaard Brouer July 12, 2021, 11:52 a.m. UTC | #3
On 09/07/2021 16.34, Alexander Duyck wrote:
> On Thu, Jul 8, 2021 at 11:30 PM Ilias Apalodimas

> <ilias.apalodimas@linaro.org> wrote:

>>

>> As Alexander points out, when we are trying to recycle a cloned/expanded

>> SKB we might trigger a race.  The recycling code relies on the

>> pp_recycle bit to trigger,  which we carry over to cloned SKBs.

>> If that cloned SKB gets expanded or if we get references to the frags,

>> call skbb_release_data() and overwrite skb->head, we are creating separate

>> instances accessing the same page frags.  Since the skb_release_data()

>> will first try to recycle the frags,  there's a potential race between

>> the original and cloned SKB, since both will have the pp_recycle bit set.

>>

>> Fix this by explicitly those SKBs not recyclable.

>> The atomic_sub_return effectively limits us to a single release case,

>> and when we are calling skb_release_data we are also releasing the

>> option to perform the recycling, or releasing the pages from the page pool.

>>

>> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

>> Reported-by: Alexander Duyck <alexanderduyck@fb.com>

>> Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

>> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>> ---

>> Changes since v1:

>> - Set the recycle bit to 0 during skb_release_data instead of the

>>    individual fucntions triggering the issue, in order to catch all

>>    cases

>>   net/core/skbuff.c | 4 +++-

>>   1 file changed, 3 insertions(+), 1 deletion(-)

>>

>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

>> index 12aabcda6db2..f91f09a824be 100644

>> --- a/net/core/skbuff.c

>> +++ b/net/core/skbuff.c

>> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

>>          if (skb->cloned &&

>>              atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>>                                &shinfo->dataref))

>> -               return;

>> +               goto exit;

>>

>>          skb_zcopy_clear(skb, true);

>>

>> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

>>                  kfree_skb_list(shinfo->frag_list);

>>

>>          skb_free_head(skb);

>> +exit:

>> +       skb->pp_recycle = 0;

>>   }

>>

>>   /*

>> --

>> 2.32.0.rc0

>>

> 

> This is probably the cleanest approach with the least amount of

> change, but one thing I am concerned with in this approach is that we

> end up having to dirty a cacheline that I am not sure is otherwise

> touched during skb cleanup. I am not sure if that will be an issue or

> not. If it is then an alternative or follow-on patch could move the

> pp_recycle flag into the skb_shared_info flags itself and then make

> certain that we clear it around the same time we are setting

> shinfo->dataref to 1.

> 


The skb->cloned and skb->pp_recycle (bitfields) are on the same 
cache-line (incl. nohdr, destructor, active_extensions).  Thus, we know 
this must be in CPUs cache, regardless of this change.  I do acknowledge 
that it might be in cache coherency "Shared" state, and writing 
skb->pp_recycle=0 the CPU *might* have to change the cache coherency 
state, but I don't expect this to be a performance problem.

> Otherwise this looks good to me.

> 

> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

  Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>


I've gone over the code-path, with Ilias on IRC and I've convinced 
myself that this fix is correct, thus ACK.
Yunsheng Lin July 15, 2021, 4 a.m. UTC | #4
On 2021/7/9 14:29, Ilias Apalodimas wrote:
> As Alexander points out, when we are trying to recycle a cloned/expanded

> SKB we might trigger a race.  The recycling code relies on the

> pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> If that cloned SKB gets expanded or if we get references to the frags,

> call skbb_release_data() and overwrite skb->head, we are creating separate

> instances accessing the same page frags.  Since the skb_release_data()

> will first try to recycle the frags,  there's a potential race between

> the original and cloned SKB, since both will have the pp_recycle bit set.

> 

> Fix this by explicitly those SKBs not recyclable.

> The atomic_sub_return effectively limits us to a single release case,

> and when we are calling skb_release_data we are also releasing the

> option to perform the recycling, or releasing the pages from the page pool.

> 

> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> ---

> Changes since v1:

> - Set the recycle bit to 0 during skb_release_data instead of the 

>   individual fucntions triggering the issue, in order to catch all 

>   cases

>  net/core/skbuff.c | 4 +++-

>  1 file changed, 3 insertions(+), 1 deletion(-)

> 

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> index 12aabcda6db2..f91f09a824be 100644

> --- a/net/core/skbuff.c

> +++ b/net/core/skbuff.c

> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

>  	if (skb->cloned &&

>  	    atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>  			      &shinfo->dataref))

> -		return;

> +		goto exit;


Is it possible this patch may break the head frag page for the original skb,
supposing it's head frag page is from the page pool and below change clears
the pp_recycle for original skb, causing a page leaking for the page pool?

>  

>  	skb_zcopy_clear(skb, true);

>  

> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

>  		kfree_skb_list(shinfo->frag_list);

>  

>  	skb_free_head(skb);

> +exit:

> +	skb->pp_recycle = 0;

>  }

>  

>  /*

>
Ilias Apalodimas July 15, 2021, 10 a.m. UTC | #5
On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>

> On 2021/7/9 14:29, Ilias Apalodimas wrote:

> > As Alexander points out, when we are trying to recycle a cloned/expanded

> > SKB we might trigger a race.  The recycling code relies on the

> > pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> > If that cloned SKB gets expanded or if we get references to the frags,

> > call skbb_release_data() and overwrite skb->head, we are creating separate

> > instances accessing the same page frags.  Since the skb_release_data()

> > will first try to recycle the frags,  there's a potential race between

> > the original and cloned SKB, since both will have the pp_recycle bit set.

> >

> > Fix this by explicitly those SKBs not recyclable.

> > The atomic_sub_return effectively limits us to a single release case,

> > and when we are calling skb_release_data we are also releasing the

> > option to perform the recycling, or releasing the pages from the page pool.

> >

> > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> > Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> > Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > ---

> > Changes since v1:

> > - Set the recycle bit to 0 during skb_release_data instead of the

> >   individual fucntions triggering the issue, in order to catch all

> >   cases

> >  net/core/skbuff.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> >

> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> > index 12aabcda6db2..f91f09a824be 100644

> > --- a/net/core/skbuff.c

> > +++ b/net/core/skbuff.c

> > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

> >       if (skb->cloned &&

> >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >                             &shinfo->dataref))

> > -             return;

> > +             goto exit;

>

> Is it possible this patch may break the head frag page for the original skb,

> supposing it's head frag page is from the page pool and below change clears

> the pp_recycle for original skb, causing a page leaking for the page pool?

>


So this would leak eventually dma mapping if the skb is cloned (and
the dataref is now +1) and we are freeing the original skb first?

> >

> >       skb_zcopy_clear(skb, true);

> >

> > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> >               kfree_skb_list(shinfo->frag_list);

> >

> >       skb_free_head(skb);

> > +exit:

> > +     skb->pp_recycle = 0;

> >  }

> >

> >  /*

> >
Ilias Apalodimas July 15, 2021, 10:38 a.m. UTC | #6
On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>

> On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote:

> >

> > On 2021/7/9 14:29, Ilias Apalodimas wrote:

> > > As Alexander points out, when we are trying to recycle a cloned/expanded

> > > SKB we might trigger a race.  The recycling code relies on the

> > > pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> > > If that cloned SKB gets expanded or if we get references to the frags,

> > > call skbb_release_data() and overwrite skb->head, we are creating separate

> > > instances accessing the same page frags.  Since the skb_release_data()

> > > will first try to recycle the frags,  there's a potential race between

> > > the original and cloned SKB, since both will have the pp_recycle bit set.

> > >

> > > Fix this by explicitly those SKBs not recyclable.

> > > The atomic_sub_return effectively limits us to a single release case,

> > > and when we are calling skb_release_data we are also releasing the

> > > option to perform the recycling, or releasing the pages from the page pool.

> > >

> > > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> > > Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> > > Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > > ---

> > > Changes since v1:

> > > - Set the recycle bit to 0 during skb_release_data instead of the

> > >   individual fucntions triggering the issue, in order to catch all

> > >   cases

> > >  net/core/skbuff.c | 4 +++-

> > >  1 file changed, 3 insertions(+), 1 deletion(-)

> > >

> > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> > > index 12aabcda6db2..f91f09a824be 100644

> > > --- a/net/core/skbuff.c

> > > +++ b/net/core/skbuff.c

> > > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

> > >       if (skb->cloned &&

> > >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> > >                             &shinfo->dataref))

> > > -             return;

> > > +             goto exit;

> >

> > Is it possible this patch may break the head frag page for the original skb,

> > supposing it's head frag page is from the page pool and below change clears

> > the pp_recycle for original skb, causing a page leaking for the page pool?

> >

>

> So this would leak eventually dma mapping if the skb is cloned (and

> the dataref is now +1) and we are freeing the original skb first?

>


Apologies for the noise, my description was not complete.
The case you are thinking is clone an SKB and then expand the original?

thanks
/Ilias


> > >

> > >       skb_zcopy_clear(skb, true);

> > >

> > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> > >               kfree_skb_list(shinfo->frag_list);

> > >

> > >       skb_free_head(skb);

> > > +exit:

> > > +     skb->pp_recycle = 0;

> > >  }

> > >

> > >  /*

> > >
Yunsheng Lin July 15, 2021, 10:47 a.m. UTC | #7
On 2021/7/15 18:38, Ilias Apalodimas wrote:
> On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas

> <ilias.apalodimas@linaro.org> wrote:

>>

>> On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote:

>>>

>>> On 2021/7/9 14:29, Ilias Apalodimas wrote:

>>>> As Alexander points out, when we are trying to recycle a cloned/expanded

>>>> SKB we might trigger a race.  The recycling code relies on the

>>>> pp_recycle bit to trigger,  which we carry over to cloned SKBs.

>>>> If that cloned SKB gets expanded or if we get references to the frags,

>>>> call skbb_release_data() and overwrite skb->head, we are creating separate

>>>> instances accessing the same page frags.  Since the skb_release_data()

>>>> will first try to recycle the frags,  there's a potential race between

>>>> the original and cloned SKB, since both will have the pp_recycle bit set.

>>>>

>>>> Fix this by explicitly those SKBs not recyclable.

>>>> The atomic_sub_return effectively limits us to a single release case,

>>>> and when we are calling skb_release_data we are also releasing the

>>>> option to perform the recycling, or releasing the pages from the page pool.

>>>>

>>>> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

>>>> Reported-by: Alexander Duyck <alexanderduyck@fb.com>

>>>> Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

>>>> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

>>>> ---

>>>> Changes since v1:

>>>> - Set the recycle bit to 0 during skb_release_data instead of the

>>>>   individual fucntions triggering the issue, in order to catch all

>>>>   cases

>>>>  net/core/skbuff.c | 4 +++-

>>>>  1 file changed, 3 insertions(+), 1 deletion(-)

>>>>

>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

>>>> index 12aabcda6db2..f91f09a824be 100644

>>>> --- a/net/core/skbuff.c

>>>> +++ b/net/core/skbuff.c

>>>> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

>>>>       if (skb->cloned &&

>>>>           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>>>>                             &shinfo->dataref))

>>>> -             return;

>>>> +             goto exit;

>>>

>>> Is it possible this patch may break the head frag page for the original skb,

>>> supposing it's head frag page is from the page pool and below change clears

>>> the pp_recycle for original skb, causing a page leaking for the page pool?

>>>

>>

>> So this would leak eventually dma mapping if the skb is cloned (and

>> the dataref is now +1) and we are freeing the original skb first?

>>

> 

> Apologies for the noise, my description was not complete.

> The case you are thinking is clone an SKB and then expand the original?


Yes.
It seems we might need different pp_recycle bit for head frag and data frag.

> 

> thanks

> /Ilias

> 

> 

>>>>

>>>>       skb_zcopy_clear(skb, true);

>>>>

>>>> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

>>>>               kfree_skb_list(shinfo->frag_list);

>>>>

>>>>       skb_free_head(skb);

>>>> +exit:

>>>> +     skb->pp_recycle = 0;

>>>>  }

>>>>

>>>>  /*

>>>>

> .

>
Ilias Apalodimas July 15, 2021, 12:37 p.m. UTC | #8
On Thu, 15 Jul 2021 at 13:48, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>

> On 2021/7/15 18:38, Ilias Apalodimas wrote:

> > On Thu, 15 Jul 2021 at 13:00, Ilias Apalodimas

> > <ilias.apalodimas@linaro.org> wrote:

> >>

> >> On Thu, 15 Jul 2021 at 07:01, Yunsheng Lin <linyunsheng@huawei.com> wrote:

> >>>

> >>> On 2021/7/9 14:29, Ilias Apalodimas wrote:

> >>>> As Alexander points out, when we are trying to recycle a cloned/expanded

> >>>> SKB we might trigger a race.  The recycling code relies on the

> >>>> pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> >>>> If that cloned SKB gets expanded or if we get references to the frags,

> >>>> call skbb_release_data() and overwrite skb->head, we are creating separate

> >>>> instances accessing the same page frags.  Since the skb_release_data()

> >>>> will first try to recycle the frags,  there's a potential race between

> >>>> the original and cloned SKB, since both will have the pp_recycle bit set.

> >>>>

> >>>> Fix this by explicitly those SKBs not recyclable.

> >>>> The atomic_sub_return effectively limits us to a single release case,

> >>>> and when we are calling skb_release_data we are also releasing the

> >>>> option to perform the recycling, or releasing the pages from the page pool.

> >>>>

> >>>> Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> >>>> Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> >>>> Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> >>>> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> >>>> ---

> >>>> Changes since v1:

> >>>> - Set the recycle bit to 0 during skb_release_data instead of the

> >>>>   individual fucntions triggering the issue, in order to catch all

> >>>>   cases

> >>>>  net/core/skbuff.c | 4 +++-

> >>>>  1 file changed, 3 insertions(+), 1 deletion(-)

> >>>>

> >>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> >>>> index 12aabcda6db2..f91f09a824be 100644

> >>>> --- a/net/core/skbuff.c

> >>>> +++ b/net/core/skbuff.c

> >>>> @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

> >>>>       if (skb->cloned &&

> >>>>           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >>>>                             &shinfo->dataref))

> >>>> -             return;

> >>>> +             goto exit;

> >>>

> >>> Is it possible this patch may break the head frag page for the original skb,

> >>> supposing it's head frag page is from the page pool and below change clears

> >>> the pp_recycle for original skb, causing a page leaking for the page pool?

> >>>

> >>

> >> So this would leak eventually dma mapping if the skb is cloned (and

> >> the dataref is now +1) and we are freeing the original skb first?

> >>

> >

> > Apologies for the noise, my description was not complete.

> > The case you are thinking is clone an SKB and then expand the original?

>

> Yes.

> It seems we might need different pp_recycle bit for head frag and data frag.


We could just reset the pp_recycle flag on pskb_carve_inside_header,
pskb_expand_header and pskb_carve_inside_nonlinear which were the
three functions that might trigger the race to begin with.  The point
on adding it on skb_release_data was to have a catch all for all
future cases ...
Let me stare at itt a bit more in case I can come up with something better

Thanks
/Ilias
>

> >

> > thanks

> > /Ilias

> >

> >

> >>>>

> >>>>       skb_zcopy_clear(skb, true);

> >>>>

> >>>> @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> >>>>               kfree_skb_list(shinfo->frag_list);

> >>>>

> >>>>       skb_free_head(skb);

> >>>> +exit:

> >>>> +     skb->pp_recycle = 0;

> >>>>  }

> >>>>

> >>>>  /*

> >>>>

> > .

> >
Alexander Duyck July 15, 2021, 2:25 p.m. UTC | #9
On Wed, Jul 14, 2021 at 9:02 PM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>

> On 2021/7/9 14:29, Ilias Apalodimas wrote:

> > As Alexander points out, when we are trying to recycle a cloned/expanded

> > SKB we might trigger a race.  The recycling code relies on the

> > pp_recycle bit to trigger,  which we carry over to cloned SKBs.

> > If that cloned SKB gets expanded or if we get references to the frags,

> > call skbb_release_data() and overwrite skb->head, we are creating separate

> > instances accessing the same page frags.  Since the skb_release_data()

> > will first try to recycle the frags,  there's a potential race between

> > the original and cloned SKB, since both will have the pp_recycle bit set.

> >

> > Fix this by explicitly those SKBs not recyclable.

> > The atomic_sub_return effectively limits us to a single release case,

> > and when we are calling skb_release_data we are also releasing the

> > option to perform the recycling, or releasing the pages from the page pool.

> >

> > Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling")

> > Reported-by: Alexander Duyck <alexanderduyck@fb.com>

> > Suggested-by: Alexander Duyck <alexanderduyck@fb.com>

> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > ---

> > Changes since v1:

> > - Set the recycle bit to 0 during skb_release_data instead of the

> >   individual fucntions triggering the issue, in order to catch all

> >   cases

> >  net/core/skbuff.c | 4 +++-

> >  1 file changed, 3 insertions(+), 1 deletion(-)

> >

> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c

> > index 12aabcda6db2..f91f09a824be 100644

> > --- a/net/core/skbuff.c

> > +++ b/net/core/skbuff.c

> > @@ -663,7 +663,7 @@ static void skb_release_data(struct sk_buff *skb)

> >       if (skb->cloned &&

> >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >                             &shinfo->dataref))

> > -             return;

> > +             goto exit;

>

> Is it possible this patch may break the head frag page for the original skb,

> supposing it's head frag page is from the page pool and below change clears

> the pp_recycle for original skb, causing a page leaking for the page pool?


I don't see how. The assumption here is that when atomic_sub_return
gets down to 0 we will still have an skb with skb->pp_recycle set and
it will flow down and encounter skb_free_head below. All we are doing
is skipping those steps and clearing skb->pp_recycle for all but the
last buffer and the last one to free it will trigger the recycling.

> >

> >       skb_zcopy_clear(skb, true);

> >

> > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> >               kfree_skb_list(shinfo->frag_list);

> >

> >       skb_free_head(skb);

> > +exit:

> > +     skb->pp_recycle = 0;


Note the path here. We don't clear skb->pp_recycle for the last buffer
where "dataref == 0" until *AFTER* the head has been freed, and all
clones will have skb->pp_recycle = 1 as long as they are a clone of
the original skb that had it set.
Ilias Apalodimas July 15, 2021, 2:45 p.m. UTC | #10
> > >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,


[...]

> > >                             &shinfo->dataref))

> > > -             return;

> > > +             goto exit;

> >

> > Is it possible this patch may break the head frag page for the original skb,

> > supposing it's head frag page is from the page pool and below change clears

> > the pp_recycle for original skb, causing a page leaking for the page pool?

> 

> I don't see how. The assumption here is that when atomic_sub_return

> gets down to 0 we will still have an skb with skb->pp_recycle set and

> it will flow down and encounter skb_free_head below. All we are doing

> is skipping those steps and clearing skb->pp_recycle for all but the

> last buffer and the last one to free it will trigger the recycling.


I think the assumption here is that 
1. We clone an skb
2. The original skb goes into pskb_expand_head()
3. skb_release_data() will be called for the original skb

But with the dataref bumped, we'll skip the recycling for it but we'll also
skip recycling or unmapping the current head (which is a page_pool mapped
buffer)

> 

> > >

> > >       skb_zcopy_clear(skb, true);

> > >

> > > @@ -674,6 +674,8 @@ static void skb_release_data(struct sk_buff *skb)

> > >               kfree_skb_list(shinfo->frag_list);

> > >

> > >       skb_free_head(skb);

> > > +exit:

> > > +     skb->pp_recycle = 0;

> 

> Note the path here. We don't clear skb->pp_recycle for the last buffer

> where "dataref == 0" until *AFTER* the head has been freed, and all

> clones will have skb->pp_recycle = 1 as long as they are a clone of

> the original skb that had it set.
Alexander Duyck July 15, 2021, 2:57 p.m. UTC | #11
On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>

> > > >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>

> [...]

>

> > > >                             &shinfo->dataref))

> > > > -             return;

> > > > +             goto exit;

> > >

> > > Is it possible this patch may break the head frag page for the original skb,

> > > supposing it's head frag page is from the page pool and below change clears

> > > the pp_recycle for original skb, causing a page leaking for the page pool?

> >

> > I don't see how. The assumption here is that when atomic_sub_return

> > gets down to 0 we will still have an skb with skb->pp_recycle set and

> > it will flow down and encounter skb_free_head below. All we are doing

> > is skipping those steps and clearing skb->pp_recycle for all but the

> > last buffer and the last one to free it will trigger the recycling.

>

> I think the assumption here is that

> 1. We clone an skb

> 2. The original skb goes into pskb_expand_head()

> 3. skb_release_data() will be called for the original skb

>

> But with the dataref bumped, we'll skip the recycling for it but we'll also

> skip recycling or unmapping the current head (which is a page_pool mapped

> buffer)


Right, but in that case it is the clone that is left holding the
original head and the skb->pp_recycle flag is set on the clone as it
was copied from the original when we cloned it. What we have
essentially done is transferred the responsibility for freeing it from
the original to the clone.

If you think about it the result is the same as if step 2 was to go
into kfree_skb. We would still be calling skb_release_data and the
dataref would be decremented without the original freeing the page. We
have to wait until all the clones are freed and dataref reaches 0
before the head can be recycled.
Ilias Apalodimas July 15, 2021, 3:02 p.m. UTC | #12
On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote:
> On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas

> <ilias.apalodimas@linaro.org> wrote:

> >

> > > > >           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >

> > [...]

> >

> > > > >                             &shinfo->dataref))

> > > > > -             return;

> > > > > +             goto exit;

> > > >

> > > > Is it possible this patch may break the head frag page for the original skb,

> > > > supposing it's head frag page is from the page pool and below change clears

> > > > the pp_recycle for original skb, causing a page leaking for the page pool?

> > >

> > > I don't see how. The assumption here is that when atomic_sub_return

> > > gets down to 0 we will still have an skb with skb->pp_recycle set and

> > > it will flow down and encounter skb_free_head below. All we are doing

> > > is skipping those steps and clearing skb->pp_recycle for all but the

> > > last buffer and the last one to free it will trigger the recycling.

> >

> > I think the assumption here is that

> > 1. We clone an skb

> > 2. The original skb goes into pskb_expand_head()

> > 3. skb_release_data() will be called for the original skb

> >

> > But with the dataref bumped, we'll skip the recycling for it but we'll also

> > skip recycling or unmapping the current head (which is a page_pool mapped

> > buffer)

> 

> Right, but in that case it is the clone that is left holding the

> original head and the skb->pp_recycle flag is set on the clone as it

> was copied from the original when we cloned it. 


Ah yes, that's what I missed

> What we have

> essentially done is transferred the responsibility for freeing it from

> the original to the clone.

> 

> If you think about it the result is the same as if step 2 was to go

> into kfree_skb. We would still be calling skb_release_data and the

> dataref would be decremented without the original freeing the page. We

> have to wait until all the clones are freed and dataref reaches 0

> before the head can be recycled.


Yep sounds correct

Thanks
/Ilias
Yunsheng Lin July 16, 2021, 2:30 a.m. UTC | #13
On 2021/7/15 23:02, Ilias Apalodimas wrote:
> On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote:

>> On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas

>> <ilias.apalodimas@linaro.org> wrote:

>>>

>>>>>>           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

>>>

>>> [...]

>>>

>>>>>>                             &shinfo->dataref))

>>>>>> -             return;

>>>>>> +             goto exit;

>>>>>

>>>>> Is it possible this patch may break the head frag page for the original skb,

>>>>> supposing it's head frag page is from the page pool and below change clears

>>>>> the pp_recycle for original skb, causing a page leaking for the page pool?

>>>>

>>>> I don't see how. The assumption here is that when atomic_sub_return

>>>> gets down to 0 we will still have an skb with skb->pp_recycle set and

>>>> it will flow down and encounter skb_free_head below. All we are doing

>>>> is skipping those steps and clearing skb->pp_recycle for all but the

>>>> last buffer and the last one to free it will trigger the recycling.

>>>

>>> I think the assumption here is that

>>> 1. We clone an skb

>>> 2. The original skb goes into pskb_expand_head()

>>> 3. skb_release_data() will be called for the original skb

>>>

>>> But with the dataref bumped, we'll skip the recycling for it but we'll also

>>> skip recycling or unmapping the current head (which is a page_pool mapped

>>> buffer)

>>

>> Right, but in that case it is the clone that is left holding the

>> original head and the skb->pp_recycle flag is set on the clone as it

>> was copied from the original when we cloned it. 

> 

> Ah yes, that's what I missed

> 

>> What we have

>> essentially done is transferred the responsibility for freeing it from

>> the original to the clone.

>>

>> If you think about it the result is the same as if step 2 was to go

>> into kfree_skb. We would still be calling skb_release_data and the

>> dataref would be decremented without the original freeing the page. We

>> have to wait until all the clones are freed and dataref reaches 0

>> before the head can be recycled.

> 

> Yep sounds correct


Ok, I suppose the fraglist skb is handled similar as the regular skb, right?

Also, this patch might need respinning as the state of this patch is "Changes
Requested" in patchwork.

> 

> Thanks

> /Ilias

> .

>
Ilias Apalodimas July 16, 2021, 6:43 a.m. UTC | #14
On Fri, 16 Jul 2021 at 05:30, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>

> On 2021/7/15 23:02, Ilias Apalodimas wrote:

> > On Thu, Jul 15, 2021 at 07:57:57AM -0700, Alexander Duyck wrote:

> >> On Thu, Jul 15, 2021 at 7:45 AM Ilias Apalodimas

> >> <ilias.apalodimas@linaro.org> wrote:

> >>>

> >>>>>>           atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,

> >>>

> >>> [...]

> >>>

> >>>>>>                             &shinfo->dataref))

> >>>>>> -             return;

> >>>>>> +             goto exit;

> >>>>>

> >>>>> Is it possible this patch may break the head frag page for the original skb,

> >>>>> supposing it's head frag page is from the page pool and below change clears

> >>>>> the pp_recycle for original skb, causing a page leaking for the page pool?

> >>>>

> >>>> I don't see how. The assumption here is that when atomic_sub_return

> >>>> gets down to 0 we will still have an skb with skb->pp_recycle set and

> >>>> it will flow down and encounter skb_free_head below. All we are doing

> >>>> is skipping those steps and clearing skb->pp_recycle for all but the

> >>>> last buffer and the last one to free it will trigger the recycling.

> >>>

> >>> I think the assumption here is that

> >>> 1. We clone an skb

> >>> 2. The original skb goes into pskb_expand_head()

> >>> 3. skb_release_data() will be called for the original skb

> >>>

> >>> But with the dataref bumped, we'll skip the recycling for it but we'll also

> >>> skip recycling or unmapping the current head (which is a page_pool mapped

> >>> buffer)

> >>

> >> Right, but in that case it is the clone that is left holding the

> >> original head and the skb->pp_recycle flag is set on the clone as it

> >> was copied from the original when we cloned it.

> >

> > Ah yes, that's what I missed

> >

> >> What we have

> >> essentially done is transferred the responsibility for freeing it from

> >> the original to the clone.

> >>

> >> If you think about it the result is the same as if step 2 was to go

> >> into kfree_skb. We would still be calling skb_release_data and the

> >> dataref would be decremented without the original freeing the page. We

> >> have to wait until all the clones are freed and dataref reaches 0

> >> before the head can be recycled.

> >

> > Yep sounds correct

>

> Ok, I suppose the fraglist skb is handled similar as the regular skb, right?

>


Yes, even in the fragments case your cloned/expanded SBK will still
have the recycle bit set, so it will try to recycle them or unmap them

> Also, this patch might need respinning as the state of this patch is "Changes

> Requested" in patchwork.


Thanks, I'll respin it and add a comment explaining why

>

> >

> > Thanks

> > /Ilias

> > .

> >
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 12aabcda6db2..f91f09a824be 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -663,7 +663,7 @@  static void skb_release_data(struct sk_buff *skb)
 	if (skb->cloned &&
 	    atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1,
 			      &shinfo->dataref))
-		return;
+		goto exit;
 
 	skb_zcopy_clear(skb, true);
 
@@ -674,6 +674,8 @@  static void skb_release_data(struct sk_buff *skb)
 		kfree_skb_list(shinfo->frag_list);
 
 	skb_free_head(skb);
+exit:
+	skb->pp_recycle = 0;
 }
 
 /*