diff mbox series

[net-next] tun: fix ubuf refcount incorrectly on error path

Message ID 1606982459-41752-1-git-send-email-wangyunjian@huawei.com
State New
Headers show
Series [net-next] tun: fix ubuf refcount incorrectly on error path | expand

Commit Message

wangyunjian Dec. 3, 2020, 8 a.m. UTC
From: Yunjian Wang <wangyunjian@huawei.com>

After setting callback for ubuf_info of skb, the callback
(vhost_net_zerocopy_callback) will be called to decrease
the refcount when freeing skb. But when an exception occurs
afterwards, the error handling in vhost handle_tx() will
try to decrease the same refcount again. This is wrong and
fix this by clearing ubuf_info when meeting errors.

Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/net/tun.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Jason Wang Dec. 4, 2020, 6:10 a.m. UTC | #1
On 2020/12/3 下午4:00, wangyunjian wrote:
> From: Yunjian Wang <wangyunjian@huawei.com>

>

> After setting callback for ubuf_info of skb, the callback

> (vhost_net_zerocopy_callback) will be called to decrease

> the refcount when freeing skb. But when an exception occurs

> afterwards, the error handling in vhost handle_tx() will

> try to decrease the same refcount again. This is wrong and

> fix this by clearing ubuf_info when meeting errors.

>

> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")

>

> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

> ---

>   drivers/net/tun.c | 11 +++++++++++

>   1 file changed, 11 insertions(+)

>

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c

> index 2dc1988a8973..3614bb1b6d35 100644

> --- a/drivers/net/tun.c

> +++ b/drivers/net/tun.c

> @@ -1861,6 +1861,12 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   	if (unlikely(!(tun->dev->flags & IFF_UP))) {

>   		err = -EIO;

>   		rcu_read_unlock();

> +		if (zerocopy) {

> +			skb_shinfo(skb)->destructor_arg = NULL;

> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> +		}

> +

>   		goto drop;

>   	}

>   

> @@ -1874,6 +1880,11 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   

>   		if (unlikely(headlen > skb_headlen(skb))) {

>   			atomic_long_inc(&tun->dev->rx_dropped);

> +			if (zerocopy) {

> +				skb_shinfo(skb)->destructor_arg = NULL;

> +				skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

> +				skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> +			}

>   			napi_free_frags(&tfile->napi);

>   			rcu_read_unlock();

>   			mutex_unlock(&tfile->napi_mutex);



It looks to me then we miss the failure feedback.

The issues comes from the inconsistent error handling in tun.

I wonder whether we can simply do uarg->callback(uarg, false) if 
necessary on every failture path on tun_get_user().

Note that, zerocopy has a lot of issues which makes it not good for 
production environment.

Thanks
wangyunjian Dec. 4, 2020, 10:22 a.m. UTC | #2
> -----Original Message-----

> From: Jason Wang [mailto:jasowang@redhat.com]

> Sent: Friday, December 4, 2020 2:11 PM

> To: wangyunjian <wangyunjian@huawei.com>; mst@redhat.com

> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun

> (Jerry) <jerry.lilijun@huawei.com>; xudingke <xudingke@huawei.com>

> Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path

> 

> 

> On 2020/12/3 下午4:00, wangyunjian wrote:

> > From: Yunjian Wang <wangyunjian@huawei.com>

> >

> > After setting callback for ubuf_info of skb, the callback

> > (vhost_net_zerocopy_callback) will be called to decrease the refcount

> > when freeing skb. But when an exception occurs afterwards, the error

> > handling in vhost handle_tx() will try to decrease the same refcount

> > again. This is wrong and fix this by clearing ubuf_info when meeting

> > errors.

> >

> > Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

> > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP

> > driver")

> >

> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

> > ---

> >   drivers/net/tun.c | 11 +++++++++++

> >   1 file changed, 11 insertions(+)

> >

> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c index

> > 2dc1988a8973..3614bb1b6d35 100644

> > --- a/drivers/net/tun.c

> > +++ b/drivers/net/tun.c

> > @@ -1861,6 +1861,12 @@ static ssize_t tun_get_user(struct tun_struct

> *tun, struct tun_file *tfile,

> >   	if (unlikely(!(tun->dev->flags & IFF_UP))) {

> >   		err = -EIO;

> >   		rcu_read_unlock();

> > +		if (zerocopy) {

> > +			skb_shinfo(skb)->destructor_arg = NULL;

> > +			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

> > +			skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> > +		}

> > +

> >   		goto drop;

> >   	}

> >

> > @@ -1874,6 +1880,11 @@ static ssize_t tun_get_user(struct tun_struct

> > *tun, struct tun_file *tfile,

> >

> >   		if (unlikely(headlen > skb_headlen(skb))) {

> >   			atomic_long_inc(&tun->dev->rx_dropped);

> > +			if (zerocopy) {

> > +				skb_shinfo(skb)->destructor_arg = NULL;

> > +				skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

> > +				skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> > +			}

> >   			napi_free_frags(&tfile->napi);

> >   			rcu_read_unlock();

> >   			mutex_unlock(&tfile->napi_mutex);

> 

> 

> It looks to me then we miss the failure feedback.

> 

> The issues comes from the inconsistent error handling in tun.

> 

> I wonder whether we can simply do uarg->callback(uarg, false) if necessary on

> every failture path on tun_get_user().


How about this?

---
 drivers/net/tun.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2dc1988a8973..36a8d8eacd7b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1637,6 +1637,19 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 	return NULL;
 }
 
+/* copy ubuf_info for callback when skb has no error */
+inline static tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void *msg_control)
+{
+	if (zerocopy) {
+		skb_shinfo(skb)->destructor_arg = msg_control;
+		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+	} else if (msg_control) {
+		struct ubuf_info *uarg = msg_control;
+		uarg->callback(uarg, false);
+	}
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			    void *msg_control, struct iov_iter *from,
@@ -1812,16 +1825,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		break;
 	}
 
-	/* copy skb_ubuf_info for callback when skb has no error */
-	if (zerocopy) {
-		skb_shinfo(skb)->destructor_arg = msg_control;
-		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
-		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
-	} else if (msg_control) {
-		struct ubuf_info *uarg = msg_control;
-		uarg->callback(uarg, false);
-	}
-
 	skb_reset_network_header(skb);
 	skb_probe_transport_header(skb);
 	skb_record_rx_queue(skb, tfile->queue_index);
@@ -1830,6 +1833,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		struct bpf_prog *xdp_prog;
 		int ret;
 
+		tun_copy_ubuf_info(skb, zerocopy, msg_control);
 		local_bh_disable();
 		rcu_read_lock();
 		xdp_prog = rcu_dereference(tun->xdp_prog);
@@ -1880,7 +1884,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 			WARN_ON(1);
 			return -ENOMEM;
 		}
-
+		tun_copy_ubuf_info(skb, zerocopy, msg_control);
 		local_bh_disable();
 		napi_gro_frags(&tfile->napi);
 		local_bh_enable();
@@ -1889,6 +1893,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
 		int queue_len;
 
+		tun_copy_ubuf_info(skb, zerocopy, msg_control);
 		spin_lock_bh(&queue->lock);
 		__skb_queue_tail(queue, skb);
 		queue_len = skb_queue_len(queue);
@@ -1899,8 +1904,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 		local_bh_enable();
 	} else if (!IS_ENABLED(CONFIG_4KSTACKS)) {
+		tun_copy_ubuf_info(skb, zerocopy, msg_control);
 		tun_rx_batched(tun, tfile, skb, more);
 	} else {
+		tun_copy_ubuf_info(skb, zerocopy, msg_control);
 		netif_rx_ni(skb);
 	}
 	rcu_read_unlock();
-- 

> 

> Note that, zerocopy has a lot of issues which makes it not good for production

> environment.


OK, thanks. I found it when reviewing the code and think it need to be fixed.


> 

> Thanks
Jason Wang Dec. 7, 2020, 3:54 a.m. UTC | #3
On 2020/12/4 下午6:22, wangyunjian wrote:
>> -----Original Message-----

>> From: Jason Wang [mailto:jasowang@redhat.com]

>> Sent: Friday, December 4, 2020 2:11 PM

>> To: wangyunjian <wangyunjian@huawei.com>; mst@redhat.com

>> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun

>> (Jerry) <jerry.lilijun@huawei.com>; xudingke <xudingke@huawei.com>

>> Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path

>>

>>

>> On 2020/12/3 下午4:00, wangyunjian wrote:

>>> From: Yunjian Wang <wangyunjian@huawei.com>

>>>

>>> After setting callback for ubuf_info of skb, the callback

>>> (vhost_net_zerocopy_callback) will be called to decrease the refcount

>>> when freeing skb. But when an exception occurs afterwards, the error

>>> handling in vhost handle_tx() will try to decrease the same refcount

>>> again. This is wrong and fix this by clearing ubuf_info when meeting

>>> errors.

>>>

>>> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

>>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP

>>> driver")

>>>

>>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

>>> ---

>>>    drivers/net/tun.c | 11 +++++++++++

>>>    1 file changed, 11 insertions(+)

>>>

>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index

>>> 2dc1988a8973..3614bb1b6d35 100644

>>> --- a/drivers/net/tun.c

>>> +++ b/drivers/net/tun.c

>>> @@ -1861,6 +1861,12 @@ static ssize_t tun_get_user(struct tun_struct

>> *tun, struct tun_file *tfile,

>>>    	if (unlikely(!(tun->dev->flags & IFF_UP))) {

>>>    		err = -EIO;

>>>    		rcu_read_unlock();

>>> +		if (zerocopy) {

>>> +			skb_shinfo(skb)->destructor_arg = NULL;

>>> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

>>> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

>>> +		}

>>> +

>>>    		goto drop;

>>>    	}

>>>

>>> @@ -1874,6 +1880,11 @@ static ssize_t tun_get_user(struct tun_struct

>>> *tun, struct tun_file *tfile,

>>>

>>>    		if (unlikely(headlen > skb_headlen(skb))) {

>>>    			atomic_long_inc(&tun->dev->rx_dropped);

>>> +			if (zerocopy) {

>>> +				skb_shinfo(skb)->destructor_arg = NULL;

>>> +				skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

>>> +				skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

>>> +			}

>>>    			napi_free_frags(&tfile->napi);

>>>    			rcu_read_unlock();

>>>    			mutex_unlock(&tfile->napi_mutex);

>>

>> It looks to me then we miss the failure feedback.

>>

>> The issues comes from the inconsistent error handling in tun.

>>

>> I wonder whether we can simply do uarg->callback(uarg, false) if necessary on

>> every failture path on tun_get_user().

> How about this?

>

> ---

>   drivers/net/tun.c | 29 ++++++++++++++++++-----------

>   1 file changed, 18 insertions(+), 11 deletions(-)

>

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c

> index 2dc1988a8973..36a8d8eacd7b 100644

> --- a/drivers/net/tun.c

> +++ b/drivers/net/tun.c

> @@ -1637,6 +1637,19 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,

>   	return NULL;

>   }

>   

> +/* copy ubuf_info for callback when skb has no error */

> +inline static tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void *msg_control)

> +{

> +	if (zerocopy) {

> +		skb_shinfo(skb)->destructor_arg = msg_control;

> +		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> +		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> +	} else if (msg_control) {

> +		struct ubuf_info *uarg = msg_control;

> +		uarg->callback(uarg, false);

> +	}

> +}

> +

>   /* Get packet from user space buffer */

>   static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   			    void *msg_control, struct iov_iter *from,

> @@ -1812,16 +1825,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   		break;

>   	}

>   

> -	/* copy skb_ubuf_info for callback when skb has no error */

> -	if (zerocopy) {

> -		skb_shinfo(skb)->destructor_arg = msg_control;

> -		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> -	} else if (msg_control) {

> -		struct ubuf_info *uarg = msg_control;

> -		uarg->callback(uarg, false);

> -	}

> -

>   	skb_reset_network_header(skb);

>   	skb_probe_transport_header(skb);

>   	skb_record_rx_queue(skb, tfile->queue_index);

> @@ -1830,6 +1833,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   		struct bpf_prog *xdp_prog;

>   		int ret;

>   

> +		tun_copy_ubuf_info(skb, zerocopy, msg_control);



If you think disabling zerocopy for XDP (which I think it makes sense). 
It's better to do this in another patch.


>   		local_bh_disable();

>   		rcu_read_lock();

>   		xdp_prog = rcu_dereference(tun->xdp_prog);

> @@ -1880,7 +1884,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   			WARN_ON(1);

>   			return -ENOMEM;

>   		}

> -

> +		tun_copy_ubuf_info(skb, zerocopy, msg_control);



And for NAPI frags.


>   		local_bh_disable();

>   		napi_gro_frags(&tfile->napi);

>   		local_bh_enable();

> @@ -1889,6 +1893,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;

>   		int queue_len;

>   

> +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

>   		spin_lock_bh(&queue->lock);

>   		__skb_queue_tail(queue, skb);

>   		queue_len = skb_queue_len(queue);

> @@ -1899,8 +1904,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>   

>   		local_bh_enable();

>   	} else if (!IS_ENABLED(CONFIG_4KSTACKS)) {

> +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

>   		tun_rx_batched(tun, tfile, skb, more);

>   	} else {

> +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

>   		netif_rx_ni(skb);

>   	}

>   	rcu_read_unlock();



So it looks to me you want to disable zerocopy in all of the possible 
datapath?

Thanks
wangyunjian Dec. 7, 2020, 1:38 p.m. UTC | #4
> -----Original Message-----

> From: Jason Wang [mailto:jasowang@redhat.com]

> Sent: Monday, December 7, 2020 11:54 AM

> To: wangyunjian <wangyunjian@huawei.com>; mst@redhat.com

> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Lilijun

> (Jerry) <jerry.lilijun@huawei.com>; xudingke <xudingke@huawei.com>

> Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path

> 

> 

> On 2020/12/4 下午6:22, wangyunjian wrote:

> >> -----Original Message-----

> >> From: Jason Wang [mailto:jasowang@redhat.com]

> >> Sent: Friday, December 4, 2020 2:11 PM

> >> To: wangyunjian <wangyunjian@huawei.com>; mst@redhat.com

> >> Cc: virtualization@lists.linux-foundation.org; netdev@vger.kernel.org;

> Lilijun

> >> (Jerry) <jerry.lilijun@huawei.com>; xudingke <xudingke@huawei.com>

> >> Subject: Re: [PATCH net-next] tun: fix ubuf refcount incorrectly on error path

> >>

> >>

> >> On 2020/12/3 下午4:00, wangyunjian wrote:

> >>> From: Yunjian Wang <wangyunjian@huawei.com>

> >>>

> >>> After setting callback for ubuf_info of skb, the callback

> >>> (vhost_net_zerocopy_callback) will be called to decrease the refcount

> >>> when freeing skb. But when an exception occurs afterwards, the error

> >>> handling in vhost handle_tx() will try to decrease the same refcount

> >>> again. This is wrong and fix this by clearing ubuf_info when meeting

> >>> errors.

> >>>

> >>> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

> >>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP

> >>> driver")

> >>>

> >>> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

> >>> ---

> >>>    drivers/net/tun.c | 11 +++++++++++

> >>>    1 file changed, 11 insertions(+)

> >>>

> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c index

> >>> 2dc1988a8973..3614bb1b6d35 100644

> >>> --- a/drivers/net/tun.c

> >>> +++ b/drivers/net/tun.c

> >>> @@ -1861,6 +1861,12 @@ static ssize_t tun_get_user(struct tun_struct

> >> *tun, struct tun_file *tfile,

> >>>    	if (unlikely(!(tun->dev->flags & IFF_UP))) {

> >>>    		err = -EIO;

> >>>    		rcu_read_unlock();

> >>> +		if (zerocopy) {

> >>> +			skb_shinfo(skb)->destructor_arg = NULL;

> >>> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;

> >>> +			skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> >>> +		}

> >>> +

> >>>    		goto drop;

> >>>    	}

> >>>

> >>> @@ -1874,6 +1880,11 @@ static ssize_t tun_get_user(struct tun_struct

> >>> *tun, struct tun_file *tfile,

> >>>

> >>>    		if (unlikely(headlen > skb_headlen(skb))) {

> >>>    			atomic_long_inc(&tun->dev->rx_dropped);

> >>> +			if (zerocopy) {

> >>> +				skb_shinfo(skb)->destructor_arg = NULL;

> >>> +				skb_shinfo(skb)->tx_flags &=

> ~SKBTX_DEV_ZEROCOPY;

> >>> +				skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;

> >>> +			}

> >>>    			napi_free_frags(&tfile->napi);

> >>>    			rcu_read_unlock();

> >>>    			mutex_unlock(&tfile->napi_mutex);

> >>

> >> It looks to me then we miss the failure feedback.

> >>

> >> The issues comes from the inconsistent error handling in tun.

> >>

> >> I wonder whether we can simply do uarg->callback(uarg, false) if necessary

> on

> >> every failture path on tun_get_user().

> > How about this?

> >

> > ---

> >   drivers/net/tun.c | 29 ++++++++++++++++++-----------

> >   1 file changed, 18 insertions(+), 11 deletions(-)

> >

> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c

> > index 2dc1988a8973..36a8d8eacd7b 100644

> > --- a/drivers/net/tun.c

> > +++ b/drivers/net/tun.c

> > @@ -1637,6 +1637,19 @@ static struct sk_buff *tun_build_skb(struct

> tun_struct *tun,

> >   	return NULL;

> >   }

> >

> > +/* copy ubuf_info for callback when skb has no error */

> > +inline static tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void

> *msg_control)

> > +{

> > +	if (zerocopy) {

> > +		skb_shinfo(skb)->destructor_arg = msg_control;

> > +		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> > +		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> > +	} else if (msg_control) {

> > +		struct ubuf_info *uarg = msg_control;

> > +		uarg->callback(uarg, false);

> > +	}

> > +}

> > +

> >   /* Get packet from user space buffer */

> >   static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

> >   			    void *msg_control, struct iov_iter *from,

> > @@ -1812,16 +1825,6 @@ static ssize_t tun_get_user(struct tun_struct

> *tun, struct tun_file *tfile,

> >   		break;

> >   	}

> >

> > -	/* copy skb_ubuf_info for callback when skb has no error */

> > -	if (zerocopy) {

> > -		skb_shinfo(skb)->destructor_arg = msg_control;

> > -		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> > -		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> > -	} else if (msg_control) {

> > -		struct ubuf_info *uarg = msg_control;

> > -		uarg->callback(uarg, false);

> > -	}

> > -

> >   	skb_reset_network_header(skb);

> >   	skb_probe_transport_header(skb);

> >   	skb_record_rx_queue(skb, tfile->queue_index);

> > @@ -1830,6 +1833,7 @@ static ssize_t tun_get_user(struct tun_struct *tun,

> struct tun_file *tfile,

> >   		struct bpf_prog *xdp_prog;

> >   		int ret;

> >

> > +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

> 

> 

> If you think disabling zerocopy for XDP (which I think it makes sense).

> It's better to do this in another patch.

> 

> 

> >   		local_bh_disable();

> >   		rcu_read_lock();

> >   		xdp_prog = rcu_dereference(tun->xdp_prog);

> > @@ -1880,7 +1884,7 @@ static ssize_t tun_get_user(struct tun_struct *tun,

> struct tun_file *tfile,

> >   			WARN_ON(1);

> >   			return -ENOMEM;

> >   		}

> > -

> > +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

> 

> 

> And for NAPI frags.

> 

> 

> >   		local_bh_disable();

> >   		napi_gro_frags(&tfile->napi);

> >   		local_bh_enable();

> > @@ -1889,6 +1893,7 @@ static ssize_t tun_get_user(struct tun_struct *tun,

> struct tun_file *tfile,

> >   		struct sk_buff_head *queue = &tfile->sk.sk_write_queue;

> >   		int queue_len;

> >

> > +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >   		spin_lock_bh(&queue->lock);

> >   		__skb_queue_tail(queue, skb);

> >   		queue_len = skb_queue_len(queue);

> > @@ -1899,8 +1904,10 @@ static ssize_t tun_get_user(struct tun_struct

> *tun, struct tun_file *tfile,

> >

> >   		local_bh_enable();

> >   	} else if (!IS_ENABLED(CONFIG_4KSTACKS)) {

> > +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >   		tun_rx_batched(tun, tfile, skb, more);

> >   	} else {

> > +		tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >   		netif_rx_ni(skb);

> >   	}

> >   	rcu_read_unlock();

> 

> 

> So it looks to me you want to disable zerocopy in all of the possible

> datapath?


I think the newly added code is easy to miss this problem, so I want to
copy ubuf_info until we're sure there's no errors.

Thanks,
Yunjian
> 

> Thanks
Jason Wang Dec. 8, 2020, 2:32 a.m. UTC | #5
On 2020/12/7 下午9:38, wangyunjian wrote:
> I think the newly added code is easy to miss this problem, so I want to

> copy ubuf_info until we're sure there's no errors.

>

> Thanks,

> Yunjian



But isn't this actually a disabling of zerocopy?

Thanks
Jason Wang Dec. 9, 2020, 9:30 a.m. UTC | #6
On 2020/12/8 上午10:32, Jason Wang wrote:
>

> On 2020/12/7 下午9:38, wangyunjian wrote:

>> I think the newly added code is easy to miss this problem, so I want to

>> copy ubuf_info until we're sure there's no errors.

>>

>> Thanks,

>> Yunjian

>

>

> But isn't this actually a disabling of zerocopy?

>

> Thanks

>

>


Sorry, I misread the patch.

Please send a formal version, and let's move the discussion there.

Thanks
Willem de Bruijn Dec. 9, 2020, 2:43 p.m. UTC | #7
On Wed, Dec 9, 2020 at 8:03 AM wangyunjian <wangyunjian@huawei.com> wrote:
>

> From: Yunjian Wang <wangyunjian@huawei.com>

>

> After setting callback for ubuf_info of skb, the callback

> (vhost_net_zerocopy_callback) will be called to decrease

> the refcount when freeing skb. But when an exception occurs


With exception, you mean if tun_get_user returns an error that
propagates to the sendmsg call in vhost handle_tx, correct?

> afterwards, the error handling in vhost handle_tx() will

> try to decrease the same refcount again. This is wrong and

> fix this by delay copying ubuf_info until we're sure

> there's no errors.


I think the right approach is to address this in the error paths,
rather than complicate the normal datapath.

Is it sufficient to suppress the call to vhost_net_ubuf_put in the
handle_tx sendmsg error path, given that vhost_zerocopy_callback
will be called on kfree_skb?

Or alternatively clear the destructor in drop:

>

> Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")

>

> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

> ---

> v2:

>    Updated code, fix by delay copying ubuf_info

> ---

>  drivers/net/tun.c | 29 +++++++++++++++++++----------

>  1 file changed, 19 insertions(+), 10 deletions(-)

>

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c

> index 2dc1988a8973..2ea822328e73 100644

> --- a/drivers/net/tun.c

> +++ b/drivers/net/tun.c

> @@ -1637,6 +1637,20 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,

>         return NULL;

>  }

>

> +/* copy ubuf_info for callback when skb has no error */

> +static inline void tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void *msg_control)

> +{

> +       if (zerocopy) {

> +               skb_shinfo(skb)->destructor_arg = msg_control;

> +               skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> +               skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> +       } else if (msg_control) {

> +               struct ubuf_info *uarg = msg_control;

> +

> +               uarg->callback(uarg, false);

> +       }

> +}

> +

>  /* Get packet from user space buffer */

>  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>                             void *msg_control, struct iov_iter *from,

> @@ -1812,16 +1826,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>                 break;

>         }

>

> -       /* copy skb_ubuf_info for callback when skb has no error */

> -       if (zerocopy) {

> -               skb_shinfo(skb)->destructor_arg = msg_control;

> -               skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> -               skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> -       } else if (msg_control) {

> -               struct ubuf_info *uarg = msg_control;

> -               uarg->callback(uarg, false);

> -       }

> -

>         skb_reset_network_header(skb);

>         skb_probe_transport_header(skb);

>         skb_record_rx_queue(skb, tfile->queue_index);

> @@ -1830,6 +1834,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>                 struct bpf_prog *xdp_prog;

>                 int ret;

>

> +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

>                 local_bh_disable();

>                 rcu_read_lock();

>                 xdp_prog = rcu_dereference(tun->xdp_prog);

> @@ -1881,6 +1886,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>                         return -ENOMEM;

>                 }

>

> +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

>                 local_bh_disable();

>                 napi_gro_frags(&tfile->napi);

>                 local_bh_enable();

> @@ -1889,6 +1895,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>                 struct sk_buff_head *queue = &tfile->sk.sk_write_queue;

>                 int queue_len;

>

> +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

>                 spin_lock_bh(&queue->lock);

>                 __skb_queue_tail(queue, skb);

>                 queue_len = skb_queue_len(queue);

> @@ -1899,8 +1906,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

>

>                 local_bh_enable();

>         } else if (!IS_ENABLED(CONFIG_4KSTACKS)) {

> +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

>                 tun_rx_batched(tun, tfile, skb, more);

>         } else {

> +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

>                 netif_rx_ni(skb);

>         }

>         rcu_read_unlock();

> --

> 2.23.0

>
wangyunjian Dec. 12, 2020, 6:43 a.m. UTC | #8
> -----Original Message-----

> From: Willem de Bruijn [mailto:willemdebruijn.kernel@gmail.com]

> Sent: Wednesday, December 9, 2020 10:43 PM

> To: wangyunjian <wangyunjian@huawei.com>

> Cc: Michael S. Tsirkin <mst@redhat.com>; Jason Wang

> <jasowang@redhat.com>; virtualization@lists.linux-foundation.org; Network

> Development <netdev@vger.kernel.org>; Lilijun (Jerry)

> <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;

> xudingke <xudingke@huawei.com>

> Subject: Re: [PATCH net v2] tun: fix ubuf refcount incorrectly on error path

> 

> On Wed, Dec 9, 2020 at 8:03 AM wangyunjian <wangyunjian@huawei.com>

> wrote:

> >

> > From: Yunjian Wang <wangyunjian@huawei.com>

> >

> > After setting callback for ubuf_info of skb, the callback

> > (vhost_net_zerocopy_callback) will be called to decrease the refcount

> > when freeing skb. But when an exception occurs

> 

> With exception, you mean if tun_get_user returns an error that propagates to

> the sendmsg call in vhost handle_tx, correct?


Yes

> 

> > afterwards, the error handling in vhost handle_tx() will try to

> > decrease the same refcount again. This is wrong and fix this by delay

> > copying ubuf_info until we're sure there's no errors.

> 

> I think the right approach is to address this in the error paths, rather than

> complicate the normal datapath.

> 

> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

> sendmsg error path, given that vhost_zerocopy_callback will be called on

> kfree_skb?


We can not call kfree_skb() until the skb was created.

> 

> Or alternatively clear the destructor in drop:


The uarg->callback() is called immediately after we decide do datacopy
even if caller want to do zerocopy. If another error occurs later, the vhost
handle_tx() will try to decrease it again.

Thanks
> 

> >

> > Fixes: 4477138fa0ae ("tun: properly test for IFF_UP")

> > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP

> > driver")

> >

> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>

> > ---

> > v2:

> >    Updated code, fix by delay copying ubuf_info

> > ---

> >  drivers/net/tun.c | 29 +++++++++++++++++++----------

> >  1 file changed, 19 insertions(+), 10 deletions(-)

> >

> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c index

> > 2dc1988a8973..2ea822328e73 100644

> > --- a/drivers/net/tun.c

> > +++ b/drivers/net/tun.c

> > @@ -1637,6 +1637,20 @@ static struct sk_buff *tun_build_skb(struct

> tun_struct *tun,

> >         return NULL;

> >  }

> >

> > +/* copy ubuf_info for callback when skb has no error */ static inline

> > +void tun_copy_ubuf_info(struct sk_buff *skb, bool zerocopy, void

> > +*msg_control) {

> > +       if (zerocopy) {

> > +               skb_shinfo(skb)->destructor_arg = msg_control;

> > +               skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> > +               skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> > +       } else if (msg_control) {

> > +               struct ubuf_info *uarg = msg_control;

> > +

> > +               uarg->callback(uarg, false);

> > +       }

> > +}

> > +

> >  /* Get packet from user space buffer */  static ssize_t

> > tun_get_user(struct tun_struct *tun, struct tun_file *tfile,

> >                             void *msg_control, struct iov_iter *from,

> > @@ -1812,16 +1826,6 @@ static ssize_t tun_get_user(struct tun_struct

> *tun, struct tun_file *tfile,

> >                 break;

> >         }

> >

> > -       /* copy skb_ubuf_info for callback when skb has no error */

> > -       if (zerocopy) {

> > -               skb_shinfo(skb)->destructor_arg = msg_control;

> > -               skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> > -               skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> > -       } else if (msg_control) {

> > -               struct ubuf_info *uarg = msg_control;

> > -               uarg->callback(uarg, false);

> > -       }

> > -

> >         skb_reset_network_header(skb);

> >         skb_probe_transport_header(skb);

> >         skb_record_rx_queue(skb, tfile->queue_index); @@ -1830,6

> > +1834,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct

> tun_file *tfile,

> >                 struct bpf_prog *xdp_prog;

> >                 int ret;

> >

> > +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >                 local_bh_disable();

> >                 rcu_read_lock();

> >                 xdp_prog = rcu_dereference(tun->xdp_prog); @@

> -1881,6

> > +1886,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct

> tun_file *tfile,

> >                         return -ENOMEM;

> >                 }

> >

> > +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >                 local_bh_disable();

> >                 napi_gro_frags(&tfile->napi);

> >                 local_bh_enable();

> > @@ -1889,6 +1895,7 @@ static ssize_t tun_get_user(struct tun_struct *tun,

> struct tun_file *tfile,

> >                 struct sk_buff_head *queue =

> &tfile->sk.sk_write_queue;

> >                 int queue_len;

> >

> > +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >                 spin_lock_bh(&queue->lock);

> >                 __skb_queue_tail(queue, skb);

> >                 queue_len = skb_queue_len(queue); @@ -1899,8

> +1906,10

> > @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file

> > *tfile,

> >

> >                 local_bh_enable();

> >         } else if (!IS_ENABLED(CONFIG_4KSTACKS)) {

> > +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >                 tun_rx_batched(tun, tfile, skb, more);

> >         } else {

> > +               tun_copy_ubuf_info(skb, zerocopy, msg_control);

> >                 netif_rx_ni(skb);

> >         }

> >         rcu_read_unlock();

> > --

> > 2.23.0

> >
Willem de Bruijn Dec. 13, 2020, 12:17 a.m. UTC | #9
> > > afterwards, the error handling in vhost handle_tx() will try to

> > > decrease the same refcount again. This is wrong and fix this by delay

> > > copying ubuf_info until we're sure there's no errors.

> >

> > I think the right approach is to address this in the error paths, rather than

> > complicate the normal datapath.

> >

> > Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

> > sendmsg error path, given that vhost_zerocopy_callback will be called on

> > kfree_skb?

>

> We can not call kfree_skb() until the skb was created.

>

> >

> > Or alternatively clear the destructor in drop:

>

> The uarg->callback() is called immediately after we decide do datacopy

> even if caller want to do zerocopy. If another error occurs later, the vhost

> handle_tx() will try to decrease it again.


Oh right, I missed the else branch in this path:

        /* copy skb_ubuf_info for callback when skb has no error */
        if (zerocopy) {
                skb_shinfo(skb)->destructor_arg = msg_control;
                skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
                skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
        } else if (msg_control) {
                struct ubuf_info *uarg = msg_control;
                uarg->callback(uarg, false);
        }

So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a
reference to release), there are these five options:

1. tun_sendmsg succeeds, ubuf_info is associated with skb.
     reference released from kfree_skb calling vhost_zerocopy_callback later

2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is
not zerocopy.

3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly
cleans up on receiving error from tun_sendmsg.

4. tun_sendmsg fails after creating skb, but with copying: decremented
at branch shown above + again in handle_tx_zerocopy

5. tun_sendmsg fails after creating skb, with zerocopy: decremented at
kfree_skb in drop: + again in handle_tx_zerocopy

Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5
occurred, either all decrement-on-error cases must be handled by
handle_tx_zerocopy or none.

Your patch chooses the latter. Makes sense.

But can this still go wrong if the xdp path is taken, but no program
exists or the program returns XDP_PASS. And then the packet hits an
error path, such as ! IFF_UP?
Willem de Bruijn Dec. 14, 2020, 1:32 a.m. UTC | #10
On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> > > > afterwards, the error handling in vhost handle_tx() will try to

> > > > decrease the same refcount again. This is wrong and fix this by delay

> > > > copying ubuf_info until we're sure there's no errors.

> > >

> > > I think the right approach is to address this in the error paths, rather than

> > > complicate the normal datapath.

> > >

> > > Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

> > > sendmsg error path, given that vhost_zerocopy_callback will be called on

> > > kfree_skb?

> >

> > We can not call kfree_skb() until the skb was created.

> >

> > >

> > > Or alternatively clear the destructor in drop:

> >

> > The uarg->callback() is called immediately after we decide do datacopy

> > even if caller want to do zerocopy. If another error occurs later, the vhost

> > handle_tx() will try to decrease it again.

>

> Oh right, I missed the else branch in this path:

>

>         /* copy skb_ubuf_info for callback when skb has no error */

>         if (zerocopy) {

>                 skb_shinfo(skb)->destructor_arg = msg_control;

>                 skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

>                 skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

>         } else if (msg_control) {

>                 struct ubuf_info *uarg = msg_control;

>                 uarg->callback(uarg, false);

>         }

>

> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a

> reference to release), there are these five options:

>

> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

>      reference released from kfree_skb calling vhost_zerocopy_callback later

>

> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is

> not zerocopy.

>

> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly

> cleans up on receiving error from tun_sendmsg.

>

> 4. tun_sendmsg fails after creating skb, but with copying: decremented

> at branch shown above + again in handle_tx_zerocopy

>

> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at

> kfree_skb in drop: + again in handle_tx_zerocopy

>

> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

> occurred,


Actually, it does. If sendmsg returns an error, it can test whether
vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.

> either all decrement-on-error cases must be handled by

> handle_tx_zerocopy or none.

>

> Your patch chooses the latter. Makes sense.

>

> But can this still go wrong if the xdp path is taken, but no program

> exists or the program returns XDP_PASS. And then the packet hits an

> error path, such as ! IFF_UP?
Jason Wang Dec. 14, 2020, 3:30 a.m. UTC | #11
On 2020/12/14 上午9:32, Willem de Bruijn wrote:
> On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

>>>>> afterwards, the error handling in vhost handle_tx() will try to

>>>>> decrease the same refcount again. This is wrong and fix this by delay

>>>>> copying ubuf_info until we're sure there's no errors.

>>>> I think the right approach is to address this in the error paths, rather than

>>>> complicate the normal datapath.

>>>>

>>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

>>>> sendmsg error path, given that vhost_zerocopy_callback will be called on

>>>> kfree_skb?

>>> We can not call kfree_skb() until the skb was created.

>>>

>>>> Or alternatively clear the destructor in drop:

>>> The uarg->callback() is called immediately after we decide do datacopy

>>> even if caller want to do zerocopy. If another error occurs later, the vhost

>>> handle_tx() will try to decrease it again.

>> Oh right, I missed the else branch in this path:

>>

>>          /* copy skb_ubuf_info for callback when skb has no error */

>>          if (zerocopy) {

>>                  skb_shinfo(skb)->destructor_arg = msg_control;

>>                  skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

>>                  skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

>>          } else if (msg_control) {

>>                  struct ubuf_info *uarg = msg_control;

>>                  uarg->callback(uarg, false);

>>          }

>>

>> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a

>> reference to release), there are these five options:

>>

>> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

>>       reference released from kfree_skb calling vhost_zerocopy_callback later

>>

>> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is

>> not zerocopy.

>>

>> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly

>> cleans up on receiving error from tun_sendmsg.

>>

>> 4. tun_sendmsg fails after creating skb, but with copying: decremented

>> at branch shown above + again in handle_tx_zerocopy

>>

>> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at

>> kfree_skb in drop: + again in handle_tx_zerocopy

>>

>> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

>> occurred,

> Actually, it does. If sendmsg returns an error, it can test whether

> vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.



Just to make sure I understand this. Any reason for it can't be 
VHOST_DMA_IN_PROGRESS here?

Thanks


>

>> either all decrement-on-error cases must be handled by

>> handle_tx_zerocopy or none.

>>

>> Your patch chooses the latter. Makes sense.

>>

>> But can this still go wrong if the xdp path is taken, but no program

>> exists or the program returns XDP_PASS. And then the packet hits an

>> error path, such as ! IFF_UP?
Willem de Bruijn Dec. 14, 2020, 3:54 a.m. UTC | #12
On Sun, Dec 13, 2020 at 10:30 PM Jason Wang <jasowang@redhat.com> wrote:
>

>

> On 2020/12/14 上午9:32, Willem de Bruijn wrote:

> > On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn

> > <willemdebruijn.kernel@gmail.com> wrote:

> >>>>> afterwards, the error handling in vhost handle_tx() will try to

> >>>>> decrease the same refcount again. This is wrong and fix this by delay

> >>>>> copying ubuf_info until we're sure there's no errors.

> >>>> I think the right approach is to address this in the error paths, rather than

> >>>> complicate the normal datapath.

> >>>>

> >>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

> >>>> sendmsg error path, given that vhost_zerocopy_callback will be called on

> >>>> kfree_skb?

> >>> We can not call kfree_skb() until the skb was created.

> >>>

> >>>> Or alternatively clear the destructor in drop:

> >>> The uarg->callback() is called immediately after we decide do datacopy

> >>> even if caller want to do zerocopy. If another error occurs later, the vhost

> >>> handle_tx() will try to decrease it again.

> >> Oh right, I missed the else branch in this path:

> >>

> >>          /* copy skb_ubuf_info for callback when skb has no error */

> >>          if (zerocopy) {

> >>                  skb_shinfo(skb)->destructor_arg = msg_control;

> >>                  skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> >>                  skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> >>          } else if (msg_control) {

> >>                  struct ubuf_info *uarg = msg_control;

> >>                  uarg->callback(uarg, false);

> >>          }

> >>

> >> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a

> >> reference to release), there are these five options:

> >>

> >> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

> >>       reference released from kfree_skb calling vhost_zerocopy_callback later

> >>

> >> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is

> >> not zerocopy.

> >>

> >> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly

> >> cleans up on receiving error from tun_sendmsg.

> >>

> >> 4. tun_sendmsg fails after creating skb, but with copying: decremented

> >> at branch shown above + again in handle_tx_zerocopy

> >>

> >> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at

> >> kfree_skb in drop: + again in handle_tx_zerocopy

> >>

> >> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

> >> occurred,

> > Actually, it does. If sendmsg returns an error, it can test whether

> > vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.

>

>

> Just to make sure I understand this. Any reason for it can't be

> VHOST_DMA_IN_PROGRESS here?


It can be, and it will be if tun_sendmsg returns EINVAL before
assigning the skb destructor.

Only if tun_sendmsg released the zerocopy state through
kfree_skb->vhost_zerocopy_callback will it have been updated to
VHOST_DMA_DONE_LEN. And only then must the caller not try to release
the state again.
Willem de Bruijn Dec. 14, 2020, 3:56 a.m. UTC | #13
On Sun, Dec 13, 2020 at 10:54 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>

> On Sun, Dec 13, 2020 at 10:30 PM Jason Wang <jasowang@redhat.com> wrote:

> >

> >

> > On 2020/12/14 上午9:32, Willem de Bruijn wrote:

> > > On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn

> > > <willemdebruijn.kernel@gmail.com> wrote:

> > >>>>> afterwards, the error handling in vhost handle_tx() will try to

> > >>>>> decrease the same refcount again. This is wrong and fix this by delay

> > >>>>> copying ubuf_info until we're sure there's no errors.

> > >>>> I think the right approach is to address this in the error paths, rather than

> > >>>> complicate the normal datapath.

> > >>>>

> > >>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

> > >>>> sendmsg error path, given that vhost_zerocopy_callback will be called on

> > >>>> kfree_skb?

> > >>> We can not call kfree_skb() until the skb was created.

> > >>>

> > >>>> Or alternatively clear the destructor in drop:

> > >>> The uarg->callback() is called immediately after we decide do datacopy

> > >>> even if caller want to do zerocopy. If another error occurs later, the vhost

> > >>> handle_tx() will try to decrease it again.

> > >> Oh right, I missed the else branch in this path:

> > >>

> > >>          /* copy skb_ubuf_info for callback when skb has no error */

> > >>          if (zerocopy) {

> > >>                  skb_shinfo(skb)->destructor_arg = msg_control;

> > >>                  skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

> > >>                  skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

> > >>          } else if (msg_control) {

> > >>                  struct ubuf_info *uarg = msg_control;

> > >>                  uarg->callback(uarg, false);

> > >>          }

> > >>

> > >> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a

> > >> reference to release), there are these five options:

> > >>

> > >> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

> > >>       reference released from kfree_skb calling vhost_zerocopy_callback later

> > >>

> > >> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is

> > >> not zerocopy.

> > >>

> > >> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly

> > >> cleans up on receiving error from tun_sendmsg.

> > >>

> > >> 4. tun_sendmsg fails after creating skb, but with copying: decremented

> > >> at branch shown above + again in handle_tx_zerocopy

> > >>

> > >> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at

> > >> kfree_skb in drop: + again in handle_tx_zerocopy

> > >>

> > >> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

> > >> occurred,

> > > Actually, it does. If sendmsg returns an error, it can test whether

> > > vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.

> >

> >

> > Just to make sure I understand this. Any reason for it can't be

> > VHOST_DMA_IN_PROGRESS here?

>

> It can be, and it will be if tun_sendmsg returns EINVAL before

> assigning the skb destructor.


I meant returns an error, not necessarily only EINVAL.

> Only if tun_sendmsg released the zerocopy state through

> kfree_skb->vhost_zerocopy_callback will it have been updated to

> VHOST_DMA_DONE_LEN. And only then must the caller not try to release

> the state again.
Jason Wang Dec. 14, 2020, 4:06 a.m. UTC | #14
On 2020/12/14 上午11:56, Willem de Bruijn wrote:
> On Sun, Dec 13, 2020 at 10:54 PM Willem de Bruijn

> <willemdebruijn.kernel@gmail.com> wrote:

>> On Sun, Dec 13, 2020 at 10:30 PM Jason Wang <jasowang@redhat.com> wrote:

>>>

>>> On 2020/12/14 上午9:32, Willem de Bruijn wrote:

>>>> On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn

>>>> <willemdebruijn.kernel@gmail.com> wrote:

>>>>>>>> afterwards, the error handling in vhost handle_tx() will try to

>>>>>>>> decrease the same refcount again. This is wrong and fix this by delay

>>>>>>>> copying ubuf_info until we're sure there's no errors.

>>>>>>> I think the right approach is to address this in the error paths, rather than

>>>>>>> complicate the normal datapath.

>>>>>>>

>>>>>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in the handle_tx

>>>>>>> sendmsg error path, given that vhost_zerocopy_callback will be called on

>>>>>>> kfree_skb?

>>>>>> We can not call kfree_skb() until the skb was created.

>>>>>>

>>>>>>> Or alternatively clear the destructor in drop:

>>>>>> The uarg->callback() is called immediately after we decide do datacopy

>>>>>> even if caller want to do zerocopy. If another error occurs later, the vhost

>>>>>> handle_tx() will try to decrease it again.

>>>>> Oh right, I missed the else branch in this path:

>>>>>

>>>>>           /* copy skb_ubuf_info for callback when skb has no error */

>>>>>           if (zerocopy) {

>>>>>                   skb_shinfo(skb)->destructor_arg = msg_control;

>>>>>                   skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;

>>>>>                   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;

>>>>>           } else if (msg_control) {

>>>>>                   struct ubuf_info *uarg = msg_control;

>>>>>                   uarg->callback(uarg, false);

>>>>>           }

>>>>>

>>>>> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and thus a

>>>>> reference to release), there are these five options:

>>>>>

>>>>> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

>>>>>        reference released from kfree_skb calling vhost_zerocopy_callback later

>>>>>

>>>>> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb is

>>>>> not zerocopy.

>>>>>

>>>>> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy correctly

>>>>> cleans up on receiving error from tun_sendmsg.

>>>>>

>>>>> 4. tun_sendmsg fails after creating skb, but with copying: decremented

>>>>> at branch shown above + again in handle_tx_zerocopy

>>>>>

>>>>> 5. tun_sendmsg fails after creating skb, with zerocopy: decremented at

>>>>> kfree_skb in drop: + again in handle_tx_zerocopy

>>>>>

>>>>> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

>>>>> occurred,

>>>> Actually, it does. If sendmsg returns an error, it can test whether

>>>> vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.

>>>

>>> Just to make sure I understand this. Any reason for it can't be

>>> VHOST_DMA_IN_PROGRESS here?

>> It can be, and it will be if tun_sendmsg returns EINVAL before

>> assigning the skb destructor.

> I meant returns an error, not necessarily only EINVAL.

>

>> Only if tun_sendmsg released the zerocopy state through

>> kfree_skb->vhost_zerocopy_callback will it have been updated to

>> VHOST_DMA_DONE_LEN. And only then must the caller not try to release

>> the state again.

> 	



I see. So I tend to fix this in vhost instead of tun to be consistent 
with the current error handling in handle_tx_zerocopy().

Thanks
wangyunjian Dec. 14, 2020, 6:56 a.m. UTC | #15
> -----Original Message-----

> From: Jason Wang [mailto:jasowang@redhat.com]

> Sent: Monday, December 14, 2020 12:07 PM

> To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>

> Cc: wangyunjian <wangyunjian@huawei.com>; Michael S. Tsirkin

> <mst@redhat.com>; virtualization@lists.linux-foundation.org; Network

> Development <netdev@vger.kernel.org>; Lilijun (Jerry)

> <jerry.lilijun@huawei.com>; chenchanghu <chenchanghu@huawei.com>;

> xudingke <xudingke@huawei.com>; huangbin (J)

> <brian.huangbin@huawei.com>; Willem de Bruijn <willemb@google.com>

> Subject: Re: [PATCH net v2] tun: fix ubuf refcount incorrectly on error path

> 

> 

> On 2020/12/14 上午11:56, Willem de Bruijn wrote:

> > On Sun, Dec 13, 2020 at 10:54 PM Willem de Bruijn

> > <willemdebruijn.kernel@gmail.com> wrote:

> >> On Sun, Dec 13, 2020 at 10:30 PM Jason Wang <jasowang@redhat.com>

> wrote:

> >>>

> >>> On 2020/12/14 上午9:32, Willem de Bruijn wrote:

> >>>> On Sat, Dec 12, 2020 at 7:18 PM Willem de Bruijn

> >>>> <willemdebruijn.kernel@gmail.com> wrote:

> >>>>>>>> afterwards, the error handling in vhost handle_tx() will try to

> >>>>>>>> decrease the same refcount again. This is wrong and fix this by

> >>>>>>>> delay copying ubuf_info until we're sure there's no errors.

> >>>>>>> I think the right approach is to address this in the error

> >>>>>>> paths, rather than complicate the normal datapath.

> >>>>>>>

> >>>>>>> Is it sufficient to suppress the call to vhost_net_ubuf_put in

> >>>>>>> the handle_tx sendmsg error path, given that

> >>>>>>> vhost_zerocopy_callback will be called on kfree_skb?

> >>>>>> We can not call kfree_skb() until the skb was created.

> >>>>>>

> >>>>>>> Or alternatively clear the destructor in drop:

> >>>>>> The uarg->callback() is called immediately after we decide do

> >>>>>> datacopy even if caller want to do zerocopy. If another error

> >>>>>> occurs later, the vhost

> >>>>>> handle_tx() will try to decrease it again.

> >>>>> Oh right, I missed the else branch in this path:

> >>>>>

> >>>>>           /* copy skb_ubuf_info for callback when skb has no error */

> >>>>>           if (zerocopy) {

> >>>>>                   skb_shinfo(skb)->destructor_arg = msg_control;

> >>>>>                   skb_shinfo(skb)->tx_flags |=

> SKBTX_DEV_ZEROCOPY;

> >>>>>                   skb_shinfo(skb)->tx_flags |=

> SKBTX_SHARED_FRAG;

> >>>>>           } else if (msg_control) {

> >>>>>                   struct ubuf_info *uarg = msg_control;

> >>>>>                   uarg->callback(uarg, false);

> >>>>>           }

> >>>>>

> >>>>> So if handle_tx_zerocopy calls tun_sendmsg with ubuf_info (and

> >>>>> thus a reference to release), there are these five options:

> >>>>>

> >>>>> 1. tun_sendmsg succeeds, ubuf_info is associated with skb.

> >>>>>        reference released from kfree_skb calling

> >>>>> vhost_zerocopy_callback later

> >>>>>

> >>>>> 2. tun_sendmsg succeeds, ubuf_info is released immediately, as skb

> >>>>> is not zerocopy.

> >>>>>

> >>>>> 3. tun_sendmsg fails before creating skb, handle_tx_zerocopy

> >>>>> correctly cleans up on receiving error from tun_sendmsg.

> >>>>>

> >>>>> 4. tun_sendmsg fails after creating skb, but with copying:

> >>>>> decremented at branch shown above + again in handle_tx_zerocopy

> >>>>>

> >>>>> 5. tun_sendmsg fails after creating skb, with zerocopy:

> >>>>> decremented at kfree_skb in drop: + again in handle_tx_zerocopy

> >>>>>

> >>>>> Since handle_tx_zerocopy has no idea whether on error 3, 4 or 5

> >>>>> occurred,

> >>>> Actually, it does. If sendmsg returns an error, it can test whether

> >>>> vq->heads[nvq->upend_idx].len != VHOST_DMA_IN_PROGRESS.

> >>>

> >>> Just to make sure I understand this. Any reason for it can't be

> >>> VHOST_DMA_IN_PROGRESS here?

> >> It can be, and it will be if tun_sendmsg returns EINVAL before

> >> assigning the skb destructor.

> > I meant returns an error, not necessarily only EINVAL.

> >

> >> Only if tun_sendmsg released the zerocopy state through

> >> kfree_skb->vhost_zerocopy_callback will it have been updated to

> >> VHOST_DMA_DONE_LEN. And only then must the caller not try to release

> >> the state again.

> >

> 

> 

> I see. So I tend to fix this in vhost instead of tun to be consistent with the

> current error handling in handle_tx_zerocopy().


Agree, thanks for the suggestion. 
I'll send v3 patch according to your comments.

> 

> Thanks
diff mbox series

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2dc1988a8973..3614bb1b6d35 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1861,6 +1861,12 @@  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	if (unlikely(!(tun->dev->flags & IFF_UP))) {
 		err = -EIO;
 		rcu_read_unlock();
+		if (zerocopy) {
+			skb_shinfo(skb)->destructor_arg = NULL;
+			skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
+			skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;
+		}
+
 		goto drop;
 	}
 
@@ -1874,6 +1880,11 @@  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 
 		if (unlikely(headlen > skb_headlen(skb))) {
 			atomic_long_inc(&tun->dev->rx_dropped);
+			if (zerocopy) {
+				skb_shinfo(skb)->destructor_arg = NULL;
+				skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
+				skb_shinfo(skb)->tx_flags &= ~SKBTX_SHARED_FRAG;
+			}
 			napi_free_frags(&tfile->napi);
 			rcu_read_unlock();
 			mutex_unlock(&tfile->napi_mutex);