diff mbox series

[RFC,net] net: stream: don't purge sk_error_queue without holding its lock

Message ID 20210913223850.660578-1-kuba@kernel.org
State New
Headers show
Series [RFC,net] net: stream: don't purge sk_error_queue without holding its lock | expand

Commit Message

Jakub Kicinski Sept. 13, 2021, 10:38 p.m. UTC
sk_stream_kill_queues() can be called when there are still
outstanding skbs to transmit. Those skbs may try to queue
notifications to the error queue (e.g. timestamps).
If sk_stream_kill_queues() purges the queue without taking
its lock the queue may get corrupted.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Sending as an RFC for review, compile-tested only.

Seems far more likely that I'm missing something than that
this has been broken forever and nobody noticed :S
---
 net/core/stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet Sept. 14, 2021, 4:32 p.m. UTC | #1
On 9/14/21 7:18 AM, Jakub Kicinski wrote:
> On Mon, 13 Sep 2021 22:14:00 -0700 Eric Dumazet wrote:
>> On 9/13/21 3:38 PM, Jakub Kicinski wrote:
>>> sk_stream_kill_queues() can be called when there are still
>>> outstanding skbs to transmit. Those skbs may try to queue
>>> notifications to the error queue (e.g. timestamps).
>>> If sk_stream_kill_queues() purges the queue without taking
>>> its lock the queue may get corrupted.
>>>
>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>>> ---
>>> Sending as an RFC for review, compile-tested only.
>>>
>>> Seems far more likely that I'm missing something than that
>>> this has been broken forever and nobody noticed :S
>>> ---
>>>  net/core/stream.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/stream.c b/net/core/stream.c
>>> index 4f1d4aa5fb38..7c585088f394 100644
>>> --- a/net/core/stream.c
>>> +++ b/net/core/stream.c
>>> @@ -196,7 +196,7 @@ void sk_stream_kill_queues(struct sock *sk)
>>>  	__skb_queue_purge(&sk->sk_receive_queue);
>>>  
>>>  	/* Next, the error queue. */
>>> -	__skb_queue_purge(&sk->sk_error_queue);
>>> +	skb_queue_purge(&sk->sk_error_queue);
>>>  
>>>  	/* Next, the write queue. */
>>>  	WARN_ON(!skb_queue_empty(&sk->sk_write_queue));
>>
>> This should not be needed.
>>
>> By definition, sk_stream_kill_queues() is only called when there is no
>> more references on the sockets.
>>
>> So all outstanding packets must have been orphaned or freed.
> 
> I don't see the wait anywhere, would you mind spelling it out?
> My (likely flawed) understanding is that inet_sock_destruct() gets
> called when refs are gone (via sk->sk_destruct).


> 
> But tcp_disconnect() + tcp_close() seem to happily call
> inet_csk_destroy_sock() -> sk_stream_kill_queues() with outstanding
> sk_wmem_alloc refs.

tcp_disconnect() should probably leave the error queue as is.

For some reason I thought your report was about inet_sock_destruct()

tcp_disconnect() has always been full of bugs, it is surprising real applications
(not fuzzers) are still trying to use it.

> 
>> Anyway, Linux-2.6.12-rc2 had no timestamps yet.
> 
> I see, thanks, if some form of the patch stands perhaps:
> 
> Fixes: cb9eff097831 ("net: new user space API for time stamping of incoming and outgoing packets")
> 

Except that this patch wont prevent a packet being added to sk_error_queue
right after skb_queue_purge(&sk->sk_error_queue).

If you think there is a bug, it must be fixed in another way.

IMO, preventing err packets from a prior session being queued after a tcp_disconnect()
is rather hard. We should not even try (packets could be stuck for hours in a qdisc)
Jakub Kicinski Sept. 14, 2021, 4:56 p.m. UTC | #2
On Tue, 14 Sep 2021 09:32:09 -0700 Eric Dumazet wrote:
> On 9/14/21 7:18 AM, Jakub Kicinski wrote:
> > On Mon, 13 Sep 2021 22:14:00 -0700 Eric Dumazet wrote:  

> >> This should not be needed.
> >>
> >> By definition, sk_stream_kill_queues() is only called when there is no
> >> more references on the sockets.
> >>
> >> So all outstanding packets must have been orphaned or freed.  
> > 
> > I don't see the wait anywhere, would you mind spelling it out?
> > My (likely flawed) understanding is that inet_sock_destruct() gets
> > called when refs are gone (via sk->sk_destruct).  
> >
> > But tcp_disconnect() + tcp_close() seem to happily call
> > inet_csk_destroy_sock() -> sk_stream_kill_queues() with outstanding
> > sk_wmem_alloc refs.  
> 
> tcp_disconnect() should probably leave the error queue as is.
> 
> For some reason I thought your report was about inet_sock_destruct()
> 
> tcp_disconnect() has always been full of bugs, it is surprising real applications
> (not fuzzers) are still trying to use it.

I think I hit it because app sets SOCK_LINGER && !sk->sk_lingertime.
I don't think the app disconnects explicitly, but "same difference".

> >> Anyway, Linux-2.6.12-rc2 had no timestamps yet.  
> > 
> > I see, thanks, if some form of the patch stands perhaps:
> > 
> > Fixes: cb9eff097831 ("net: new user space API for time stamping of incoming and outgoing packets")
> >   
> 
> Except that this patch wont prevent a packet being added to sk_error_queue
> right after skb_queue_purge(&sk->sk_error_queue).

Right, but then inet_sock_destruct() also purges the err queue, again.
I was afraid of regressions but we could just remove the purging 
from sk_stream_kill_queues(), and target net-next?

> If you think there is a bug, it must be fixed in another way.
> 
> IMO, preventing err packets from a prior session being queued after a tcp_disconnect()
> is rather hard. We should not even try (packets could be stuck for hours in a qdisc)

Indeed, we could rearrange the SOCK_DEAD check in sock_queue_err_skb()
to skip queuing and put it under the err queue lock (provided we make
sk_stream_kill_queues() take that lock as well). But seems like an
overkill. I'd lean towards the existing patch or removing the purge from
sk_stream_kill_queues(). LMK what you prefer, this is not urgent.
Jakub Kicinski Sept. 14, 2021, 6:03 p.m. UTC | #3
On Tue, 14 Sep 2021 10:55:49 -0700 Eric Dumazet wrote:
> On 9/14/21 9:56 AM, Jakub Kicinski wrote:
> > Right, but then inet_sock_destruct() also purges the err queue, again.
> > I was afraid of regressions but we could just remove the purging 
> > from sk_stream_kill_queues(), and target net-next?
> >   
> 
> Yes, this would be the safest thing.

Alright, let me do more testing and post a removal to net-next.

> >> If you think there is a bug, it must be fixed in another way.
> >>
> >> IMO, preventing err packets from a prior session being queued after a tcp_disconnect()
> >> is rather hard. We should not even try (packets could be stuck for hours in a qdisc)  
> > 
> > Indeed, we could rearrange the SOCK_DEAD check in sock_queue_err_skb()
> > to skip queuing and put it under the err queue lock (provided we make
> > sk_stream_kill_queues() take that lock as well). But seems like an
> > overkill. I'd lean towards the existing patch or removing the purge from
> > sk_stream_kill_queues(). LMK what you prefer, this is not urgent.
> 
> The issue would really about the tcp_disconnect() case, 
> followed by a reuse of the socket to establish another session.
> 
> In order to prevent polluting sk_error_queue with notifications
> triggered by old packets (from prior flow), this would require
> to record the socket cookie in skb, or something like that :/

Ah, I see what you meant! I'll make a note of the disconnect case 
in the commit message. I just care that we don't corrupt the queue
on close.
diff mbox series

Patch

diff --git a/net/core/stream.c b/net/core/stream.c
index 4f1d4aa5fb38..7c585088f394 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -196,7 +196,7 @@  void sk_stream_kill_queues(struct sock *sk)
 	__skb_queue_purge(&sk->sk_receive_queue);
 
 	/* Next, the error queue. */
-	__skb_queue_purge(&sk->sk_error_queue);
+	skb_queue_purge(&sk->sk_error_queue);
 
 	/* Next, the write queue. */
 	WARN_ON(!skb_queue_empty(&sk->sk_write_queue));