Message ID | 5370c0ae03449239e3d1674ddcfb090cf6f20abe.1606253206.git.pabeni@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [net-next,v2] mptcp: be careful on MPTCP-level ack. | expand |
On Tue, 24 Nov 2020 22:51:24 +0100 Paolo Abeni wrote: > We can enter the main mptcp_recvmsg() loop even when > no subflows are connected. As note by Eric, that would > result in a divide by zero oops on ack generation. > > Address the issue by checking the subflow status before > sending the ack. > > Additionally protect mptcp_recvmsg() against invocation > with weird socket states. > > v1 -> v2: > - removed unneeded inline keyword - Jakub > > Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> > Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> Applied, thanks!
On 11/24/20 10:51 PM, Paolo Abeni wrote: > We can enter the main mptcp_recvmsg() loop even when > no subflows are connected. As note by Eric, that would > result in a divide by zero oops on ack generation. > > Address the issue by checking the subflow status before > sending the ack. > > Additionally protect mptcp_recvmsg() against invocation > with weird socket states. > > v1 -> v2: > - removed unneeded inline keyword - Jakub > > Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> > Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ > 1 file changed, 49 insertions(+), 18 deletions(-) > Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will trigger an infinite loop if there is available data in receive queue ? It seems the following is needed, commit ea4ca586b16f removed a needed check to catch this condition. Untested patch, I can submit it formally if you agree. diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 221f7cdd416bdb681968bf1b3ff2ed1b03cea3ce..57213ff60f784fae14c2a96f391ccdec6249c168 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1921,7 +1921,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, len = min_t(size_t, len, INT_MAX); target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); - for (;;) { + while (copied < len) { int bytes_read, old_space; bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied);
On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: > > On 11/24/20 10:51 PM, Paolo Abeni wrote: > > We can enter the main mptcp_recvmsg() loop even when > > no subflows are connected. As note by Eric, that would > > result in a divide by zero oops on ack generation. > > > > Address the issue by checking the subflow status before > > sending the ack. > > > > Additionally protect mptcp_recvmsg() against invocation > > with weird socket states. > > > > v1 -> v2: > > - removed unneeded inline keyword - Jakub > > > > Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> > > Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ > > 1 file changed, 49 insertions(+), 18 deletions(-) > > > > Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will > trigger an infinite loop if there is available data in receive queue ? Thank you for looking into this! I can't reproduce the issue with the following packetdrill ?!? +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> +0.1 read(3, ..., 0) = 0 The main recvmsg() loop is interrupted by the following check: if (copied >= target) break; I guess we could loop while the msk has available rcv space and some subflow is feeding new data. If so, I think moving: if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk, len - copied)) continue; after the above check should address the issue, and will make the common case faster. Let me test the above - unless I underlooked something relevant! Thanks, Paolo
On 12/2/20 4:37 PM, Paolo Abeni wrote: > On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: >> >> On 11/24/20 10:51 PM, Paolo Abeni wrote: >>> We can enter the main mptcp_recvmsg() loop even when >>> no subflows are connected. As note by Eric, that would >>> result in a divide by zero oops on ack generation. >>> >>> Address the issue by checking the subflow status before >>> sending the ack. >>> >>> Additionally protect mptcp_recvmsg() against invocation >>> with weird socket states. >>> >>> v1 -> v2: >>> - removed unneeded inline keyword - Jakub >>> >>> Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> >>> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") >>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>> --- >>> net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ >>> 1 file changed, 49 insertions(+), 18 deletions(-) >>> >> >> Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will >> trigger an infinite loop if there is available data in receive queue ? > > Thank you for looking into this! > > I can't reproduce the issue with the following packetdrill ?!? > > +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) > +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> > +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > > +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> > +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 > +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> > +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> > +0.1 read(3, ..., 0) = 0 > > The main recvmsg() loop is interrupted by the following check: > > if (copied >= target) > break; @copied should be 0, and @target should be 1 Are you sure the above condition is triggering ? Maybe read(fd, ..., 0) does not reach recvmsg() at all. You could try recvmsg() or recvmmsg(), > > I guess we could loop while the msk has available rcv space and some > subflow is feeding new data. If so, I think moving: > > if (skb_queue_empty(&msk->receive_queue) && > __mptcp_move_skbs(msk, len - copied)) > continue; > > after the above check should address the issue, and will make the > common case faster. Let me test the above - unless I underlooked > something relevant! > > Thanks, > > Paolo >
On 12/2/20 5:10 PM, Eric Dumazet wrote: > > > On 12/2/20 4:37 PM, Paolo Abeni wrote: >> On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: >>> >>> On 11/24/20 10:51 PM, Paolo Abeni wrote: >>>> We can enter the main mptcp_recvmsg() loop even when >>>> no subflows are connected. As note by Eric, that would >>>> result in a divide by zero oops on ack generation. >>>> >>>> Address the issue by checking the subflow status before >>>> sending the ack. >>>> >>>> Additionally protect mptcp_recvmsg() against invocation >>>> with weird socket states. >>>> >>>> v1 -> v2: >>>> - removed unneeded inline keyword - Jakub >>>> >>>> Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> >>>> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>>> --- >>>> net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ >>>> 1 file changed, 49 insertions(+), 18 deletions(-) >>>> >>> >>> Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will >>> trigger an infinite loop if there is available data in receive queue ? >> >> Thank you for looking into this! >> >> I can't reproduce the issue with the following packetdrill ?!? >> >> +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) >> +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> >> +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > >> +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> >> +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 >> +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> >> +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> >> +0.1 read(3, ..., 0) = 0 >> >> The main recvmsg() loop is interrupted by the following check: >> >> if (copied >= target) >> break; > > @copied should be 0, and @target should be 1 > > Are you sure the above condition is triggering ? > > Maybe read(fd, ..., 0) does not reach recvmsg() at all. Yes, sock_read_iter() has a shortcut : if (!iov_iter_count(to)) /* Match SYS5 behaviour */ res = sock_recvmsg(sock, &msg, msg.msg_flags); but recvmsg() does not have such check, or maybe I have not looked at the right place. > > You could try recvmsg() or recvmmsg(), > >> >> I guess we could loop while the msk has available rcv space and some >> subflow is feeding new data. If so, I think moving: >> >> if (skb_queue_empty(&msk->receive_queue) && >> __mptcp_move_skbs(msk, len - copied)) >> continue; >> >> after the above check should address the issue, and will make the >> common case faster. Let me test the above - unless I underlooked >> something relevant! >> >> Thanks, >> >> Paolo >>
On 12/2/20 5:30 PM, Eric Dumazet wrote: > > > On 12/2/20 5:10 PM, Eric Dumazet wrote: >> >> >> On 12/2/20 4:37 PM, Paolo Abeni wrote: >>> On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: >>>> >>>> On 11/24/20 10:51 PM, Paolo Abeni wrote: >>>>> We can enter the main mptcp_recvmsg() loop even when >>>>> no subflows are connected. As note by Eric, that would >>>>> result in a divide by zero oops on ack generation. >>>>> >>>>> Address the issue by checking the subflow status before >>>>> sending the ack. >>>>> >>>>> Additionally protect mptcp_recvmsg() against invocation >>>>> with weird socket states. >>>>> >>>>> v1 -> v2: >>>>> - removed unneeded inline keyword - Jakub >>>>> >>>>> Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> >>>>> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") >>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>>>> --- >>>>> net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ >>>>> 1 file changed, 49 insertions(+), 18 deletions(-) >>>>> >>>> >>>> Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will >>>> trigger an infinite loop if there is available data in receive queue ? >>> >>> Thank you for looking into this! >>> >>> I can't reproduce the issue with the following packetdrill ?!? >>> >>> +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) >>> +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> >>> +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > >>> +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> >>> +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 >>> +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> >>> +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> >>> +0.1 read(3, ..., 0) = 0 >>> >>> The main recvmsg() loop is interrupted by the following check: >>> >>> if (copied >= target) >>> break; >> >> @copied should be 0, and @target should be 1 >> >> Are you sure the above condition is triggering ? >> >> Maybe read(fd, ..., 0) does not reach recvmsg() at all. > > Yes, sock_read_iter() has a shortcut : > > if (!iov_iter_count(to)) /* Match SYS5 behaviour */ > res = sock_recvmsg(sock, &msg, msg.msg_flags); No idea what went wrong with my copy/paste. The real code is more like : if (!iov_iter_count(to)) /* Match SYS5 behaviour */ return 0; > > but recvmsg() does not have such check, or maybe I have not looked at the right place. > >> >> You could try recvmsg() or recvmmsg(), >> >>> >>> I guess we could loop while the msk has available rcv space and some >>> subflow is feeding new data. If so, I think moving: >>> >>> if (skb_queue_empty(&msk->receive_queue) && >>> __mptcp_move_skbs(msk, len - copied)) >>> continue; >>> >>> after the above check should address the issue, and will make the >>> common case faster. Let me test the above - unless I underlooked >>> something relevant! >>> >>> Thanks, >>> >>> Paolo >>>
On 12/2/20 5:32 PM, Eric Dumazet wrote: > > > On 12/2/20 5:30 PM, Eric Dumazet wrote: >> >> >> On 12/2/20 5:10 PM, Eric Dumazet wrote: >>> >>> >>> On 12/2/20 4:37 PM, Paolo Abeni wrote: >>>> On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: >>>>> >>>>> On 11/24/20 10:51 PM, Paolo Abeni wrote: >>>>>> We can enter the main mptcp_recvmsg() loop even when >>>>>> no subflows are connected. As note by Eric, that would >>>>>> result in a divide by zero oops on ack generation. >>>>>> >>>>>> Address the issue by checking the subflow status before >>>>>> sending the ack. >>>>>> >>>>>> Additionally protect mptcp_recvmsg() against invocation >>>>>> with weird socket states. >>>>>> >>>>>> v1 -> v2: >>>>>> - removed unneeded inline keyword - Jakub >>>>>> >>>>>> Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> >>>>>> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") >>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>>>>> --- >>>>>> net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ >>>>>> 1 file changed, 49 insertions(+), 18 deletions(-) >>>>>> >>>>> >>>>> Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will >>>>> trigger an infinite loop if there is available data in receive queue ? >>>> >>>> Thank you for looking into this! >>>> >>>> I can't reproduce the issue with the following packetdrill ?!? >>>> >>>> +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) >>>> +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> >>>> +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > >>>> +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> >>>> +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 >>>> +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> >>>> +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> >>>> +0.1 read(3, ..., 0) = 0 >>>> >>>> The main recvmsg() loop is interrupted by the following check: >>>> >>>> if (copied >= target) >>>> break; >>> >>> @copied should be 0, and @target should be 1 >>> >>> Are you sure the above condition is triggering ? >>> >>> Maybe read(fd, ..., 0) does not reach recvmsg() at all. >> >> Yes, sock_read_iter() has a shortcut : >> >> if (!iov_iter_count(to)) /* Match SYS5 behaviour */ >> res = sock_recvmsg(sock, &msg, msg.msg_flags); > > No idea what went wrong with my copy/paste. > > The real code is more like : > > if (!iov_iter_count(to)) /* Match SYS5 behaviour */ > return 0; > Packetdrill recvmsg syntax would be something like +0 recvmsg(3, {msg_name(...)=..., msg_iov(1)=[{..., 0}], msg_flags=0 }, 0) = 0
On Wed, 2020-12-02 at 17:45 +0100, Eric Dumazet wrote: > > On 12/2/20 5:32 PM, Eric Dumazet wrote: > > > > On 12/2/20 5:30 PM, Eric Dumazet wrote: > > > > > > On 12/2/20 5:10 PM, Eric Dumazet wrote: > > > > > > > > On 12/2/20 4:37 PM, Paolo Abeni wrote: > > > > > On Wed, 2020-12-02 at 14:18 +0100, Eric Dumazet wrote: > > > > > > On 11/24/20 10:51 PM, Paolo Abeni wrote: > > > > > > > We can enter the main mptcp_recvmsg() loop even when > > > > > > > no subflows are connected. As note by Eric, that would > > > > > > > result in a divide by zero oops on ack generation. > > > > > > > > > > > > > > Address the issue by checking the subflow status before > > > > > > > sending the ack. > > > > > > > > > > > > > > Additionally protect mptcp_recvmsg() against invocation > > > > > > > with weird socket states. > > > > > > > > > > > > > > v1 -> v2: > > > > > > > - removed unneeded inline keyword - Jakub > > > > > > > > > > > > > > Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> > > > > > > > Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") > > > > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > > > > > --- > > > > > > > net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ > > > > > > > 1 file changed, 49 insertions(+), 18 deletions(-) > > > > > > > > > > > > > > > > > > > Looking at mptcp recvmsg(), it seems that a read(fd, ..., 0) will > > > > > > trigger an infinite loop if there is available data in receive queue ? > > > > > > > > > > Thank you for looking into this! > > > > > > > > > > I can't reproduce the issue with the following packetdrill ?!? > > > > > > > > > > +0.0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress) > > > > > +0.1 > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 fflags[flag_h] nokey> > > > > > +0.1 < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscaale 8,mpcapable v1 flags[flag_h] key[skey=2] > > > > > > +0.1 > . 1:1(0) ack 1 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h]] key[ckey,skey]> > > > > > +0.1 fcntl(3, F_SETFL, O_RDWR) = 0 > > > > > +0.1 < . 1:201(200) ack 1 win 225 <dss dack8=1 dsn8=1 ssn=1 dll=200 nocs, nop, nop> > > > > > +0.1 > . 1:1(0) ack 201 <nop, nop, TS val 100 ecr 700, dss dack8=201 dll=00 nocs> > > > > > +0.1 read(3, ..., 0) = 0 > > > > > > > > > > The main recvmsg() loop is interrupted by the following check: > > > > > > > > > > if (copied >= target) > > > > > break; > > > > > > > > @copied should be 0, and @target should be 1 > > > > > > > > Are you sure the above condition is triggering ? > > > > > > > > Maybe read(fd, ..., 0) does not reach recvmsg() at all. > > > > > > Yes, sock_read_iter() has a shortcut : > > > > > > if (!iov_iter_count(to)) /* Match SYS5 behaviour */ > > > res = sock_recvmsg(sock, &msg, msg.msg_flags); > > > > No idea what went wrong with my copy/paste. > > > > The real code is more like : > > > > if (!iov_iter_count(to)) /* Match SYS5 behaviour */ > > return 0; > > > > Packetdrill recvmsg syntax would be something like > > +0 recvmsg(3, {msg_name(...)=..., > msg_iov(1)=[{..., 0}], > msg_flags=0 > }, 0) = 0 Thank you very much for all the effort! Yes, with recvmsg() the packet drill hangs. I agree your proposed fix is correct. I can test it explicitly later today. (and sorry for the initial confusing/confused reply) Paolo
On Wed, 2020-12-02 at 17:51 +0100, Paolo Abeni wrote: > On Wed, 2020-12-02 at 17:45 +0100, Eric Dumazet wrote: > > Packetdrill recvmsg syntax would be something like > > > > +0 recvmsg(3, {msg_name(...)=..., > > msg_iov(1)=[{..., 0}], > > msg_flags=0 > > }, 0) = 0 > > Thank you very much for all the effort! > > Yes, with recvmsg() the packet drill hangs. I agree your proposed fix > is correct. > > I can test it explicitly later today. The proposed fix passes successfully the pktdrill test and there are no regressions in the other self-tests. Feel free to add: Tested-by: Paolo Abeni <pabeni@redhat.com> thanks! Paolo
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4b7794835fea..371a5e691a9a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -419,31 +419,57 @@ static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)); } -static void mptcp_send_ack(struct mptcp_sock *msk, bool force) +static bool tcp_can_send_ack(const struct sock *ssk) +{ + return !((1 << inet_sk_state_load(ssk)) & + (TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_TIME_WAIT | TCPF_CLOSE)); +} + +static void mptcp_send_ack(struct mptcp_sock *msk) { struct mptcp_subflow_context *subflow; - struct sock *pick = NULL; mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - if (force) { - lock_sock(ssk); + lock_sock(ssk); + if (tcp_can_send_ack(ssk)) tcp_send_ack(ssk); - release_sock(ssk); - continue; - } - - /* if the hintes ssk is still active, use it */ - pick = ssk; - if (ssk == msk->ack_hint) - break; + release_sock(ssk); } - if (!force && pick) { - lock_sock(pick); - tcp_cleanup_rbuf(pick, 1); - release_sock(pick); +} + +static bool mptcp_subflow_cleanup_rbuf(struct sock *ssk) +{ + int ret; + + lock_sock(ssk); + ret = tcp_can_send_ack(ssk); + if (ret) + tcp_cleanup_rbuf(ssk, 1); + release_sock(ssk); + return ret; +} + +static void mptcp_cleanup_rbuf(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + /* if the hinted ssk is still active, try to use it */ + if (likely(msk->ack_hint)) { + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + if (msk->ack_hint == ssk && + mptcp_subflow_cleanup_rbuf(ssk)) + return; + } } + + /* otherwise pick the first active subflow */ + mptcp_for_each_subflow(msk, subflow) + if (mptcp_subflow_cleanup_rbuf(mptcp_subflow_tcp_sock(subflow))) + return; } static bool mptcp_check_data_fin(struct sock *sk) @@ -494,7 +520,7 @@ static bool mptcp_check_data_fin(struct sock *sk) ret = true; mptcp_set_timeout(sk, NULL); - mptcp_send_ack(msk, true); + mptcp_send_ack(msk); mptcp_close_wake_up(sk); } return ret; @@ -1579,6 +1605,11 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, return -EOPNOTSUPP; lock_sock(sk); + if (unlikely(sk->sk_state == TCP_LISTEN)) { + copied = -ENOTCONN; + goto out_err; + } + timeo = sock_rcvtimeo(sk, nonblock); len = min_t(size_t, len, INT_MAX); @@ -1604,7 +1635,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, /* be sure to advertise window change */ old_space = READ_ONCE(msk->old_wspace); if ((tcp_space(sk) - old_space) >= old_space) - mptcp_send_ack(msk, false); + mptcp_cleanup_rbuf(msk); /* only the master socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg()
We can enter the main mptcp_recvmsg() loop even when no subflows are connected. As note by Eric, that would result in a divide by zero oops on ack generation. Address the issue by checking the subflow status before sending the ack. Additionally protect mptcp_recvmsg() against invocation with weird socket states. v1 -> v2: - removed unneeded inline keyword - Jakub Reported-and-suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: ea4ca586b16f ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/mptcp/protocol.c | 67 ++++++++++++++++++++++++++++++++------------ 1 file changed, 49 insertions(+), 18 deletions(-)