mbox series

[net,0/9] Netfilter fixes for net

Message ID 20210306121223.28711-1-pablo@netfilter.org
Headers show
Series Netfilter fixes for net | expand

Message

Pablo Neira Ayuso March 6, 2021, 12:12 p.m. UTC
Hi,

The following patchset contains Netfilter fixes for net:

1) Fix incorrect enum type definition in nfnetlink_cthelper UAPI,
   from Dmitry V. Levin.

2) Remove extra space in deprecated automatic helper assignment
   notice, from Klemen Košir.

3) Drop early socket demux socket after NAT mangling, from
   Florian Westphal. Add a test to exercise this bug.

4) Fix bogus invalid packet report in the conntrack TCP tracker,
   also from Florian.

5) Fix access to xt[NFPROTO_UNSPEC] list with no mutex
   in target/match_revfn(), from Vasily Averin.

6) Disallow updates on the table ownership flag.

7) Fix double hook unregistration of tables with owner.

8) Remove bogus check on the table owner in __nft_release_tables().

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit eee7ede695cfbb19fefdeb14992535b605448f35:

  Merge branch 'bnxt_en-error-recovery-bug-fixes' (2021-02-26 15:50:25 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to bd1777b3a88f98e223392221b330668458aac7f1:

  netfilter: nftables: bogus check for netlink portID with table owner (2021-03-04 04:02:54 +0100)

----------------------------------------------------------------
Dmitry V. Levin (1):
      uapi: nfnetlink_cthelper.h: fix userspace compilation error

Florian Westphal (3):
      netfilter: nf_nat: undo erroneous tcp edemux lookup
      netfilter: conntrack: avoid misleading 'invalid' in log message
      selftests: netfilter: test nat port clash resolution interaction with tcp early demux

Klemen Košir (1):
      netfilter: conntrack: Remove a double space in a log message

Pablo Neira Ayuso (3):
      netfilter: nftables: disallow updates on table ownership
      netfilter: nftables: fix possible double hook unregistration with table owner
      netfilter: nftables: bogus check for netlink portID with table owner

Vasily Averin (1):
      netfilter: x_tables: gpf inside xt_find_revision()

 include/uapi/linux/netfilter/nfnetlink_cthelper.h  |  2 +-
 net/netfilter/nf_conntrack_helper.c                |  3 +-
 net/netfilter/nf_conntrack_proto_tcp.c             |  6 +-
 net/netfilter/nf_nat_proto.c                       | 25 +++++-
 net/netfilter/nf_tables_api.c                      | 19 +++--
 net/netfilter/x_tables.c                           |  6 +-
 tools/testing/selftests/netfilter/Makefile         |  2 +-
 tools/testing/selftests/netfilter/nf_nat_edemux.sh | 99 ++++++++++++++++++++++
 8 files changed, 145 insertions(+), 17 deletions(-)
 create mode 100755 tools/testing/selftests/netfilter/nf_nat_edemux.sh

Comments

Mika Penttilä March 6, 2021, 2:49 p.m. UTC | #1
On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
> From: Florian Westphal <fw@strlen.de>
>
> Under extremely rare conditions TCP early demux will retrieve the wrong
> socket.
>
> 1. local machine establishes a connection to a remote server, S, on port
>     p.
>
>     This gives:
>     laddr:lport -> S:p
>     ... both in tcp and conntrack.
>
> 2. local machine establishes a connection to host H, on port p2.
>     2a. TCP stack choses same laddr:lport, so we have
>     laddr:lport -> H:p2 from TCP point of view.
>     2b). There is a destination NAT rewrite in place, translating
>          H:p2 to S:p.  This results in following conntrack entries:
>
>     I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
>     II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
>
>     NAT engine has rewritten laddr:lport to laddr:lport2 to map
>     the reply packet to the correct origin.
Could you eloborate where and how linux nat engine is doing the

laddr:lport to laddr:lport2

rewrite? There's only DST nat and there should be conflict (for reply) 
in tuple establishment afaik....


>
>     When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
>     will undo-the SNAT transformation, rewriting IP header to
>     S:p -> laddr:lport
>
>     This causes TCP early demux to associate the skb with the TCP socket
>     of the first connection.
>
>     The INPUT hook will then reverse the DNAT transformation, rewriting
>     the IP header to H:p2 -> laddr:lport.
>
> Because packet ends up with the wrong socket, the new connection
> never completes: originator stays in SYN_SENT and conntrack entry
> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
> until it gives up.
>
> To resolve this, orphan the skb after the input rewrite:
> Because the source IP address changed, the socket must be incorrect.
> We can't move the DNAT undo to prerouting due to backwards
> compatibility, doing so will make iptables/nftables rules to no longer
> match the way they did.
>
> After orphan, the packet will be handed to the next protocol layer
> (tcp, udp, ...) and that will repeat the socket lookup just like as if
> early demux was disabled.
>
> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>   net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
> index e87b6bd6b3cd..4731d21fc3ad 100644
> --- a/net/netfilter/nf_nat_proto.c
> +++ b/net/netfilter/nf_nat_proto.c
> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
>   }
>   
>   static unsigned int
> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
> -	       const struct nf_hook_state *state)
> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
> +			const struct nf_hook_state *state)
>   {
>   	unsigned int ret;
>   	__be32 daddr = ip_hdr(skb)->daddr;
> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>   	return ret;
>   }
>   
> +static unsigned int
> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
> +		     const struct nf_hook_state *state)
> +{
> +	__be32 saddr = ip_hdr(skb)->saddr;
> +	struct sock *sk = skb->sk;
> +	unsigned int ret;
> +
> +	ret = nf_nat_ipv4_fn(priv, skb, state);
> +
> +	if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
> +	    !inet_sk_transparent(sk))
> +		skb_orphan(skb); /* TCP edemux obtained wrong socket */
> +
> +	return ret;
> +}
> +
>   static unsigned int
>   nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
>   		const struct nf_hook_state *state)
> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
>   static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>   	/* Before packet filtering, change destination */
>   	{
> -		.hook		= nf_nat_ipv4_in,
> +		.hook		= nf_nat_ipv4_pre_routing,
>   		.pf		= NFPROTO_IPV4,
>   		.hooknum	= NF_INET_PRE_ROUTING,
>   		.priority	= NF_IP_PRI_NAT_DST,
> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>   	},
>   	/* After packet filtering, change source */
>   	{
> -		.hook		= nf_nat_ipv4_fn,
> +		.hook		= nf_nat_ipv4_local_in,
>   		.pf		= NFPROTO_IPV4,
>   		.hooknum	= NF_INET_LOCAL_IN,
>   		.priority	= NF_IP_PRI_NAT_SRC,
Mika Penttilä March 6, 2021, 4:10 p.m. UTC | #2
On 6.3.2021 16.49, Mika Penttilä wrote:
>
>
> On 6.3.2021 14.12, Pablo Neira Ayuso wrote:
>> From: Florian Westphal <fw@strlen.de>
>>
>> Under extremely rare conditions TCP early demux will retrieve the wrong
>> socket.
>>
>> 1. local machine establishes a connection to a remote server, S, on port
>>     p.
>>
>>     This gives:
>>     laddr:lport -> S:p
>>     ... both in tcp and conntrack.
>>
>> 2. local machine establishes a connection to host H, on port p2.
>>     2a. TCP stack choses same laddr:lport, so we have
>>     laddr:lport -> H:p2 from TCP point of view.
>>     2b). There is a destination NAT rewrite in place, translating
>>          H:p2 to S:p.  This results in following conntrack entries:
>>
>>     I)  laddr:lport -> S:p  (origin)  S:p -> laddr:lport (reply)
>>     II) laddr:lport -> H:p2 (origin)  S:p -> laddr:lport2 (reply)
>>
>>     NAT engine has rewritten laddr:lport to laddr:lport2 to map
>>     the reply packet to the correct origin.
> Could you eloborate where and how linux nat engine is doing the
>
> laddr:lport to laddr:lport2
>
> rewrite? There's only DST nat and there should be conflict (for reply) 
> in tuple establishment afaik....

Ah I see it is the nat null binding for src to make it unique

>
>
>>
>>     When server sends SYN/ACK to laddr:lport2, the PREROUTING hook
>>     will undo-the SNAT transformation, rewriting IP header to
>>     S:p -> laddr:lport
>>
>>     This causes TCP early demux to associate the skb with the TCP socket
>>     of the first connection.
>>
>>     The INPUT hook will then reverse the DNAT transformation, rewriting
>>     the IP header to H:p2 -> laddr:lport.
>>
>> Because packet ends up with the wrong socket, the new connection
>> never completes: originator stays in SYN_SENT and conntrack entry
>> remains in SYN_RECV until timeout, and responder retransmits SYN/ACK
>> until it gives up.
>>
>> To resolve this, orphan the skb after the input rewrite:
>> Because the source IP address changed, the socket must be incorrect.
>> We can't move the DNAT undo to prerouting due to backwards
>> compatibility, doing so will make iptables/nftables rules to no longer
>> match the way they did.
>>
>> After orphan, the packet will be handed to the next protocol layer
>> (tcp, udp, ...) and that will repeat the socket lookup just like as if
>> early demux was disabled.
>>
>> Fixes: 41063e9dd1195 ("ipv4: Early TCP socket demux.")
>> Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1427
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> ---
>>   net/netfilter/nf_nat_proto.c | 25 +++++++++++++++++++++----
>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
>> index e87b6bd6b3cd..4731d21fc3ad 100644
>> --- a/net/netfilter/nf_nat_proto.c
>> +++ b/net/netfilter/nf_nat_proto.c
>> @@ -646,8 +646,8 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
>>   }
>>     static unsigned int
>> -nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>> -           const struct nf_hook_state *state)
>> +nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
>> +            const struct nf_hook_state *state)
>>   {
>>       unsigned int ret;
>>       __be32 daddr = ip_hdr(skb)->daddr;
>> @@ -659,6 +659,23 @@ nf_nat_ipv4_in(void *priv, struct sk_buff *skb,
>>       return ret;
>>   }
>>   +static unsigned int
>> +nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
>> +             const struct nf_hook_state *state)
>> +{
>> +    __be32 saddr = ip_hdr(skb)->saddr;
>> +    struct sock *sk = skb->sk;
>> +    unsigned int ret;
>> +
>> +    ret = nf_nat_ipv4_fn(priv, skb, state);
>> +
>> +    if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
>> +        !inet_sk_transparent(sk))
>> +        skb_orphan(skb); /* TCP edemux obtained wrong socket */
>> +
>> +    return ret;
>> +}
>> +
>>   static unsigned int
>>   nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
>>           const struct nf_hook_state *state)
>> @@ -736,7 +753,7 @@ nf_nat_ipv4_local_fn(void *priv, struct sk_buff 
>> *skb,
>>   static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
>>       /* Before packet filtering, change destination */
>>       {
>> -        .hook        = nf_nat_ipv4_in,
>> +        .hook        = nf_nat_ipv4_pre_routing,
>>           .pf        = NFPROTO_IPV4,
>>           .hooknum    = NF_INET_PRE_ROUTING,
>>           .priority    = NF_IP_PRI_NAT_DST,
>> @@ -757,7 +774,7 @@ static const struct nf_hook_ops nf_nat_ipv4_ops[] 
>> = {
>>       },
>>       /* After packet filtering, change source */
>>       {
>> -        .hook        = nf_nat_ipv4_fn,
>> +        .hook        = nf_nat_ipv4_local_in,
>>           .pf        = NFPROTO_IPV4,
>>           .hooknum    = NF_INET_LOCAL_IN,
>>           .priority    = NF_IP_PRI_NAT_SRC,
>