diff mbox series

[net-next] ipv6: use prandom_u32() for ID generation

Message ID 20210529110746.6796-1-w@1wt.eu
State New
Headers show
Series [net-next] ipv6: use prandom_u32() for ID generation | expand

Commit Message

Willy Tarreau May 29, 2021, 11:07 a.m. UTC
This is a complement to commit aa6dd211e4b1 ("inet: use bigger hash
table for IP ID generation"), but focusing on some specific aspects
of IPv6.

Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a
minimum MTU of 1280, it's much less easy to force a remote peer to
produce many fragments to explore its ID sequence. In addition packet
IDs are 32-bit in IPv6, which further complicates their analysis. On
the other hand, it is often easier to choose among plenty of possible
source addresses and partially work around the bigger hash table the
commit above permits, which leaves IPv6 partially exposed to some
possibilities of remote analysis at the risk of weakening some
protocols like DNS if some IDs can be predicted with a good enough
probability.

Given the wide range of permitted IDs, the risk of collision is extremely
low so there's no need to rely on the positive increment algorithm that
is shared with the IPv4 code via ip_idents_reserve(). We have a fast
PRNG, so let's simply call prandom_u32() and be done with it.

Performance measurements at 10 Gbps couldn't show any difference with
the previous code, even when using a single core, because due to the
large fragments, we're limited to only ~930 kpps at 10 Gbps and the cost
of the random generation is completely offset by other operations and by
the network transfer time. In addition, this change removes the need to
update a shared entry in the idents table so it may even end up being
slightly faster on large scale systems where this matters.

The risk of at least one collision here is about 1/80 million among
10 IDs, 1/850k among 100 IDs, and still only 1/8.5k among 1000 IDs,
which remains very low compared to IPv4 where all IDs are reused
every 4 to 80ms on a 10 Gbps flow depending on packet sizes.

Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv6/output_core.c | 28 +++++-----------------------
 1 file changed, 5 insertions(+), 23 deletions(-)

Comments

David Laight May 31, 2021, 10:41 a.m. UTC | #1
From: Willy Tarreau

> Sent: 29 May 2021 12:08

> 

> This is a complement to commit aa6dd211e4b1 ("inet: use bigger hash

> table for IP ID generation"), but focusing on some specific aspects

> of IPv6.

> 

> Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a

> minimum MTU of 1280, it's much less easy to force a remote peer to

> produce many fragments to explore its ID sequence. In addition packet

> IDs are 32-bit in IPv6, which further complicates their analysis. On

> the other hand, it is often easier to choose among plenty of possible

> source addresses and partially work around the bigger hash table the

> commit above permits, which leaves IPv6 partially exposed to some

> possibilities of remote analysis at the risk of weakening some

> protocols like DNS if some IDs can be predicted with a good enough

> probability.

> 

> Given the wide range of permitted IDs, the risk of collision is extremely

> low so there's no need to rely on the positive increment algorithm that

> is shared with the IPv4 code via ip_idents_reserve(). We have a fast

> PRNG, so let's simply call prandom_u32() and be done with it.

> 

> Performance measurements at 10 Gbps couldn't show any difference with

> the previous code, even when using a single core, because due to the

> large fragments, we're limited to only ~930 kpps at 10 Gbps and the cost

> of the random generation is completely offset by other operations and by

> the network transfer time. In addition, this change removes the need to

> update a shared entry in the idents table so it may even end up being

> slightly faster on large scale systems where this matters.

> 

> The risk of at least one collision here is about 1/80 million among

> 10 IDs, 1/850k among 100 IDs, and still only 1/8.5k among 1000 IDs,

> which remains very low compared to IPv4 where all IDs are reused

> every 4 to 80ms on a 10 Gbps flow depending on packet sizes.


The problem is that, on average, 1 in 2^32 packets will use
the same id as the previous one.
If a fragment of such a pair gets lost horrid things are
likely to happen.
Note that this is different from an ID being reused after a
count of packets or after a time delay.

So you still need something to ensure IDs aren't reused immediately.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Willy Tarreau May 31, 2021, 11:19 a.m. UTC | #2
On Mon, May 31, 2021 at 10:41:18AM +0000, David Laight wrote:
> The problem is that, on average, 1 in 2^32 packets will use

> the same id as the previous one.

> If a fragment of such a pair gets lost horrid things are

> likely to happen.

> Note that this is different from an ID being reused after a

> count of packets or after a time delay.


I'm well aware of this, as this is something we discussed already
for IPv4 and which I objected to for the same reason (except that
it's 1/2^16 there).

With that said, the differences with IPv4 are significant here,
because you won't fragment below 1280 bytes per packet, which
means the issue could happen every 5 terabytes of fragmented
losses (or reorders). I'd say that in the worst case you're
using load-balanced links with some funny LB algorithm that
ensures that every second fragment is sent on the same link
as the previous packet's first fragment. This is the case where
you could provoke a failure every 5 TB. But then you're still
subject to UDP's 16-bit checksumm so in practice you're seeing
a failure every 320 PB. Finally it's the same probability as
getting both TCP csum + Ethernet CRC correct on a failure,
except that here it applies only to large fragments while with
TCP/eth it applies to any packet.

> So you still need something to ensure IDs aren't reused immediately.


That's what I initially did for IPv4 but Amit could exploit this
specific property. For example it makes it easier to count flows
behind NAT when there is a guaranteed distance :-/  We even tried
with a smooth, non-linear distribution, but that made no difference,
it remained observable.

Another idea we had in mind was to keep small increments for local
networks and use full randoms only over routers (since fragments
are rare and terribly unreliable on the net), but that would involve
quite significant changes for very little benefit compared to the
current option in the end.

Regards,
Willy
Eric Dumazet May 31, 2021, 7:27 p.m. UTC | #3
On Sat, May 29, 2021 at 1:08 PM Willy Tarreau <w@1wt.eu> wrote:
>

> This is a complement to commit aa6dd211e4b1 ("inet: use bigger hash

> table for IP ID generation"), but focusing on some specific aspects

> of IPv6.

>

> Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a

> minimum MTU of 1280, it's much less easy to force a remote peer to

> produce many fragments to explore its ID sequence. In addition packet

> IDs are 32-bit in IPv6, which further complicates their analysis. On

> the other hand, it is often easier to choose among plenty of possible

> source addresses and partially work around the bigger hash table the

> commit above permits, which leaves IPv6 partially exposed to some

> possibilities of remote analysis at the risk of weakening some

> protocols like DNS if some IDs can be predicted with a good enough

> probability.

>

> Given the wide range of permitted IDs, the risk of collision is extremely

> low so there's no need to rely on the positive increment algorithm that

> is shared with the IPv4 code via ip_idents_reserve(). We have a fast

> PRNG, so let's simply call prandom_u32() and be done with it.

>

> Performance measurements at 10 Gbps couldn't show any difference with

> the previous code, even when using a single core, because due to the

> large fragments, we're limited to only ~930 kpps at 10 Gbps and the cost

> of the random generation is completely offset by other operations and by

> the network transfer time. In addition, this change removes the need to

> update a shared entry in the idents table so it may even end up being

> slightly faster on large scale systems where this matters.

>

> The risk of at least one collision here is about 1/80 million among

> 10 IDs, 1/850k among 100 IDs, and still only 1/8.5k among 1000 IDs,

> which remains very low compared to IPv4 where all IDs are reused

> every 4 to 80ms on a 10 Gbps flow depending on packet sizes.

>

> Reported-by: Amit Klein <aksecurity@gmail.com>

> Cc: Eric Dumazet <edumazet@google.com>

> Signed-off-by: Willy Tarreau <w@1wt.eu>


Reviewed-by: Eric Dumazet <edumazet@google.com>


> ---

>  net/ipv6/output_core.c | 28 +++++-----------------------

>  1 file changed, 5 insertions(+), 23 deletions(-)

>

> diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c

> index af36acc1a644..2880dc7d9a49 100644

> --- a/net/ipv6/output_core.c

> +++ b/net/ipv6/output_core.c

> @@ -15,29 +15,11 @@ static u32 __ipv6_select_ident(struct net *net,

>                                const struct in6_addr *dst,

>                                const struct in6_addr *src)

>  {

> -       const struct {

> -               struct in6_addr dst;

> -               struct in6_addr src;

> -       } __aligned(SIPHASH_ALIGNMENT) combined = {

> -               .dst = *dst,

> -               .src = *src,

> -       };

> -       u32 hash, id;

> -

> -       /* Note the following code is not safe, but this is okay. */

> -       if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key)))

> -               get_random_bytes(&net->ipv4.ip_id_key,

> -                                sizeof(net->ipv4.ip_id_key));

> -

> -       hash = siphash(&combined, sizeof(combined), &net->ipv4.ip_id_key);

> -

> -       /* Treat id of 0 as unset and if we get 0 back from ip_idents_reserve,

> -        * set the hight order instead thus minimizing possible future

> -        * collisions.

> -        */

> -       id = ip_idents_reserve(hash, 1);

> -       if (unlikely(!id))

> -               id = 1 << 31;

> +       u32 id;

> +

> +       do {

> +               id = prandom_u32();

> +       } while (!id);

>

>         return id;

>  }

> --

> 2.17.5

>
patchwork-bot+netdevbpf@kernel.org June 1, 2021, 5:20 a.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (refs/heads/master):

On Sat, 29 May 2021 13:07:46 +0200 you wrote:
> This is a complement to commit aa6dd211e4b1 ("inet: use bigger hash

> table for IP ID generation"), but focusing on some specific aspects

> of IPv6.

> 

> Contary to IPv4, IPv6 only uses packet IDs with fragments, and with a

> minimum MTU of 1280, it's much less easy to force a remote peer to

> produce many fragments to explore its ID sequence. In addition packet

> IDs are 32-bit in IPv6, which further complicates their analysis. On

> the other hand, it is often easier to choose among plenty of possible

> source addresses and partially work around the bigger hash table the

> commit above permits, which leaves IPv6 partially exposed to some

> possibilities of remote analysis at the risk of weakening some

> protocols like DNS if some IDs can be predicted with a good enough

> probability.

> 

> [...]


Here is the summary with links:
  - [net-next] ipv6: use prandom_u32() for ID generation
    https://git.kernel.org/netdev/net-next/c/62f20e068ccc

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index af36acc1a644..2880dc7d9a49 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -15,29 +15,11 @@  static u32 __ipv6_select_ident(struct net *net,
 			       const struct in6_addr *dst,
 			       const struct in6_addr *src)
 {
-	const struct {
-		struct in6_addr dst;
-		struct in6_addr src;
-	} __aligned(SIPHASH_ALIGNMENT) combined = {
-		.dst = *dst,
-		.src = *src,
-	};
-	u32 hash, id;
-
-	/* Note the following code is not safe, but this is okay. */
-	if (unlikely(siphash_key_is_zero(&net->ipv4.ip_id_key)))
-		get_random_bytes(&net->ipv4.ip_id_key,
-				 sizeof(net->ipv4.ip_id_key));
-
-	hash = siphash(&combined, sizeof(combined), &net->ipv4.ip_id_key);
-
-	/* Treat id of 0 as unset and if we get 0 back from ip_idents_reserve,
-	 * set the hight order instead thus minimizing possible future
-	 * collisions.
-	 */
-	id = ip_idents_reserve(hash, 1);
-	if (unlikely(!id))
-		id = 1 << 31;
+	u32 id;
+
+	do {
+		id = prandom_u32();
+	} while (!id);
 
 	return id;
 }