diff mbox series

[2/7] mac80211: force calculation of software hash for tx fair queueing

Message ID 20201216204316.44498-2-nbd@nbd.name
State New
Headers show
Series [1/7] net/fq_impl: bulk-free packets from a flow on overmemory | expand

Commit Message

Felix Fietkau Dec. 16, 2020, 8:43 p.m. UTC
Depending on the source, a hardware calculated hash may not provide the
same level of collision resistance.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 net/mac80211/tx.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Toke Høiland-Jørgensen Dec. 17, 2020, 11:54 a.m. UTC | #1
Felix Fietkau <nbd@nbd.name> writes:

> Depending on the source, a hardware calculated hash may not provide the

> same level of collision resistance.


This seems like it would have performance implications?

Also, this can potentially discard information from tunnels that
preserve the hash before encapsulation (we added support for this to
Wireguard which had some nice effects on queueing of encapsulated
traffic).

Could you elaborate a bit on the kind of bad hardware hashes you were
seeing?

-Toke
Felix Fietkau Dec. 17, 2020, 12:20 p.m. UTC | #2
On 2020-12-17 12:54, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:

> 

>> Depending on the source, a hardware calculated hash may not provide the

>> same level of collision resistance.

> 

> This seems like it would have performance implications?

> 

> Also, this can potentially discard information from tunnels that

> preserve the hash before encapsulation (we added support for this to

> Wireguard which had some nice effects on queueing of encapsulated

> traffic).

If the hash was calculated in software using the flow dissector, it will
be preserved, even if it went through a few virtual interfaces.
The only hashes discarded are hardware generated ones.

- Felix
Toke Høiland-Jørgensen Dec. 17, 2020, 1:01 p.m. UTC | #3
Felix Fietkau <nbd@nbd.name> writes:

> On 2020-12-17 12:54, Toke Høiland-Jørgensen wrote:

>> Felix Fietkau <nbd@nbd.name> writes:

>> 

>>> Depending on the source, a hardware calculated hash may not provide the

>>> same level of collision resistance.

>> 

>> This seems like it would have performance implications?

>> 

>> Also, this can potentially discard information from tunnels that

>> preserve the hash before encapsulation (we added support for this to

>> Wireguard which had some nice effects on queueing of encapsulated

>> traffic).

> If the hash was calculated in software using the flow dissector, it will

> be preserved, even if it went through a few virtual interfaces.

> The only hashes discarded are hardware generated ones.


Yeah, but I was thinking something like:

Packet comes in with HW hash -> gets encapsulated (preserving the hash)
-> gets to mac80211 which discards the HW hash. So now you're replacing
a (possibly bad-quality) HW hash with a software hash of the *outer*
encapsulation header...

-Toke
Felix Fietkau Dec. 17, 2020, 3:48 p.m. UTC | #4
On 2020-12-17 14:01, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:

> 

>> On 2020-12-17 12:54, Toke Høiland-Jørgensen wrote:

>>> Felix Fietkau <nbd@nbd.name> writes:

>>> 

>>>> Depending on the source, a hardware calculated hash may not provide the

>>>> same level of collision resistance.

>>> 

>>> This seems like it would have performance implications?

>>> 

>>> Also, this can potentially discard information from tunnels that

>>> preserve the hash before encapsulation (we added support for this to

>>> Wireguard which had some nice effects on queueing of encapsulated

>>> traffic).

>> If the hash was calculated in software using the flow dissector, it will

>> be preserved, even if it went through a few virtual interfaces.

>> The only hashes discarded are hardware generated ones.

> 

> Yeah, but I was thinking something like:

> 

> Packet comes in with HW hash -> gets encapsulated (preserving the hash)

> -> gets to mac80211 which discards the HW hash. So now you're replacing

> a (possibly bad-quality) HW hash with a software hash of the *outer*

> encapsulation header...

If this becomes a problem, I think we should add a similar patch to
wireguard, which already calls skb_get_hash before encapsulating.
Other regular tunnels should already get a proper hash, since the flow
dissector will take care of it.

The reason I did this patch is because I have a patch to set the hw flow
hash in the skb on mtk_eth_soc, which does help GRO, but leads to
collisions on mac80211 fq.

- Felix
Toke Høiland-Jørgensen Dec. 17, 2020, 5:26 p.m. UTC | #5
Felix Fietkau <nbd@nbd.name> writes:

> On 2020-12-17 14:01, Toke Høiland-Jørgensen wrote:

>> Felix Fietkau <nbd@nbd.name> writes:

>> 

>>> On 2020-12-17 12:54, Toke Høiland-Jørgensen wrote:

>>>> Felix Fietkau <nbd@nbd.name> writes:

>>>> 

>>>>> Depending on the source, a hardware calculated hash may not provide the

>>>>> same level of collision resistance.

>>>> 

>>>> This seems like it would have performance implications?

>>>> 

>>>> Also, this can potentially discard information from tunnels that

>>>> preserve the hash before encapsulation (we added support for this to

>>>> Wireguard which had some nice effects on queueing of encapsulated

>>>> traffic).

>>> If the hash was calculated in software using the flow dissector, it will

>>> be preserved, even if it went through a few virtual interfaces.

>>> The only hashes discarded are hardware generated ones.

>> 

>> Yeah, but I was thinking something like:

>> 

>> Packet comes in with HW hash -> gets encapsulated (preserving the hash)

>> -> gets to mac80211 which discards the HW hash. So now you're replacing

>> a (possibly bad-quality) HW hash with a software hash of the *outer*

>> encapsulation header...

> If this becomes a problem, I think we should add a similar patch to

> wireguard, which already calls skb_get_hash before encapsulating.

> Other regular tunnels should already get a proper hash, since the flow

> dissector will take care of it.


But then we'd need to go around adding this to all the places that uses
the hash just to work around a particular piece of broken(ish) hardware.
And we're hard-coding a behaviour in mac80211 that means we'll *always*
recompute the hash, even for hardware that's not similarly broken.

> The reason I did this patch is because I have a patch to set the hw flow

> hash in the skb on mtk_eth_soc, which does help GRO, but leads to

> collisions on mac80211 fq.


So wouldn't the right thing to do here be to put a flag into the RX
device that makes the stack clear the hash after using it for GRO?

-Toke
Felix Fietkau Dec. 17, 2020, 7:07 p.m. UTC | #6
On 2020-12-17 18:26, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:

>> If this becomes a problem, I think we should add a similar patch to

>> wireguard, which already calls skb_get_hash before encapsulating.

>> Other regular tunnels should already get a proper hash, since the flow

>> dissector will take care of it.

> 

> But then we'd need to go around adding this to all the places that uses

> the hash just to work around a particular piece of broken(ish) hardware.

> And we're hard-coding a behaviour in mac80211 that means we'll *always*

> recompute the hash, even for hardware that's not similarly broken.

> 

>> The reason I did this patch is because I have a patch to set the hw flow

>> hash in the skb on mtk_eth_soc, which does help GRO, but leads to

>> collisions on mac80211 fq.

> 

> So wouldn't the right thing to do here be to put a flag into the RX

> device that makes the stack clear the hash after using it for GRO?

I don't think the hardware is broken, I think fq is simply making
assumptions about the hash that aren't met by the hw.

The documentation in include/linux/skbuff.h mentions these requirements
for the skb hash:
 * 1) Two packets in different flows have different hash values
 * 2) Two packets in the same flow should have the same hash value

FWIW, I think the 'should' from 2) probably belongs to 1), otherwise it
makes no sense. Two packets of the flow must return the same hash,
otherwise the hash is broken. I'm assuming this is a typo.

In addition to those properties, fq needs the hash to be
cryptographically secure, so that it can use reciprocal_scale to sort
flows into buckets without allowing an attacker to craft collisions.
That's also the reason why it used to use skb_get_hash_perturb with a
random perturbation until we got software hashes based on siphash.

I think it's safe to assume that most hardware out there will not
provide collision resistant hashes, so in my opinion fq cannot rely on a
hardware hash. We don't need to go around and change all places that use
the hash, just those that assume a collision resistant one.

- Felix
Toke Høiland-Jørgensen Dec. 18, 2020, 12:41 p.m. UTC | #7
Felix Fietkau <nbd@nbd.name> writes:

> On 2020-12-17 18:26, Toke Høiland-Jørgensen wrote:

>> Felix Fietkau <nbd@nbd.name> writes:

>>> If this becomes a problem, I think we should add a similar patch to

>>> wireguard, which already calls skb_get_hash before encapsulating.

>>> Other regular tunnels should already get a proper hash, since the flow

>>> dissector will take care of it.

>> 

>> But then we'd need to go around adding this to all the places that uses

>> the hash just to work around a particular piece of broken(ish) hardware.

>> And we're hard-coding a behaviour in mac80211 that means we'll *always*

>> recompute the hash, even for hardware that's not similarly broken.

>> 

>>> The reason I did this patch is because I have a patch to set the hw flow

>>> hash in the skb on mtk_eth_soc, which does help GRO, but leads to

>>> collisions on mac80211 fq.

>> 

>> So wouldn't the right thing to do here be to put a flag into the RX

>> device that makes the stack clear the hash after using it for GRO?

> I don't think the hardware is broken, I think fq is simply making

> assumptions about the hash that aren't met by the hw.

>

> The documentation in include/linux/skbuff.h mentions these requirements

> for the skb hash:

>  * 1) Two packets in different flows have different hash values

>  * 2) Two packets in the same flow should have the same hash value

>

> FWIW, I think the 'should' from 2) probably belongs to 1), otherwise it

> makes no sense. Two packets of the flow must return the same hash,

> otherwise the hash is broken. I'm assuming this is a typo.


There's some text further down indicating this is deliberate:

 * A driver may indicate a hash level which is less specific than the
 * actual layer the hash was computed on. For instance, a hash computed
 * at L4 may be considered an L3 hash. This should only be done if the
 * driver can't unambiguously determine that the HW computed the hash at
 * the higher layer. Note that the "should" in the second property above
 * permits this.

So the way I'm reading that whole section, either the intent is that
both properties should be fulfilled, or that the first one (being
collision-free) is more important...

> In addition to those properties, fq needs the hash to be

> cryptographically secure, so that it can use reciprocal_scale to sort

> flows into buckets without allowing an attacker to craft collisions.

> That's also the reason why it used to use skb_get_hash_perturb with a

> random perturbation until we got software hashes based on siphash.

>

> I think it's safe to assume that most hardware out there will not

> provide collision resistant hashes, so in my opinion fq cannot rely on a

> hardware hash. We don't need to go around and change all places that use

> the hash, just those that assume a collision resistant one.


I did a quick grep-based survey of uses of skb_get_hash() outside
drivers - this is what I found (with my interpretations of what they're
used for):

net/core/dev.c           : skb_tx_hash() - selecting TX queue w/reciprocal scale
net/core/dev.c           : RX flow steering, flow limiting
net/core/dev.c           : GRO
net/core/filter.c        : BPF helper
include/net/ip_tunnels.h : flowi4_multipath_hash - so multipath selection?
net/ipv{4,6}/route.c     : multipath hashing (if l4)
net/ipv6/seg6_iptunnel   : building flow labels
net/mac80211/tx.c        : FQ
net/mptcp/syncookies     : storing cookies (XOR w/net_hash_mix())
net/netfilter/nft_hash.c : symhash input (seems to be load balancing)
net/openvswitch          : flow hashing and actions
net/packet/af_packet.c   : PACKET_FANOUT_HASH
net/sched/sch_*.c        : flow hashing for queueing

Apart from GRO it's not obvious to me that a trivially
attacker-controlled hash is safe in any of those uses?

-Toke
Felix Fietkau Dec. 18, 2020, 1:40 p.m. UTC | #8
On 2020-12-18 13:41, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:

> 

>> On 2020-12-17 18:26, Toke Høiland-Jørgensen wrote:

>>> Felix Fietkau <nbd@nbd.name> writes:

>>>> If this becomes a problem, I think we should add a similar patch to

>>>> wireguard, which already calls skb_get_hash before encapsulating.

>>>> Other regular tunnels should already get a proper hash, since the flow

>>>> dissector will take care of it.

>>> 

>>> But then we'd need to go around adding this to all the places that uses

>>> the hash just to work around a particular piece of broken(ish) hardware.

>>> And we're hard-coding a behaviour in mac80211 that means we'll *always*

>>> recompute the hash, even for hardware that's not similarly broken.

>>> 

>>>> The reason I did this patch is because I have a patch to set the hw flow

>>>> hash in the skb on mtk_eth_soc, which does help GRO, but leads to

>>>> collisions on mac80211 fq.

>>> 

>>> So wouldn't the right thing to do here be to put a flag into the RX

>>> device that makes the stack clear the hash after using it for GRO?

>> I don't think the hardware is broken, I think fq is simply making

>> assumptions about the hash that aren't met by the hw.

>>

>> The documentation in include/linux/skbuff.h mentions these requirements

>> for the skb hash:

>>  * 1) Two packets in different flows have different hash values

>>  * 2) Two packets in the same flow should have the same hash value

>>

>> FWIW, I think the 'should' from 2) probably belongs to 1), otherwise it

>> makes no sense. Two packets of the flow must return the same hash,

>> otherwise the hash is broken. I'm assuming this is a typo.

> 

> There's some text further down indicating this is deliberate:

> 

>  * A driver may indicate a hash level which is less specific than the

>  * actual layer the hash was computed on. For instance, a hash computed

>  * at L4 may be considered an L3 hash. This should only be done if the

>  * driver can't unambiguously determine that the HW computed the hash at

>  * the higher layer. Note that the "should" in the second property above

>  * permits this.

> 

> So the way I'm reading that whole section, either the intent is that

> both properties should be fulfilled, or that the first one (being

> collision-free) is more important...

A hash - by definition - cannot be collision free.
But that's beside the point. On my hw, the hash itself seems collision
free for the flows that I'm pushing, but the result of the
reciprocal_scale isn't.
I took another look and figured out the reason for that:
The hw delivers a 14 bit hash. reciprocal_scale assumes that the values
are distributed across the full 32 bit range. So in this case, the lower
bits are pretty much ignored and the result of the reciprocal_scale is 0
or close to 0, which is what's causing the collisions in fq.

Maybe the assumption that the hash should be distributed across the full
32 bit range should be documented somewhere :)

>> In addition to those properties, fq needs the hash to be

>> cryptographically secure, so that it can use reciprocal_scale to sort

>> flows into buckets without allowing an attacker to craft collisions.

>> That's also the reason why it used to use skb_get_hash_perturb with a

>> random perturbation until we got software hashes based on siphash.

>>

>> I think it's safe to assume that most hardware out there will not

>> provide collision resistant hashes, so in my opinion fq cannot rely on a

>> hardware hash. We don't need to go around and change all places that use

>> the hash, just those that assume a collision resistant one.

> 

> I did a quick grep-based survey of uses of skb_get_hash() outside

> drivers - this is what I found (with my interpretations of what they're

> used for):

> 

> net/core/dev.c           : skb_tx_hash() - selecting TX queue w/reciprocal scale

> net/core/dev.c           : RX flow steering, flow limiting

> net/core/dev.c           : GRO

> net/core/filter.c        : BPF helper

> include/net/ip_tunnels.h : flowi4_multipath_hash - so multipath selection?

> net/ipv{4,6}/route.c     : multipath hashing (if l4)

> net/ipv6/seg6_iptunnel   : building flow labels

> net/mac80211/tx.c        : FQ

> net/mptcp/syncookies     : storing cookies (XOR w/net_hash_mix())

> net/netfilter/nft_hash.c : symhash input (seems to be load balancing)

> net/openvswitch          : flow hashing and actions

> net/packet/af_packet.c   : PACKET_FANOUT_HASH

> net/sched/sch_*.c        : flow hashing for queueing

> 

> Apart from GRO it's not obvious to me that a trivially

> attacker-controlled hash is safe in any of those uses?

I looked at some of those uses you mentioned here.
Most of them fit into 2 categories:
1. Sort into power-of-2 buckets and use hash & (size-1), effectively
using the lower bits only.
2. Use reciprocal_scale - effectively using the higher bits only.
For the hash that my hw is reporting, type 1 is working and type 2 is
broken.

So it seems to me that the solution would involve running a simple hash
on the 14 bit values to get the bits distributed to the full 32 bit
range without adding too much bias.
I will do this in the driver and drop this patch.

Thanks for looking into this,

- Felix
Toke Høiland-Jørgensen Dec. 18, 2020, 3:49 p.m. UTC | #9
Felix Fietkau <nbd@nbd.name> writes:

> On 2020-12-18 13:41, Toke Høiland-Jørgensen wrote:

>> Felix Fietkau <nbd@nbd.name> writes:

>> 

>>> On 2020-12-17 18:26, Toke Høiland-Jørgensen wrote:

>>>> Felix Fietkau <nbd@nbd.name> writes:

>>>>> If this becomes a problem, I think we should add a similar patch to

>>>>> wireguard, which already calls skb_get_hash before encapsulating.

>>>>> Other regular tunnels should already get a proper hash, since the flow

>>>>> dissector will take care of it.

>>>> 

>>>> But then we'd need to go around adding this to all the places that uses

>>>> the hash just to work around a particular piece of broken(ish) hardware.

>>>> And we're hard-coding a behaviour in mac80211 that means we'll *always*

>>>> recompute the hash, even for hardware that's not similarly broken.

>>>> 

>>>>> The reason I did this patch is because I have a patch to set the hw flow

>>>>> hash in the skb on mtk_eth_soc, which does help GRO, but leads to

>>>>> collisions on mac80211 fq.

>>>> 

>>>> So wouldn't the right thing to do here be to put a flag into the RX

>>>> device that makes the stack clear the hash after using it for GRO?

>>> I don't think the hardware is broken, I think fq is simply making

>>> assumptions about the hash that aren't met by the hw.

>>>

>>> The documentation in include/linux/skbuff.h mentions these requirements

>>> for the skb hash:

>>>  * 1) Two packets in different flows have different hash values

>>>  * 2) Two packets in the same flow should have the same hash value

>>>

>>> FWIW, I think the 'should' from 2) probably belongs to 1), otherwise it

>>> makes no sense. Two packets of the flow must return the same hash,

>>> otherwise the hash is broken. I'm assuming this is a typo.

>> 

>> There's some text further down indicating this is deliberate:

>> 

>>  * A driver may indicate a hash level which is less specific than the

>>  * actual layer the hash was computed on. For instance, a hash computed

>>  * at L4 may be considered an L3 hash. This should only be done if the

>>  * driver can't unambiguously determine that the HW computed the hash at

>>  * the higher layer. Note that the "should" in the second property above

>>  * permits this.

>> 

>> So the way I'm reading that whole section, either the intent is that

>> both properties should be fulfilled, or that the first one (being

>> collision-free) is more important...

> A hash - by definition - cannot be collision free.

> But that's beside the point. On my hw, the hash itself seems collision

> free for the flows that I'm pushing, but the result of the

> reciprocal_scale isn't.

> I took another look and figured out the reason for that:

> The hw delivers a 14 bit hash. reciprocal_scale assumes that the values

> are distributed across the full 32 bit range. So in this case, the lower

> bits are pretty much ignored and the result of the reciprocal_scale is 0

> or close to 0, which is what's causing the collisions in fq.


Ah, right, that makes sense!

> Maybe the assumption that the hash should be distributed across the full

> 32 bit range should be documented somewhere :)


Yeah, I agree. Maybe just updating that comment in skbuff.h? Do you want
to fold such an update into your series? Otherwise I can send a patch
once net-next opens...

>>> In addition to those properties, fq needs the hash to be

>>> cryptographically secure, so that it can use reciprocal_scale to sort

>>> flows into buckets without allowing an attacker to craft collisions.

>>> That's also the reason why it used to use skb_get_hash_perturb with a

>>> random perturbation until we got software hashes based on siphash.

>>>

>>> I think it's safe to assume that most hardware out there will not

>>> provide collision resistant hashes, so in my opinion fq cannot rely on a

>>> hardware hash. We don't need to go around and change all places that use

>>> the hash, just those that assume a collision resistant one.

>> 

>> I did a quick grep-based survey of uses of skb_get_hash() outside

>> drivers - this is what I found (with my interpretations of what they're

>> used for):

>> 

>> net/core/dev.c           : skb_tx_hash() - selecting TX queue w/reciprocal scale

>> net/core/dev.c           : RX flow steering, flow limiting

>> net/core/dev.c           : GRO

>> net/core/filter.c        : BPF helper

>> include/net/ip_tunnels.h : flowi4_multipath_hash - so multipath selection?

>> net/ipv{4,6}/route.c     : multipath hashing (if l4)

>> net/ipv6/seg6_iptunnel   : building flow labels

>> net/mac80211/tx.c        : FQ

>> net/mptcp/syncookies     : storing cookies (XOR w/net_hash_mix())

>> net/netfilter/nft_hash.c : symhash input (seems to be load balancing)

>> net/openvswitch          : flow hashing and actions

>> net/packet/af_packet.c   : PACKET_FANOUT_HASH

>> net/sched/sch_*.c        : flow hashing for queueing

>> 

>> Apart from GRO it's not obvious to me that a trivially

>> attacker-controlled hash is safe in any of those uses?

> I looked at some of those uses you mentioned here.

> Most of them fit into 2 categories:

> 1. Sort into power-of-2 buckets and use hash & (size-1), effectively

> using the lower bits only.

> 2. Use reciprocal_scale - effectively using the higher bits only.

> For the hash that my hw is reporting, type 1 is working and type 2 is

> broken.

>

> So it seems to me that the solution would involve running a simple hash

> on the 14 bit values to get the bits distributed to the full 32 bit

> range without adding too much bias.

> I will do this in the driver and drop this patch.


Yes, this seems like a reasonable solution; great!

> Thanks for looking into this,


You're welcome :)

-Toke
diff mbox series

Patch

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 6422da6690f7..f1c934f21d7e 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3937,7 +3937,8 @@  void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 	if (local->ops->wake_tx_queue) {
 		u16 queue = __ieee80211_select_queue(sdata, sta, skb);
 		skb_set_queue_mapping(skb, queue);
-		skb_get_hash(skb);
+		if (!skb->sw_hash)
+			__skb_get_hash(skb);
 	}
 
 	if (sta) {
@@ -4191,7 +4192,8 @@  static void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata,
 	if (local->ops->wake_tx_queue) {
 		u16 queue = __ieee80211_select_queue(sdata, sta, skb);
 		skb_set_queue_mapping(skb, queue);
-		skb_get_hash(skb);
+		if (!skb->sw_hash)
+			__skb_get_hash(skb);
 	}
 
 	if (unlikely(test_bit(SCAN_SW_SCANNING, &local->scanning)) &&