diff mbox series

[1/7] net/fq_impl: bulk-free packets from a flow on overmemory

Message ID 20201216204316.44498-1-nbd@nbd.name
State Superseded
Headers show
Series [1/7] net/fq_impl: bulk-free packets from a flow on overmemory | expand

Commit Message

Felix Fietkau Dec. 16, 2020, 8:43 p.m. UTC
This is similar to what sch_fq_codel does. It also amortizes the worst
case cost of a follow-up patch that changes the selection of the biggest
flow for dropping packets

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 include/net/fq_impl.h | 55 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 11 deletions(-)

Comments

Johannes Berg Dec. 16, 2020, 8:59 p.m. UTC | #1
On Wed, 2020-12-16 at 21:43 +0100, Felix Fietkau wrote:
> 
> +	u32 *flows_bitmap;

> +	fq->flows_bitmap[idx / 32] &= ~BIT(idx % 32);

> +	for (base = 0; base < fq->flows_cnt; base += 32) {
> +		u32 mask = fq->flows_bitmap[base / 32];

This all seems a little awkward, why not use unsigned long * and
<linux/bitops.h>?

The &=~ BIT() thing above is basically __clear_bit() then, the loops
later are basically for_each_set_bit() if I'm reading things right.

> +	if (!flow->backlog) {
> +		if (flow != &tin->default_flow)
> +			fq->flows_bitmap[idx / 32] |= BIT(idx % 32);

That could be __set_bit()

> +	fq->flows_bitmap = kcalloc(DIV_ROUND_UP(fq->flows_cnt, 32), sizeof(u32),
> +				   GFP_KERNEL);

And that would just use BITS_TO_BYTES()?

johannes
Johannes Berg Dec. 16, 2020, 9:04 p.m. UTC | #2
Wait, another thing:

> +++ b/net/mac80211/iface.c
> @@ -798,10 +798,21 @@ static bool ieee80211_set_sdata_offload_flags(struct ieee80211_sub_if_data *sdat
>  		flags &= ~IEEE80211_OFFLOAD_ENCAP_ENABLED;
>  	}
>  
> +	if (ieee80211_hw_check(&local->hw, SUPPORTS_RX_DECAP_OFFLOAD) &&
> +	    ieee80211_iftype_supports_encap_offload(sdata->vif.type)) {
> +		flags |= IEEE80211_OFFLOAD_DECAP_ENABLED;

Why does decap depend on encap here?

johannes
Felix Fietkau Dec. 16, 2020, 9:19 p.m. UTC | #3
On 2020-12-16 22:03, Johannes Berg wrote:
> On Wed, 2020-12-16 at 21:43 +0100, Felix Fietkau wrote:
>> 
>> +	offload = assign &&
>> +		  (sdata->vif.offload_flags & IEEE80211_OFFLOAD_DECAP_ENABLED);
>> +
>> +	if (offload)
>> +		set_offload = !test_and_set_sta_flag(sta, WLAN_STA_DECAP_OFFLOAD);
>> +	else
>> +		set_offload = test_and_clear_sta_flag(sta, WLAN_STA_DECAP_OFFLOAD);
>> +
>> +	if (set_offload)
>> +		drv_sta_set_decap_offload(local, sdata, &sta->sta, assign);
> 
> Some of these lines look a bit long?
Just a tiny bit over 80 characters. Wasn't the 80 characters line limit
removed a while back? I don't think line wrapping would make things more
readable here.

>> -	skb = ieee80211_rx_monitor(local, skb, rate);
>> +	if (!(status->flag & RX_FLAG_8023))
>> +		skb = ieee80211_rx_monitor(local, skb, rate);
> 
> Is that worth the check? You basically disable it anyway if monitor
> interfaces are there.
There could be a race. The driver or hw might have queued up some 802.3
frames after offload was disabled.

> Not sure that's really the right thing to do ... we often want monitor
> interfaces (with no flags set) for debug?
> 
> Or maybe we should add some tracing then (instead).
Tracing probably makes more sense. I'm not sure pcap or radiotap can
deal with a mix of 802.3 and 802.11 packets. Leaving offload enabled and
silently dropping 802.3 packets seems like a bad idea to me as well.

- Felix
Johannes Berg Dec. 17, 2020, 8:08 a.m. UTC | #4
On Wed, 2020-12-16 at 22:19 +0100, Felix Fietkau wrote:
> On 2020-12-16 22:03, Johannes Berg wrote:

> > On Wed, 2020-12-16 at 21:43 +0100, Felix Fietkau wrote:

> > > +	offload = assign &&

> > > +		  (sdata->vif.offload_flags & IEEE80211_OFFLOAD_DECAP_ENABLED);

> > > +

> > > +	if (offload)

> > > +		set_offload = !test_and_set_sta_flag(sta, WLAN_STA_DECAP_OFFLOAD);

> > > +	else

> > > +		set_offload = test_and_clear_sta_flag(sta, WLAN_STA_DECAP_OFFLOAD);

> > > +

> > > +	if (set_offload)

> > > +		drv_sta_set_decap_offload(local, sdata, &sta->sta, assign);

> > 

> > Some of these lines look a bit long?

> Just a tiny bit over 80 characters. Wasn't the 80 characters line limit

> removed a while back? I don't think line wrapping would make things more

> readable here.


Ok, fair point, I didn't count :-)

> > Not sure that's really the right thing to do ... we often want monitor

> > interfaces (with no flags set) for debug?

> > 

> > Or maybe we should add some tracing then (instead).

> Tracing probably makes more sense. I'm not sure pcap or radiotap can

> deal with a mix of 802.3 and 802.11 packets. Leaving offload enabled and

> silently dropping 802.3 packets seems like a bad idea to me as well.


Right. I've long felt that perhaps we should have tracing for this,
rather than relying on the extra monitor interface, but the monitor
interface is oh so convenient since you can directly use the result for
wireshark etc. :)

I guess I don't really care that much either way. Going with your
approach seems reasonable, but people will have to be aware that "just
add a monitor interface" is now going to affect things more than it used
to.

johannes
Toke Høiland-Jørgensen Dec. 17, 2020, 11:51 a.m. UTC | #5
Felix Fietkau <nbd@nbd.name> writes:

> This is similar to what sch_fq_codel does. It also amortizes the worst

> case cost of a follow-up patch that changes the selection of the biggest

> flow for dropping packets

>

> Signed-off-by: Felix Fietkau <nbd@nbd.name>


Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Toke Høiland-Jørgensen Dec. 17, 2020, 11:55 a.m. UTC | #6
Felix Fietkau <nbd@nbd.name> writes:

> Simplifies the code and prepares for a rework of scanning for flows on

> overmemory drop.

>

> Signed-off-by: Felix Fietkau <nbd@nbd.name>


This seems reasonable.

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Toke Høiland-Jørgensen Dec. 17, 2020, 12:09 p.m. UTC | #7
Felix Fietkau <nbd@nbd.name> writes:

> On 2020-12-16 21:54, Johannes Berg wrote:

>> On Wed, 2020-12-16 at 21:43 +0100, Felix Fietkau wrote:

>>> 

>>> +static int fq_flow_drop(struct fq *fq, struct fq_flow *flow,

>>> +			fq_skb_free_t free_func)

>>> +{

>>> +	unsigned int packets = 0, bytes = 0, truesize = 0;

>>> +	struct fq_tin *tin = flow->tin;

>>> +	struct sk_buff *skb;

>>> +	int pending;

>>> +

>>> +	lockdep_assert_held(&fq->lock);

>>> +

>>> +	pending = min_t(int, 32, skb_queue_len(&flow->queue) / 2);

>>> 

>> 

>> Why 32?

> I guess I forgot got make it configurable. sch_fq_codel uses 64, but

> that seemed a bit excessive to me.


I'm not sure it's worth a configuration knob. It's basically an
arbitrary choice anyway, and only kicks in when an unresponsive flows
keeps flooding a queue to the point of overflow (if it's many smaller
flows the "never drop more than half a flow's backlog" should keep it
from being excessive).

This (hopefully) only happens relatively rarely and hitting it with a
really big hammer is the right thing to do in such a case to keep the
box from falling over. Not sure if 32 or 64 makes much difference; guess
it depends on the CPU-to-bandwidth ratio of the particular machine.

-Toke
Felix Fietkau Dec. 17, 2020, 12:40 p.m. UTC | #8
On 2020-12-16 21:59, Johannes Berg wrote:
> On Wed, 2020-12-16 at 21:43 +0100, Felix Fietkau wrote:

>> 

>> +	u32 *flows_bitmap;

> 

>> +	fq->flows_bitmap[idx / 32] &= ~BIT(idx % 32);

> 

>> +	for (base = 0; base < fq->flows_cnt; base += 32) {

>> +		u32 mask = fq->flows_bitmap[base / 32];

> 

> This all seems a little awkward, why not use unsigned long * and

> <linux/bitops.h>?

> 

> The &=~ BIT() thing above is basically __clear_bit() then, the loops

> later are basically for_each_set_bit() if I'm reading things right.

I guess I can simplify this some more. I think I avoided
for_each_set_bit for performance reasons to keep things inline in the
loop, but I'm not sure how much that matters in practice.

- Felix
diff mbox series

Patch

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index e73d74d2fabf..06d2a79233c9 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -11,17 +11,25 @@ 
 
 /* functions that are embedded into includer */
 
+
+static void
+__fq_adjust_removal(struct fq *fq, struct fq_flow *flow, unsigned int packets,
+		    unsigned int bytes, unsigned int truesize)
+{
+	struct fq_tin *tin = flow->tin;
+
+	tin->backlog_bytes -= bytes;
+	tin->backlog_packets -= packets;
+	flow->backlog -= bytes;
+	fq->backlog -= packets;
+	fq->memory_usage -= truesize;
+}
+
 static void fq_adjust_removal(struct fq *fq,
 			      struct fq_flow *flow,
 			      struct sk_buff *skb)
 {
-	struct fq_tin *tin = flow->tin;
-
-	tin->backlog_bytes -= skb->len;
-	tin->backlog_packets--;
-	flow->backlog -= skb->len;
-	fq->backlog--;
-	fq->memory_usage -= skb->truesize;
+	__fq_adjust_removal(fq, flow, 1, skb->len, skb->truesize);
 }
 
 static void fq_rejigger_backlog(struct fq *fq, struct fq_flow *flow)
@@ -59,6 +67,34 @@  static struct sk_buff *fq_flow_dequeue(struct fq *fq,
 	return skb;
 }
 
+static int fq_flow_drop(struct fq *fq, struct fq_flow *flow,
+			fq_skb_free_t free_func)
+{
+	unsigned int packets = 0, bytes = 0, truesize = 0;
+	struct fq_tin *tin = flow->tin;
+	struct sk_buff *skb;
+	int pending;
+
+	lockdep_assert_held(&fq->lock);
+
+	pending = min_t(int, 32, skb_queue_len(&flow->queue) / 2);
+	do {
+		skb = __skb_dequeue(&flow->queue);
+		if (!skb)
+			break;
+
+		packets++;
+		bytes += skb->len;
+		truesize += skb->truesize;
+		free_func(fq, tin, flow, skb);
+	} while (packets < pending);
+
+	__fq_adjust_removal(fq, flow, packets, bytes, truesize);
+	fq_rejigger_backlog(fq, flow);
+
+	return packets;
+}
+
 static struct sk_buff *fq_tin_dequeue(struct fq *fq,
 				      struct fq_tin *tin,
 				      fq_tin_dequeue_t dequeue_func)
@@ -190,12 +226,9 @@  static void fq_tin_enqueue(struct fq *fq,
 		if (!flow)
 			return;
 
-		skb = fq_flow_dequeue(fq, flow);
-		if (!skb)
+		if (!fq_flow_drop(fq, flow, free_func))
 			return;
 
-		free_func(fq, flow->tin, flow, skb);
-
 		flow->tin->overlimit++;
 		fq->overlimit++;
 		if (oom) {