diff mbox series

[net-next,v3,2/8] taprio: Add support for frame preemption offload

Message ID 20210122224453.4161729-3-vinicius.gomes@intel.com
State New
Headers show
Series [net-next,v3,1/8] ethtool: Add support for configuring frame preemption | expand

Commit Message

Vinicius Costa Gomes Jan. 22, 2021, 10:44 p.m. UTC
Adds a way to configure which traffic classes are marked as
preemptible and which are marked as express.

Even if frame preemption is not a "real" offload, because it can't be
executed purely in software, having this information near where the
mapping of traffic classes to queues is specified, makes it,
hopefully, easier to use.

taprio will receive the information of which traffic classes are
marked as express/preemptible, and when offloading frame preemption to
the driver will convert the information, so the driver receives which
queues are marked as express/preemptible.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
 include/linux/netdevice.h      |  1 +
 include/net/pkt_sched.h        |  4 ++++
 include/uapi/linux/pkt_sched.h |  1 +
 net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----
 4 files changed, 44 insertions(+), 5 deletions(-)

Comments

Vladimir Oltean Jan. 26, 2021, 12:09 a.m. UTC | #1
On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote:
> +	/* It's valid to enable frame preemption without any kind of

> +	 * offloading being enabled, so keep it separated.

> +	 */

> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {

> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);

> +		struct tc_preempt_qopt_offload qopt = { };

> +

> +		if (preempt == U32_MAX) {

> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");

> +			err = -EINVAL;

> +			goto free_sched;

> +		}

> +

> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);

> +

> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,

> +						    &qopt);

> +		if (err)

> +			goto free_sched;

> +

> +		q->preemptible_tcs = preempt;

> +	}

> +


First I'm interested in the means: why check for preempt == U32_MAX when
you determine that all traffic classes are preemptible? What if less
than 32 traffic classes are used by the netdev? The check will be
bypassed, won't it?

Secondly, why should at least one queue be preemptible? What's wrong
with frame preemption being triggered by a tc-taprio window smaller than
the packet size? This can happen regardless of traffic class.
Vinicius Costa Gomes Jan. 29, 2021, 9:13 p.m. UTC | #2
Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote:

>> +	/* It's valid to enable frame preemption without any kind of

>> +	 * offloading being enabled, so keep it separated.

>> +	 */

>> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {

>> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);

>> +		struct tc_preempt_qopt_offload qopt = { };

>> +

>> +		if (preempt == U32_MAX) {

>> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");

>> +			err = -EINVAL;

>> +			goto free_sched;

>> +		}

>> +

>> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);

>> +

>> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,

>> +						    &qopt);

>> +		if (err)

>> +			goto free_sched;

>> +

>> +		q->preemptible_tcs = preempt;

>> +	}

>> +

>

> First I'm interested in the means: why check for preempt == U32_MAX when

> you determine that all traffic classes are preemptible? What if less

> than 32 traffic classes are used by the netdev? The check will be

> bypassed, won't it?


Good catch :-)

I wanted to have this (at least one express queue) handled in a
centralized way, but perhaps this should be handled best per driver.

>

> Secondly, why should at least one queue be preemptible? What's wrong

> with frame preemption being triggered by a tc-taprio window smaller than

> the packet size? This can happen regardless of traffic class.


It's the opposite, at least one queue needs to be marked
express/non-preemptible. But as I said above, perhaps this should be
handled in a per-driver way. I will remove this from taprio.

I think removing this check/limitation from taprio should solve the
second part of your question, right?


Cheers,
-- 
Vinicius
Jakub Kicinski Jan. 29, 2021, 9:57 p.m. UTC | #3
On Fri, 29 Jan 2021 13:13:24 -0800 Vinicius Costa Gomes wrote:
> Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> 

> > On Fri, Jan 22, 2021 at 02:44:47PM -0800, Vinicius Costa Gomes wrote:  

> >> +	/* It's valid to enable frame preemption without any kind of

> >> +	 * offloading being enabled, so keep it separated.

> >> +	 */

> >> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {

> >> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);

> >> +		struct tc_preempt_qopt_offload qopt = { };

> >> +

> >> +		if (preempt == U32_MAX) {

> >> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");

> >> +			err = -EINVAL;

> >> +			goto free_sched;

> >> +		}

> >> +

> >> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);

> >> +

> >> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,

> >> +						    &qopt);

> >> +		if (err)

> >> +			goto free_sched;

> >> +

> >> +		q->preemptible_tcs = preempt;

> >> +	}

> >> +  

> >

> > First I'm interested in the means: why check for preempt == U32_MAX when

> > you determine that all traffic classes are preemptible? What if less

> > than 32 traffic classes are used by the netdev? The check will be

> > bypassed, won't it?  

> 

> Good catch :-)

> 

> I wanted to have this (at least one express queue) handled in a

> centralized way, but perhaps this should be handled best per driver.


Centralized is good. Much easier than reviewing N drivers to make sure
they all behave the same, and right.
Vinicius Costa Gomes Jan. 29, 2021, 11:12 p.m. UTC | #4
Jakub Kicinski <kuba@kernel.org> writes:
>> > First I'm interested in the means: why check for preempt == U32_MAX when

>> > you determine that all traffic classes are preemptible? What if less

>> > than 32 traffic classes are used by the netdev? The check will be

>> > bypassed, won't it?  

>> 

>> Good catch :-)

>> 

>> I wanted to have this (at least one express queue) handled in a

>> centralized way, but perhaps this should be handled best per driver.

>

> Centralized is good. Much easier than reviewing N drivers to make sure

> they all behave the same, and right.


The issue is that it seems that not all drivers/hw have the same
limitation: that at least one queue needs to be configured as
express/not preemptible.

That's the point I was trying to make when I suggested for the check to
be done per-driver, different limitations.


Cheers,
-- 
Vinicius
Vladimir Oltean Jan. 29, 2021, 11:20 p.m. UTC | #5
On Fri, Jan 29, 2021 at 01:13:24PM -0800, Vinicius Costa Gomes wrote:
> > Secondly, why should at least one queue be preemptible? What's wrong

> > with frame preemption being triggered by a tc-taprio window smaller than

> > the packet size? This can happen regardless of traffic class.

>

> It's the opposite, at least one queue needs to be marked

> express/non-preemptible.


I meant to ask why should at least one queue be express. The second part
of the question remains valid.

> But as I said above, perhaps this should be handled in a per-driver

> way. I will remove this from taprio.

>

> I think removing this check/limitation from taprio should solve the

> second part of your question, right?


Nope. Can you point me to either 802.1Q or 802.3 saying that at least
one priority should go to the express MAC?
Vinicius Costa Gomes Jan. 29, 2021, 11:42 p.m. UTC | #6
Vladimir Oltean <vladimir.oltean@nxp.com> writes:

> On Fri, Jan 29, 2021 at 01:13:24PM -0800, Vinicius Costa Gomes wrote:

>> > Secondly, why should at least one queue be preemptible? What's wrong

>> > with frame preemption being triggered by a tc-taprio window smaller than

>> > the packet size? This can happen regardless of traffic class.

>>

>> It's the opposite, at least one queue needs to be marked

>> express/non-preemptible.

>

> I meant to ask why should at least one queue be express. The second part

> of the question remains valid.

>

>> But as I said above, perhaps this should be handled in a per-driver

>> way. I will remove this from taprio.

>>

>> I think removing this check/limitation from taprio should solve the

>> second part of your question, right?

>

> Nope. Can you point me to either 802.1Q or 802.3 saying that at least

> one priority should go to the express MAC?


After re-reading Anex Q, I know it's informative, and
thinking/remembering things a bit better, it seems that the standard
only defines preemption of express queues/priorities over preemptible
traffic. The standard doesn't talk about preemptible pririoties
preempting other preemptible priorities.

So, if there's no express queue, no preemption is going to happen, so it
shouldn't be enabled, to avoid like an invalid/useless state.

So I am going to take back my previous email: this seems like it's
better to be kept in a centralized place.


Cheers,
-- 
Vinicius
Vladimir Oltean Jan. 30, 2021, 12:25 a.m. UTC | #7
On Fri, Jan 29, 2021 at 03:42:05PM -0800, Vinicius Costa Gomes wrote:
> >> But as I said above, perhaps this should be handled in a per-driver

> >> way. I will remove this from taprio.

> >>

> >> I think removing this check/limitation from taprio should solve the

> >> second part of your question, right?

> >

> > Nope. Can you point me to either 802.1Q or 802.3 saying that at least

> > one priority should go to the express MAC?

> 

> After re-reading Anex Q, I know it's informative, and

> thinking/remembering things a bit better, it seems that the standard

> only defines preemption of express queues/priorities over preemptible

> traffic. The standard doesn't talk about preemptible pririoties

> preempting other preemptible priorities.

> 

> So, if there's no express queue, no preemption is going to happen, so it

> shouldn't be enabled, to avoid like an invalid/useless state.

> 

> So I am going to take back my previous email: this seems like it's

> better to be kept in a centralized place.


Sorry, what?
If you go to the Figure 99-5 Transmit Processing state diagram in IEEE
802.3 (which is what a hardware designer will actually implement), the
pMAC's transition from the PREEMPTABLE_TX state to TX_MCRC is gated by
the "preempt" variable.

Whose definition is:

Boolean that is TRUE when a preemptable packet is to be preempted.
The value of preempt is:
pAllow * (eTx + hold) * preemptableFragSize * MIN_REMAIN

(where * is logical AND, + is logical OR - editor's note).

The important part is (eTx + hold). It means that either the eMAC wants
to transmit, or a HOLD request has been emitted. Otherwise said, HOLD
requests should always trigger a preemption regardless of whether the
eMAC has packets to send or not.

I believe that Michael Teener's peristaltic shaper which is described in
Annex T "Cyclic queuing and forwarding" (right below what you linked me)
makes use of frame preemption triggered by HOLD requests, in order to
reduce the interference time as much as possible.
That's the use case I was actually thinking of when I asked. See slide
11 here:
https://www.ieee802.org/1/files/public/docs2012/new-avb-mjt-back-to-the-future-1112-v01.pdf
Jakub Kicinski Jan. 30, 2021, 12:27 a.m. UTC | #8
On Fri, 29 Jan 2021 15:12:58 -0800 Vinicius Costa Gomes wrote:
> Jakub Kicinski <kuba@kernel.org> writes:

> >> Good catch :-)

> >> 

> >> I wanted to have this (at least one express queue) handled in a

> >> centralized way, but perhaps this should be handled best per driver.  

> >

> > Centralized is good. Much easier than reviewing N drivers to make sure

> > they all behave the same, and right.  

> 

> The issue is that it seems that not all drivers/hw have the same

> limitation: that at least one queue needs to be configured as

> express/not preemptible.


Oh, I thought that was something driven by the standard.
For HW specific checks driver doing it is obviously fine.

> That's the point I was trying to make when I suggested for the check to

> be done per-driver, different limitations.
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef517254367d..b9ec0a2007b3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -858,6 +858,7 @@  enum tc_setup_type {
 	TC_SETUP_QDISC_ETS,
 	TC_SETUP_QDISC_TBF,
 	TC_SETUP_QDISC_FIFO,
+	TC_SETUP_PREEMPT,
 };
 
 /* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 15b1b30f454e..be5ff1535332 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -183,6 +183,10 @@  struct tc_taprio_qopt_offload {
 	struct tc_taprio_sched_entry entries[];
 };
 
+struct tc_preempt_qopt_offload {
+	u32 preemptible_queues;
+};
+
 /* Reference counting */
 struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
 						  *offload);
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 9e7c2c607845..9ca9d2e55557 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1240,6 +1240,7 @@  enum {
 	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
 	TCA_TAPRIO_ATTR_FLAGS, /* u32 */
 	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
+	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */
 	__TCA_TAPRIO_ATTR_MAX,
 };
 
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8287894541e3..41503aff6572 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -64,6 +64,7 @@  struct taprio_sched {
 	struct Qdisc **qdiscs;
 	struct Qdisc *root;
 	u32 flags;
+	u32 preemptible_tcs;
 	enum tk_offsets tk_offset;
 	int clockid;
 	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+
@@ -776,6 +777,7 @@  static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
 	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
 	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },
 	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },
+	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },
 };
 
 static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
@@ -1268,6 +1270,7 @@  static int taprio_disable_offload(struct net_device *dev,
 				  struct netlink_ext_ack *extack)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
+	struct tc_preempt_qopt_offload preempt = { };
 	struct tc_taprio_qopt_offload *offload;
 	int err;
 
@@ -1286,13 +1289,15 @@  static int taprio_disable_offload(struct net_device *dev,
 	offload->enable = 0;
 
 	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
-	if (err < 0) {
+	if (err < 0)
 		NL_SET_ERR_MSG(extack,
-			       "Device failed to disable offload");
-		goto out;
-	}
+			       "Device failed to disable taprio offload");
+
+	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);
+	if (err < 0)
+		NL_SET_ERR_MSG(extack,
+			       "Device failed to disable frame preemption offload");
 
-out:
 	taprio_offload_free(offload);
 
 	return err;
@@ -1509,6 +1514,29 @@  static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 					       mqprio->prio_tc_map[i]);
 	}
 
+	/* It's valid to enable frame preemption without any kind of
+	 * offloading being enabled, so keep it separated.
+	 */
+	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {
+		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);
+		struct tc_preempt_qopt_offload qopt = { };
+
+		if (preempt == U32_MAX) {
+			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");
+			err = -EINVAL;
+			goto free_sched;
+		}
+
+		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);
+
+		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,
+						    &qopt);
+		if (err)
+			goto free_sched;
+
+		q->preemptible_tcs = preempt;
+	}
+
 	if (FULL_OFFLOAD_IS_ENABLED(q->flags))
 		err = taprio_enable_offload(dev, q, new_admin, extack);
 	else
@@ -1665,6 +1693,7 @@  static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	 */
 	q->clockid = -1;
 	q->flags = TAPRIO_FLAGS_INVALID;
+	q->preemptible_tcs = U32_MAX;
 
 	spin_lock(&taprio_list_lock);
 	list_add(&q->taprio_list, &taprio_list);
@@ -1848,6 +1877,10 @@  static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
 		goto options_error;
 
+	if (q->preemptible_tcs != U32_MAX &&
+	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))
+		goto options_error;
+
 	if (q->txtime_delay &&
 	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
 		goto options_error;