diff mbox series

[net-next,1/7] net: sched: Add a trap-and-forward action

Message ID 20210408133829.2135103-2-petrm@nvidia.com
State New
Headers show
Series tc: Introduce a trap-and-forward action | expand

Commit Message

Petr Machata April 8, 2021, 1:38 p.m. UTC
The TC action "trap" is used to instruct the HW datapath to drop the
matched packet and transfer it for processing in the SW pipeline. If
instead it is desirable to forward the packet and transferring a _copy_ to
the SW pipeline, there is no practical way to achieve that.

To that end add a new generic action, trap_fwd. In the software pipeline,
it is equivalent to an OK. When offloading, it should forward the packet to
the host, but unlike trap it should not drop the packet.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 include/uapi/linux/pkt_cls.h       |  6 +++++-
 net/core/dev.c                     |  2 ++
 net/sched/act_bpf.c                | 13 +++++++++++--
 net/sched/cls_bpf.c                |  1 +
 net/sched/sch_dsmark.c             |  1 +
 tools/include/uapi/linux/pkt_cls.h |  6 +++++-
 6 files changed, 25 insertions(+), 4 deletions(-)

Comments

Petr Machata April 9, 2021, 11:03 a.m. UTC | #1
Jamal Hadi Salim <jhs@mojatatu.com> writes:

> I am concerned about adding new opcodes which only make sense if you
> offload (or make sense only if you are running in s/w).
>
> Those opcodes are intended to be generic abstractions so the dispatcher
> can decide what to do next. Adding things that are specific only
> to scenarios of hardware offload removes that opaqueness.
> I must have missed the discussion on ACT_TRAP because it is the
> same issue there i.e shouldnt be an opcode. For details see:
> https://people.netfilter.org/pablo/netdev0.1/papers/Linux-Traffic-Control-Classifier-Action-Subsystem-Architecture.pdf

Trap has been in since 4.13, so 2017ish. It's done and dusted at this
point.

> IMO:
> It seems to me there are two actions here encapsulated in one.
> The first is to "trap" and the second is to "drop".
>
> This is no different semantically than say "mirror and drop"
> offload being enunciated by "skip_sw".
>
> Does the spectrum not support multiple actions?
> e.g with a policy like:
>  match blah action trap action drop skip_sw

Trap drops implicitly. We need a "trap, but don't drop". Expressed in
terms of existing actions it would be "mirred egress redirect dev
$cpu_port". But how to express $cpu_port except again by a HW-specific
magic token I don't know.
diff mbox series

Patch

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 025c40fef93d..a1bbccb88e67 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -72,7 +72,11 @@  enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
-#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
+#define TC_ACT_TRAP_FWD		9 /* For hw path, this means "send a copy
+				   * of the packet to the cpu". For sw
+				   * datapath, this is like TC_ACT_OK.
+				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP_FWD
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in
diff --git a/net/core/dev.c b/net/core/dev.c
index 9d1a8fac793f..f0b8c16dbf12 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3975,6 +3975,7 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	switch (tcf_classify(skb, miniq->filter_list, &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+	case TC_ACT_TRAP_FWD:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
@@ -5083,6 +5084,7 @@  sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 				     &cl_res, false)) {
 	case TC_ACT_OK:
 	case TC_ACT_RECLASSIFY:
+	case TC_ACT_TRAP_FWD:
 		skb->tc_index = TC_H_MIN(cl_res.classid);
 		break;
 	case TC_ACT_SHOT:
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index e48e980c3b93..be2a51c6f84e 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -54,8 +54,16 @@  static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
 		bpf_compute_data_pointers(skb);
 		filter_res = BPF_PROG_RUN(filter, skb);
 	}
-	if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
-		skb_orphan(skb);
+	if (skb_sk_is_prefetched(skb)) {
+		switch (filter_res) {
+		case TC_ACT_OK:
+		case TC_ACT_TRAP_FWD:
+			break;
+		default:
+			skb_orphan(skb);
+			break;
+		}
+	}
 	rcu_read_unlock();
 
 	/* A BPF program may overwrite the default action opcode.
@@ -72,6 +80,7 @@  static int tcf_bpf_act(struct sk_buff *skb, const struct tc_action *act,
 	case TC_ACT_PIPE:
 	case TC_ACT_RECLASSIFY:
 	case TC_ACT_OK:
+	case TC_ACT_TRAP_FWD:
 	case TC_ACT_REDIRECT:
 		action = filter_res;
 		break;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 6e3e63db0e01..5fd96cf2dca7 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -69,6 +69,7 @@  static int cls_bpf_exec_opcode(int code)
 	case TC_ACT_SHOT:
 	case TC_ACT_STOLEN:
 	case TC_ACT_TRAP:
+	case TC_ACT_TRAP_FWD:
 	case TC_ACT_REDIRECT:
 	case TC_ACT_UNSPEC:
 		return code;
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index cd2748e2d4a2..054a06bd9dc8 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -258,6 +258,7 @@  static int dsmark_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 			goto drop;
 #endif
 		case TC_ACT_OK:
+		case TC_ACT_TRAP_FWD:
 			skb->tc_index = TC_H_MIN(res.classid);
 			break;
 
diff --git a/tools/include/uapi/linux/pkt_cls.h b/tools/include/uapi/linux/pkt_cls.h
index 12153771396a..ccfa424dfeaf 100644
--- a/tools/include/uapi/linux/pkt_cls.h
+++ b/tools/include/uapi/linux/pkt_cls.h
@@ -45,7 +45,11 @@  enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
-#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
+#define TC_ACT_TRAP_FWD		9 /* For hw path, this means "send a copy
+				   * of the packet to the cpu". For sw
+				   * datapath, this is like TC_ACT_OK.
+				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP_FWD
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in