diff mbox series

[net-next] net: dsa: mv88e6xxx: Allow dynamic reconfiguration of tag protocol

Message ID 20210323102326.3677940-1-tobias@waldekranz.com
State Superseded
Headers show
Series [net-next] net: dsa: mv88e6xxx: Allow dynamic reconfiguration of tag protocol | expand

Commit Message

Tobias Waldekranz March 23, 2021, 10:23 a.m. UTC
All devices are capable of using regular DSA tags. Support for
Ethertyped DSA tags sort into three categories:

1. No support. Older chips fall into this category.

2. Full support. Datasheet explicitly supports configuring the CPU
   port to receive FORWARDs with a DSA tag.

3. Undocumented support. Datasheet lists the configuration from
   category 2 as "reserved for future use", but does empirically
   behave like a category 2 device.

Because there are ethernet controllers that do not handle regular DSA
tags in all cases, it is sometimes preferable to rely on the
undocumented behavior, as the alternative is a very crippled
system. But, in those cases, make sure to log the fact that an
undocumented feature has been enabled.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---

In a system using an NXP T1023 SoC connected to a 6390X switch, we
noticed that TO_CPU frames where not reaching the CPU. This only
happened on hardware port 8. Looking at the DSA master interface
(dpaa-ethernet) we could see that an Rx error counter was bumped at
the same rate. The logs indicated a parser error.

It just so happens that a TO_CPU coming in on device 0, port 8, will
result in the first two bytes of the DSA tag being one of:

00 40
00 44
00 46

My guess is that since these values look like 802.3 length fields, the
controller's parser will signal an error if the frame length does not
match what is in the header.

As a workaround, switching to EDSA (thereby always having a proper
EtherType in the frame) solves the issue.

 drivers/net/dsa/mv88e6xxx/chip.c | 41 +++++++++++++++++++++++++++++---
 drivers/net/dsa/mv88e6xxx/chip.h |  3 +++
 2 files changed, 41 insertions(+), 3 deletions(-)

Comments

Andrew Lunn March 23, 2021, 12:41 p.m. UTC | #1
On Tue, Mar 23, 2021 at 11:23:26AM +0100, Tobias Waldekranz wrote:
> All devices are capable of using regular DSA tags. Support for
> Ethertyped DSA tags sort into three categories:
> 
> 1. No support. Older chips fall into this category.
> 
> 2. Full support. Datasheet explicitly supports configuring the CPU
>    port to receive FORWARDs with a DSA tag.
> 
> 3. Undocumented support. Datasheet lists the configuration from
>    category 2 as "reserved for future use", but does empirically
>    behave like a category 2 device.
> 
> Because there are ethernet controllers that do not handle regular DSA
> tags in all cases, it is sometimes preferable to rely on the
> undocumented behavior, as the alternative is a very crippled
> system. But, in those cases, make sure to log the fact that an
> undocumented feature has been enabled.

Hi Tobias

I wonder if dynamic reconfiguration is the correct solution here. By
default it will be wrong for this board, and you need user space to
flip it.

Maybe a DT property would be better. Extend dsa_switch_parse_of() to
look for the optional property dsa,tag-protocol, a string containing
the name of the tag ops to be used.

    Andrew
Tobias Waldekranz March 23, 2021, 2:48 p.m. UTC | #2
On Tue, Mar 23, 2021 at 13:35, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Mar 23, 2021 at 11:23:26AM +0100, Tobias Waldekranz wrote:
>> All devices are capable of using regular DSA tags. Support for
>> Ethertyped DSA tags sort into three categories:
>> 
>> 1. No support. Older chips fall into this category.
>> 
>> 2. Full support. Datasheet explicitly supports configuring the CPU
>>    port to receive FORWARDs with a DSA tag.
>> 
>> 3. Undocumented support. Datasheet lists the configuration from
>>    category 2 as "reserved for future use", but does empirically
>>    behave like a category 2 device.
>> 
>> Because there are ethernet controllers that do not handle regular DSA
>> tags in all cases, it is sometimes preferable to rely on the
>> undocumented behavior, as the alternative is a very crippled
>> system. But, in those cases, make sure to log the fact that an
>> undocumented feature has been enabled.
>> 
>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>> ---
>> 
>> In a system using an NXP T1023 SoC connected to a 6390X switch, we
>> noticed that TO_CPU frames where not reaching the CPU. This only
>> happened on hardware port 8. Looking at the DSA master interface
>> (dpaa-ethernet) we could see that an Rx error counter was bumped at
>> the same rate. The logs indicated a parser error.
>> 
>> It just so happens that a TO_CPU coming in on device 0, port 8, will
>> result in the first two bytes of the DSA tag being one of:
>> 
>> 00 40
>> 00 44
>> 00 46
>> 
>> My guess is that since these values look like 802.3 length fields, the
>> controller's parser will signal an error if the frame length does not
>> match what is in the header.
>
> Interesting assumption.
> Could you please try this patch out, just for my amusement? It is only
> compile-tested.
>
> -----------------------------[ cut here ]-----------------------------
> From ab75b63d1bfeccc3032060e6e6dbd2ea8f1d31ed Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> Date: Tue, 23 Mar 2021 13:03:34 +0200
> Subject: [PATCH] fsl/fman: ignore RX parse errors on ports that are DSA
>  masters
>
> Tobias reports that when an FMan port receives a Marvell DSA tagged
> frame (normal/legacy tag, not EtherType tag) which is a TO_CPU frame
> coming in from device 0, port 8, that frame will be dropped.
>
> It appears that the first two bytes of this particular DSA tag (which
> overlap with what the FMan parser interprets as an EtherType/Length
> field) look like one of the possible values below:
>
> 00 40
> 00 44
> 00 46
>
> Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 65 ++++++++++---------
>  .../net/ethernet/freescale/fman/fman_port.c   |  8 ++-
>  .../net/ethernet/freescale/fman/fman_port.h   |  2 +-
>  3 files changed, 42 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index 720dc99bd1fc..069d38cd63c5 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -55,6 +55,7 @@
>  #include <linux/phy_fixed.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
> +#include <net/dsa.h>
>  #include <soc/fsl/bman.h>
>  #include <soc/fsl/qman.h>
>  #include "fman.h"
> @@ -2447,34 +2448,6 @@ static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
>  	return 0;
>  }
>  
> -static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
> -					      struct qman_fq *fq,
> -					      const struct qm_dqrr_entry *dq,
> -					      bool sched_napi)
> -{
> -	struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
> -	struct dpaa_percpu_priv *percpu_priv;
> -	struct net_device *net_dev;
> -	struct dpaa_bp *dpaa_bp;
> -	struct dpaa_priv *priv;
> -
> -	net_dev = dpaa_fq->net_dev;
> -	priv = netdev_priv(net_dev);
> -	dpaa_bp = dpaa_bpid2pool(dq->fd.bpid);
> -	if (!dpaa_bp)
> -		return qman_cb_dqrr_consume;
> -
> -	percpu_priv = this_cpu_ptr(priv->percpu_priv);
> -
> -	if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
> -		return qman_cb_dqrr_stop;
> -
> -	dpaa_eth_refill_bpools(priv);
> -	dpaa_rx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
> -
> -	return qman_cb_dqrr_consume;
> -}
> -
>  static int dpaa_xdp_xmit_frame(struct net_device *net_dev,
>  			       struct xdp_frame *xdpf)
>  {
> @@ -2699,7 +2672,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
>  		return qman_cb_dqrr_consume;
>  	}
>  
> -	if (unlikely(fd_status & FM_FD_STAT_RX_ERRORS) != 0) {
> +	if (!netdev_uses_dsa(net_dev) && unlikely(fd_status & FM_FD_STAT_RX_ERRORS) != 0) {
>  		if (net_ratelimit())
>  			netif_warn(priv, hw, net_dev, "FD status = 0x%08x\n",
>  				   fd_status & FM_FD_STAT_RX_ERRORS);
> @@ -2802,6 +2775,37 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
>  	return qman_cb_dqrr_consume;
>  }
>  
> +static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
> +					      struct qman_fq *fq,
> +					      const struct qm_dqrr_entry *dq,
> +					      bool sched_napi)
> +{
> +	struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
> +	struct dpaa_percpu_priv *percpu_priv;
> +	struct net_device *net_dev;
> +	struct dpaa_bp *dpaa_bp;
> +	struct dpaa_priv *priv;
> +
> +	net_dev = dpaa_fq->net_dev;
> +	if (netdev_uses_dsa(net_dev))
> +		return rx_default_dqrr(portal, fq, dq, sched_napi);
> +
> +	priv = netdev_priv(net_dev);
> +	dpaa_bp = dpaa_bpid2pool(dq->fd.bpid);
> +	if (!dpaa_bp)
> +		return qman_cb_dqrr_consume;
> +
> +	percpu_priv = this_cpu_ptr(priv->percpu_priv);
> +
> +	if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
> +		return qman_cb_dqrr_stop;
> +
> +	dpaa_eth_refill_bpools(priv);
> +	dpaa_rx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
> +
> +	return qman_cb_dqrr_consume;
> +}
> +
>  static enum qman_cb_dqrr_result conf_error_dqrr(struct qman_portal *portal,
>  						struct qman_fq *fq,
>  						const struct qm_dqrr_entry *dq,
> @@ -2955,6 +2959,7 @@ static int dpaa_phy_init(struct net_device *net_dev)
>  
>  static int dpaa_open(struct net_device *net_dev)
>  {
> +	bool ignore_errors = netdev_uses_dsa(net_dev);
>  	struct mac_device *mac_dev;
>  	struct dpaa_priv *priv;
>  	int err, i;
> @@ -2968,7 +2973,7 @@ static int dpaa_open(struct net_device *net_dev)
>  		goto phy_init_failed;
>  
>  	for (i = 0; i < ARRAY_SIZE(mac_dev->port); i++) {
> -		err = fman_port_enable(mac_dev->port[i]);
> +		err = fman_port_enable(mac_dev->port[i], ignore_errors);
>  		if (err)
>  			goto mac_start_failed;
>  	}
> diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
> index d9baac0dbc7d..763faec11f5c 100644
> --- a/drivers/net/ethernet/freescale/fman/fman_port.c
> +++ b/drivers/net/ethernet/freescale/fman/fman_port.c
> @@ -106,6 +106,7 @@
>  #define BMI_EBD_EN				0x80000000
>  
>  #define BMI_PORT_CFG_EN				0x80000000
> +#define BMI_PORT_CFG_FDOVR			0x02000000
>  
>  #define BMI_PORT_STATUS_BSY			0x80000000
>  
> @@ -1629,7 +1630,7 @@ int fman_port_disable(struct fman_port *port)
>  	}
>  
>  	/* Disable BMI */
> -	tmp = ioread32be(bmi_cfg_reg) & ~BMI_PORT_CFG_EN;
> +	tmp = ioread32be(bmi_cfg_reg) & ~(BMI_PORT_CFG_EN | BMI_PORT_CFG_FDOVR);
>  	iowrite32be(tmp, bmi_cfg_reg);
>  
>  	/* Wait for graceful stop end */
> @@ -1655,6 +1656,7 @@ EXPORT_SYMBOL(fman_port_disable);
>  /**
>   * fman_port_enable
>   * @port:	A pointer to a FM Port module.
> + * @ignore_errors: If set, do not discard frames received with errors.
>   *
>   * A runtime routine provided to allow disable/enable of port.
>   *
> @@ -1662,7 +1664,7 @@ EXPORT_SYMBOL(fman_port_disable);
>   *
>   * Return: 0 on success; Error code otherwise.
>   */
> -int fman_port_enable(struct fman_port *port)
> +int fman_port_enable(struct fman_port *port, bool ignore_errors)
>  {
>  	u32 __iomem *bmi_cfg_reg;
>  	u32 tmp;
> @@ -1692,6 +1694,8 @@ int fman_port_enable(struct fman_port *port)
>  
>  	/* Enable BMI */
>  	tmp = ioread32be(bmi_cfg_reg) | BMI_PORT_CFG_EN;
> +	if (ignore_errors)
> +		tmp |= BMI_PORT_CFG_FDOVR;
>  	iowrite32be(tmp, bmi_cfg_reg);
>  
>  	return 0;
> diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
> index 82f12661a46d..0928361b0e73 100644
> --- a/drivers/net/ethernet/freescale/fman/fman_port.h
> +++ b/drivers/net/ethernet/freescale/fman/fman_port.h
> @@ -147,7 +147,7 @@ int fman_port_cfg_buf_prefix_content(struct fman_port *port,
>  
>  int fman_port_disable(struct fman_port *port);
>  
> -int fman_port_enable(struct fman_port *port);
> +int fman_port_enable(struct fman_port *port, bool ignore_errors);
>  
>  u32 fman_port_get_qman_channel_id(struct fman_port *port);
>  
> -----------------------------[ cut here ]-----------------------------
>
> The netdev_uses_dsa thing is a bit trashy, I think that a more polished
> version should rather set NETIF_F_RXALL for the DSA master, and have the
> dpaa driver act upon that. But first I'm curious if it works.

It does work. Thank you!

Does setting RXALL mean that the master would accept frames with a bad
FCS as well? If so, would that mean that we would have to verify it in
software?

>> 
>> As a workaround, switching to EDSA (thereby always having a proper
>> EtherType in the frame) solves the issue.
>
> So basically every user needs to change the tag protocol manually to be
> able to receive from port 8? Not sure if that's too friendly.

No it is not friendly at all. My goal was to add it as a device-tree
property, but for reasons I will detail in my answer to Andrew, I did
not manage to figure out a good way to do that. Happy to take
suggestions.

>>  drivers/net/dsa/mv88e6xxx/chip.c | 41 +++++++++++++++++++++++++++++---
>>  drivers/net/dsa/mv88e6xxx/chip.h |  3 +++
>>  2 files changed, 41 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
>> index 95f07fcd4f85..e7ec883d5f6b 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>> @@ -2531,10 +2531,10 @@ static int mv88e6xxx_setup_port_mode(struct mv88e6xxx_chip *chip, int port)
>>  		return mv88e6xxx_set_port_mode_normal(chip, port);
>>  
>>  	/* Setup CPU port mode depending on its supported tag format */
>> -	if (chip->info->tag_protocol == DSA_TAG_PROTO_DSA)
>> +	if (chip->tag_protocol == DSA_TAG_PROTO_DSA)
>>  		return mv88e6xxx_set_port_mode_dsa(chip, port);
>>  
>> -	if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA)
>> +	if (chip->tag_protocol == DSA_TAG_PROTO_EDSA)
>>  		return mv88e6xxx_set_port_mode_edsa(chip, port);
>>  
>>  	return -EINVAL;
>> @@ -5564,7 +5564,39 @@ static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds,
>>  {
>>  	struct mv88e6xxx_chip *chip = ds->priv;
>>  
>> -	return chip->info->tag_protocol;
>> +	return chip->tag_protocol;
>> +}
>> +
>> +static int mv88e6xxx_change_tag_protocol(struct dsa_switch *ds, int port,
>> +					 enum dsa_tag_protocol proto)
>> +{
>> +	struct mv88e6xxx_chip *chip = ds->priv;
>> +	enum dsa_tag_protocol old_protocol;
>> +	int err;
>> +
>> +	switch (proto) {
>> +	case DSA_TAG_PROTO_EDSA:
>> +		if (chip->info->tag_protocol != DSA_TAG_PROTO_EDSA)
>> +			dev_warn(chip->dev, "Relying on undocumented EDSA tagging behavior\n");
>> +
>> +		break;
>> +	case DSA_TAG_PROTO_DSA:
>> +		break;
>> +	default:
>> +		return -EPROTONOSUPPORT;
>> +	}
>> +
>> +	old_protocol = chip->tag_protocol;
>> +	chip->tag_protocol = proto;
>> +
>> +	mv88e6xxx_reg_lock(chip);
>> +	err = mv88e6xxx_setup_port_mode(chip, port);
>> +	mv88e6xxx_reg_unlock(chip);
>> +
>> +	if (err)
>> +		chip->tag_protocol = old_protocol;
>> +
>> +	return err;
>>  }
>>  
>>  static int mv88e6xxx_port_mdb_add(struct dsa_switch *ds, int port,
>> @@ -6029,6 +6061,7 @@ static int mv88e6xxx_crosschip_lag_leave(struct dsa_switch *ds, int sw_index,
>>  
>>  static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
>>  	.get_tag_protocol	= mv88e6xxx_get_tag_protocol,
>> +	.change_tag_protocol	= mv88e6xxx_change_tag_protocol,
>>  	.setup			= mv88e6xxx_setup,
>>  	.teardown		= mv88e6xxx_teardown,
>>  	.phylink_validate	= mv88e6xxx_validate,
>> @@ -6209,6 +6242,8 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
>>  	if (err)
>>  		goto out;
>>  
>> +	chip->tag_protocol = chip->info->tag_protocol;
>> +
>>  	mv88e6xxx_phy_init(chip);
>>  
>>  	if (chip->info->ops->get_eeprom) {
>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
>> index bce6e0dc8535..96b775f3fda2 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.h
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
>> @@ -261,6 +261,9 @@ struct mv88e6xxx_region_priv {
>>  struct mv88e6xxx_chip {
>>  	const struct mv88e6xxx_info *info;
>>  
>> +	/* Currently configured tagging protocol */
>> +	enum dsa_tag_protocol tag_protocol;
>> +
>>  	/* The dsa_switch this private structure is related to */
>>  	struct dsa_switch *ds;
>>  
>> -- 
>> 2.25.1
>>
Florian Fainelli March 23, 2021, 4:30 p.m. UTC | #3
On 3/23/2021 7:48 AM, Tobias Waldekranz wrote:
> On Tue, Mar 23, 2021 at 13:35, Vladimir Oltean <olteanv@gmail.com> wrote:
>> On Tue, Mar 23, 2021 at 11:23:26AM +0100, Tobias Waldekranz wrote:
>>> All devices are capable of using regular DSA tags. Support for
>>> Ethertyped DSA tags sort into three categories:
>>>
>>> 1. No support. Older chips fall into this category.
>>>
>>> 2. Full support. Datasheet explicitly supports configuring the CPU
>>>    port to receive FORWARDs with a DSA tag.
>>>
>>> 3. Undocumented support. Datasheet lists the configuration from
>>>    category 2 as "reserved for future use", but does empirically
>>>    behave like a category 2 device.
>>>
>>> Because there are ethernet controllers that do not handle regular DSA
>>> tags in all cases, it is sometimes preferable to rely on the
>>> undocumented behavior, as the alternative is a very crippled
>>> system. But, in those cases, make sure to log the fact that an
>>> undocumented feature has been enabled.
>>>
>>> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
>>> ---
>>>
>>> In a system using an NXP T1023 SoC connected to a 6390X switch, we
>>> noticed that TO_CPU frames where not reaching the CPU. This only
>>> happened on hardware port 8. Looking at the DSA master interface
>>> (dpaa-ethernet) we could see that an Rx error counter was bumped at
>>> the same rate. The logs indicated a parser error.
>>>
>>> It just so happens that a TO_CPU coming in on device 0, port 8, will
>>> result in the first two bytes of the DSA tag being one of:
>>>
>>> 00 40
>>> 00 44
>>> 00 46
>>>
>>> My guess is that since these values look like 802.3 length fields, the
>>> controller's parser will signal an error if the frame length does not
>>> match what is in the header.
>>
>> Interesting assumption.
>> Could you please try this patch out, just for my amusement? It is only
>> compile-tested.
>>
>> -----------------------------[ cut here ]-----------------------------
>> From ab75b63d1bfeccc3032060e6e6dbd2ea8f1d31ed Mon Sep 17 00:00:00 2001
>> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>> Date: Tue, 23 Mar 2021 13:03:34 +0200
>> Subject: [PATCH] fsl/fman: ignore RX parse errors on ports that are DSA
>>  masters
>>
>> Tobias reports that when an FMan port receives a Marvell DSA tagged
>> frame (normal/legacy tag, not EtherType tag) which is a TO_CPU frame
>> coming in from device 0, port 8, that frame will be dropped.
>>
>> It appears that the first two bytes of this particular DSA tag (which
>> overlap with what the FMan parser interprets as an EtherType/Length
>> field) look like one of the possible values below:
>>
>> 00 40
>> 00 44
>> 00 46
>>
>> Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>> ---
>>  .../net/ethernet/freescale/dpaa/dpaa_eth.c    | 65 ++++++++++---------
>>  .../net/ethernet/freescale/fman/fman_port.c   |  8 ++-
>>  .../net/ethernet/freescale/fman/fman_port.h   |  2 +-
>>  3 files changed, 42 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
>> index 720dc99bd1fc..069d38cd63c5 100644
>> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
>> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
>> @@ -55,6 +55,7 @@
>>  #include <linux/phy_fixed.h>
>>  #include <linux/bpf.h>
>>  #include <linux/bpf_trace.h>
>> +#include <net/dsa.h>
>>  #include <soc/fsl/bman.h>
>>  #include <soc/fsl/qman.h>
>>  #include "fman.h"
>> @@ -2447,34 +2448,6 @@ static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
>>  	return 0;
>>  }
>>  
>> -static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
>> -					      struct qman_fq *fq,
>> -					      const struct qm_dqrr_entry *dq,
>> -					      bool sched_napi)
>> -{
>> -	struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
>> -	struct dpaa_percpu_priv *percpu_priv;
>> -	struct net_device *net_dev;
>> -	struct dpaa_bp *dpaa_bp;
>> -	struct dpaa_priv *priv;
>> -
>> -	net_dev = dpaa_fq->net_dev;
>> -	priv = netdev_priv(net_dev);
>> -	dpaa_bp = dpaa_bpid2pool(dq->fd.bpid);
>> -	if (!dpaa_bp)
>> -		return qman_cb_dqrr_consume;
>> -
>> -	percpu_priv = this_cpu_ptr(priv->percpu_priv);
>> -
>> -	if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
>> -		return qman_cb_dqrr_stop;
>> -
>> -	dpaa_eth_refill_bpools(priv);
>> -	dpaa_rx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
>> -
>> -	return qman_cb_dqrr_consume;
>> -}
>> -
>>  static int dpaa_xdp_xmit_frame(struct net_device *net_dev,
>>  			       struct xdp_frame *xdpf)
>>  {
>> @@ -2699,7 +2672,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
>>  		return qman_cb_dqrr_consume;
>>  	}
>>  
>> -	if (unlikely(fd_status & FM_FD_STAT_RX_ERRORS) != 0) {
>> +	if (!netdev_uses_dsa(net_dev) && unlikely(fd_status & FM_FD_STAT_RX_ERRORS) != 0) {
>>  		if (net_ratelimit())
>>  			netif_warn(priv, hw, net_dev, "FD status = 0x%08x\n",
>>  				   fd_status & FM_FD_STAT_RX_ERRORS);
>> @@ -2802,6 +2775,37 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
>>  	return qman_cb_dqrr_consume;
>>  }
>>  
>> +static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
>> +					      struct qman_fq *fq,
>> +					      const struct qm_dqrr_entry *dq,
>> +					      bool sched_napi)
>> +{
>> +	struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
>> +	struct dpaa_percpu_priv *percpu_priv;
>> +	struct net_device *net_dev;
>> +	struct dpaa_bp *dpaa_bp;
>> +	struct dpaa_priv *priv;
>> +
>> +	net_dev = dpaa_fq->net_dev;
>> +	if (netdev_uses_dsa(net_dev))
>> +		return rx_default_dqrr(portal, fq, dq, sched_napi);
>> +
>> +	priv = netdev_priv(net_dev);
>> +	dpaa_bp = dpaa_bpid2pool(dq->fd.bpid);
>> +	if (!dpaa_bp)
>> +		return qman_cb_dqrr_consume;
>> +
>> +	percpu_priv = this_cpu_ptr(priv->percpu_priv);
>> +
>> +	if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
>> +		return qman_cb_dqrr_stop;
>> +
>> +	dpaa_eth_refill_bpools(priv);
>> +	dpaa_rx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
>> +
>> +	return qman_cb_dqrr_consume;
>> +}
>> +
>>  static enum qman_cb_dqrr_result conf_error_dqrr(struct qman_portal *portal,
>>  						struct qman_fq *fq,
>>  						const struct qm_dqrr_entry *dq,
>> @@ -2955,6 +2959,7 @@ static int dpaa_phy_init(struct net_device *net_dev)
>>  
>>  static int dpaa_open(struct net_device *net_dev)
>>  {
>> +	bool ignore_errors = netdev_uses_dsa(net_dev);
>>  	struct mac_device *mac_dev;
>>  	struct dpaa_priv *priv;
>>  	int err, i;
>> @@ -2968,7 +2973,7 @@ static int dpaa_open(struct net_device *net_dev)
>>  		goto phy_init_failed;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(mac_dev->port); i++) {
>> -		err = fman_port_enable(mac_dev->port[i]);
>> +		err = fman_port_enable(mac_dev->port[i], ignore_errors);
>>  		if (err)
>>  			goto mac_start_failed;
>>  	}
>> diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
>> index d9baac0dbc7d..763faec11f5c 100644
>> --- a/drivers/net/ethernet/freescale/fman/fman_port.c
>> +++ b/drivers/net/ethernet/freescale/fman/fman_port.c
>> @@ -106,6 +106,7 @@
>>  #define BMI_EBD_EN				0x80000000
>>  
>>  #define BMI_PORT_CFG_EN				0x80000000
>> +#define BMI_PORT_CFG_FDOVR			0x02000000
>>  
>>  #define BMI_PORT_STATUS_BSY			0x80000000
>>  
>> @@ -1629,7 +1630,7 @@ int fman_port_disable(struct fman_port *port)
>>  	}
>>  
>>  	/* Disable BMI */
>> -	tmp = ioread32be(bmi_cfg_reg) & ~BMI_PORT_CFG_EN;
>> +	tmp = ioread32be(bmi_cfg_reg) & ~(BMI_PORT_CFG_EN | BMI_PORT_CFG_FDOVR);
>>  	iowrite32be(tmp, bmi_cfg_reg);
>>  
>>  	/* Wait for graceful stop end */
>> @@ -1655,6 +1656,7 @@ EXPORT_SYMBOL(fman_port_disable);
>>  /**
>>   * fman_port_enable
>>   * @port:	A pointer to a FM Port module.
>> + * @ignore_errors: If set, do not discard frames received with errors.
>>   *
>>   * A runtime routine provided to allow disable/enable of port.
>>   *
>> @@ -1662,7 +1664,7 @@ EXPORT_SYMBOL(fman_port_disable);
>>   *
>>   * Return: 0 on success; Error code otherwise.
>>   */
>> -int fman_port_enable(struct fman_port *port)
>> +int fman_port_enable(struct fman_port *port, bool ignore_errors)
>>  {
>>  	u32 __iomem *bmi_cfg_reg;
>>  	u32 tmp;
>> @@ -1692,6 +1694,8 @@ int fman_port_enable(struct fman_port *port)
>>  
>>  	/* Enable BMI */
>>  	tmp = ioread32be(bmi_cfg_reg) | BMI_PORT_CFG_EN;
>> +	if (ignore_errors)
>> +		tmp |= BMI_PORT_CFG_FDOVR;
>>  	iowrite32be(tmp, bmi_cfg_reg);
>>  
>>  	return 0;
>> diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
>> index 82f12661a46d..0928361b0e73 100644
>> --- a/drivers/net/ethernet/freescale/fman/fman_port.h
>> +++ b/drivers/net/ethernet/freescale/fman/fman_port.h
>> @@ -147,7 +147,7 @@ int fman_port_cfg_buf_prefix_content(struct fman_port *port,
>>  
>>  int fman_port_disable(struct fman_port *port);
>>  
>> -int fman_port_enable(struct fman_port *port);
>> +int fman_port_enable(struct fman_port *port, bool ignore_errors);
>>  
>>  u32 fman_port_get_qman_channel_id(struct fman_port *port);
>>  
>> -----------------------------[ cut here ]-----------------------------
>>
>> The netdev_uses_dsa thing is a bit trashy, I think that a more polished
>> version should rather set NETIF_F_RXALL for the DSA master, and have the
>> dpaa driver act upon that. But first I'm curious if it works.
> 
> It does work. Thank you!

This is unfortunately not new, the stmmac choked on me with Broadcom
tags as well:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8cad443eacf661796a740903a75cb8944c675b4e
Vladimir Oltean March 23, 2021, 7:03 p.m. UTC | #4
On Tue, Mar 23, 2021 at 03:48:51PM +0100, Tobias Waldekranz wrote:
> On Tue, Mar 23, 2021 at 13:35, Vladimir Oltean <olteanv@gmail.com> wrote:
> > The netdev_uses_dsa thing is a bit trashy, I think that a more polished
> > version should rather set NETIF_F_RXALL for the DSA master, and have the
> > dpaa driver act upon that. But first I'm curious if it works.
> 
> It does work. Thank you!

Happy to hear that.

> Does setting RXALL mean that the master would accept frames with a bad
> FCS as well?

Do you mean from the perspective of the network stack, or of the hardware?

As far as the hardware is concerned, here is what the manual has to say:

Frame reception from the network may encounter certain error conditions.
Such errors are reported by the Ethernet MAC when the frame is transferred
to the Buffer Manager Interface (BMI). The action taken per error case
is described below. Besides the interrupts, the BMI is capable of
recognizing several conditions and setting a corresponding flag in FD
status field for Host usage. These conditions are as follows:

* Physical Error. One of the following events were detected by the
  Ethernet MAC: Rx FIFO overflow, FCS error, code error, running
  disparity error (in applicable modes), FIFO parity error, PHY Sequence
  error, PHY error control character detected, CRC error. The BMI
  discards the frame, or enqueue directly to EFQID if FMBM_RCFG[FDOVR]
  is set [ editor's note: this is what my patch does ]. FPE bit is set
  in the FD status.
* Frame size error. The Ethernet MAC detected a frame that its length
  exceeds the maximum allowed as configured in the MAC registers. The
  frame is truncated by the MAC to the maximum allowed, and it is marked
  as truncated. The BMI sets FSE in the FD status and forwards the frame
  to next module in the FMan as usual.
* Some other network error may result in the frame being discarded by
  the MAC and not shown to the BMI. However, the MAC is responsible for
  counting such errors in its own statistics counters.

So yes, packets with bad FCS are accepted with FMBM_RCFG[FDOVR] set.
But it would be interesting to see what is the value of "fd_status" in
rx_default_dqrr() for bad packets. You know, in the DPAA world, the
correct approach to solve this problem would be to create a
configuration to describe a "soft examination sequence" for the
programmable hardware "soft parser", which identifies the DSA tag and
skips over a programmable number of octets. This allows you to be able
to continue to do things such as flow steering based on IP headers
located after the DSA tag, etc. This is not supported in the upstream
FMan driver however, neither the soft parser itself nor an abstraction
for making DSA masters DSA-aware. I think it would also require more
work than it took me to hack up this patch. But at least, if I
understand correctly, with a soft parser in place, the MAC error
counters should at least stop incrementing, if that is of any importance
to you.

> If so, would that mean that we would have to verify it in software?

I don't see any place in the network stack that recalculates the FCS if
NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even
know how could the stack even tell a packet with bad FCS apart from one
with good FCS. If NETIF_F_RXALL is set, then once a packet is received,
it's taken for granted as good.

There is a separate hardware bit to include the FCS in the RX buffer, I
don't think this is what you want/need.

> >> 
> >> As a workaround, switching to EDSA (thereby always having a proper
> >> EtherType in the frame) solves the issue.
> >
> > So basically every user needs to change the tag protocol manually to be
> > able to receive from port 8? Not sure if that's too friendly.
> 
> No it is not friendly at all. My goal was to add it as a device-tree
> property, but for reasons I will detail in my answer to Andrew, I did
> not manage to figure out a good way to do that. Happy to take
> suggestions.

My two cents here are that you should think for the long term. If you
need it due to a limitation which you have today but might no longer
have tomorrow, don't put it in the device tree unless you want to
support it even when you don't need it anymore.
Vladimir Oltean March 23, 2021, 11:15 p.m. UTC | #5
On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:
> On Tue, Mar 23, 2021 at 21:03, Vladimir Oltean <olteanv@gmail.com> wrote:
> > On Tue, Mar 23, 2021 at 03:48:51PM +0100, Tobias Waldekranz wrote:
> >> On Tue, Mar 23, 2021 at 13:35, Vladimir Oltean <olteanv@gmail.com> wrote:
> >> > The netdev_uses_dsa thing is a bit trashy, I think that a more polished
> >> > version should rather set NETIF_F_RXALL for the DSA master, and have the
> >> > dpaa driver act upon that. But first I'm curious if it works.
> >> 
> >> It does work. Thank you!
> >
> > Happy to hear that.
> >
> >> Does setting RXALL mean that the master would accept frames with a bad
> >> FCS as well?
> >
> > Do you mean from the perspective of the network stack, or of the hardware?
> >
> > As far as the hardware is concerned, here is what the manual has to say:
> >
> > Frame reception from the network may encounter certain error conditions.
> > Such errors are reported by the Ethernet MAC when the frame is transferred
> > to the Buffer Manager Interface (BMI). The action taken per error case
> > is described below. Besides the interrupts, the BMI is capable of
> > recognizing several conditions and setting a corresponding flag in FD
> > status field for Host usage. These conditions are as follows:
> >
> > * Physical Error. One of the following events were detected by the
> >   Ethernet MAC: Rx FIFO overflow, FCS error, code error, running
> >   disparity error (in applicable modes), FIFO parity error, PHY Sequence
> >   error, PHY error control character detected, CRC error. The BMI
> >   discards the frame, or enqueue directly to EFQID if FMBM_RCFG[FDOVR]
> >   is set [ editor's note: this is what my patch does ]. FPE bit is set
> >   in the FD status.
> > * Frame size error. The Ethernet MAC detected a frame that its length
> >   exceeds the maximum allowed as configured in the MAC registers. The
> >   frame is truncated by the MAC to the maximum allowed, and it is marked
> >   as truncated. The BMI sets FSE in the FD status and forwards the frame
> >   to next module in the FMan as usual.
> > * Some other network error may result in the frame being discarded by
> >   the MAC and not shown to the BMI. However, the MAC is responsible for
> >   counting such errors in its own statistics counters.
> >
> > So yes, packets with bad FCS are accepted with FMBM_RCFG[FDOVR] set.
> > But it would be interesting to see what is the value of "fd_status" in
> > rx_default_dqrr() for bad packets. You know, in the DPAA world, the
> > correct approach to solve this problem would be to create a
> > configuration to describe a "soft examination sequence" for the
> > programmable hardware "soft parser", which identifies the DSA tag and
> 
> Yeah I know you can do that. It is a very flexible chip that can do all
> kinds of fancy stuff...
> 
> > skips over a programmable number of octets. This allows you to be able
> > to continue to do things such as flow steering based on IP headers
> > located after the DSA tag, etc. This is not supported in the upstream
> > FMan driver however, neither the soft parser itself nor an abstraction
> > for making DSA masters DSA-aware. I think it would also require more
> 
> ...but this is the problem. These accelerators are always guarded by
> NDAs and proprietary code.

Hey, that is simply not true. [ this is even more hilarous, given that
you are criticizing NXP about openness in a thread about Marvell hardware ]

I just created an account on nxp.com using my gmail email address, then
I typed "T1023" in the search bar, clicked on "T1024", clicked the
"Documentation" tab, went to the "Reference Manual" section, hit "More",
selected "T1024DPAArm, QorIQ T1024 Data Path Acceleration Architecture
(DPAA) Reference Manual", and it downloaded T1042DPAARM.pdf right away.
And that has 1373 pages of 'all you can eat' about the DPAA hardware.
And this user manual is _specifically_ for the network subsystem, the
SoC has a separate user manual (T1024RM.pdf) which has another 2121
pages about the other peripherals.

> If NXP could transpile XDP to dpaa/dpaa2 in the kernel like how
> Netronome does it, we would never even talk to another SoC-vendor.

'More complicated than it's worth' is, I believe, what the verdict on
that was. DPAA is an unbelievably complicated architecture living in a
universe of its own. Also, it was designed at the beginning of the
multi-core/multi-queue era, when it wasn't at all clear that Linux would
become the dominant operating system for embedded networking, but merely
one of the many contenders. So you'll find things like a queuing and
buffer management system optimized for dispatching towards many packet
processing engines arranged in composable pipelines, and with seemingly
exotic features such as order preservation for parallel processing of a
single network flow. Whereas in Linux, the dominant model [ coming from x86 ]
is that where 1 queue + 1 buffer pool are compressed into 1 ring which
goes to 1 CPU, which is a much more efficient data structure for
single-core, CPU-intensive processing, so that model won out. The way
the Linux drivers for DPAA make use of the hardware features is so poor
not because of lack of willpower, but due to an explosive combination of
dated hardware design and overengineering.

And DPAA2 is basically the response to the criticism that with DPAA1 it was
stupidly complicated to send a packet. With DPAA2 it's just as complicated,
except that now, most of the hardware intricacies are managed by a
firmware. The idea behind this was that it was supposed to make the
integration with operating systems easier, and many people smarter than
me say it's for the better.

Of course, all of these look like mistakes when seen through the lens of
hindsight, but I don't think I can really judge, just observe.

But many open source frameworks for packet acceleration with DPAA were
tried, rest assured. If you scan the "external/qoriq" project from
https://source.codeaurora.org/, I'm sure you'll find more code than you
can digest.

On the other hand, there _is_ support in the mainline kernel for both
dpaa and dpaa2 for native XDP, which is a modern framework that gives
you decent enough throughput for CPU-based packet processing.

> > work than it took me to hack up this patch. But at least, if I
> > understand correctly, with a soft parser in place, the MAC error
> > counters should at least stop incrementing, if that is of any importance
> > to you.
> 
> This is the tragedy: I know for a fact that a DSA soft parser exists,
> but because of the aforementioned maze of NDAs and license agreements
> we, the community, cannot have nice things.

Oh yeah? You can even create your own, if you have nerves of steel and a
thick enough skin to learn to use the "fmc" (Frame Manager Configuration
Tool) program, which is fully open source if you search for it on CAF
(and if you can actually make something out of the source code).

And this PDF (hidden so well behind the maze of NDAs, that I just had to
google for it, and you don't even need to register to read it):
https://www.nxp.com/docs/en/user-guide/LSDKUG_Rev20.12.pdf
is chock full of information on what you can do with it, see chapters 8.2.5 and 8.2.6.

Personally, I am not ashamed to admit that I'm too stupid to use it.
But to blame the mainline unavailability of these features on lack of
openness is unfair. Poor integration with Linux networking concepts?
Lack of popularity or demand from Linux users? Maybe, but that would
imply a certain circularity, and that would paint things in a
not-so-black-and-white tone, which you want to avoid.

If you want to do any sort of development around that area, I'm sure
you'd be raised a statue and sung odes to.

[ enough about DPAA ]

> >> If so, would that mean that we would have to verify it in software?
> >
> > I don't see any place in the network stack that recalculates the FCS if
> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even
> > know how could the stack even tell a packet with bad FCS apart from one
> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,
> > it's taken for granted as good.
> 
> Right, but there is a difference between a user explicitly enabling it
> on a device and us enabling it because we need it internally in the
> kernel.
> 
> In the first scenario, the user can hardly complain as they have
> explicitly requested to see all packets on that device. That would not
> be true in the second one because there would be no way for the user to
> turn it off. It feels like you would end up in a similar situation as
> with the user- vs. kernel- promiscuous setting.
> 
> It seems to me if we enable it, we are responsible for not letting crap
> through to the port netdevs.

Yes, the advantage of NETIF_F_RXALL is that you treat error packets as
normal, the disadvantage is that you treat error packets as normal.

So I think all the options were laid out:
- Make the driver use EDSA by default when it can, because that has
  better compatibility with masters, then users who care about
  performance can dynamically switch [ back ] to DSA. Pro: is simple,
  con: may affect somebody relying on the default behavior
- Make your board parse a custom device tree binding which tells it what
  initial tagging protocol to use. Pro: addresses the con of the above.
  Con: kinda hacky.
- Make the DSA master ignore parser errors. Pro: see above. Con: see above.
- Teach the DSA master to have a minimal understanding of the DSA header.
  Pro: is the right thing to do, is compatible and better for more
  advanced use cases. Con: is more complicated than the alternatives.
Andrew Lunn March 24, 2021, 12:44 a.m. UTC | #6
> This was my initial approach. It gets quite messy though. Since taggers

> can be modules, there is no way of knowing if a supplied protocol name

> is garbage ("asdf"), or just part of a module in an initrd that is not

> loaded yet when you are probing the tree.


Hi Tobias

I don't think that is an issue. We currently lookup the tagger in
dsa_port_parse_cpu(). If it does not exist, we return
-EPROBE_DEFER. Either it eventually gets loaded, or the driver core
gives up. I don't see why the same cannot be done for a DT
property. If dsa_find_tagger_by_name() does not find the tagger return
-EPROBE_DEFER. Garbage will result in the switch never loading, and
the DT writer will go find their typo.

> Even when the tagger is available, there is no way to verify if the

> driver is compatible with it.


I would of though, calling the switch drivers change_tag_protocol() op
will that for you. If it comes back with -EINVAL, or -EOPNOTSUPP, you
know it is not compatible.

So i guess i would keep all the code you are adding here to allow
dynamic setting of the protocol. And add more code in
dsa_switch_parse_of() to parse the optional tagging protocol name,
error out -EPROBE_DEFER if it is not known yet, otherwise store it
away in something like dst->tag_ops_name. And then probably in
dsa_switch_setup(), if dst->tag_ops_name is not NULL, invoke the
dynamic change code to perform the actual change.

	Andrew
Tobias Waldekranz March 24, 2021, 10:52 a.m. UTC | #7
On Wed, Mar 24, 2021 at 01:15, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:

>> On Tue, Mar 23, 2021 at 21:03, Vladimir Oltean <olteanv@gmail.com> wrote:

>> > On Tue, Mar 23, 2021 at 03:48:51PM +0100, Tobias Waldekranz wrote:

>> >> On Tue, Mar 23, 2021 at 13:35, Vladimir Oltean <olteanv@gmail.com> wrote:

>> >> > The netdev_uses_dsa thing is a bit trashy, I think that a more polished

>> >> > version should rather set NETIF_F_RXALL for the DSA master, and have the

>> >> > dpaa driver act upon that. But first I'm curious if it works.

>> >> 

>> >> It does work. Thank you!

>> >

>> > Happy to hear that.

>> >

>> >> Does setting RXALL mean that the master would accept frames with a bad

>> >> FCS as well?

>> >

>> > Do you mean from the perspective of the network stack, or of the hardware?

>> >

>> > As far as the hardware is concerned, here is what the manual has to say:

>> >

>> > Frame reception from the network may encounter certain error conditions.

>> > Such errors are reported by the Ethernet MAC when the frame is transferred

>> > to the Buffer Manager Interface (BMI). The action taken per error case

>> > is described below. Besides the interrupts, the BMI is capable of

>> > recognizing several conditions and setting a corresponding flag in FD

>> > status field for Host usage. These conditions are as follows:

>> >

>> > * Physical Error. One of the following events were detected by the

>> >   Ethernet MAC: Rx FIFO overflow, FCS error, code error, running

>> >   disparity error (in applicable modes), FIFO parity error, PHY Sequence

>> >   error, PHY error control character detected, CRC error. The BMI

>> >   discards the frame, or enqueue directly to EFQID if FMBM_RCFG[FDOVR]

>> >   is set [ editor's note: this is what my patch does ]. FPE bit is set

>> >   in the FD status.

>> > * Frame size error. The Ethernet MAC detected a frame that its length

>> >   exceeds the maximum allowed as configured in the MAC registers. The

>> >   frame is truncated by the MAC to the maximum allowed, and it is marked

>> >   as truncated. The BMI sets FSE in the FD status and forwards the frame

>> >   to next module in the FMan as usual.

>> > * Some other network error may result in the frame being discarded by

>> >   the MAC and not shown to the BMI. However, the MAC is responsible for

>> >   counting such errors in its own statistics counters.

>> >

>> > So yes, packets with bad FCS are accepted with FMBM_RCFG[FDOVR] set.

>> > But it would be interesting to see what is the value of "fd_status" in

>> > rx_default_dqrr() for bad packets. You know, in the DPAA world, the

>> > correct approach to solve this problem would be to create a

>> > configuration to describe a "soft examination sequence" for the

>> > programmable hardware "soft parser", which identifies the DSA tag and

>> 

>> Yeah I know you can do that. It is a very flexible chip that can do all

>> kinds of fancy stuff...

>> 

>> > skips over a programmable number of octets. This allows you to be able

>> > to continue to do things such as flow steering based on IP headers

>> > located after the DSA tag, etc. This is not supported in the upstream

>> > FMan driver however, neither the soft parser itself nor an abstraction

>> > for making DSA masters DSA-aware. I think it would also require more

>> 

>> ...but this is the problem. These accelerators are always guarded by

>> NDAs and proprietary code.

>

> Hey, that is simply not true. [ this is even more hilarous, given that

> you are criticizing NXP about openness in a thread about Marvell hardware ]


NXP is much better than most in this area. That is a big reason why you
will find NXP/Freescale silicon in a majority of Westermo products.

> I just created an account on nxp.com using my gmail email address, then

> I typed "T1023" in the search bar, clicked on "T1024", clicked the

> "Documentation" tab, went to the "Reference Manual" section, hit "More",

> selected "T1024DPAArm, QorIQ T1024 Data Path Acceleration Architecture

> (DPAA) Reference Manual", and it downloaded T1042DPAARM.pdf right away.

> And that has 1373 pages of 'all you can eat' about the DPAA hardware.

> And this user manual is _specifically_ for the network subsystem, the

> SoC has a separate user manual (T1024RM.pdf) which has another 2121

> pages about the other peripherals.


Yes I know all this.

>> If NXP could transpile XDP to dpaa/dpaa2 in the kernel like how

>> Netronome does it, we would never even talk to another SoC-vendor.

>

> 'More complicated than it's worth' is, I believe, what the verdict on

> that was. DPAA is an unbelievably complicated architecture living in a

> universe of its own. Also, it was designed at the beginning of the

> multi-core/multi-queue era, when it wasn't at all clear that Linux would

> become the dominant operating system for embedded networking, but merely

> one of the many contenders. So you'll find things like a queuing and

> buffer management system optimized for dispatching towards many packet

> processing engines arranged in composable pipelines, and with seemingly

> exotic features such as order preservation for parallel processing of a

> single network flow. Whereas in Linux, the dominant model [ coming from x86 ]

> is that where 1 queue + 1 buffer pool are compressed into 1 ring which

> goes to 1 CPU, which is a much more efficient data structure for

> single-core, CPU-intensive processing, so that model won out. The way

> the Linux drivers for DPAA make use of the hardware features is so poor

> not because of lack of willpower, but due to an explosive combination of

> dated hardware design and overengineering.

>

> And DPAA2 is basically the response to the criticism that with DPAA1 it was

> stupidly complicated to send a packet. With DPAA2 it's just as complicated,

> except that now, most of the hardware intricacies are managed by a

> firmware. The idea behind this was that it was supposed to make the

> integration with operating systems easier, and many people smarter than

> me say it's for the better.

>

> Of course, all of these look like mistakes when seen through the lens of

> hindsight, but I don't think I can really judge, just observe.


I do not mean to judge either. I am sure that all these desicions were
the right ones at the time they were taken. It is just surprising to me
that no SoC vendor has tried on a sub-50 Gbps device.

> But many open source frameworks for packet acceleration with DPAA were

> tried, rest assured. If you scan the "external/qoriq" project from

> https://source.codeaurora.org/, I'm sure you'll find more code than you

> can digest.

>

> On the other hand, there _is_ support in the mainline kernel for both

> dpaa and dpaa2 for native XDP, which is a modern framework that gives

> you decent enough throughput for CPU-based packet processing.


Yes, that is very much appreciated.

>> > work than it took me to hack up this patch. But at least, if I

>> > understand correctly, with a soft parser in place, the MAC error

>> > counters should at least stop incrementing, if that is of any importance

>> > to you.

>> 

>> This is the tragedy: I know for a fact that a DSA soft parser exists,

>> but because of the aforementioned maze of NDAs and license agreements

>> we, the community, cannot have nice things.

>

> Oh yeah? You can even create your own, if you have nerves of steel and a

> thick enough skin to learn to use the "fmc" (Frame Manager Configuration

> Tool) program, which is fully open source if you search for it on CAF

> (and if you can actually make something out of the source code).


Yes, this is what a colleague of mine has done. Which is how I know that
one exists :)

> And this PDF (hidden so well behind the maze of NDAs, that I just had to

> google for it, and you don't even need to register to read it):

> https://www.nxp.com/docs/en/user-guide/LSDKUG_Rev20.12.pdf

> is chock full of information on what you can do with it, see chapters 8.2.5 and 8.2.6.


Right, but this is where it ends. Using the wealth of information you
have laid out so far you can use DPAA to do amazing things using open
components.

...unless you have to do something so incredibly advanced and exotic as
a masked update of a field. At this point you have two options:

1. Buy the firmware toolchain, which requires signing an NDA.
2. Buy a single-drop firmware binary for lots of $$$ without any
   possibility of getting further updates because "you should really be
   using DPAA2".

> Personally, I am not ashamed to admit that I'm too stupid to use it.


Neither am I, I am very thankful to have people around me who can do
amazing things that are out of reach for me.

> But to blame the mainline unavailability of these features on lack of

> openness is unfair. Poor integration with Linux networking concepts?

> Lack of popularity or demand from Linux users? Maybe, but that would

> imply a certain circularity, and that would paint things in a

> not-so-black-and-white tone, which you want to avoid.


Objectively, it is certainly not black-and-white. My guess is that there
many many happy DPAA users out there. I was describing my subjective
experience. Perhaps some residual pain from old wounds led me to use a
harsher tone than necessary.

> If you want to do any sort of development around that area, I'm sure

> you'd be raised a statue and sung odes to.


We could maybe open up our soft-parser without having to use a custom
firmware. But I do not know enough about DPAA to judge how hard that
would be to integrate with what is in the open driver. I will consult
with the people who do.

> [ enough about DPAA ]

>

>> >> If so, would that mean that we would have to verify it in software?

>> >

>> > I don't see any place in the network stack that recalculates the FCS if

>> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even

>> > know how could the stack even tell a packet with bad FCS apart from one

>> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,

>> > it's taken for granted as good.

>> 

>> Right, but there is a difference between a user explicitly enabling it

>> on a device and us enabling it because we need it internally in the

>> kernel.

>> 

>> In the first scenario, the user can hardly complain as they have

>> explicitly requested to see all packets on that device. That would not

>> be true in the second one because there would be no way for the user to

>> turn it off. It feels like you would end up in a similar situation as

>> with the user- vs. kernel- promiscuous setting.

>> 

>> It seems to me if we enable it, we are responsible for not letting crap

>> through to the port netdevs.

>

> Yes, the advantage of NETIF_F_RXALL is that you treat error packets as

> normal, the disadvantage is that you treat error packets as normal.

>

> So I think all the options were laid out:

> - Make the driver use EDSA by default when it can, because that has

>   better compatibility with masters, then users who care about

>   performance can dynamically switch [ back ] to DSA. Pro: is simple,

>   con: may affect somebody relying on the default behavior


I think this is pretty much what is done today for "category 2" devices
in my original message. The problem is with the devices from "category
3" where we would prefer to only use documented features of the device
as long as the master can cope with it. But if the controller refuses to
cooperate, the scales might tip in favor of using undocumented features.

> - Make your board parse a custom device tree binding which tells it what

>   initial tagging protocol to use. Pro: addresses the con of the above.

>   Con: kinda hacky.


Kinda hacky, yes. OTOH, it is a workaround and not something that you
have to be aware of unless you actually need it.

> - Make the DSA master ignore parser errors. Pro: see above. Con: see above.


Yeah I think the performace impact of having to checksum every frame
makes this unattractive.

> - Teach the DSA master to have a minimal understanding of the DSA header.

>   Pro: is the right thing to do, is compatible and better for more

>   advanced use cases. Con: is more complicated than the alternatives.


In cases where this is possible, absolutely!
Vladimir Oltean March 24, 2021, 11:34 a.m. UTC | #8
On Wed, Mar 24, 2021 at 11:52:49AM +0100, Tobias Waldekranz wrote:
> >> This is the tragedy: I know for a fact that a DSA soft parser exists,

> >> but because of the aforementioned maze of NDAs and license agreements

> >> we, the community, cannot have nice things.

> >

> > Oh yeah? You can even create your own, if you have nerves of steel and a

> > thick enough skin to learn to use the "fmc" (Frame Manager Configuration

> > Tool) program, which is fully open source if you search for it on CAF

> > (and if you can actually make something out of the source code).

> 

> Yes, this is what a colleague of mine has done. Which is how I know that

> one exists :)

> 

> > And this PDF (hidden so well behind the maze of NDAs, that I just had to

> > google for it, and you don't even need to register to read it):

> > https://www.nxp.com/docs/en/user-guide/LSDKUG_Rev20.12.pdf

> > is chock full of information on what you can do with it, see chapters 8.2.5 and 8.2.6.

> 

> Right, but this is where it ends. Using the wealth of information you

> have laid out so far you can use DPAA to do amazing things using open

> components.

> 

> ...unless you have to do something so incredibly advanced and exotic as

> a masked update of a field. At this point you have two options:

> 

> 1. Buy the firmware toolchain, which requires signing an NDA.

> 2. Buy a single-drop firmware binary for lots of $$$ without any

>    possibility of getting further updates because "you should really be

>    using DPAA2".


Uhm, what?
By "firmware" I assume you mean "FMan microcode"?

To my knowledge, the standard FMan microcode distributed _freely_ with
the LSDK has support for Header Manipulation, you just need to create a
Header Manipulation Command Descriptor (HMCD) and pass it to the
microcode through an O/H port. I believe that:
(a) the Header Manipulation descriptors allow you to perform raw mask
    based field updates too, not just for standard protocols
(b) fmc already has some support for sending Header Manipulation
    descriptors to the microcode

And by "firmware toolchain" you mean the FMan microcode SDK?
https://www.nxp.com/design/software/embedded-software/linux-software-and-development-tools/dpaa-fman-microcode-sdk-source-code-software-kit:DPAA-FMAN-SDK

In the description for that product it says:

  For MOST of NXP communications customers, the microcode that is freely
  accessible via the NXP LSDK or SDK for QorIQ or Layerscape processors
  will handle any communications offload task you could throw at the DPAA.

So why on earth would you need that? And does it really surprise you
that it costs money, especially considering the fact that you're going
to need heaps of support for it anyway?

Seriously, what is your point? You're complaining about having the
option to write your own microcode for the RISC cores inside the network
controller, when the standard one already comes with a lot of features?
What would you prefer, not having that option?

This is a strawman. None of the features we talked about in this thread,
soft parser for DSA tags or masked header manipulation, should require
custom microcode.
Tobias Waldekranz March 24, 2021, 12:53 p.m. UTC | #9
On Wed, Mar 24, 2021 at 01:44, Andrew Lunn <andrew@lunn.ch> wrote:
>> This was my initial approach. It gets quite messy though. Since taggers

>> can be modules, there is no way of knowing if a supplied protocol name

>> is garbage ("asdf"), or just part of a module in an initrd that is not

>> loaded yet when you are probing the tree.

>

> Hi Tobias

>

> I don't think that is an issue. We currently lookup the tagger in

> dsa_port_parse_cpu(). If it does not exist, we return

> -EPROBE_DEFER. Either it eventually gets loaded, or the driver core

> gives up. I don't see why the same cannot be done for a DT

> property. If dsa_find_tagger_by_name() does not find the tagger return

> -EPROBE_DEFER. Garbage will result in the switch never loading, and

> the DT writer will go find their typo.

>

>> Even when the tagger is available, there is no way to verify if the

>> driver is compatible with it.

>

> I would of though, calling the switch drivers change_tag_protocol() op

> will that for you. If it comes back with -EINVAL, or -EOPNOTSUPP, you

> know it is not compatible.

>

> So i guess i would keep all the code you are adding here to allow

> dynamic setting of the protocol. And add more code in

> dsa_switch_parse_of() to parse the optional tagging protocol name,

> error out -EPROBE_DEFER if it is not known yet, otherwise store it

> away in something like dst->tag_ops_name. And then probably in

> dsa_switch_setup(), if dst->tag_ops_name is not NULL, invoke the

> dynamic change code to perform the actual change.


Sounds like a plan. I will try it out and get back with a v2. Thanks.
Tobias Waldekranz March 24, 2021, 1:01 p.m. UTC | #10
On Wed, Mar 24, 2021 at 13:34, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Wed, Mar 24, 2021 at 11:52:49AM +0100, Tobias Waldekranz wrote:

>> >> This is the tragedy: I know for a fact that a DSA soft parser exists,

>> >> but because of the aforementioned maze of NDAs and license agreements

>> >> we, the community, cannot have nice things.

>> >

>> > Oh yeah? You can even create your own, if you have nerves of steel and a

>> > thick enough skin to learn to use the "fmc" (Frame Manager Configuration

>> > Tool) program, which is fully open source if you search for it on CAF

>> > (and if you can actually make something out of the source code).

>> 

>> Yes, this is what a colleague of mine has done. Which is how I know that

>> one exists :)

>> 

>> > And this PDF (hidden so well behind the maze of NDAs, that I just had to

>> > google for it, and you don't even need to register to read it):

>> > https://www.nxp.com/docs/en/user-guide/LSDKUG_Rev20.12.pdf

>> > is chock full of information on what you can do with it, see chapters 8.2.5 and 8.2.6.

>> 

>> Right, but this is where it ends. Using the wealth of information you

>> have laid out so far you can use DPAA to do amazing things using open

>> components.

>> 

>> ...unless you have to do something so incredibly advanced and exotic as

>> a masked update of a field. At this point you have two options:

>> 

>> 1. Buy the firmware toolchain, which requires signing an NDA.

>> 2. Buy a single-drop firmware binary for lots of $$$ without any

>>    possibility of getting further updates because "you should really be

>>    using DPAA2".

>

> Uhm, what?

> By "firmware" I assume you mean "FMan microcode"?

>

> To my knowledge, the standard FMan microcode distributed _freely_ with

> the LSDK has support for Header Manipulation, you just need to create a

> Header Manipulation Command Descriptor (HMCD) and pass it to the

> microcode through an O/H port. I believe that:

> (a) the Header Manipulation descriptors allow you to perform raw mask

>     based field updates too, not just for standard protocols


This is not the story we were told.

> (b) fmc already has some support for sending Header Manipulation

>     descriptors to the microcode

>

> And by "firmware toolchain" you mean the FMan microcode SDK?

> https://www.nxp.com/design/software/embedded-software/linux-software-and-development-tools/dpaa-fman-microcode-sdk-source-code-software-kit:DPAA-FMAN-SDK

>

> In the description for that product it says:

>

>   For MOST of NXP communications customers, the microcode that is freely

>   accessible via the NXP LSDK or SDK for QorIQ or Layerscape processors

>   will handle any communications offload task you could throw at the DPAA.

>

> So why on earth would you need that? And does it really surprise you


Because NXP said we needed it.

> that it costs money, especially considering the fact that you're going

> to need heaps of support for it anyway?


No, it surprised me that we had to pay for a solution to a problem that
we were promised would be solvable using the stock firmware.

> Seriously, what is your point? You're complaining about having the

> option to write your own microcode for the RISC cores inside the network

> controller, when the standard one already comes with a lot of features?

> What would you prefer, not having that option?

>

> This is a strawman. None of the features we talked about in this thread,

> soft parser for DSA tags or masked header manipulation, should require

> custom microcode.


I never made that claim. I was describing our experience with DPAA on
the whole.
Vladimir Oltean March 24, 2021, 1:24 p.m. UTC | #11
On Wed, Mar 24, 2021 at 02:01:14PM +0100, Tobias Waldekranz wrote:
> On Wed, Mar 24, 2021 at 13:34, Vladimir Oltean <olteanv@gmail.com> wrote:

> > On Wed, Mar 24, 2021 at 11:52:49AM +0100, Tobias Waldekranz wrote:

> >> >> This is the tragedy: I know for a fact that a DSA soft parser exists,

> >> >> but because of the aforementioned maze of NDAs and license agreements

> >> >> we, the community, cannot have nice things.

> >> >

> >> > Oh yeah? You can even create your own, if you have nerves of steel and a

> >> > thick enough skin to learn to use the "fmc" (Frame Manager Configuration

> >> > Tool) program, which is fully open source if you search for it on CAF

> >> > (and if you can actually make something out of the source code).

> >> 

> >> Yes, this is what a colleague of mine has done. Which is how I know that

> >> one exists :)

> >> 

> >> > And this PDF (hidden so well behind the maze of NDAs, that I just had to

> >> > google for it, and you don't even need to register to read it):

> >> > https://www.nxp.com/docs/en/user-guide/LSDKUG_Rev20.12.pdf

> >> > is chock full of information on what you can do with it, see chapters 8.2.5 and 8.2.6.

> >> 

> >> Right, but this is where it ends. Using the wealth of information you

> >> have laid out so far you can use DPAA to do amazing things using open

> >> components.

> >> 

> >> ...unless you have to do something so incredibly advanced and exotic as

> >> a masked update of a field. At this point you have two options:

> >> 

> >> 1. Buy the firmware toolchain, which requires signing an NDA.

> >> 2. Buy a single-drop firmware binary for lots of $$$ without any

> >>    possibility of getting further updates because "you should really be

> >>    using DPAA2".

> >

> > Uhm, what?

> > By "firmware" I assume you mean "FMan microcode"?

> >

> > To my knowledge, the standard FMan microcode distributed _freely_ with

> > the LSDK has support for Header Manipulation, you just need to create a

> > Header Manipulation Command Descriptor (HMCD) and pass it to the

> > microcode through an O/H port. I believe that:

> > (a) the Header Manipulation descriptors allow you to perform raw mask

> >     based field updates too, not just for standard protocols

> 

> This is not the story we were told.


Wait, aren't we talking about HdrMan OPCODE 0x19 ("Replace Field in Header")?

> > (b) fmc already has some support for sending Header Manipulation

> >     descriptors to the microcode

> >

> > And by "firmware toolchain" you mean the FMan microcode SDK?

> > https://www.nxp.com/design/software/embedded-software/linux-software-and-development-tools/dpaa-fman-microcode-sdk-source-code-software-kit:DPAA-FMAN-SDK

> >

> > In the description for that product it says:

> >

> >   For MOST of NXP communications customers, the microcode that is freely

> >   accessible via the NXP LSDK or SDK for QorIQ or Layerscape processors

> >   will handle any communications offload task you could throw at the DPAA.

> >

> > So why on earth would you need that? And does it really surprise you

> 

> Because NXP said we needed it.

> 

> > that it costs money, especially considering the fact that you're going

> > to need heaps of support for it anyway?

> 

> No, it surprised me that we had to pay for a solution to a problem that

> we were promised would be solvable using the stock firmware.


Maybe the FMan version of your particular device does not support that
HM command, or maybe you needed a slightly different behavior compared
to what HM opcode 0x19 does, and there was a misunderstanding on either
ends resulting in the impression that what you need could be achievable
through that type of descriptor? Either way, the way you phrased things:
| unless you have to do something so incredibly advanced and exotic as
| a masked update of a field
is very unfair, oversimplifying and misleading.

> > Seriously, what is your point? You're complaining about having the

> > option to write your own microcode for the RISC cores inside the network

> > controller, when the standard one already comes with a lot of features?

> > What would you prefer, not having that option?

> >

> > This is a strawman. None of the features we talked about in this thread,

> > soft parser for DSA tags or masked header manipulation, should require

> > custom microcode.

> 

> I never made that claim. I was describing our experience with DPAA on

> the whole.


I fail to see how we ended up talking about custom FMan microcode then.
I did not bring it up, and it is completely irrelevant to the discussion
about soft parser for DSA.
Vladimir Oltean March 24, 2021, 2:03 p.m. UTC | #12
On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:
> > I don't see any place in the network stack that recalculates the FCS if

> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even

> > know how could the stack even tell a packet with bad FCS apart from one

> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,

> > it's taken for granted as good.

> 

> Right, but there is a difference between a user explicitly enabling it

> on a device and us enabling it because we need it internally in the

> kernel.

> 

> In the first scenario, the user can hardly complain as they have

> explicitly requested to see all packets on that device. That would not

> be true in the second one because there would be no way for the user to

> turn it off. It feels like you would end up in a similar situation as

> with the user- vs. kernel- promiscuous setting.

> 

> It seems to me if we enable it, we are responsible for not letting crap

> through to the port netdevs.


I think there exists an intermediate approach between processing the
frames on the RX queue and installing a soft parser.

The BMI of FMan RX ports has a configurable pipeline through Next
Invoked Actions (NIA). Through the FMBM_RFNE register (Rx Frame Next
Engine), it is possible to change the Next Invoked Action from the
default value (which is the hardware parser). You can choose to make the
Buffer Manager Interface enqueue the packet directly to the Queue
Manager Interface (QMI). This will effectively bypass the hardware
parser, so DSA frames will never be sent to the error queue if they have
an invalid EtherType/Length field.

Additionally, frames with a bad FCS should still be discarded, as that
is done by the MAC (an earlier stage compared to the BMI).
Vladimir Oltean March 24, 2021, 2:10 p.m. UTC | #13
On Wed, Mar 24, 2021 at 04:03:17PM +0200, Vladimir Oltean wrote:
> I think there exists an intermediate approach between processing the

> frames on the RX queue and installing a soft parser.


I meant "RX error queue", sorry for the confusion.
Tobias Waldekranz March 24, 2021, 3:02 p.m. UTC | #14
On Wed, Mar 24, 2021 at 16:03, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:

>> > I don't see any place in the network stack that recalculates the FCS if

>> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even

>> > know how could the stack even tell a packet with bad FCS apart from one

>> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,

>> > it's taken for granted as good.

>> 

>> Right, but there is a difference between a user explicitly enabling it

>> on a device and us enabling it because we need it internally in the

>> kernel.

>> 

>> In the first scenario, the user can hardly complain as they have

>> explicitly requested to see all packets on that device. That would not

>> be true in the second one because there would be no way for the user to

>> turn it off. It feels like you would end up in a similar situation as

>> with the user- vs. kernel- promiscuous setting.

>> 

>> It seems to me if we enable it, we are responsible for not letting crap

>> through to the port netdevs.

>

> I think there exists an intermediate approach between processing the

> frames on the RX queue and installing a soft parser.

>

> The BMI of FMan RX ports has a configurable pipeline through Next

> Invoked Actions (NIA). Through the FMBM_RFNE register (Rx Frame Next

> Engine), it is possible to change the Next Invoked Action from the

> default value (which is the hardware parser). You can choose to make the

> Buffer Manager Interface enqueue the packet directly to the Queue

> Manager Interface (QMI). This will effectively bypass the hardware

> parser, so DSA frames will never be sent to the error queue if they have

> an invalid EtherType/Length field.

>

> Additionally, frames with a bad FCS should still be discarded, as that

> is done by the MAC (an earlier stage compared to the BMI).


Yeah this sounds like the perfect middle ground. I guess that would then
be activated with an `if (netdev_uses_dsa(dev))`-guard in the driver,
like how Florian solved it for stmmac? Since it is not quite "rx-all".
Vladimir Oltean March 24, 2021, 3:08 p.m. UTC | #15
On Wed, Mar 24, 2021 at 04:02:52PM +0100, Tobias Waldekranz wrote:
> On Wed, Mar 24, 2021 at 16:03, Vladimir Oltean <olteanv@gmail.com> wrote:

> > On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:

> >> > I don't see any place in the network stack that recalculates the FCS if

> >> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even

> >> > know how could the stack even tell a packet with bad FCS apart from one

> >> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,

> >> > it's taken for granted as good.

> >> 

> >> Right, but there is a difference between a user explicitly enabling it

> >> on a device and us enabling it because we need it internally in the

> >> kernel.

> >> 

> >> In the first scenario, the user can hardly complain as they have

> >> explicitly requested to see all packets on that device. That would not

> >> be true in the second one because there would be no way for the user to

> >> turn it off. It feels like you would end up in a similar situation as

> >> with the user- vs. kernel- promiscuous setting.

> >> 

> >> It seems to me if we enable it, we are responsible for not letting crap

> >> through to the port netdevs.

> >

> > I think there exists an intermediate approach between processing the

> > frames on the RX queue and installing a soft parser.

> >

> > The BMI of FMan RX ports has a configurable pipeline through Next

> > Invoked Actions (NIA). Through the FMBM_RFNE register (Rx Frame Next

> > Engine), it is possible to change the Next Invoked Action from the

> > default value (which is the hardware parser). You can choose to make the

> > Buffer Manager Interface enqueue the packet directly to the Queue

> > Manager Interface (QMI). This will effectively bypass the hardware

> > parser, so DSA frames will never be sent to the error queue if they have

> > an invalid EtherType/Length field.

> >

> > Additionally, frames with a bad FCS should still be discarded, as that

> > is done by the MAC (an earlier stage compared to the BMI).

> 

> Yeah this sounds like the perfect middle ground. I guess that would then

> be activated with an `if (netdev_uses_dsa(dev))`-guard in the driver,

> like how Florian solved it for stmmac? Since it is not quite "rx-all".


I think this would have to be guarded by netdev_uses_dsa for now, yes.
Also, it is far from being a "perfect" middle ground, because if you
disable the hardware parser, you also lose the ability to do frame
classification and hashing/flow steering to multiple RX queues on that
port, I think.
Tobias Waldekranz March 24, 2021, 4:07 p.m. UTC | #16
On Wed, Mar 24, 2021 at 17:08, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Wed, Mar 24, 2021 at 04:02:52PM +0100, Tobias Waldekranz wrote:

>> On Wed, Mar 24, 2021 at 16:03, Vladimir Oltean <olteanv@gmail.com> wrote:

>> > On Tue, Mar 23, 2021 at 10:17:30PM +0100, Tobias Waldekranz wrote:

>> >> > I don't see any place in the network stack that recalculates the FCS if

>> >> > NETIF_F_RXALL is set. Additionally, without NETIF_F_RXFCS, I don't even

>> >> > know how could the stack even tell a packet with bad FCS apart from one

>> >> > with good FCS. If NETIF_F_RXALL is set, then once a packet is received,

>> >> > it's taken for granted as good.

>> >> 

>> >> Right, but there is a difference between a user explicitly enabling it

>> >> on a device and us enabling it because we need it internally in the

>> >> kernel.

>> >> 

>> >> In the first scenario, the user can hardly complain as they have

>> >> explicitly requested to see all packets on that device. That would not

>> >> be true in the second one because there would be no way for the user to

>> >> turn it off. It feels like you would end up in a similar situation as

>> >> with the user- vs. kernel- promiscuous setting.

>> >> 

>> >> It seems to me if we enable it, we are responsible for not letting crap

>> >> through to the port netdevs.

>> >

>> > I think there exists an intermediate approach between processing the

>> > frames on the RX queue and installing a soft parser.

>> >

>> > The BMI of FMan RX ports has a configurable pipeline through Next

>> > Invoked Actions (NIA). Through the FMBM_RFNE register (Rx Frame Next

>> > Engine), it is possible to change the Next Invoked Action from the

>> > default value (which is the hardware parser). You can choose to make the

>> > Buffer Manager Interface enqueue the packet directly to the Queue

>> > Manager Interface (QMI). This will effectively bypass the hardware

>> > parser, so DSA frames will never be sent to the error queue if they have

>> > an invalid EtherType/Length field.

>> >

>> > Additionally, frames with a bad FCS should still be discarded, as that

>> > is done by the MAC (an earlier stage compared to the BMI).

>> 

>> Yeah this sounds like the perfect middle ground. I guess that would then

>> be activated with an `if (netdev_uses_dsa(dev))`-guard in the driver,

>> like how Florian solved it for stmmac? Since it is not quite "rx-all".

>

> I think this would have to be guarded by netdev_uses_dsa for now, yes.

> Also, it is far from being a "perfect" middle ground, because if you

> disable the hardware parser, you also lose the ability to do frame

> classification and hashing/flow steering to multiple RX queues on that

> port, I think.


But even if the parser was enabled, it would never get anywhere since
the Ethertype would look like random garbage. Unless we have the soft
parser, but then it is not the middle ground anymore :)

I suppose you would like to test for netdev_uses_dsa_and_violates_8023,
that way you could still do RSS on DSA devices using regular 1Q-tags for
example. Do we want to add this property to the taggers so that we do
not degrade performance for any existing users?
Vladimir Oltean March 25, 2021, 1:34 a.m. UTC | #17
On Wed, Mar 24, 2021 at 05:07:09PM +0100, Tobias Waldekranz wrote:
> But even if the parser was enabled, it would never get anywhere since

> the Ethertype would look like random garbage. Unless we have the soft

> parser, but then it is not the middle ground anymore :)


Garbage, true, but garbage with enough entropy to allow for some sort of
RFS (ideally you can get the source port field from the DSA tag into the
area covered by the n-tuple on which the master performs hashing). This
is the way in which the switches inside NXP LS1028A and T1040 work.

> I suppose you would like to test for netdev_uses_dsa_and_violates_8023,

> that way you could still do RSS on DSA devices using regular 1Q-tags for

> example. Do we want to add this property to the taggers so that we do

> not degrade performance for any existing users?


Yes, so T1040 is one such example of device that would be negatively
affected by this change. There isn't a good solution to solve all
problems: there will be some Marvell switches which can't operate in
EDSA mode, and there will be some DSA masters that can't parse Marvell
DSA tags. Eventually all possible combinations of workarounds will have
to be implemented. But for now, I think I prefer to see the simplest
one, which has just become the one based on device tree.
Tobias Waldekranz March 25, 2021, 8:04 a.m. UTC | #18
On Thu, Mar 25, 2021 at 03:34, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Wed, Mar 24, 2021 at 05:07:09PM +0100, Tobias Waldekranz wrote:

>> But even if the parser was enabled, it would never get anywhere since

>> the Ethertype would look like random garbage. Unless we have the soft

>> parser, but then it is not the middle ground anymore :)

>

> Garbage, true, but garbage with enough entropy to allow for some sort of

> RFS (ideally you can get the source port field from the DSA tag into the

> area covered by the n-tuple on which the master performs hashing). This

> is the way in which the switches inside NXP LS1028A and T1040 work.


I see what you are saying. Any given flow would still have the same
not-really-an-Ethertype.

>> I suppose you would like to test for netdev_uses_dsa_and_violates_8023,

>> that way you could still do RSS on DSA devices using regular 1Q-tags for

>> example. Do we want to add this property to the taggers so that we do

>> not degrade performance for any existing users?

>

> Yes, so T1040 is one such example of device that would be negatively

> affected by this change. There isn't a good solution to solve all

> problems: there will be some Marvell switches which can't operate in

> EDSA mode, and there will be some DSA masters that can't parse Marvell

> DSA tags. Eventually all possible combinations of workarounds will have

> to be implemented. But for now, I think I prefer to see the simplest

> one, which has just become the one based on device tree.


Alright, it seems like everyone agrees then. I will look into it.

Just to avoid a DenverCoder9 situation; I tried changing the NIA in
FMBM_RFNE like you suggested:

8< ---

diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index d9baac0dbc7d..5aa5b4068f2d 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -543,7 +543,7 @@ static int init_bmi_rx(struct fman_port *port)
        /* NIA */
        tmp = (u32)cfg->rx_fd_bits << BMI_NEXT_ENG_FD_BITS_SHIFT;
 
-       tmp |= NIA_ENG_HWP;
+       tmp |= NIA_ENG_BMI | NIA_BMI_AC_ENQ_FRAME;
        iowrite32be(tmp, &regs->fmbm_rfne);
 
        /* Parser Next Engine NIA */

8< ---

From what I can tell, this works as expected. TO_CPUs from port 8 can
ingress the device with this in place.
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 95f07fcd4f85..e7ec883d5f6b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2531,10 +2531,10 @@  static int mv88e6xxx_setup_port_mode(struct mv88e6xxx_chip *chip, int port)
 		return mv88e6xxx_set_port_mode_normal(chip, port);
 
 	/* Setup CPU port mode depending on its supported tag format */
-	if (chip->info->tag_protocol == DSA_TAG_PROTO_DSA)
+	if (chip->tag_protocol == DSA_TAG_PROTO_DSA)
 		return mv88e6xxx_set_port_mode_dsa(chip, port);
 
-	if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA)
+	if (chip->tag_protocol == DSA_TAG_PROTO_EDSA)
 		return mv88e6xxx_set_port_mode_edsa(chip, port);
 
 	return -EINVAL;
@@ -5564,7 +5564,39 @@  static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds,
 {
 	struct mv88e6xxx_chip *chip = ds->priv;
 
-	return chip->info->tag_protocol;
+	return chip->tag_protocol;
+}
+
+static int mv88e6xxx_change_tag_protocol(struct dsa_switch *ds, int port,
+					 enum dsa_tag_protocol proto)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	enum dsa_tag_protocol old_protocol;
+	int err;
+
+	switch (proto) {
+	case DSA_TAG_PROTO_EDSA:
+		if (chip->info->tag_protocol != DSA_TAG_PROTO_EDSA)
+			dev_warn(chip->dev, "Relying on undocumented EDSA tagging behavior\n");
+
+		break;
+	case DSA_TAG_PROTO_DSA:
+		break;
+	default:
+		return -EPROTONOSUPPORT;
+	}
+
+	old_protocol = chip->tag_protocol;
+	chip->tag_protocol = proto;
+
+	mv88e6xxx_reg_lock(chip);
+	err = mv88e6xxx_setup_port_mode(chip, port);
+	mv88e6xxx_reg_unlock(chip);
+
+	if (err)
+		chip->tag_protocol = old_protocol;
+
+	return err;
 }
 
 static int mv88e6xxx_port_mdb_add(struct dsa_switch *ds, int port,
@@ -6029,6 +6061,7 @@  static int mv88e6xxx_crosschip_lag_leave(struct dsa_switch *ds, int sw_index,
 
 static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.get_tag_protocol	= mv88e6xxx_get_tag_protocol,
+	.change_tag_protocol	= mv88e6xxx_change_tag_protocol,
 	.setup			= mv88e6xxx_setup,
 	.teardown		= mv88e6xxx_teardown,
 	.phylink_validate	= mv88e6xxx_validate,
@@ -6209,6 +6242,8 @@  static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	if (err)
 		goto out;
 
+	chip->tag_protocol = chip->info->tag_protocol;
+
 	mv88e6xxx_phy_init(chip);
 
 	if (chip->info->ops->get_eeprom) {
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index bce6e0dc8535..96b775f3fda2 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -261,6 +261,9 @@  struct mv88e6xxx_region_priv {
 struct mv88e6xxx_chip {
 	const struct mv88e6xxx_info *info;
 
+	/* Currently configured tagging protocol */
+	enum dsa_tag_protocol tag_protocol;
+
 	/* The dsa_switch this private structure is related to */
 	struct dsa_switch *ds;