[net-next,v4,00/12] ethtool: Add support for frame preemption

Message ID	20210626003314.3159402-1-vinicius.gomes@intel.com
Headers	show Return-Path: <netdev-owner@kernel.org> IronPort-SDR: Obom+VQb4pxWp1V+5SUae+snJIleDQzXE4FszBawh6CMHmfqtOUoAhyurPpRkGUXCw7+uP5qKe NvHPP08edE4g== IronPort-SDR: GqdBDBdxUL9Es7PEfXblA2eVd+27IFX+2tzCKGQ3JALDbpTNcWD6agBTwaafUBbiVKe7x4EA9D YIFPDxci/0jw== From: Vinicius Costa Gomes <vinicius.gomes@intel.com> To: netdev@vger.kernel.org Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>, jhs@mojatatu.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, kuba@kernel.org, vladimir.oltean@nxp.com, po.liu@nxp.com, intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com, mkubecek@suse.cz Subject: [PATCH net-next v4 00/12] ethtool: Add support for frame preemption Date: Fri, 25 Jun 2021 17:33:02 -0700 Message-Id: <20210626003314.3159402-1-vinicius.gomes@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	ethtool: Add support for frame preemption \| expand [net-next,v4,00/12] ethtool: Add support for frame preemption [net-next,v4,01/12] ethtool: Add support for configuring frame preemption [net-next,v4,02/12] taprio: Add support for frame preemption offload [net-next,v4,03/12] core: Introduce netdev_tc_map_to_queue_mask() [net-next,v4,04/12] taprio: Replace tc_map_to_queue_mask() [net-next,v4,05/12] mqprio: Add support for frame preemption offload [net-next,v4,06/12] igc: Add support for enabling frame preemption via ethtool [net-next,v4,07/12] igc: Add support for TC_SETUP_PREEMPT [net-next,v4,08/12] igc: Simplify TSN flags handling [net-next,v4,09/12] igc: Add support for setting frame preemption configuration [net-next,v4,10/12] ethtool: Add support for Frame Preemption verification [net-next,v4,11/12] igc: Check incompatible configs for Frame Preemption [net-next,v4,12/12] igc: Add support for Frame Preemption verification

Message ID

20210626003314.3159402-1-vinicius.gomes@intel.com

Headers

IronPort-SDR: Obom+VQb4pxWp1V+5SUae+snJIleDQzXE4FszBawh6CMHmfqtOUoAhyurPpRkGUXCw7+uP5qKe
	NvHPP08edE4g==
IronPort-SDR: GqdBDBdxUL9Es7PEfXblA2eVd+27IFX+2tzCKGQ3JALDbpTNcWD6agBTwaafUBbiVKe7x4EA9D
	YIFPDxci/0jw==
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
To: netdev@vger.kernel.org
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, kuba@kernel.org,
	vladimir.oltean@nxp.com, po.liu@nxp.com,
	intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	mkubecek@suse.cz
Subject: [PATCH net-next v4 00/12] ethtool: Add support for frame preemption
Date: Fri, 25 Jun 2021 17:33:02 -0700
Message-Id: <20210626003314.3159402-1-vinicius.gomes@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

ethtool: Add support for frame preemption | expand

Message

Vinicius Costa Gomes June 26, 2021, 12:33 a.m. UTC

Hi,

When the APIs, now including verification, are fine, I can separate
this series into smaller pieces, to make further review easier. I am
proposing this as one series so it's easier to get the full picture.


Changes from v3:
 - Added early support for sending/receiving support for verification
   frames (Vladimir Oltean). This is a bit more than RFC-quality, but
   adding this so people can see how it fits together with the rest.
   The driver specific bits are interesting because the hardware does
   the absolute minimum, the driver needs to do the heavy lifting.

 - Added support for setting preemptible/express traffic classes via
   tc-mqprio (Vladimir Oltean). mqprio parsing of configuration
   options is... interesting, so comments here are going to be useful,
   I may have missed something.

Changes from v2:
 - Fixed some copy&paste mistakes, documentation formatting and
   slightly improved error reporting (Jakub Kicinski);

Changes from v1:
 - The minimum fragment size configuration was changed to be
   configured in bytes to be more future proof, in case the standard
   changes this (the previous definition was '(X + 1) * 64', X being
   [0..3]) (Michal Kubecek);
 - In taprio, frame preemption is now configured by traffic classes (was
   done by queues) (Jakub Kicinski, Vladimir Oltean);
 - Various netlink protocol validation improvements (Jakub Kicinski);
 - Dropped the IGC register dump for frame preemption registers, until a
   stardandized way of exposing that is agreed (Jakub Kicinski);

Changes from RFC v2:
 - Reorganised the offload enabling/disabling on the driver size;
 - Added a few igc fixes;

Changes from RFC v1:
 - The per-queue preemptible/express setting is moved to applicable
   qdiscs (Jakub Kicinski and others);
 - "min-frag-size" now follows the 802.3br specification more closely,
   it's expressed as X in '64(1 + X) + 4' (Joergen Andreasen);

Another point that should be noted is the addition of the
TC_SETUP_PREEMPT offload type, the idea behind this is to allow other
qdiscs (was thinking of mqprio) to also configure which traffic
classes should be marked as express/preemptible.

Original cover letter (lightly edited):

This is still an RFC because two main reasons, I want to confirm that
this approach (per-queue settings via qdiscs, device settings via
ethtool) looks good, even though there aren't much more options left ;-)
The other reason is that while testing this I found some weirdness
in the driver that I would need a bit more time to investigate.

(In case these patches are not enough to give an idea of how things
work, I can send the userspace patches, of course.)

The idea of this "hybrid" approach is that applications/users would do
the following steps to configure frame preemption:

$ tc qdisc replace dev $IFACE parent root handle 100 taprio \
      num_tc 3 \
      map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      queues 1@0 1@1 2@2 \
      base-time $BASE_TIME \
      sched-entry S 0f 10000000 \
      preempt 1110 \
      flags 0x2 

The "preempt" parameter is the only difference, it configures which
traffic classes are marked as preemptible, in this example, traffic
class 0 is marked as "not preemptible", so it is express, the rest of
the four traffic classes are preemptible.

The next step, of this example, would be to enable frame preemption in
the device, via ethtool, and set the minimum fragment size to 192 bytes:

$ sudo ./ethtool --set-frame-preemption $IFACE fp on min-frag-size 192

Cheers,


Vinicius Costa Gomes (12):
  ethtool: Add support for configuring frame preemption
  taprio: Add support for frame preemption offload
  core: Introduce netdev_tc_map_to_queue_mask()
  taprio: Replace tc_map_to_queue_mask()
  mqprio: Add support for frame preemption offload
  igc: Add support for enabling frame preemption via ethtool
  igc: Add support for TC_SETUP_PREEMPT
  igc: Simplify TSN flags handling
  igc: Add support for setting frame preemption configuration
  ethtool: Add support for Frame Preemption verification
  igc: Check incompatible configs for Frame Preemption
  igc: Add support for Frame Preemption verification

 Documentation/networking/ethtool-netlink.rst |  41 +++
 drivers/net/ethernet/intel/igc/igc.h         |  27 ++
 drivers/net/ethernet/intel/igc/igc_defines.h |  17 ++
 drivers/net/ethernet/intel/igc/igc_ethtool.c |  45 ++++
 drivers/net/ethernet/intel/igc/igc_main.c    | 249 ++++++++++++++++++-
 drivers/net/ethernet/intel/igc/igc_tsn.c     | 127 ++++++----
 drivers/net/ethernet/intel/igc/igc_tsn.h     |   1 +
 include/linux/ethtool.h                      |  24 ++
 include/linux/netdevice.h                    |   2 +
 include/net/pkt_sched.h                      |   4 +
 include/uapi/linux/ethtool_netlink.h         |  19 ++
 include/uapi/linux/pkt_sched.h               |   2 +
 net/core/dev.c                               |  20 ++
 net/ethtool/Makefile                         |   2 +-
 net/ethtool/common.c                         |  25 ++
 net/ethtool/netlink.c                        |  19 ++
 net/ethtool/netlink.h                        |   4 +
 net/ethtool/preempt.c                        | 157 ++++++++++++
 net/sched/sch_mqprio.c                       |  41 ++-
 net/sched/sch_taprio.c                       |  65 +++--
 20 files changed, 815 insertions(+), 76 deletions(-)
 create mode 100644 net/ethtool/preempt.c

Comments

Vladimir Oltean June 27, 2021, 7:43 p.m. UTC | #1

On Fri, Jun 25, 2021 at 05:33:03PM -0700, Vinicius Costa Gomes wrote:
> Frame preemption (described in IEEE 802.3-2018, Section 99 in

> particular) defines the concept of preemptible and express queues. It

> allows traffic from express queues to "interrupt" traffic from

> preemptible queues, which are "resumed" after the express traffic has

> finished transmitting.

> 

> Frame preemption can only be used when both the local device and the

> link partner support it.

> 

> Only parameters for enabling/disabling frame preemption and

> configuring the minimum fragment size are included here. Expressing

> which queues are marked as preemptible is left to mqprio/taprio, as

> having that information there should be easier on the user.

> 

> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

> ---

>  Documentation/networking/ethtool-netlink.rst |  38 +++++

>  include/linux/ethtool.h                      |  22 +++

>  include/uapi/linux/ethtool_netlink.h         |  17 +++

>  net/ethtool/Makefile                         |   2 +-

>  net/ethtool/common.c                         |  25 ++++

>  net/ethtool/netlink.c                        |  19 +++

>  net/ethtool/netlink.h                        |   4 +

>  net/ethtool/preempt.c                        | 146 +++++++++++++++++++

>  8 files changed, 272 insertions(+), 1 deletion(-)

>  create mode 100644 net/ethtool/preempt.c

> 

> diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst

> index 6ea91e41593f..a87f1716944e 100644

> --- a/Documentation/networking/ethtool-netlink.rst

> +++ b/Documentation/networking/ethtool-netlink.rst

> @@ -1477,6 +1477,44 @@ Low and high bounds are inclusive, for example:

>   etherStatsPkts512to1023Octets 512  1023

>   ============================= ==== ====


I think you need to add some extra documentation bits to the

List of message types
=====================

and

Request translation
===================

sections.

>  

> +PREEMPT_GET

> +===========

> +

> +Get information about frame preemption state.

> +

> +Request contents:

> +

> +  ====================================  ======  ==========================

> +  ``ETHTOOL_A_PREEMPT_HEADER``          nested  request header

> +  ====================================  ======  ==========================

> +

> +Request contents:

> +

> +  =====================================  ======  ==========================

> +  ``ETHTOOL_A_PREEMPT_HEADER``           nested  reply header

> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled

> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size

> +  =====================================  ======  ==========================

> +

> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final

> +fragment size that the receiver device supports.

> +

> +PREEMPT_SET

> +===========

> +

> +Sets frame preemption parameters.

> +

> +Request contents:

> +

> +  =====================================  ======  ==========================

> +  ``ETHTOOL_A_CHANNELS_HEADER``          nested  reply header

> +  ``ETHTOOL_A_PREEMPT_ENABLED``          u8      frame preemption enabled

> +  ``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE``    u32     Min additional frag size

> +  =====================================  ======  ==========================

> +

> +``ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE`` configures the minimum non-final

> +fragment size that the receiver device supports.

> +

>  Request translation

>  ===================

>  

> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h

> index 29dbb603bc91..7e449be8f335 100644

> --- a/include/linux/ethtool.h

> +++ b/include/linux/ethtool.h

> @@ -409,6 +409,19 @@ struct ethtool_module_eeprom {

>  	u8	*data;

>  };

>  

> +/**

> + * struct ethtool_fp - Frame Preemption information

> + *

> + * @enabled: Enable frame preemption.

> + * @add_frag_size: Minimum size for additional (non-final) fragments

> + * in bytes, for the value defined in the IEEE 802.3-2018 standard see

> + * ethtool_frag_size_to_mult().

> + */

> +struct ethtool_fp {

> +	u8 enabled;

> +	u32 add_frag_size;


Strange that the verify_disable bit is not in here? I haven't looked at
further patches in detail but I saw in the commit message that you added
support for it, maybe it needs to be squashed with this?

Can we make "enabled" a bool?

> +};

> +

>  /**

>   * struct ethtool_ops - optional netdev operations

>   * @cap_link_lanes_supported: indicates if the driver supports lanes

> @@ -561,6 +574,8 @@ struct ethtool_module_eeprom {

>   *	not report statistics.

>   * @get_fecparam: Get the network device Forward Error Correction parameters.

>   * @set_fecparam: Set the network device Forward Error Correction parameters.

> + * @get_preempt: Get the network device Frame Preemption parameters.

> + * @set_preempt: Set the network device Frame Preemption parameters.

>   * @get_ethtool_phy_stats: Return extended statistics about the PHY device.

>   *	This is only useful if the device maintains PHY statistics and

>   *	cannot use the standard PHY library helpers.

> @@ -675,6 +690,10 @@ struct ethtool_ops {

>  				      struct ethtool_fecparam *);

>  	int	(*set_fecparam)(struct net_device *,

>  				      struct ethtool_fecparam *);

> +	int	(*get_preempt)(struct net_device *,

> +			       struct ethtool_fp *);

> +	int	(*set_preempt)(struct net_device *, struct ethtool_fp *,

> +			       struct netlink_ext_ack *);

>  	void	(*get_ethtool_phy_stats)(struct net_device *,

>  					 struct ethtool_stats *, u64 *);

>  	int	(*get_phy_tunable)(struct net_device *,

> @@ -766,4 +785,7 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,

>   * next string.

>   */

>  extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);

> +

> +u8 ethtool_frag_size_to_mult(u32 frag_size);

> +

>  #endif /* _LINUX_ETHTOOL_H */

> diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h

> index c7135c9c37a5..4600aba1c693 100644

> --- a/include/uapi/linux/ethtool_netlink.h

> +++ b/include/uapi/linux/ethtool_netlink.h

> @@ -44,6 +44,8 @@ enum {

>  	ETHTOOL_MSG_TUNNEL_INFO_GET,

>  	ETHTOOL_MSG_FEC_GET,

>  	ETHTOOL_MSG_FEC_SET,

> +	ETHTOOL_MSG_PREEMPT_GET,

> +	ETHTOOL_MSG_PREEMPT_SET,

>  	ETHTOOL_MSG_MODULE_EEPROM_GET,

>  	ETHTOOL_MSG_STATS_GET,

>  

> @@ -86,6 +88,8 @@ enum {

>  	ETHTOOL_MSG_TUNNEL_INFO_GET_REPLY,

>  	ETHTOOL_MSG_FEC_GET_REPLY,

>  	ETHTOOL_MSG_FEC_NTF,

> +	ETHTOOL_MSG_PREEMPT_GET_REPLY,

> +	ETHTOOL_MSG_PREEMPT_NTF,

>  	ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,

>  	ETHTOOL_MSG_STATS_GET_REPLY,


Correct me if I'm wrong, but enums in uapi should always be added at the
end, otherwise you break value with user space binaries which use
ETHTOOL_MSG_MODULE_EEPROM_GET and are compiled against old kernel
headers.

>  

> @@ -664,6 +668,19 @@ enum {

>  	ETHTOOL_A_FEC_STAT_MAX = (__ETHTOOL_A_FEC_STAT_CNT - 1)

>  };

>  

> +/* FRAME PREEMPTION */

> +

> +enum {

> +	ETHTOOL_A_PREEMPT_UNSPEC,

> +	ETHTOOL_A_PREEMPT_HEADER,			/* nest - _A_HEADER_* */

> +	ETHTOOL_A_PREEMPT_ENABLED,			/* u8 */

> +	ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,		/* u32 */

> +

> +	/* add new constants above here */

> +	__ETHTOOL_A_PREEMPT_CNT,

> +	ETHTOOL_A_PREEMPT_MAX = (__ETHTOOL_A_PREEMPT_CNT - 1)

> +};

> +

>  /* MODULE EEPROM */

>  

>  enum {

> diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile

> index 723c9a8a8cdf..4b84b2d34c7a 100644

> --- a/net/ethtool/Makefile

> +++ b/net/ethtool/Makefile

> @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK)	+= ethtool_nl.o

>  ethtool_nl-y	:= netlink.o bitset.o strset.o linkinfo.o linkmodes.o \

>  		   linkstate.o debug.o wol.o features.o privflags.o rings.o \

>  		   channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \

> -		   tunnels.o fec.o eeprom.o stats.o

> +		   tunnels.o fec.o preempt.o eeprom.o stats.o

> diff --git a/net/ethtool/common.c b/net/ethtool/common.c

> index f9dcbad84788..68d123dd500b 100644

> --- a/net/ethtool/common.c

> +++ b/net/ethtool/common.c

> @@ -579,3 +579,28 @@ ethtool_params_from_link_mode(struct ethtool_link_ksettings *link_ksettings,

>  	link_ksettings->base.duplex = link_info->duplex;

>  }

>  EXPORT_SYMBOL_GPL(ethtool_params_from_link_mode);

> +

> +/**

> + * ethtool_frag_size_to_mult() - Convert from a Frame Preemption

> + * Additional Fragment size in bytes to a multiplier.

> + * @frag_size: minimum non-final fragment size in bytes.

> + *

> + * The multiplier is defined as:

> + *	"A 2-bit integer value used to indicate the minimum size of

> + *	non-final fragments supported by the receiver on the given port

> + *	associated with the local System. This value is expressed in units

> + *	of 64 octets of additional fragment length."

> + *	Equivalent to `30.14.1.7 aMACMergeAddFragSize` from the IEEE 802.3-2018

> + *	standard.

> + *

> + * Return: the multiplier is a number in the [0, 2] interval.

> + */

> +u8 ethtool_frag_size_to_mult(u32 frag_size)

> +{

> +	u8 mult = (frag_size / 64) - 1;

> +

> +	mult = clamp_t(u8, mult, 0, 3);

> +

> +	return mult;


I think it would look better as "return clamp_t(u8, mult, 0, 3);"

> +}

> +EXPORT_SYMBOL_GPL(ethtool_frag_size_to_mult);

> diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c

> index a7346346114f..f4e07b740790 100644

> --- a/net/ethtool/netlink.c

> +++ b/net/ethtool/netlink.c

> @@ -246,6 +246,7 @@ ethnl_default_requests[__ETHTOOL_MSG_USER_CNT] = {

>  	[ETHTOOL_MSG_EEE_GET]		= &ethnl_eee_request_ops,

>  	[ETHTOOL_MSG_FEC_GET]		= &ethnl_fec_request_ops,

>  	[ETHTOOL_MSG_TSINFO_GET]	= &ethnl_tsinfo_request_ops,

> +	[ETHTOOL_MSG_PREEMPT_GET]	= &ethnl_preempt_request_ops,

>  	[ETHTOOL_MSG_MODULE_EEPROM_GET]	= &ethnl_module_eeprom_request_ops,

>  	[ETHTOOL_MSG_STATS_GET]		= &ethnl_stats_request_ops,

>  };

> @@ -561,6 +562,7 @@ ethnl_default_notify_ops[ETHTOOL_MSG_KERNEL_MAX + 1] = {

>  	[ETHTOOL_MSG_PAUSE_NTF]		= &ethnl_pause_request_ops,

>  	[ETHTOOL_MSG_EEE_NTF]		= &ethnl_eee_request_ops,

>  	[ETHTOOL_MSG_FEC_NTF]		= &ethnl_fec_request_ops,

> +	[ETHTOOL_MSG_PREEMPT_NTF]	= &ethnl_preempt_request_ops,

>  };

>  

>  /* default notification handler */

> @@ -654,6 +656,7 @@ static const ethnl_notify_handler_t ethnl_notify_handlers[] = {

>  	[ETHTOOL_MSG_PAUSE_NTF]		= ethnl_default_notify,

>  	[ETHTOOL_MSG_EEE_NTF]		= ethnl_default_notify,

>  	[ETHTOOL_MSG_FEC_NTF]		= ethnl_default_notify,

> +	[ETHTOOL_MSG_PREEMPT_NTF]	= ethnl_default_notify,

>  };

>  

>  void ethtool_notify(struct net_device *dev, unsigned int cmd, const void *data)

> @@ -958,6 +961,22 @@ static const struct genl_ops ethtool_genl_ops[] = {

>  		.policy = ethnl_stats_get_policy,

>  		.maxattr = ARRAY_SIZE(ethnl_stats_get_policy) - 1,

>  	},

> +	{

> +		.cmd	= ETHTOOL_MSG_PREEMPT_GET,

> +		.doit	= ethnl_default_doit,

> +		.start	= ethnl_default_start,

> +		.dumpit	= ethnl_default_dumpit,

> +		.done	= ethnl_default_done,

> +		.policy = ethnl_preempt_get_policy,

> +		.maxattr = ARRAY_SIZE(ethnl_preempt_get_policy) - 1,

> +	},

> +	{

> +		.cmd	= ETHTOOL_MSG_PREEMPT_SET,

> +		.flags	= GENL_UNS_ADMIN_PERM,

> +		.doit	= ethnl_set_preempt,

> +		.policy = ethnl_preempt_set_policy,

> +		.maxattr = ARRAY_SIZE(ethnl_preempt_set_policy) - 1,

> +	},

>  };

>  

>  static const struct genl_multicast_group ethtool_nl_mcgrps[] = {

> diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h

> index 3e25a47fd482..cc90a463a81c 100644

> --- a/net/ethtool/netlink.h

> +++ b/net/ethtool/netlink.h

> @@ -345,6 +345,7 @@ extern const struct ethnl_request_ops ethnl_pause_request_ops;

>  extern const struct ethnl_request_ops ethnl_eee_request_ops;

>  extern const struct ethnl_request_ops ethnl_tsinfo_request_ops;

>  extern const struct ethnl_request_ops ethnl_fec_request_ops;

> +extern const struct ethnl_request_ops ethnl_preempt_request_ops;

>  extern const struct ethnl_request_ops ethnl_module_eeprom_request_ops;

>  extern const struct ethnl_request_ops ethnl_stats_request_ops;

>  

> @@ -381,6 +382,8 @@ extern const struct nla_policy ethnl_tunnel_info_get_policy[ETHTOOL_A_TUNNEL_INF

>  extern const struct nla_policy ethnl_fec_get_policy[ETHTOOL_A_FEC_HEADER + 1];

>  extern const struct nla_policy ethnl_fec_set_policy[ETHTOOL_A_FEC_AUTO + 1];

>  extern const struct nla_policy ethnl_module_eeprom_get_policy[ETHTOOL_A_MODULE_EEPROM_I2C_ADDRESS + 1];

> +extern const struct nla_policy ethnl_preempt_get_policy[ETHTOOL_A_PREEMPT_HEADER + 1];

> +extern const struct nla_policy ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE + 1];

>  extern const struct nla_policy ethnl_stats_get_policy[ETHTOOL_A_STATS_GROUPS + 1];

>  

>  int ethnl_set_linkinfo(struct sk_buff *skb, struct genl_info *info);

> @@ -400,6 +403,7 @@ int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);

>  int ethnl_tunnel_info_start(struct netlink_callback *cb);

>  int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);

>  int ethnl_set_fec(struct sk_buff *skb, struct genl_info *info);

> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info);

>  

>  extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];

>  extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];

> diff --git a/net/ethtool/preempt.c b/net/ethtool/preempt.c

> new file mode 100644

> index 000000000000..4f96d3c2b1d5

> --- /dev/null

> +++ b/net/ethtool/preempt.c

> @@ -0,0 +1,146 @@

> +// SPDX-License-Identifier: GPL-2.0-only

> +

> +#include "netlink.h"

> +#include "common.h"

> +

> +struct preempt_req_info {

> +	struct ethnl_req_info		base;

> +};

> +

> +struct preempt_reply_data {

> +	struct ethnl_reply_data		base;

> +	struct ethtool_fp		fp;

> +};

> +

> +#define PREEMPT_REPDATA(__reply_base) \

> +	container_of(__reply_base, struct preempt_reply_data, base)

> +

> +const struct nla_policy

> +ethnl_preempt_get_policy[] = {

> +	[ETHTOOL_A_PREEMPT_HEADER]		= NLA_POLICY_NESTED(ethnl_header_policy),

> +};

> +

> +static int preempt_prepare_data(const struct ethnl_req_info *req_base,

> +				struct ethnl_reply_data *reply_base,

> +				struct genl_info *info)

> +{

> +	struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);

> +	struct net_device *dev = reply_base->dev;

> +	int ret;

> +

> +	if (!dev->ethtool_ops->get_preempt)

> +		return -EOPNOTSUPP;

> +

> +	ret = ethnl_ops_begin(dev);

> +	if (ret < 0)

> +		return ret;

> +

> +	ret = dev->ethtool_ops->get_preempt(dev, &data->fp);

> +	ethnl_ops_complete(dev);

> +

> +	return ret;

> +}

> +

> +static int preempt_reply_size(const struct ethnl_req_info *req_base,

> +			      const struct ethnl_reply_data *reply_base)

> +{

> +	int len = 0;

> +

> +	len += nla_total_size(sizeof(u8)); /* _PREEMPT_ENABLED */

> +	len += nla_total_size(sizeof(u32)); /* _PREEMPT_ADD_FRAG_SIZE */

> +

> +	return len;

> +}

> +

> +static int preempt_fill_reply(struct sk_buff *skb,

> +			      const struct ethnl_req_info *req_base,

> +			      const struct ethnl_reply_data *reply_base)

> +{

> +	const struct preempt_reply_data *data = PREEMPT_REPDATA(reply_base);

> +	const struct ethtool_fp *preempt = &data->fp;

> +

> +	if (nla_put_u8(skb, ETHTOOL_A_PREEMPT_ENABLED, preempt->enabled))

> +		return -EMSGSIZE;

> +

> +	if (nla_put_u32(skb, ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE,

> +			preempt->add_frag_size))

> +		return -EMSGSIZE;

> +

> +	return 0;

> +}

> +

> +const struct ethnl_request_ops ethnl_preempt_request_ops = {

> +	.request_cmd		= ETHTOOL_MSG_PREEMPT_GET,

> +	.reply_cmd		= ETHTOOL_MSG_PREEMPT_GET_REPLY,

> +	.hdr_attr		= ETHTOOL_A_PREEMPT_HEADER,

> +	.req_info_size		= sizeof(struct preempt_req_info),

> +	.reply_data_size	= sizeof(struct preempt_reply_data),

> +

> +	.prepare_data		= preempt_prepare_data,

> +	.reply_size		= preempt_reply_size,

> +	.fill_reply		= preempt_fill_reply,

> +};

> +

> +const struct nla_policy

> +ethnl_preempt_set_policy[ETHTOOL_A_PREEMPT_MAX + 1] = {

> +	[ETHTOOL_A_PREEMPT_HEADER]			= NLA_POLICY_NESTED(ethnl_header_policy),

> +	[ETHTOOL_A_PREEMPT_ENABLED]			= NLA_POLICY_RANGE(NLA_U8, 0, 1),

> +	[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE]		= { .type = NLA_U32 },

> +};

> +

> +int ethnl_set_preempt(struct sk_buff *skb, struct genl_info *info)

> +{

> +	struct ethnl_req_info req_info = {};

> +	struct nlattr **tb = info->attrs;

> +	struct ethtool_fp preempt = {};

> +	struct net_device *dev;

> +	bool mod = false;

> +	int ret;

> +

> +	ret = ethnl_parse_header_dev_get(&req_info,

> +					 tb[ETHTOOL_A_PREEMPT_HEADER],

> +					 genl_info_net(info), info->extack,

> +					 true);

> +	if (ret < 0)

> +		return ret;

> +	dev = req_info.dev;

> +	ret = -EOPNOTSUPP;


Some new lines around here please? And maybe it would look a bit cleaner
if you could assign "ret = -EOPNOTSUPP" in the "preempt ops not present"
if condition body?

> +	if (!dev->ethtool_ops->get_preempt ||

> +	    !dev->ethtool_ops->set_preempt)

> +		goto out_dev;

> +

> +	rtnl_lock();

> +	ret = ethnl_ops_begin(dev);

> +	if (ret < 0)

> +		goto out_rtnl;

> +

> +	ret = dev->ethtool_ops->get_preempt(dev, &preempt);


I don't know much about the background of ethtool netlink, but why does
the .doit of ETHTOOL_MSG_*_SET go through a getter first? Is it because
all the netlink attributes from the message are optional, and we need to
default to the current state?

> +	if (ret < 0) {

> +		GENL_SET_ERR_MSG(info, "failed to retrieve frame preemption settings");

> +		goto out_ops;

> +	}

> +

> +	ethnl_update_u8(&preempt.enabled,

> +			tb[ETHTOOL_A_PREEMPT_ENABLED], &mod);

> +	ethnl_update_u32(&preempt.add_frag_size,

> +			 tb[ETHTOOL_A_PREEMPT_ADD_FRAG_SIZE], &mod);

> +	ret = 0;


This reinitialization of ret to zero is interesting. It implies
->get_preempt() is allowed to return > 0 as a success error code.
However ->set_preempt() below isn't? (its return value is directly
propagated to callers of ethnl_set_preempt().

> +	if (!mod)

> +		goto out_ops;

> +

> +	ret = dev->ethtool_ops->set_preempt(dev, &preempt, info->extack);

> +	if (ret < 0) {

> +		GENL_SET_ERR_MSG(info, "frame preemption settings update failed");

> +		goto out_ops;

> +	}

> +

> +	ethtool_notify(dev, ETHTOOL_MSG_PREEMPT_NTF, NULL);

> +

> +out_ops:

> +	ethnl_ops_complete(dev);

> +out_rtnl:

> +	rtnl_unlock();

> +out_dev:

> +	dev_put(dev);

> +	return ret;

> +}

> -- 

> 2.32.0

>

Vladimir Oltean June 27, 2021, 7:58 p.m. UTC | #2

On Fri, Jun 25, 2021 at 05:33:04PM -0700, Vinicius Costa Gomes wrote:
> Adds a way to configure which traffic classes are marked as

> preemptible and which are marked as express.

> 

> Even if frame preemption is not a "real" offload, because it can't be

> executed purely in software, having this information near where the

> mapping of traffic classes to queues is specified, makes it,

> hopefully, easier to use.

> 

> taprio will receive the information of which traffic classes are

> marked as express/preemptible, and when offloading frame preemption to

> the driver will convert the information, so the driver receives which

> queues are marked as express/preemptible.

> 

> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

> ---

>  include/linux/netdevice.h      |  1 +

>  include/net/pkt_sched.h        |  4 ++++

>  include/uapi/linux/pkt_sched.h |  1 +

>  net/sched/sch_taprio.c         | 43 ++++++++++++++++++++++++++++++----

>  4 files changed, 44 insertions(+), 5 deletions(-)

> 

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h

> index be1dcceda5e4..af5d4c5b0ad5 100644

> --- a/include/linux/netdevice.h

> +++ b/include/linux/netdevice.h

> @@ -923,6 +923,7 @@ enum tc_setup_type {

>  	TC_SETUP_QDISC_TBF,

>  	TC_SETUP_QDISC_FIFO,

>  	TC_SETUP_QDISC_HTB,

> +	TC_SETUP_PREEMPT,

>  };

>  

>  /* These structures hold the attributes of bpf state that are being passed

> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h

> index 6d7b12cba015..b4cb479d1cf5 100644

> --- a/include/net/pkt_sched.h

> +++ b/include/net/pkt_sched.h

> @@ -178,6 +178,10 @@ struct tc_taprio_qopt_offload {

>  	struct tc_taprio_sched_entry entries[];

>  };

>  

> +struct tc_preempt_qopt_offload {

> +	u32 preemptible_queues;

> +};

> +

>  /* Reference counting */

>  struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload

>  						  *offload);

> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h

> index 79a699f106b1..830ce9c9ec6f 100644

> --- a/include/uapi/linux/pkt_sched.h

> +++ b/include/uapi/linux/pkt_sched.h

> @@ -1241,6 +1241,7 @@ enum {

>  	TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */

>  	TCA_TAPRIO_ATTR_FLAGS, /* u32 */

>  	TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */

> +	TCA_TAPRIO_ATTR_PREEMPT_TCS, /* u32 */

>  	__TCA_TAPRIO_ATTR_MAX,

>  };

>  

> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c

> index 66fe2b82af9a..58586f98c648 100644

> --- a/net/sched/sch_taprio.c

> +++ b/net/sched/sch_taprio.c

> @@ -64,6 +64,7 @@ struct taprio_sched {

>  	struct Qdisc **qdiscs;

>  	struct Qdisc *root;

>  	u32 flags;

> +	u32 preemptible_tcs;

>  	enum tk_offsets tk_offset;

>  	int clockid;

>  	atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+

> @@ -786,6 +787,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {

>  	[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },

>  	[TCA_TAPRIO_ATTR_FLAGS]                      = { .type = NLA_U32 },

>  	[TCA_TAPRIO_ATTR_TXTIME_DELAY]		     = { .type = NLA_U32 },

> +	[TCA_TAPRIO_ATTR_PREEMPT_TCS]                = { .type = NLA_U32 },

>  };

>  

>  static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,

> @@ -1284,6 +1286,7 @@ static int taprio_disable_offload(struct net_device *dev,

>  				  struct netlink_ext_ack *extack)

>  {

>  	const struct net_device_ops *ops = dev->netdev_ops;

> +	struct tc_preempt_qopt_offload preempt = { };

>  	struct tc_taprio_qopt_offload *offload;

>  	int err;

>  

> @@ -1302,13 +1305,15 @@ static int taprio_disable_offload(struct net_device *dev,

>  	offload->enable = 0;

>  

>  	err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);

> -	if (err < 0) {

> +	if (err < 0)

>  		NL_SET_ERR_MSG(extack,

> -			       "Device failed to disable offload");

> -		goto out;

> -	}

> +			       "Device failed to disable taprio offload");

> +

> +	err = ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT, &preempt);

> +	if (err < 0)

> +		NL_SET_ERR_MSG(extack,

> +			       "Device failed to disable frame preemption offload");


First line in taprio_disable_offload() is:

	if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
		return 0;

but you said it yourself below that the preemptible queues thing is
independent of whether you have taprio offload or not (or taprio at
all). So the queues will never be reset back to the eMAC if you don't
use full offload (yes, this includes txtime offload too). In fact, it's
so independent, that I don't even know why we add them to taprio in the
first place :)
I think the argument had to do with the hold/advance commands (other
frame preemption stuff that's already in taprio), but those are really
special and only to be used in the Qbv+Qbu combination, but the pMAC
traffic classes? I don't know... Honestly I thought that me asking to
see preemptible queues implemented for mqprio as well was going to
discourage you, but oh well...

>  

> -out:

>  	taprio_offload_free(offload);

>  

>  	return err;

> @@ -1525,6 +1530,29 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,

>  					       mqprio->prio_tc_map[i]);

>  	}

>  

> +	/* It's valid to enable frame preemption without any kind of

> +	 * offloading being enabled, so keep it separated.

> +	 */

> +	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {

> +		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);

> +		struct tc_preempt_qopt_offload qopt = { };

> +

> +		if (preempt == U32_MAX) {

> +			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");

> +			err = -EINVAL;

> +			goto free_sched;

> +		}


Hmmm, did we somehow agree that at least one traffic class must not be
preemptible? Citation needed.

> +

> +		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);

> +

> +		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,

> +						    &qopt);

> +		if (err)

> +			goto free_sched;

> +

> +		q->preemptible_tcs = preempt;

> +	}

> +

>  	if (FULL_OFFLOAD_IS_ENABLED(q->flags))

>  		err = taprio_enable_offload(dev, q, new_admin, extack);

>  	else

> @@ -1681,6 +1709,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,

>  	 */

>  	q->clockid = -1;

>  	q->flags = TAPRIO_FLAGS_INVALID;

> +	q->preemptible_tcs = U32_MAX;

>  

>  	spin_lock(&taprio_list_lock);

>  	list_add(&q->taprio_list, &taprio_list);

> @@ -1899,6 +1928,10 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)

>  	if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))

>  		goto options_error;

>  

> +	if (q->preemptible_tcs != U32_MAX &&

> +	    nla_put_u32(skb, TCA_TAPRIO_ATTR_PREEMPT_TCS, q->preemptible_tcs))

> +		goto options_error;

> +

>  	if (q->txtime_delay &&

>  	    nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))

>  		goto options_error;

> -- 

> 2.32.0

>

Vladimir Oltean June 27, 2021, 8:02 p.m. UTC | #3

On Fri, Jun 25, 2021 at 05:33:06PM -0700, Vinicius Costa Gomes wrote:
> Replaces tc_map_to_queue_mask() by netdev_tc_map_to_queue_mask() that

> was just introduced.

> 

> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

> ---

>  net/sched/sch_taprio.c | 26 ++++----------------------

>  1 file changed, 4 insertions(+), 22 deletions(-)

> 

> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c

> index 58586f98c648..4e411ca3a9eb 100644

> --- a/net/sched/sch_taprio.c

> +++ b/net/sched/sch_taprio.c

> @@ -1201,25 +1201,6 @@ static void taprio_offload_config_changed(struct taprio_sched *q)

>  	spin_unlock(&q->current_entry_lock);

>  }

>  

> -static u32 tc_map_to_queue_mask(struct net_device *dev, u32 tc_mask)

> -{

> -	u32 i, queue_mask = 0;

> -

> -	for (i = 0; i < dev->num_tc; i++) {

> -		u32 offset, count;

> -

> -		if (!(tc_mask & BIT(i)))

> -			continue;

> -

> -		offset = dev->tc_to_txq[i].offset;

> -		count = dev->tc_to_txq[i].count;

> -

> -		queue_mask |= GENMASK(offset + count - 1, offset);

> -	}

> -

> -	return queue_mask;

> -}

> -

>  static void taprio_sched_to_offload(struct net_device *dev,

>  				    struct sched_gate_list *sched,

>  				    struct tc_taprio_qopt_offload *offload)

> @@ -1236,7 +1217,7 @@ static void taprio_sched_to_offload(struct net_device *dev,

>  

>  		e->command = entry->command;

>  		e->interval = entry->interval;

> -		e->gate_mask = tc_map_to_queue_mask(dev, entry->gate_mask);

> +		e->gate_mask = netdev_tc_map_to_queue_mask(dev, entry->gate_mask);

>  

>  		i++;

>  	}

> @@ -1536,14 +1517,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,

>  	if (tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]) {

>  		u32 preempt = nla_get_u32(tb[TCA_TAPRIO_ATTR_PREEMPT_TCS]);

>  		struct tc_preempt_qopt_offload qopt = { };

> +		u32 all_tcs_mask = GENMASK(mqprio->num_tc, 0);

>  

> -		if (preempt == U32_MAX) {

> +		if ((preempt & all_tcs_mask) == all_tcs_mask) {


Ouch, this patch does more than it says on the box.
If it did only what the commit message said, it could have just as well
been squashed with the previous one (and this extra change squashed with
the "preemptible queues in taprio" patch. Practically it means that
these last two patches should go before the "preemptible queues in taprio" one.

>  			NL_SET_ERR_MSG(extack, "At least one queue must be not be preemptible");

>  			err = -EINVAL;

>  			goto free_sched;

>  		}

>  

> -		qopt.preemptible_queues = tc_map_to_queue_mask(dev, preempt);

> +		qopt.preemptible_queues = netdev_tc_map_to_queue_mask(dev, preempt);

>  

>  		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_PREEMPT,

>  						    &qopt);

> -- 

> 2.32.0

>