mbox series

[net-next,0/6] ethtool: add standard FEC statistics

Message ID 20210414034454.1970967-1-kuba@kernel.org
Headers show
Series ethtool: add standard FEC statistics | expand

Message

Jakub Kicinski April 14, 2021, 3:44 a.m. UTC
Hi!

This set adds uAPI for reporting standard FEC statistics, and
implements it in a handful of drivers.

The statistics are taken from the IEEE standard, with one
extra seemingly popular but not standard statistics added.

The implementation is similar to that of the pause frame
statistics, user requests the stats by setting a bit
(ETHTOOL_FLAG_STATS) in the common ethtool header of
ETHTOOL_MSG_FEC_GET.

Since standard defines the statistics per lane what's
reported is both total and per-lane counters:

 # ethtool  -I --show-fec eth0
 FEC parameters for eth0:
 Configured FEC encodings: None
 Active FEC encoding: None
 Statistics:
  corrected_blocks: 256
    Lane 0: 255
    Lane 1: 1
  uncorrectable_blocks: 145
    Lane 0: 128
    Lane 1: 17


The driver implementations are compile-tested only.
I'm also guessing the semantics, so careful review
from maintainers would be much appreciated!

Jakub Kicinski (6):
  ethtool: move ethtool_stats_init
  ethtool: fec_prepare_data() - jump to error handling
  ethtool: add FEC statistics
  bnxt: implement ethtool::get_fec_stats
  sfc: ef10: implement ethtool::get_fec_stats
  mlx5: implement ethtool::get_fec_stats

 Documentation/networking/ethtool-netlink.rst  | 21 +++++
 Documentation/networking/statistics.rst       |  2 +
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 15 ++++
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  |  9 +++
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 28 ++++++-
 .../ethernet/mellanox/mlx5/core/en_stats.h    |  2 +
 drivers/net/ethernet/sfc/ef10.c               | 17 ++++
 drivers/net/ethernet/sfc/ethtool.c            | 10 +++
 drivers/net/ethernet/sfc/net_driver.h         |  3 +
 include/linux/ethtool.h                       | 46 +++++++++++
 include/uapi/linux/ethtool_netlink.h          | 14 ++++
 net/ethtool/fec.c                             | 80 ++++++++++++++++++-
 net/ethtool/pause.c                           |  6 --
 13 files changed, 241 insertions(+), 12 deletions(-)

Comments

Saeed Mahameed April 15, 2021, 6:25 a.m. UTC | #1
On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote:
> ethtool_link_ksettings *);

> +       void    (*get_fec_stats)(struct net_device *dev,

> +                                struct ethtool_fec_stats

> *fec_stats);


why void ? some drivers need to access the FW and it could be an old
FW/device where the fec stats are not supported.
and sometimes e.g. in mlx5 case FW can fail for FW related businesses
:)..
Jakub Kicinski April 15, 2021, 3:21 p.m. UTC | #2
On Wed, 14 Apr 2021 23:25:43 -0700 Saeed Mahameed wrote:
> On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote:

> > ethtool_link_ksettings *);

> > +       void    (*get_fec_stats)(struct net_device *dev,

> > +                                struct ethtool_fec_stats *fec_stats);  

> 

> why void ? some drivers need to access the FW and it could be an old

> FW/device where the fec stats are not supported.


When stats are not supported just returning is fine. Stats are
initialized to -1, core will not dump them into the netlink message 
if driver didn't assign anything.

> and sometimes e.g. in mlx5 case FW can fail for FW related businesses

> :)..


Can do. I was wondering if the entity reading the stats (from user
space) can do anything useful with the error, and didn't really come 
up with anything other than printing an error. Which the kernel can 
do as well. OTOH if there are multiple stats to read and one of them
fails its probably better to return partial results than fail 
the entire op. Therefore I went for no error - if something fails - 
the stats will be missing.

Does that make any sense? Or do you think errors are rare enough that
it's okay if they are fatal? (with the caveat that -EOPNOTSUPP should
be ignored).
Saeed Mahameed April 15, 2021, 10:33 p.m. UTC | #3
On Thu, 2021-04-15 at 08:21 -0700, Jakub Kicinski wrote:
> On Wed, 14 Apr 2021 23:25:43 -0700 Saeed Mahameed wrote:

> > On Tue, 2021-04-13 at 20:44 -0700, Jakub Kicinski wrote:

> > > ethtool_link_ksettings *);

> > > +       void    (*get_fec_stats)(struct net_device *dev,

> > > +                                struct ethtool_fec_stats

> > > *fec_stats);  

> > 

> > why void ? some drivers need to access the FW and it could be an

> > old

> > FW/device where the fec stats are not supported.

> 

> When stats are not supported just returning is fine. Stats are

> initialized to -1, core will not dump them into the netlink message 

> if driver didn't assign anything.

> 

> > and sometimes e.g. in mlx5 case FW can fail for FW related

> > businesses

> > :)..

> 

> Can do. I was wondering if the entity reading the stats (from user

> space) can do anything useful with the error, and didn't really come 

> up with anything other than printing an error. Which the kernel can 

> do as well. OTOH if there are multiple stats to read and one of them

> fails its probably better to return partial results than fail 

> the entire op. Therefore I went for no error - if something fails - 

> the stats will be missing.

> 

> Does that make any sense? Or do you think errors are rare enough that

> it's okay if they are fatal? (with the caveat that -EOPNOTSUPP should

> be ignored).


Agreed, Thanks for the explanation
but you still need to handle the error internally in the driver,
otherwise the command returns garbage or 0 if you didn't check return
status.
Edward Cree April 19, 2021, 12:39 p.m. UTC | #4
On 14/04/2021 04:44, Jakub Kicinski wrote:
> Report what appears to be the standard block counts:

>  - 30.5.1.1.17 aFECCorrectedBlocks

>  - 30.5.1.1.18 aFECUncorrectableBlocks

> 

> Don't report the per-lane symbol counts, if those really

> count symbols they are not what the standard calls for

> (even if symbols seem like the most useful thing to count.)

> 

> Fingers crossed that fec_corrected_errors is not in symbols.

> 

> Signed-off-by: Jakub Kicinski <kuba@kernel.org>


I'm trying to find out whether these count the right thing or not.
Some component documentation says that the per-lane _signal_ "asserts at
 maximum once per FEC coding block".  I haven't yet been able to track
 down how the counters aggregate that, but it would seem to suggest that
 they in fact count blocks and not symbols, and are just misnamed.
Investigation continues.

-ed