mbox series

[00/15] Adding GAUDI NIC code to habanalabs driver

Message ID 20200910161126.30948-1-oded.gabbay@gmail.com
Headers show
Series Adding GAUDI NIC code to habanalabs driver | expand

Message

Oded Gabbay Sept. 10, 2020, 4:11 p.m. UTC
This patch-set adds support for initializing and using the GAUDI NIC ports,
functioning as scale-out interconnect when doing distributed Deep Learning
training. The training can be performed over tens of thousands of GAUDIs
and it is done using the RDMA-over-converged-Ethernet (RoCE) v2 protocol.

Each GAUDI exposes 10x100GbE ports that are designed to scale-out the
inter-GAUDI communication by integrating a complete communication engine
on-die. This native integration allows users to use the same scaling
technology, both inside the server and rack (termed as scale-up), as well
as for scaling across racks (scale-out). The racks can be connected
directly between GAUDI processors, or through any number of standard
Ethernet switches.

The driver exposes the NIC ports to the user as standard Ethernet ports by
registering each port to the networking subsystem. This allows the user to
manage the ports with standard tools such as ifconfig, ethtool, etc. It
also enables us to connect to the Linux networking stack and thus support
standard networking protocols, such as IPv4, IPv6, TCP, etc. In addition,
we can also leverage protocols such as DCB for dynamically configuring
priorities to avoid congestion.

For each NIC port there is a matching QMAN entity. For RoCE, the user
submits workloads to the NIC through the QMAN, same as he does for the
compute engines. For regular Ethernet, the user sends and receives packets
through the standard Ethernet sockets. Those sockets are used only as a
control path. The data path that is used for AI training goes through the
RoCE interface.

It is important to note that there are some limitations and uniqueness
in GAUDI's NIC H/W, compared to other networking adapters that enforced us
to use a less-than-common driver design:

1. The NIC functionality is NOT exposed as different PCI Physical
   Functions. There is a single PF which is used for compute and
   networking, as the main goal of the NIC ports is to be used as
   intra-communication and not as standard network interfaces. This
   implies we can't connect different drivers to handle the networking
   ports because it is the same device, from the kernel POV, as the
   compute. Therefore, we must integrate the networking code into the
   main habanalabs driver.

2. Although our communication engine implements RDMA, and the driver code
   uses well-known RDMA concepts such as QP context, CQ, WQ, etc., the
   GAUDI architecture does NOT support other basic IBverbs concepts, such
   as MR and protection domain. Therefore, we can't connect to the standard
   IBverb infrastructure in the user-space and kernel (rdma-core library
   and infiniband subsystem, respectively) because the standard RDMA s/w
   and tools won't work on our H/W. Instead, we added a new IOCTL to the
   driver's existing IOCTL API. The new IOCTL exposes the available
   NIC control operations to the user (e.g. Create a QP context).

3. The die-on communication engine provides minimal offloading for standard
   Ethernet and TCP/IP protocols, as those are only used for control plane.
   E.g. the packets are copied rather than using descriptors.
   Therefore, the Ethernet performance is quite low compared to standard
   Ethernet adapters.

4. There is no virtualization support per port.

Most or all of the above limitations will hopefully be improved in future
ASIC generations.

Patch-set organization:

- Patches 1 & 2 are just adding some auto-generated register header files
  and NIC-related definitions to the interface between the driver and the
  GAUDI firmware. 

- Patch 3 adds initialization of security restrictions on the NIC engines.

- Patch 4 adds initialization of the NIC QMANs. The QMANs are needed to
  send RDMA packets through the NIC engines.

- Patches 5-11 adds the NIC driver code. It contains the basic Ethernet
  driver and H/W initialization, the NIC PHY driver code and the new NIC
  control IOCTL operations.

- Patch 12-14 adds support for debugfs, ethtool and DCB.

- Patch 15 adds the implementation of the high-level init/fini functions
  and their calls from the common code. This is the patch that actually
  enables the NIC ports and allows the user to work with them.

Thanks,
Oded


Omer Shpigelman (15):
  habanalabs/gaudi: add NIC H/W and registers definitions
  habanalabs/gaudi: add NIC firmware-related definitions
  habanalabs/gaudi: add NIC security configuration
  habanalabs/gaudi: add support for NIC QMANs
  habanalabs/gaudi: add NIC Ethernet support
  habanalabs/gaudi: add NIC PHY code
  habanalabs/gaudi: allow user to get MAC addresses in INFO IOCTL
  habanalabs/gaudi: add a new IOCTL for NIC control operations
  habanalabs/gaudi: add CQ control operations
  habanalabs/gaudi: add WQ control operations
  habanalabs/gaudi: add QP error handling
  habanalabs/gaudi: add debugfs entries for the NIC
  habanalabs/gaudi: Add ethtool support using coresight
  habanalabs/gaudi: support DCB protocol
  habanalabs/gaudi: add NIC init/fini calls from common code

 .../ABI/testing/debugfs-driver-habanalabs     |   69 +
 drivers/misc/habanalabs/common/context.c      |    1 +
 drivers/misc/habanalabs/common/device.c       |   24 +-
 drivers/misc/habanalabs/common/firmware_if.c  |   44 +
 drivers/misc/habanalabs/common/habanalabs.h   |   33 +-
 .../misc/habanalabs/common/habanalabs_drv.c   |   11 +
 .../misc/habanalabs/common/habanalabs_ioctl.c |  151 +-
 drivers/misc/habanalabs/common/pci.c          |    1 +
 drivers/misc/habanalabs/gaudi/Makefile        |    4 +
 drivers/misc/habanalabs/gaudi/gaudi.c         |  958 +++-
 drivers/misc/habanalabs/gaudi/gaudiP.h        |  333 +-
 .../misc/habanalabs/gaudi/gaudi_coresight.c   |  144 +
 drivers/misc/habanalabs/gaudi/gaudi_nic.c     | 4063 +++++++++++++++++
 drivers/misc/habanalabs/gaudi/gaudi_nic.h     |  354 ++
 .../misc/habanalabs/gaudi/gaudi_nic_dcbnl.c   |  108 +
 .../misc/habanalabs/gaudi/gaudi_nic_debugfs.c |  402 ++
 .../misc/habanalabs/gaudi/gaudi_nic_ethtool.c |  582 +++
 drivers/misc/habanalabs/gaudi/gaudi_phy.c     | 1272 ++++++
 .../misc/habanalabs/gaudi/gaudi_security.c    | 3973 ++++++++++++++++
 drivers/misc/habanalabs/goya/goya.c           |   44 +
 .../misc/habanalabs/include/common/cpucp_if.h |   34 +-
 .../include/gaudi/asic_reg/gaudi_regs.h       |   26 +-
 .../include/gaudi/asic_reg/nic0_qm0_masks.h   |  800 ++++
 .../include/gaudi/asic_reg/nic0_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic0_qpc0_masks.h  |  500 ++
 .../include/gaudi/asic_reg/nic0_qpc0_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_qpc1_regs.h   |  710 +++
 .../include/gaudi/asic_reg/nic0_rxb_regs.h    |  508 +++
 .../include/gaudi/asic_reg/nic0_rxe0_masks.h  |  354 ++
 .../include/gaudi/asic_reg/nic0_rxe0_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_rxe1_regs.h   |  158 +
 .../include/gaudi/asic_reg/nic0_stat_regs.h   |  518 +++
 .../include/gaudi/asic_reg/nic0_tmr_regs.h    |  184 +
 .../include/gaudi/asic_reg/nic0_txe0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txe0_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txe1_regs.h   |  264 ++
 .../include/gaudi/asic_reg/nic0_txs0_masks.h  |  336 ++
 .../include/gaudi/asic_reg/nic0_txs0_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic0_txs1_regs.h   |  214 +
 .../include/gaudi/asic_reg/nic1_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic1_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic2_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic3_qm1_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm0_regs.h    |  834 ++++
 .../include/gaudi/asic_reg/nic4_qm1_regs.h    |  834 ++++
 drivers/misc/habanalabs/include/gaudi/gaudi.h |   12 +
 .../habanalabs/include/gaudi/gaudi_fw_if.h    |   24 +
 .../habanalabs/include/gaudi/gaudi_masks.h    |   15 +
 .../include/hw_ip/nic/nic_general.h           |   13 +
 include/uapi/misc/habanalabs.h                |  296 +-
 53 files changed, 27497 insertions(+), 62 deletions(-)
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
 create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
 create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
 create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h

Comments

Oded Gabbay Sept. 10, 2020, 8:16 p.m. UTC | #1
On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
>
> The relevant code needs to live under drivers/net/(ethernet/).
> For one thing our automation won't trigger for drivers in random
> (/misc) part of the tree.

Can you please elaborate on how to do this with a single driver that
is already in misc ?
As I mentioned in the cover letter, we are not developing a
stand-alone NIC. We have a deep-learning accelerator with a NIC
interface.
Therefore, we don't have a separate PCI physical function for the NIC
and I can't have a second driver registering to it.

We did this design based on existing examples in the kernel
(registering to netdev), such as (just a few examples):
1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
2. bnx2fc in drivers/scsi/bnx2fc

Thanks,
Oded
Oded Gabbay Sept. 10, 2020, 8:18 p.m. UTC | #2
On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:
> > +module_param(nic_rx_poll, int, 0444);
> > +MODULE_PARM_DESC(nic_rx_poll,
> > +     "Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");
>
> If your chip does not support IRQ coalescing you can configure polling
> and the timeout via ethtool -C, rather than a module parameter.

Thanks for the pointer, we will take a look at that.

Oded
Andrew Lunn Sept. 10, 2020, 8:19 p.m. UTC | #3
> +static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
> +					struct ethtool_link_ksettings *cmd)
> +{
> +	struct gaudi_nic_device **ptr = netdev_priv(netdev);
> +	struct gaudi_nic_device *gaudi_nic = *ptr;
> +	struct hl_device *hdev = gaudi_nic->hdev;
> +	u32 port = gaudi_nic->port, speed = gaudi_nic->speed;

Please go through the code and fixup Reverse Christmas tree.

> +
> +	cmd->base.speed = speed;
> +	cmd->base.duplex = DUPLEX_FULL;
> +
> +	ethtool_link_ksettings_zero_link_mode(cmd, supported);
> +	ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> +
> +	ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
> +	ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
> +
> +	ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
> +	ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
> +	ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
> +
> +	if (speed == SPEED_100000) {
> +		ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
> +		ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> +
> +		cmd->base.port = PORT_FIBRE;
> +
> +		ethtool_add_mode(cmd, supported, FIBRE);
> +		ethtool_add_mode(cmd, advertising, FIBRE);
> +
> +		ethtool_add_mode(cmd, supported, Backplane);
> +		ethtool_add_mode(cmd, advertising, Backplane);
> +	} else if (speed == SPEED_50000) {
> +		ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
> +		ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
> +		ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
> +	} else {
> +		dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
> +		return -EFAULT;
> +	}
> +
> +	ethtool_add_mode(cmd, supported, Autoneg);
> +
> +	if (gaudi_nic->auto_neg_enable) {
> +		ethtool_add_mode(cmd, advertising, Autoneg);
> +		cmd->base.autoneg = AUTONEG_ENABLE;
> +		if (gaudi_nic->auto_neg_resolved)
> +			ethtool_add_mode(cmd, lp_advertising, Autoneg);
> +	} else {
> +		cmd->base.autoneg = AUTONEG_DISABLE;
> +	}
> +
> +	ethtool_add_mode(cmd, supported, Pause);
> +
> +	if (gaudi_nic->pfc_enable)
> +		ethtool_add_mode(cmd, advertising, Pause);
> +
> +	return 0;
> +}
> +
> +static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
> +				const struct ethtool_link_ksettings *cmd)
> +{
> +	struct gaudi_nic_device **ptr = netdev_priv(netdev);
> +	struct gaudi_nic_device *gaudi_nic = *ptr;
> +	struct hl_device *hdev = gaudi_nic->hdev;
> +	u32 port = gaudi_nic->port;
> +	int rc = 0, speed = cmd->base.speed;
> +	bool auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;

It appears you only support speed and auto_neg. You should check that
all other things which could be configured are empty, e.g. none of the
bits are set in cmd->link_modes.advertising. If you are requested to
configure something which is not supported, you need to return
-EOPNOTSUPP.

	Andrew
Oded Gabbay Sept. 10, 2020, 8:22 p.m. UTC | #4
On Thu, Sep 10, 2020 at 11:19 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > +static int gaudi_nic_get_link_ksettings(struct net_device *netdev,
> > +                                     struct ethtool_link_ksettings *cmd)
> > +{
> > +     struct gaudi_nic_device **ptr = netdev_priv(netdev);
> > +     struct gaudi_nic_device *gaudi_nic = *ptr;
> > +     struct hl_device *hdev = gaudi_nic->hdev;
> > +     u32 port = gaudi_nic->port, speed = gaudi_nic->speed;
>
> Please go through the code and fixup Reverse Christmas tree.
Of course, we will fix this.

>
> > +
> > +     cmd->base.speed = speed;
> > +     cmd->base.duplex = DUPLEX_FULL;
> > +
> > +     ethtool_link_ksettings_zero_link_mode(cmd, supported);
> > +     ethtool_link_ksettings_zero_link_mode(cmd, advertising);
> > +
> > +     ethtool_add_mode(cmd, supported, 100000baseCR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseSR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseKR4_Full);
> > +     ethtool_add_mode(cmd, supported, 100000baseLR4_ER4_Full);
> > +
> > +     ethtool_add_mode(cmd, supported, 50000baseSR2_Full);
> > +     ethtool_add_mode(cmd, supported, 50000baseCR2_Full);
> > +     ethtool_add_mode(cmd, supported, 50000baseKR2_Full);
> > +
> > +     if (speed == SPEED_100000) {
> > +             ethtool_add_mode(cmd, advertising, 100000baseCR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseSR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseKR4_Full);
> > +             ethtool_add_mode(cmd, advertising, 100000baseLR4_ER4_Full);
> > +
> > +             cmd->base.port = PORT_FIBRE;
> > +
> > +             ethtool_add_mode(cmd, supported, FIBRE);
> > +             ethtool_add_mode(cmd, advertising, FIBRE);
> > +
> > +             ethtool_add_mode(cmd, supported, Backplane);
> > +             ethtool_add_mode(cmd, advertising, Backplane);
> > +     } else if (speed == SPEED_50000) {
> > +             ethtool_add_mode(cmd, advertising, 50000baseSR2_Full);
> > +             ethtool_add_mode(cmd, advertising, 50000baseCR2_Full);
> > +             ethtool_add_mode(cmd, advertising, 50000baseKR2_Full);
> > +     } else {
> > +             dev_err(hdev->dev, "unknown speed %d, port %d\n", speed, port);
> > +             return -EFAULT;
> > +     }
> > +
> > +     ethtool_add_mode(cmd, supported, Autoneg);
> > +
> > +     if (gaudi_nic->auto_neg_enable) {
> > +             ethtool_add_mode(cmd, advertising, Autoneg);
> > +             cmd->base.autoneg = AUTONEG_ENABLE;
> > +             if (gaudi_nic->auto_neg_resolved)
> > +                     ethtool_add_mode(cmd, lp_advertising, Autoneg);
> > +     } else {
> > +             cmd->base.autoneg = AUTONEG_DISABLE;
> > +     }
> > +
> > +     ethtool_add_mode(cmd, supported, Pause);
> > +
> > +     if (gaudi_nic->pfc_enable)
> > +             ethtool_add_mode(cmd, advertising, Pause);
> > +
> > +     return 0;
> > +}
> > +
> > +static int gaudi_nic_set_link_ksettings(struct net_device *netdev,
> > +                             const struct ethtool_link_ksettings *cmd)
> > +{
> > +     struct gaudi_nic_device **ptr = netdev_priv(netdev);
> > +     struct gaudi_nic_device *gaudi_nic = *ptr;
> > +     struct hl_device *hdev = gaudi_nic->hdev;
> > +     u32 port = gaudi_nic->port;
> > +     int rc = 0, speed = cmd->base.speed;
> > +     bool auto_neg = cmd->base.autoneg == AUTONEG_ENABLE;
>
> It appears you only support speed and auto_neg. You should check that
> all other things which could be configured are empty, e.g. none of the
> bits are set in cmd->link_modes.advertising. If you are requested to
> configure something which is not supported, you need to return
> -EOPNOTSUPP.
>
>         Andrew

Thanks Andrew,
We will do that and send an updated version.
Oded
Andrew Lunn Sept. 10, 2020, 8:25 p.m. UTC | #5
> Can you please elaborate on how to do this with a single driver that
> is already in misc ?
> As I mentioned in the cover letter, we are not developing a
> stand-alone NIC. We have a deep-learning accelerator with a NIC
> interface.

This sounds like an MFD.

     Andrew
Jakub Kicinski Sept. 10, 2020, 8:28 p.m. UTC | #6
On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:  
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> > >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h  
> >
> > The relevant code needs to live under drivers/net/(ethernet/).
> > For one thing our automation won't trigger for drivers in random
> > (/misc) part of the tree.  
> 
> Can you please elaborate on how to do this with a single driver that
> is already in misc ?
> As I mentioned in the cover letter, we are not developing a
> stand-alone NIC. We have a deep-learning accelerator with a NIC
> interface.
> Therefore, we don't have a separate PCI physical function for the NIC
> and I can't have a second driver registering to it.

Is it not possible to move the files and still build them into a single
module?

> We did this design based on existing examples in the kernel
> (registering to netdev), such as (just a few examples):
> 1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
> 2. bnx2fc in drivers/scsi/bnx2fc
> 
> Thanks,
> Oded
Oded Gabbay Sept. 10, 2020, 8:30 p.m. UTC | #7
On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Can you please elaborate on how to do this with a single driver that
> > is already in misc ?
> > As I mentioned in the cover letter, we are not developing a
> > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > interface.
>
> This sounds like an MFD.
>
>      Andrew

Yes and no. There is only one functionality - training of deep
learning (Accelerating compute operations) :)
The rdma is just our method of scaling-out - our method of
intra-connection between GAUDI devices (similar to NVlink or AMD
crossfire).
So the H/W exposes a single physical function at the PCI level. And
thus Linux can call a single driver for it during the PCI probe.

I hope that in future generations we will improve that, but it is what
it is for GAUDI.
I don't see how to do it otherwise currently but if you have ideas please share.
Oded
Oded Gabbay Sept. 10, 2020, 8:32 p.m. UTC | #8
On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> > > >  create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> > > >  create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
> > >
> > > The relevant code needs to live under drivers/net/(ethernet/).
> > > For one thing our automation won't trigger for drivers in random
> > > (/misc) part of the tree.
> >
> > Can you please elaborate on how to do this with a single driver that
> > is already in misc ?
> > As I mentioned in the cover letter, we are not developing a
> > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > interface.
> > Therefore, we don't have a separate PCI physical function for the NIC
> > and I can't have a second driver registering to it.
>
> Is it not possible to move the files and still build them into a single
> module?
hmm...
I actually didn't try that as I thought it will be very strange and
I'm not familiar with other drivers that build as a single ko but have
files spread out in different subsystems.
I don't feel it is a better option than what we did here.

Will I need to split pull requests to different subsystem maintainers
? For the same driver ?
Sounds to me this is not going to fly.
Thanks,
Oded


>
> > We did this design based on existing examples in the kernel
> > (registering to netdev), such as (just a few examples):
> > 1. sgi-xp driver in drivers/misc/sgi-xp (see file xpnet.c)
> > 2. bnx2fc in drivers/scsi/bnx2fc
> >
> > Thanks,
> > Oded
>
Oded Gabbay Sept. 10, 2020, 8:52 p.m. UTC | #9
On Thu, Sep 10, 2020 at 11:38 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Thu, Sep 10, 2020 at 11:30:33PM +0300, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:25 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > Can you please elaborate on how to do this with a single driver that
> > > > is already in misc ?
> > > > As I mentioned in the cover letter, we are not developing a
> > > > stand-alone NIC. We have a deep-learning accelerator with a NIC
> > > > interface.
> > >
> > > This sounds like an MFD.
> > >
> > >      Andrew
> >
> > Yes and no. There is only one functionality - training of deep
> > learning (Accelerating compute operations) :)
> > The rdma is just our method of scaling-out - our method of
> > intra-connection between GAUDI devices (similar to NVlink or AMD
> > crossfire).
> > So the H/W exposes a single physical function at the PCI level. And
> > thus Linux can call a single driver for it during the PCI probe.
>
> Yes, it probes the MFD driver. The MFD driver then creates platform
> drivers for the sub functions. i.e. it would create an Ethernet
> platform driver. That then gets probed in the usual way. The child
> device can get access to the parent device, if it needs to share
> things, e.g. a device on a bus. This is typically I2C or SPI, but
> there is no reason it cannot be a PCI device.
>
> Go look in drivers/mfd.
>
>       Andrew

I'm slightly familiar with drivers/mfd and as you mentioned, those are
for "simple" devices, which use a bus with different functionality on
them, like I2C with many devices (sensors for various things, etc).
I've never seen anyone doing a PCI device there and frankly, I don't
see the benefit of trying to migrate our complex PCI driver to that
subsystem, if it will even work.
And I would like to reiterate that our NIC ports are highly integrated
with our compute engines.
They "talk" to each other via sync objects inside the SOC, and all of
them are used as part of the training of the deep learning network.
Another example why this is not MFD - when a compute engine gets
stuck, all the NIC ports are going through reset.
So it's not the same as multiple devices that use the same bus or H/W.
It's a single device with some engines that work in harmony.
The bottom line is that we have single functionality and the scale-out
is done via RDMA that is integrated on the device.
We could have chosen other ways to scale-out (like some proprietary
bus) and then would that count as another functionality ? I think not.

So I'm not going to drivers/mfd with our driver. I wish that I had
multiple PCI PF so I could do a proper Ethernet driver but I can't for
this H/W.
And I think that physically splitting the files into two subsystems
will be very hard to maintain and definitely I will want to hear
Greg's opinion on that.

Thanks,
Oded
Oded Gabbay Sept. 10, 2020, 9:15 p.m. UTC | #10
On Fri, Sep 11, 2020 at 12:05 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 9/10/2020 1:32 PM, Oded Gabbay wrote:
> > On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
> >>> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >>>> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
> >>>>>   create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
> >>>>
> >>>> The relevant code needs to live under drivers/net/(ethernet/).
> >>>> For one thing our automation won't trigger for drivers in random
> >>>> (/misc) part of the tree.
> >>>
> >>> Can you please elaborate on how to do this with a single driver that
> >>> is already in misc ?
> >>> As I mentioned in the cover letter, we are not developing a
> >>> stand-alone NIC. We have a deep-learning accelerator with a NIC
> >>> interface.
> >>> Therefore, we don't have a separate PCI physical function for the NIC
> >>> and I can't have a second driver registering to it.
> >>
> >> Is it not possible to move the files and still build them into a single
> >> module?
> > hmm...
> > I actually didn't try that as I thought it will be very strange and
> > I'm not familiar with other drivers that build as a single ko but have
> > files spread out in different subsystems.
> > I don't feel it is a better option than what we did here.
> >
> > Will I need to split pull requests to different subsystem maintainers
> > ? For the same driver ?
> > Sounds to me this is not going to fly.
>
> Not necessarily, you can post your patches to all relevant lists and
> seek maintainer review/acked-by tags from the relevant maintainers. This
> is not unheard of with mlx5 for instance.
Yeah, I see what you are saying, the problem is that sometimes,
because everything is tightly integrated in our SOC, the patches
contain code from common code (common to ALL our ASICs, even those who
don't have NIC at all), GAUDI specific code which is not NIC related
and the NIC code itself.
But I guess that as a last resort if this is a *must* I can do that.
Though I would like to hear Greg's opinion on this as he is my current
maintainer.

Personally I do want to send relevant patches to netdev because I want
to get your expert reviews on them, but still keep the code in a
single location.

>
> Have you considered using notifiers to get your NIC driver registered
> while the NIC code lives in a different module?
Yes, and I prefered to keep it simple. I didn't want to start sending
notifications to the NIC driver every time, for example, I needed to
reset the SOC because a compute engine got stuck. Or vice-versa - when
some error happened in the NIC to start sending notifications to the
common driver.

In addition, from my AMD days, we had a very tough time managing two
drivers that "talk" to each other and manage the same H/W. I'm talking
about amdgpu for graphics and amdkfd for compute (which I was the
maintainer). AMD is working in the past years to unite those two
drivers to get out of that mess. That's why I didn't want to go down
that road.

Thanks,
Oded

> --
> Florian
Florian Fainelli Sept. 10, 2020, 9:23 p.m. UTC | #11
On 9/10/2020 2:15 PM, Oded Gabbay wrote:
> On Fri, Sep 11, 2020 at 12:05 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 9/10/2020 1:32 PM, Oded Gabbay wrote:
>>> On Thu, Sep 10, 2020 at 11:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Thu, 10 Sep 2020 23:16:22 +0300 Oded Gabbay wrote:
>>>>> On Thu, Sep 10, 2020 at 11:01 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>>> On Thu, 10 Sep 2020 19:11:11 +0300 Oded Gabbay wrote:
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_dcbnl.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_debugfs.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_nic_ethtool.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/gaudi/gaudi_phy.c
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_qpc1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxb_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_rxe1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_stat_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_tmr_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txe1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_masks.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic0_txs1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic1_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic2_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic3_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm0_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/gaudi/asic_reg/nic4_qm1_regs.h
>>>>>>>    create mode 100644 drivers/misc/habanalabs/include/hw_ip/nic/nic_general.h
>>>>>>
>>>>>> The relevant code needs to live under drivers/net/(ethernet/).
>>>>>> For one thing our automation won't trigger for drivers in random
>>>>>> (/misc) part of the tree.
>>>>>
>>>>> Can you please elaborate on how to do this with a single driver that
>>>>> is already in misc ?
>>>>> As I mentioned in the cover letter, we are not developing a
>>>>> stand-alone NIC. We have a deep-learning accelerator with a NIC
>>>>> interface.
>>>>> Therefore, we don't have a separate PCI physical function for the NIC
>>>>> and I can't have a second driver registering to it.
>>>>
>>>> Is it not possible to move the files and still build them into a single
>>>> module?
>>> hmm...
>>> I actually didn't try that as I thought it will be very strange and
>>> I'm not familiar with other drivers that build as a single ko but have
>>> files spread out in different subsystems.
>>> I don't feel it is a better option than what we did here.
>>>
>>> Will I need to split pull requests to different subsystem maintainers
>>> ? For the same driver ?
>>> Sounds to me this is not going to fly.
>>
>> Not necessarily, you can post your patches to all relevant lists and
>> seek maintainer review/acked-by tags from the relevant maintainers. This
>> is not unheard of with mlx5 for instance.
> Yeah, I see what you are saying, the problem is that sometimes,
> because everything is tightly integrated in our SOC, the patches
> contain code from common code (common to ALL our ASICs, even those who
> don't have NIC at all), GAUDI specific code which is not NIC related
> and the NIC code itself.
> But I guess that as a last resort if this is a *must* I can do that.
> Though I would like to hear Greg's opinion on this as he is my current
> maintainer.
> 
> Personally I do want to send relevant patches to netdev because I want
> to get your expert reviews on them, but still keep the code in a
> single location.

We do have network drivers sprinkled across the kernel tree already, but 
I would agree that from a networking maintainer perspective this makes 
auditing code harder, you would naturally grep for net/ and drivers/net 
and easily miss arch/uml/ for instance. When you do treewide changes, 
having all your ducklings in the same pond is a lot easier.

There is a possible "risk" with posting a patch series for the 
habanalabs driver to netdev that people will be wondering what this is 
about and completely miss it is about the networking bits. If there is a 
NIC driver under drivers/net then people will start to filter or pay 
attention based on the directory.

> 
>>
>> Have you considered using notifiers to get your NIC driver registered
>> while the NIC code lives in a different module?
> Yes, and I prefered to keep it simple. I didn't want to start sending
> notifications to the NIC driver every time, for example, I needed to
> reset the SOC because a compute engine got stuck. Or vice-versa - when
> some error happened in the NIC to start sending notifications to the
> common driver.
> 
> In addition, from my AMD days, we had a very tough time managing two
> drivers that "talk" to each other and manage the same H/W. I'm talking
> about amdgpu for graphics and amdkfd for compute (which I was the
> maintainer). AMD is working in the past years to unite those two
> drivers to get out of that mess. That's why I didn't want to go down
> that road.

You are trading an indirect call for a direct call, and it does provide 
some nice interface, but it could be challenging to work with given the 
context in which the notifier is called can be problematic. You could 
still have direct module references then, and that would avoid the need 
for notifiers.

You are the driver maintainer, so you definitively have a bigger say in 
the matter than most of us, drive by contributors.
Omer Shpigelman Sept. 14, 2020, 9:52 a.m. UTC | #12
On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:

> > +module_param(nic_rx_poll, int, 0444);

> MODULE_PARM_DESC(nic_rx_poll,

> > +	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");

> 

> If your chip does not support IRQ coalescing you can configure polling and the

> timeout via ethtool -C, rather than a module parameter.


I couldn't find an example for that in other drivers and I didn't see anything regarding polling mode in the parameters description of this ethtool callback.
Can you please specify some pointer for that? Or in other words, what parameter can we use to enable polling/setting the timeout?

Thanks,
Omer
Jakub Kicinski Sept. 14, 2020, 4:47 p.m. UTC | #13
On Mon, 14 Sep 2020 09:52:00 +0000 Omer Shpigelman wrote:
> On Thu, Sep 10, 2020 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 10 Sep 2020 19:11:16 +0300 Oded Gabbay wrote:  
> > > +module_param(nic_rx_poll, int, 0444);  
> > MODULE_PARM_DESC(nic_rx_poll,  
> > > +	"Enable NIC Rx polling mode (0 = no, 1 = yes, default no)");  
> > 
> > If your chip does not support IRQ coalescing you can configure polling and the
> > timeout via ethtool -C, rather than a module parameter.  
> 
> I couldn't find an example for that in other drivers and I didn't see
> anything regarding polling mode in the parameters description of this
> ethtool callback.
> Can you please specify some pointer for that? Or in other words, what
> parameter can we use to enable polling/setting the timeout?

Look at stmmac, hip04_eth..