mbox series

[RFC,v4,00/27] NVMeTCP Offload ULP and QEDN Device Driver

Message ID 20210429190926.5086-1-smalin@marvell.com
Headers show
Series NVMeTCP Offload ULP and QEDN Device Driver | expand

Message

Shai Malin April 29, 2021, 7:08 p.m. UTC
With the goal of enabling a generic infrastructure that allows NVMe/TCP 
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this 
patch series introduces the nvme-tcp-offload ULP host layer, which will 
be a new transport type called "tcp-offload" and will serve as an 
abstraction layer to work with vendor specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes 
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and 
other transports. The tcp offload was designed so that stack changes are 
kept to a bare minimum: only registering new transports. 
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path
and those that are not offloaded (even on the same device).


The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |			 
* Vendor agnostic transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs) 
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Vendor Specific Driver: *

             |        |             |
           [ qedr ]       
                      |             |
                   [ qede ]
                                    |
                                  [ qedn ]


Performance:
============
With this implementation on top of the Marvell qedn driver (using the
Marvell FastLinQ NIC), we were able to demonstrate the following CPU 
utilization improvement:

On AMD EPYC 7402, 2.80GHz, 28 cores:
- For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate): 
  Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with 
  NVMeTCP offload.

On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores: 
- For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate): 
  Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with 
  NVMeTCP offload.

In addition, we were able to demonstrate the following latency improvement:
- For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
  Improved the average latency from 105 usec with NVMeTCP SW to 39 usec 
  with NVMeTCP offload.
  
  Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec 
  with NVMeTCP offload.

The end-to-end offload latency was measured from fio while running against 
back end of null device.


Upstream plan:
==============
Following this RFC, the series will be sent in a modular way so that changes 
in each part will not impact the previous part.

- Part 1 (Patches 1-7):
  The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

- Part 2 (Patch 8-15): 
  The nvme-tcp-offload patches, will be sent to 
  'linux-nvme@lists.infradead.org'.

- Part 3 (Packet 16-27):
  The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
 

Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing 
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
                the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.

The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.


IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload 
ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
driver and later, the nvme-tcp-offload vendor driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- init_req()
- send_req() - in order to pass the request to the handling of the
               offload driver that shall pass it to the vendor specific device.
- poll_queue()

Once the IO completes, the nvme-tcp-offload vendor driver shall call 
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.


TCP events:
===========
The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
and OOO events.


Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()


The Marvell FastLinQ NIC HW engine:
====================================
The Marvell NIC HW engine is capable of offloading the entire TCP/IP
stack and managing up to 64K connections per PF, already implemented and 
upstream use cases for this include iWARP (by the Marvell qedr driver) 
and iSCSI (by the Marvell qedi driver).
In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
and is able to manage the IO level also in case of TCP re-transmissions
and OOO events.
The HW engine enables direct data placement (including the data digest CRC
calculation and validation) and direct data transmission (including data
digest CRC calculation).


The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver 
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.
  

QEDN Future work:
=================
- Support extended HW resources.
- Digest support.
- Devlink support for device configuration and TCP offload configurations.
- Statistics

 
Long term future work:
======================
- The nvme-tcp-offload ULP target abstraction layer.
- The Marvell nvme-tcp-offload "qednt" target driver.


Changes since RFC v1:
=====================
- Fix nvme_tcp_ofld_ops return values.
- Remove NVMF_TRTYPE_TCP_OFFLOAD.
- Add nvme_tcp_ofld_poll() implementation.
- Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return
  values.

Changes since RFC v2:
=====================
- Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe
  (patches 8-11).
- Fixes in controller and queue level (patches 3-6).
  
Changes since RFC v3:
=====================
- Add the full implementation of the nvme-tcp-offload layer including the 
  new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC
  and timeout).
- Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.
- Add nvme-tcp-offload layer design and optimization changes.
- Add the qedn full implementation for the conn level, IO path and error 
  handling.
- Add qed support for the new AHP HW. 


Arie Gershberg (3):
  nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
    definitions
  nvme-tcp-offload: Add controller level implementation
  nvme-tcp-offload: Add controller level error recovery implementation

Dean Balandin (3):
  nvme-tcp-offload: Add device scan implementation
  nvme-tcp-offload: Add queue level implementation
  nvme-tcp-offload: Add IO level implementation

Nikolay Assa (2):
  qed: Add IP services APIs support
  qedn: Add qedn_claim_dev API support

Omkar Kulkarni (1):
  qed: Add qed-NVMeTCP personality

Prabhakar Kushwaha (6):
  qed: Add support of HW filter block
  qedn: Add connection-level slowpath functionality
  qedn: Add support of configuring HW filter block
  qedn: Add support of Task and SGL
  qedn: Add support of NVME ICReq & ICResp
  qedn: Add support of ASYNC

Shai Malin (12):
  qed: Add NVMeTCP Offload PF Level FW and HW HSI
  qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW Initializations
  nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  nvme-tcp-offload: Add Timeout and ASYNC Support
  qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
  qedn: Add qedn probe
  qedn: Add IRQ and fast-path resources initializations
  qedn: Add IO level nvme_req and fw_cq workqueues
  qedn: Add IO level fastpath functionality
  qedn: Add Connection and IO level recovery flows

 MAINTAINERS                                   |   10 +
 drivers/net/ethernet/qlogic/Kconfig           |    3 +
 drivers/net/ethernet/qlogic/qed/Makefile      |    5 +
 drivers/net/ethernet/qlogic/qed/qed.h         |   16 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +
 drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +
 .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++
 drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-
 drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +
 .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +
 drivers/nvme/Kconfig                          |    1 +
 drivers/nvme/Makefile                         |    1 +
 drivers/nvme/host/Kconfig                     |   16 +
 drivers/nvme/host/Makefile                    |    3 +
 drivers/nvme/host/fabrics.c                   |    7 -
 drivers/nvme/host/fabrics.h                   |    7 +
 drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++
 drivers/nvme/host/tcp-offload.h               |  209 +++
 drivers/nvme/hw/Kconfig                       |    9 +
 drivers/nvme/hw/Makefile                      |    3 +
 drivers/nvme/hw/qedn/Makefile                 |    4 +
 drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++
 drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++
 drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++
 include/linux/qed/common_hsi.h                |    1 +
 include/linux/qed/nvmetcp_common.h            |  616 ++++++++
 include/linux/qed/qed_if.h                    |   22 +
 include/linux/qed/qed_nvmetcp_if.h            |  244 +++
 .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +
 39 files changed, 7947 insertions(+), 25 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
 create mode 100644 include/linux/qed/nvmetcp_common.h
 create mode 100644 include/linux/qed/qed_nvmetcp_if.h
 create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

Comments

Hannes Reinecke May 1, 2021, 12:18 p.m. UTC | #1
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the structure for the NVMeTCP offload common

> layer driver. This module is added under "drivers/nvme/host/" and future

> offload drivers which will register to it will be placed under

> "drivers/nvme/hw".

> This new driver will be enabled by the Kconfig "NVM Express over Fabrics

> TCP offload commmon layer".

> In order to support the new transport type, for host mode, no change is

> needed.

> 

> Each new vendor-specific offload driver will register to this ULP during

> its probe function, by filling out the nvme_tcp_ofld_dev->ops and

> nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev

> with the initialized struct.

> 

> The internal implementation:

> - tcp-offload.h:

>    Includes all common structs and ops to be used and shared by offload

>    drivers.

> 

> - tcp-offload.c:

>    Includes the init function which registers as a NVMf transport just

>    like any other transport.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Dean Balandin <dbalandin@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/host/Kconfig       |  16 +++

>   drivers/nvme/host/Makefile      |   3 +

>   drivers/nvme/host/tcp-offload.c | 126 +++++++++++++++++++

>   drivers/nvme/host/tcp-offload.h | 206 ++++++++++++++++++++++++++++++++

>   4 files changed, 351 insertions(+)

>   create mode 100644 drivers/nvme/host/tcp-offload.c

>   create mode 100644 drivers/nvme/host/tcp-offload.h

> 

It will be tricky to select the correct transport eg when traversing the 
discovery log page; the discovery log page only knows about 'tcp' (not 
'tcp_offload'), so the offload won't be picked up.
But that can we worked on / fixed later on, as it's arguably a policy 
decision.

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 1, 2021, 4:29 p.m. UTC | #2
On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Arie Gershberg <agershberg@marvell.com>

> 

> In this patch, we implement controller level error handling and recovery.

> Upon an error discovered by the ULP or reset controller initiated by the

> nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller

> recovery which includes teardown and re-connect of all queues.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Arie Gershberg <agershberg@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++-

>   drivers/nvme/host/tcp-offload.h |   1 +

>   2 files changed, 137 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c

> index 59e1955e02ec..9082b11c133f 100644

> --- a/drivers/nvme/host/tcp-offload.c

> +++ b/drivers/nvme/host/tcp-offload.c

> @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)

>   }

>   EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);

>   

> +/**

> + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery.

> + * function.

> + * @nctrl:	NVMe controller instance to change to resetting.

> + *

> + * API function that change the controller state to resseting.

> + * Part of the overall controller reset sequence.

> + */

> +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)

> +{

> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))

> +		return;

> +

> +	queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);

> +}

> +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);

> +

>   /**

>    * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event

>    * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.

> @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);

>    */

>   int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)

>   {

> -	/* Placeholder - invoke error recovery flow */

> +	pr_err("nvme-tcp-offload queue error\n");

> +	nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);

>   

>   	return 0;

>   }

> @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)

>   	return rc;

>   }

>   

> +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)

> +{

> +	/* If we are resetting/deleting then do nothing */

> +	if (nctrl->state != NVME_CTRL_CONNECTING) {

> +		WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||

> +			     nctrl->state == NVME_CTRL_LIVE);

> +

> +		return;

> +	}

> +

> +	if (nvmf_should_reconnect(nctrl)) {

> +		dev_info(nctrl->device, "Reconnecting in %d seconds...\n",

> +			 nctrl->opts->reconnect_delay);

> +		queue_delayed_work(nvme_wq,

> +				   &to_tcp_ofld_ctrl(nctrl)->connect_work,

> +				   nctrl->opts->reconnect_delay * HZ);

> +	} else {

> +		dev_info(nctrl->device, "Removing controller...\n");

> +		nvme_delete_ctrl(nctrl);

> +	}

> +}

> +

>   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)

>   {

>   	struct nvmf_ctrl_options *opts = nctrl->opts;

> @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)

>   	/* Placeholder - teardown_io_queues */

>   }

>   

> +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)

> +{

> +	struct nvme_tcp_ofld_ctrl *ctrl =

> +				container_of(to_delayed_work(work),

> +					     struct nvme_tcp_ofld_ctrl,

> +					     connect_work);

> +	struct nvme_ctrl *nctrl = &ctrl->nctrl;

> +

> +	++nctrl->nr_reconnects;

> +

> +	if (ctrl->dev->ops->setup_ctrl(ctrl, false))

> +		goto requeue;

> +

> +	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))

> +		goto release_and_requeue;

> +

> +	dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",

> +		 nctrl->nr_reconnects);

> +

> +	nctrl->nr_reconnects = 0;

> +

> +	return;

> +

> +release_and_requeue:

> +	ctrl->dev->ops->release_ctrl(ctrl);

> +requeue:

> +	dev_info(nctrl->device, "Failed reconnect attempt %d\n",

> +		 nctrl->nr_reconnects);

> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);

> +}

> +

> +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)

> +{

> +	struct nvme_tcp_ofld_ctrl *ctrl =

> +		container_of(work, struct nvme_tcp_ofld_ctrl, err_work);

> +	struct nvme_ctrl *nctrl = &ctrl->nctrl;

> +

> +	nvme_stop_keep_alive(nctrl);

> +	nvme_tcp_ofld_teardown_io_queues(nctrl, false);

> +	/* unquiesce to fail fast pending requests */

> +	nvme_start_queues(nctrl);

> +	nvme_tcp_ofld_teardown_admin_queue(nctrl, false);

> +	blk_mq_unquiesce_queue(nctrl->admin_q);

> +

> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {

> +		/* state change failure is ok if we started nctrl delete */

> +		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&

> +			     nctrl->state != NVME_CTRL_DELETING_NOIO);

> +

> +		return;

> +	}

> +

> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);

> +}

> +

>   static void

>   nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)

>   {

> -	/* Placeholder - err_work and connect_work */

> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);

> +

> +	cancel_work_sync(&ctrl->err_work);

> +	cancel_delayed_work_sync(&ctrl->connect_work);

>   	nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);

>   	blk_mq_quiesce_queue(nctrl->admin_q);

>   	if (shutdown)

> @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)

>   	nvme_tcp_ofld_teardown_ctrl(nctrl, true);

>   }

>   

> +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)

> +{

> +	struct nvme_ctrl *nctrl =

> +		container_of(work, struct nvme_ctrl, reset_work);

> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);

> +

> +	nvme_stop_ctrl(nctrl);

> +	nvme_tcp_ofld_teardown_ctrl(nctrl, false);

> +

> +	if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {

> +		/* state change failure is ok if we started ctrl delete */

> +		WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&

> +			     nctrl->state != NVME_CTRL_DELETING_NOIO);

> +

> +		return;

> +	}

> +

> +	if (ctrl->dev->ops->setup_ctrl(ctrl, false))

> +		goto out_fail;

> +

> +	if (nvme_tcp_ofld_setup_ctrl(nctrl, false))

> +		goto release_ctrl;

> +

> +	return;

> +

> +release_ctrl:

> +	ctrl->dev->ops->release_ctrl(ctrl);

> +out_fail:

> +	++nctrl->nr_reconnects;

> +	nvme_tcp_ofld_reconnect_or_remove(nctrl);

> +}

> +

>   static int

>   nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,

>   			   struct request *rq,

> @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)

>   			     opts->nr_poll_queues + 1;

>   	nctrl->sqsize = opts->queue_size - 1;

>   	nctrl->kato = opts->kato;

> +	INIT_DELAYED_WORK(&ctrl->connect_work,

> +			  nvme_tcp_ofld_reconnect_ctrl_work);

> +	INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);

> +	INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);

>   	if (!(opts->mask & NVMF_OPT_TRSVCID)) {

>   		opts->trsvcid =

>   			kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);

> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h

> index 9fd270240eaa..b23b1d7ea6fa 100644

> --- a/drivers/nvme/host/tcp-offload.h

> +++ b/drivers/nvme/host/tcp-offload.h

> @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops {

>   /* Exported functions for lower vendor specific offload drivers */

>   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);

>   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);

> +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 1, 2021, 4:45 p.m. UTC | #3
On 4/29/21 9:09 PM, Shai Malin wrote:
> In this patch, we present the nvme-tcp-offload timeout support

> nvme_tcp_ofld_timeout() and ASYNC support

> nvme_tcp_ofld_submit_async_event().

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/host/tcp-offload.c | 85 ++++++++++++++++++++++++++++++++-

>   drivers/nvme/host/tcp-offload.h |  2 +

>   2 files changed, 86 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c

> index 0cdf5a432208..1d62f921f109 100644

> --- a/drivers/nvme/host/tcp-offload.c

> +++ b/drivers/nvme/host/tcp-offload.c

> @@ -133,6 +133,26 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,

>   		nvme_complete_rq(rq);

>   }

>   

> +/**

> + * nvme_tcp_ofld_async_req_done() - NVMeTCP Offload request done callback

> + * function for async request. Pointed to by nvme_tcp_ofld_req->done.

> + * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.

> + * @req:	NVMeTCP offload request to complete.

> + * @result:     The nvme_result.

> + * @status:     The completion status.

> + *

> + * API function that allows the vendor specific offload driver to report request

> + * completions to the common offload layer.

> + */

> +void nvme_tcp_ofld_async_req_done(struct nvme_tcp_ofld_req *req,

> +				  union nvme_result *result, __le16 status)

> +{

> +	struct nvme_tcp_ofld_queue *queue = req->queue;

> +	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;

> +

> +	nvme_complete_async_event(&ctrl->nctrl, status, result);

> +}

> +

>   struct nvme_tcp_ofld_dev *

>   nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)

>   {

> @@ -719,7 +739,23 @@ void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)

>   

>   static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)

>   {

> -	/* Placeholder - submit_async_event */

> +	struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(arg);

> +	struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];

> +	struct nvme_tcp_ofld_dev *dev = queue->dev;

> +	struct nvme_tcp_ofld_ops *ops = dev->ops;

> +

> +	ctrl->async_req.nvme_cmd.common.opcode = nvme_admin_async_event;

> +	ctrl->async_req.nvme_cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;

> +	ctrl->async_req.nvme_cmd.common.flags |= NVME_CMD_SGL_METABUF;

> +

> +	nvme_tcp_ofld_set_sg_null(&ctrl->async_req.nvme_cmd);

> +

> +	ctrl->async_req.async = true;

> +	ctrl->async_req.queue = queue;

> +	ctrl->async_req.last = true;

> +	ctrl->async_req.done = nvme_tcp_ofld_async_req_done;

> +

> +	ops->send_req(&ctrl->async_req);

>   }

>   

>   static void

> @@ -1024,6 +1060,51 @@ static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)

>   	return ops->poll_queue(queue);

>   }

>   

> +static void nvme_tcp_ofld_complete_timed_out(struct request *rq)

> +{

> +	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);

> +	struct nvme_ctrl *nctrl = &req->queue->ctrl->nctrl;

> +

> +	nvme_tcp_ofld_stop_queue(nctrl, nvme_tcp_ofld_qid(req->queue));

> +	if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {

> +		nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;

> +		blk_mq_complete_request(rq);

> +	}

> +}

> +

> +static enum blk_eh_timer_return nvme_tcp_ofld_timeout(struct request *rq, bool reserved)

> +{

> +	struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);

> +	struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;

> +

> +	dev_warn(ctrl->nctrl.device,

> +		 "queue %d: timeout request %#x type %d\n",

> +		 nvme_tcp_ofld_qid(req->queue), rq->tag, req->nvme_cmd.common.opcode);

> +

> +	if (ctrl->nctrl.state != NVME_CTRL_LIVE) {

> +		/*

> +		 * If we are resetting, connecting or deleting we should

> +		 * complete immediately because we may block controller

> +		 * teardown or setup sequence

> +		 * - ctrl disable/shutdown fabrics requests

> +		 * - connect requests

> +		 * - initialization admin requests

> +		 * - I/O requests that entered after unquiescing and

> +		 *   the controller stopped responding

> +		 *

> +		 * All other requests should be cancelled by the error

> +		 * recovery work, so it's fine that we fail it here.

> +		 */

> +		nvme_tcp_ofld_complete_timed_out(rq);

> +

> +		return BLK_EH_DONE;

> +	}


And this particular error code has been causing _so_ _many_ issues 
during testing, that I'd rather get rid of it altogether.
But probably not your fault, your just copying what tcp and rdma is doing.

> +

> +	nvme_tcp_ofld_error_recovery(&ctrl->nctrl);

> +

> +	return BLK_EH_RESET_TIMER;

> +}

> +

>   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {

>   	.queue_rq	= nvme_tcp_ofld_queue_rq,

>   	.commit_rqs     = nvme_tcp_ofld_commit_rqs,

> @@ -1031,6 +1112,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {

>   	.init_request	= nvme_tcp_ofld_init_request,

>   	.exit_request	= nvme_tcp_ofld_exit_request,

>   	.init_hctx	= nvme_tcp_ofld_init_hctx,

> +	.timeout	= nvme_tcp_ofld_timeout,

>   	.map_queues	= nvme_tcp_ofld_map_queues,

>   	.poll		= nvme_tcp_ofld_poll,

>   };

> @@ -1041,6 +1123,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {

>   	.init_request	= nvme_tcp_ofld_init_request,

>   	.exit_request	= nvme_tcp_ofld_exit_request,

>   	.init_hctx	= nvme_tcp_ofld_init_admin_hctx,

> +	.timeout	= nvme_tcp_ofld_timeout,

>   };

>   

>   static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {

> diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h

> index d82645fcf9da..275a7e2d9d8a 100644

> --- a/drivers/nvme/host/tcp-offload.h

> +++ b/drivers/nvme/host/tcp-offload.h

> @@ -110,6 +110,8 @@ struct nvme_tcp_ofld_ctrl {

>   	/* Connectivity params */

>   	struct nvme_tcp_ofld_ctrl_con_params conn_params;

>   

> +	struct nvme_tcp_ofld_req async_req;

> +

>   	/* Vendor specific driver context */

>   	void *private_data;

>   };

> 

So:

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 1, 2021, 4:47 p.m. UTC | #4
On 4/29/21 9:08 PM, Shai Malin wrote:
> With the goal of enabling a generic infrastructure that allows NVMe/TCP

> offload devices like NICs to seamlessly plug into the NVMe-oF stack, this

> patch series introduces the nvme-tcp-offload ULP host layer, which will

> be a new transport type called "tcp-offload" and will serve as an

> abstraction layer to work with vendor specific nvme-tcp offload drivers.

> 

> NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes

> both the TCP level and the NVMeTCP level.

> 

> The nvme-tcp-offload transport can co-exist with the existing tcp and

> other transports. The tcp offload was designed so that stack changes are

> kept to a bare minimum: only registering new transports.

> All other APIs, ops etc. are identical to the regular tcp transport.

> Representing the TCP offload as a new transport allows clear and manageable

> differentiation between the connections which should use the offload path

> and those that are not offloaded (even on the same device).

> 

> 

> The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

> 

> * NVMe layer: *

> 

>         [ nvme/nvme-fabrics/blk-mq ]

>               |

>          (nvme API and blk-mq API)

>               |

>               |			

> * Vendor agnostic transport layer: *

> 

>        [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]

>               |        |             |

>             (Verbs)

>               |        |             |

>               |     (Socket)

>               |        |             |

>               |        |        (nvme-tcp-offload API)

>               |        |             |

>               |        |             |

> * Vendor Specific Driver: *

> 

>               |        |             |

>             [ qedr ]

>                        |             |

>                     [ qede ]

>                                      |

>                                    [ qedn ]

> 

> 

> Performance:

> ============

> With this implementation on top of the Marvell qedn driver (using the

> Marvell FastLinQ NIC), we were able to demonstrate the following CPU

> utilization improvement:

> 

> On AMD EPYC 7402, 2.80GHz, 28 cores:

> - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):

>    Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with

>    NVMeTCP offload.

> 

> On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:

> - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):

>    Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with

>    NVMeTCP offload.

> 

> In addition, we were able to demonstrate the following latency improvement:

> - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):

>    Improved the average latency from 105 usec with NVMeTCP SW to 39 usec

>    with NVMeTCP offload.

>    

>    Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec

>    with NVMeTCP offload.

> 

> The end-to-end offload latency was measured from fio while running against

> back end of null device.

> 

> 

> Upstream plan:

> ==============

> Following this RFC, the series will be sent in a modular way so that changes

> in each part will not impact the previous part.

> 

> - Part 1 (Patches 1-7):

>    The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

> 

> - Part 2 (Patch 8-15):

>    The nvme-tcp-offload patches, will be sent to

>    'linux-nvme@lists.infradead.org'.

> 

> - Part 3 (Packet 16-27):

>    The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.

>   

> 

> Queue Initialization Design:

> ============================

> The nvme-tcp-offload ULP module shall register with the existing

> nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.

> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> with the following ops:

> - claim_dev() - in order to resolve the route to the target according to

>                  the paired net_dev.

> - create_queue() - in order to create offloaded nvme-tcp queue.

> 

> The nvme-tcp-offload ULP module shall manage all the controller level

> functionalities, call claim_dev and based on the return values shall call

> the relevant module create_queue in order to create the admin queue and

> the IO queues.

> 

> 

> IO-path Design:

> ===============

> The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload

> ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor

> driver and later, the nvme-tcp-offload vendor driver returns the request

> completion (the IO completion).

> No additional handling is needed in between; this design will reduce the

> CPU utilization as we will describe below.

> 

> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> with the following IO-path ops:

> - init_req()

> - send_req() - in order to pass the request to the handling of the

>                 offload driver that shall pass it to the vendor specific device.

> - poll_queue()

> 

> Once the IO completes, the nvme-tcp-offload vendor driver shall call

> command.done() that will invoke the nvme-tcp-offload ULP layer to

> complete the request.

> 

> 

> TCP events:

> ===========

> The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions

> and OOO events.

> 

> 

> Teardown and errors:

> ====================

> In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall

> call the nvme_tcp_ofld_report_queue_err.

> The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> with the following teardown ops:

> - drain_queue()

> - destroy_queue()

> 

> 

> The Marvell FastLinQ NIC HW engine:

> ====================================

> The Marvell NIC HW engine is capable of offloading the entire TCP/IP

> stack and managing up to 64K connections per PF, already implemented and

> upstream use cases for this include iWARP (by the Marvell qedr driver)

> and iSCSI (by the Marvell qedi driver).

> In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer

> and is able to manage the IO level also in case of TCP re-transmissions

> and OOO events.

> The HW engine enables direct data placement (including the data digest CRC

> calculation and validation) and direct data transmission (including data

> digest CRC calculation).

> 

> 

> The Marvell qedn driver:

> ========================

> The new driver will be added under "drivers/nvme/hw" and will be enabled

> by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

> As part of the qedn init, the driver will register as a pci device driver

> and will work with the Marvell fastlinQ NIC.

> As part of the probe, the driver will register to the nvme_tcp_offload

> (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other

> "qed_*_ops" which are used by the qede, qedr, qedf and qedi device

> drivers.

>    

> 

> QEDN Future work:

> =================

> - Support extended HW resources.

> - Digest support.

> - Devlink support for device configuration and TCP offload configurations.

> - Statistics

> 

>   

> Long term future work:

> ======================

> - The nvme-tcp-offload ULP target abstraction layer.

> - The Marvell nvme-tcp-offload "qednt" target driver.

> 

> 

> Changes since RFC v1:

> =====================

> - Fix nvme_tcp_ofld_ops return values.

> - Remove NVMF_TRTYPE_TCP_OFFLOAD.

> - Add nvme_tcp_ofld_poll() implementation.

> - Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return

>    values.

> 

> Changes since RFC v2:

> =====================

> - Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe

>    (patches 8-11).

> - Fixes in controller and queue level (patches 3-6).

>    

> Changes since RFC v3:

> =====================

> - Add the full implementation of the nvme-tcp-offload layer including the

>    new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC

>    and timeout).

> - Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.

> - Add nvme-tcp-offload layer design and optimization changes.

> - Add the qedn full implementation for the conn level, IO path and error

>    handling.

> - Add qed support for the new AHP HW.

> 

> 

> Arie Gershberg (3):

>    nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS

>      definitions

>    nvme-tcp-offload: Add controller level implementation

>    nvme-tcp-offload: Add controller level error recovery implementation

> 

> Dean Balandin (3):

>    nvme-tcp-offload: Add device scan implementation

>    nvme-tcp-offload: Add queue level implementation

>    nvme-tcp-offload: Add IO level implementation

> 

> Nikolay Assa (2):

>    qed: Add IP services APIs support

>    qedn: Add qedn_claim_dev API support

> 

> Omkar Kulkarni (1):

>    qed: Add qed-NVMeTCP personality

> 

> Prabhakar Kushwaha (6):

>    qed: Add support of HW filter block

>    qedn: Add connection-level slowpath functionality

>    qedn: Add support of configuring HW filter block

>    qedn: Add support of Task and SGL

>    qedn: Add support of NVME ICReq & ICResp

>    qedn: Add support of ASYNC

> 

> Shai Malin (12):

>    qed: Add NVMeTCP Offload PF Level FW and HW HSI

>    qed: Add NVMeTCP Offload Connection Level FW and HW HSI

>    qed: Add NVMeTCP Offload IO Level FW and HW HSI

>    qed: Add NVMeTCP Offload IO Level FW Initializations

>    nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP

>    nvme-tcp-offload: Add Timeout and ASYNC Support

>    qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver

>    qedn: Add qedn probe

>    qedn: Add IRQ and fast-path resources initializations

>    qedn: Add IO level nvme_req and fw_cq workqueues

>    qedn: Add IO level fastpath functionality

>    qedn: Add Connection and IO level recovery flows

> 

>   MAINTAINERS                                   |   10 +

>   drivers/net/ethernet/qlogic/Kconfig           |    3 +

>   drivers/net/ethernet/qlogic/qed/Makefile      |    5 +

>   drivers/net/ethernet/qlogic/qed/qed.h         |   16 +

>   drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +

>   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +

>   drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-

>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-

>   drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-

>   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +

>   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++

>   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++

>   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +

>   .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++

>   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-

>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +

>   .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +

>   drivers/nvme/Kconfig                          |    1 +

>   drivers/nvme/Makefile                         |    1 +

>   drivers/nvme/host/Kconfig                     |   16 +

>   drivers/nvme/host/Makefile                    |    3 +

>   drivers/nvme/host/fabrics.c                   |    7 -

>   drivers/nvme/host/fabrics.h                   |    7 +

>   drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++

>   drivers/nvme/host/tcp-offload.h               |  209 +++

>   drivers/nvme/hw/Kconfig                       |    9 +

>   drivers/nvme/hw/Makefile                      |    3 +

>   drivers/nvme/hw/qedn/Makefile                 |    4 +

>   drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++

>   drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++

>   drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++

>   drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++

>   include/linux/qed/common_hsi.h                |    1 +

>   include/linux/qed/nvmetcp_common.h            |  616 ++++++++

>   include/linux/qed/qed_if.h                    |   22 +

>   include/linux/qed/qed_nvmetcp_if.h            |  244 +++

>   .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +

>   39 files changed, 7947 insertions(+), 25 deletions(-)

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

>   create mode 100644 drivers/nvme/host/tcp-offload.c

>   create mode 100644 drivers/nvme/host/tcp-offload.h

>   create mode 100644 drivers/nvme/hw/Kconfig

>   create mode 100644 drivers/nvme/hw/Makefile

>   create mode 100644 drivers/nvme/hw/qedn/Makefile

>   create mode 100644 drivers/nvme/hw/qedn/qedn.h

>   create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c

>   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

>   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

>   create mode 100644 include/linux/qed/nvmetcp_common.h

>   create mode 100644 include/linux/qed/qed_nvmetcp_if.h

>   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

> 

I would structure this patchset slightly different, in putting the 
NVMe-oF implementation at the start of the patchset; this will be where 
you get most of the comment, and any change there will potentially 
reflect back on the driver implementation, too.

Something to consider for the next round.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 1, 2021, 4:50 p.m. UTC | #5
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP device and PF level HSI and HSI

> functionality in order to initialize and interact with the HW device.

> 

> This patch is based on the qede, qedr, qedi, qedf drivers HSI.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Dean Balandin <dbalandin@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> ---

>   drivers/net/ethernet/qlogic/Kconfig           |   3 +

>   drivers/net/ethernet/qlogic/qed/Makefile      |   2 +

>   drivers/net/ethernet/qlogic/qed/qed.h         |   3 +

>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |   1 +

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 282 ++++++++++++++++++

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  51 ++++

>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   2 +

>   include/linux/qed/common_hsi.h                |   1 +

>   include/linux/qed/nvmetcp_common.h            |  54 ++++

>   include/linux/qed/qed_if.h                    |  22 ++

>   include/linux/qed/qed_nvmetcp_if.h            |  72 +++++

>   11 files changed, 493 insertions(+)

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

>   create mode 100644 include/linux/qed/nvmetcp_common.h

>   create mode 100644 include/linux/qed/qed_nvmetcp_if.h

> 

> diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig

> index 6b5ddb07ee83..98f430905ffa 100644

> --- a/drivers/net/ethernet/qlogic/Kconfig

> +++ b/drivers/net/ethernet/qlogic/Kconfig

> @@ -110,6 +110,9 @@ config QED_RDMA

>   config QED_ISCSI

>   	bool

>   

> +config QED_NVMETCP

> +	bool

> +

>   config QED_FCOE

>   	bool

>   

> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile

> index 8251755ec18c..7cb0db67ba5b 100644

> --- a/drivers/net/ethernet/qlogic/qed/Makefile

> +++ b/drivers/net/ethernet/qlogic/qed/Makefile

> @@ -28,6 +28,8 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o

>   qed-$(CONFIG_QED_LL2) += qed_ll2.o

>   qed-$(CONFIG_QED_OOO) += qed_ooo.o

>   

> +qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o

> +

>   qed-$(CONFIG_QED_RDMA) +=	\

>   	qed_iwarp.o		\

>   	qed_rdma.o		\

> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h

> index a20cb8a0c377..91d4635009ab 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed.h

> @@ -240,6 +240,7 @@ enum QED_FEATURE {

>   	QED_VF,

>   	QED_RDMA_CNQ,

>   	QED_ISCSI_CQ,

> +	QED_NVMETCP_CQ = QED_ISCSI_CQ,

>   	QED_FCOE_CQ,

>   	QED_VF_L2_QUE,

>   	QED_MAX_FEATURES,

> @@ -592,6 +593,7 @@ struct qed_hwfn {

>   	struct qed_ooo_info		*p_ooo_info;

>   	struct qed_rdma_info		*p_rdma_info;

>   	struct qed_iscsi_info		*p_iscsi_info;

> +	struct qed_nvmetcp_info		*p_nvmetcp_info;

>   	struct qed_fcoe_info		*p_fcoe_info;

>   	struct qed_pf_params		pf_params;

>   

> @@ -828,6 +830,7 @@ struct qed_dev {

>   		struct qed_eth_cb_ops		*eth;

>   		struct qed_fcoe_cb_ops		*fcoe;

>   		struct qed_iscsi_cb_ops		*iscsi;

> +		struct qed_nvmetcp_cb_ops	*nvmetcp;

>   	} protocol_ops;

>   	void				*ops_cookie;

>   

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> index 559df9f4d656..24472f6a83c2 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> @@ -20,6 +20,7 @@

>   #include <linux/qed/fcoe_common.h>

>   #include <linux/qed/eth_common.h>

>   #include <linux/qed/iscsi_common.h>

> +#include <linux/qed/nvmetcp_common.h>

>   #include <linux/qed/iwarp_common.h>

>   #include <linux/qed/rdma_common.h>

>   #include <linux/qed/roce_common.h>

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> new file mode 100644

> index 000000000000..da3b5002d216

> --- /dev/null

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> @@ -0,0 +1,282 @@

> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#include <linux/types.h>

> +#include <asm/byteorder.h>

> +#include <asm/param.h>

> +#include <linux/delay.h>

> +#include <linux/dma-mapping.h>

> +#include <linux/etherdevice.h>

> +#include <linux/kernel.h>

> +#include <linux/log2.h>

> +#include <linux/module.h>

> +#include <linux/pci.h>

> +#include <linux/stddef.h>

> +#include <linux/string.h>

> +#include <linux/errno.h>

> +#include <linux/list.h>

> +#include <linux/qed/qed_nvmetcp_if.h>

> +#include "qed.h"

> +#include "qed_cxt.h"

> +#include "qed_dev_api.h"

> +#include "qed_hsi.h"

> +#include "qed_hw.h"

> +#include "qed_int.h"

> +#include "qed_nvmetcp.h"

> +#include "qed_ll2.h"

> +#include "qed_mcp.h"

> +#include "qed_sp.h"

> +#include "qed_reg_addr.h"

> +

> +static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,

> +				   u16 echo, union event_ring_data *data,

> +				   u8 fw_return_code)

> +{

> +	if (p_hwfn->p_nvmetcp_info->event_cb) {

> +		struct qed_nvmetcp_info *p_nvmetcp = p_hwfn->p_nvmetcp_info;

> +

> +		return p_nvmetcp->event_cb(p_nvmetcp->event_context,

> +					 fw_event_code, data);

> +	} else {

> +		DP_NOTICE(p_hwfn, "nvmetcp async completion is not set\n");

> +

> +		return -EINVAL;

> +	}

> +}

> +

> +static int qed_sp_nvmetcp_func_start(struct qed_hwfn *p_hwfn,

> +				     enum spq_mode comp_mode,

> +				     struct qed_spq_comp_cb *p_comp_addr,

> +				     void *event_context,

> +				     nvmetcp_event_cb_t async_event_cb)

> +{

> +	struct nvmetcp_init_ramrod_params *p_ramrod = NULL;

> +	struct qed_nvmetcp_pf_params *p_params = NULL;

> +	struct scsi_init_func_queues *p_queue = NULL;

> +	struct nvmetcp_spe_func_init *p_init = NULL;

> +	struct qed_sp_init_data init_data = {};

> +	struct qed_spq_entry *p_ent = NULL;

> +	int rc = 0;

> +	u16 val;

> +	u8 i;

> +

> +	/* Get SPQ entry */

> +	init_data.cid = qed_spq_get_cid(p_hwfn);

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_INIT_FUNC,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	p_ramrod = &p_ent->ramrod.nvmetcp_init;

> +	p_init = &p_ramrod->nvmetcp_init_spe;

> +	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> +	p_queue = &p_init->q_params;

> +

> +	p_init->num_sq_pages_in_ring = p_params->num_sq_pages_in_ring;

> +	p_init->num_r2tq_pages_in_ring = p_params->num_r2tq_pages_in_ring;

> +	p_init->num_uhq_pages_in_ring = p_params->num_uhq_pages_in_ring;

> +	p_init->ll2_rx_queue_id = RESC_START(p_hwfn, QED_LL2_RAM_QUEUE) +

> +					p_params->ll2_ooo_queue_id;

> +

> +	SET_FIELD(p_init->flags, NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE, 1);

> +

> +	p_init->func_params.log_page_size = ilog2(PAGE_SIZE);

> +	p_init->func_params.num_tasks = cpu_to_le16(p_params->num_tasks);

> +	p_init->debug_flags = p_params->debug_mode;

> +

> +	DMA_REGPAIR_LE(p_queue->glbl_q_params_addr,

> +		       p_params->glbl_q_params_addr);

> +

> +	p_queue->cq_num_entries = cpu_to_le16(QED_NVMETCP_FW_CQ_SIZE);

> +	p_queue->num_queues = p_params->num_queues;

> +	val = RESC_START(p_hwfn, QED_CMDQS_CQS);

> +	p_queue->queue_relative_offset = cpu_to_le16((u16)val);

> +	p_queue->cq_sb_pi = p_params->gl_rq_pi;

> +

> +	for (i = 0; i < p_params->num_queues; i++) {

> +		val = qed_get_igu_sb_id(p_hwfn, i);

> +		p_queue->cq_cmdq_sb_num_arr[i] = cpu_to_le16(val);

> +	}

> +

> +	SET_FIELD(p_queue->q_validity,

> +		  SCSI_INIT_FUNC_QUEUES_CMD_VALID, 0);

> +	p_queue->cmdq_num_entries = 0;

> +	p_queue->bdq_resource_id = (u8)RESC_START(p_hwfn, QED_BDQ);

> +

> +	/* p_ramrod->tcp_init.min_rto = cpu_to_le16(p_params->min_rto); */

> +	p_ramrod->tcp_init.two_msl_timer = cpu_to_le32(QED_TCP_TWO_MSL_TIMER);

> +	p_ramrod->tcp_init.tx_sws_timer = cpu_to_le16(QED_TCP_SWS_TIMER);

> +	p_init->half_way_close_timeout = cpu_to_le16(QED_TCP_HALF_WAY_CLOSE_TIMEOUT);

> +	p_ramrod->tcp_init.max_fin_rt = QED_TCP_MAX_FIN_RT;

> +

> +	SET_FIELD(p_ramrod->nvmetcp_init_spe.params,

> +		  NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT, QED_TCP_MAX_FIN_RT);

> +

> +	p_hwfn->p_nvmetcp_info->event_context = event_context;

> +	p_hwfn->p_nvmetcp_info->event_cb = async_event_cb;

> +

> +	qed_spq_register_async_cb(p_hwfn, PROTOCOLID_NVMETCP,

> +				  qed_nvmetcp_async_event);

> +

> +	return qed_spq_post(p_hwfn, p_ent, NULL);

> +}

> +

> +static int qed_sp_nvmetcp_func_stop(struct qed_hwfn *p_hwfn,

> +				    enum spq_mode comp_mode,

> +				    struct qed_spq_comp_cb *p_comp_addr)

> +{

> +	struct qed_spq_entry *p_ent = NULL;

> +	struct qed_sp_init_data init_data;

> +	int rc;

> +

> +	/* Get SPQ entry */

> +	memset(&init_data, 0, sizeof(init_data));

> +	init_data.cid = qed_spq_get_cid(p_hwfn);

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	rc = qed_spq_post(p_hwfn, p_ent, NULL);

> +

> +	qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_NVMETCP);

> +

> +	return rc;

> +}

> +

> +static int qed_fill_nvmetcp_dev_info(struct qed_dev *cdev,

> +				     struct qed_dev_nvmetcp_info *info)

> +{

> +	struct qed_hwfn *hwfn = QED_AFFIN_HWFN(cdev);

> +	int rc;

> +

> +	memset(info, 0, sizeof(*info));

> +	rc = qed_fill_dev_info(cdev, &info->common);

> +

> +	info->port_id = MFW_PORT(hwfn);

> +	info->num_cqs = FEAT_NUM(hwfn, QED_NVMETCP_CQ);

> +

> +	return rc;

> +}

> +

> +static void qed_register_nvmetcp_ops(struct qed_dev *cdev,

> +				     struct qed_nvmetcp_cb_ops *ops,

> +				     void *cookie)

> +{

> +	cdev->protocol_ops.nvmetcp = ops;

> +	cdev->ops_cookie = cookie;

> +}

> +

> +static int qed_nvmetcp_stop(struct qed_dev *cdev)

> +{

> +	int rc;

> +

> +	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED)) {

> +		DP_NOTICE(cdev, "nvmetcp already stopped\n");

> +

> +		return 0;

> +	}

> +

> +	if (!hash_empty(cdev->connections)) {

> +		DP_NOTICE(cdev,

> +			  "Can't stop nvmetcp - not all connections were returned\n");

> +

> +		return -EINVAL;

> +	}

> +

> +	/* Stop the nvmetcp */

> +	rc = qed_sp_nvmetcp_func_stop(QED_AFFIN_HWFN(cdev), QED_SPQ_MODE_EBLOCK,

> +				      NULL);

> +	cdev->flags &= ~QED_FLAG_STORAGE_STARTED;

> +

> +	return rc;

> +}

> +

> +static int qed_nvmetcp_start(struct qed_dev *cdev,

> +			     struct qed_nvmetcp_tid *tasks,

> +			     void *event_context,

> +			     nvmetcp_event_cb_t async_event_cb)

> +{

> +	struct qed_tid_mem *tid_info;

> +	int rc;

> +

> +	if (cdev->flags & QED_FLAG_STORAGE_STARTED) {

> +		DP_NOTICE(cdev, "nvmetcp already started;\n");

> +

> +		return 0;

> +	}

> +

> +	rc = qed_sp_nvmetcp_func_start(QED_AFFIN_HWFN(cdev),

> +				       QED_SPQ_MODE_EBLOCK, NULL,

> +				       event_context, async_event_cb);

> +	if (rc) {

> +		DP_NOTICE(cdev, "Failed to start nvmetcp\n");

> +

> +		return rc;

> +	}

> +

> +	cdev->flags |= QED_FLAG_STORAGE_STARTED;

> +	hash_init(cdev->connections);

> +

> +	if (!tasks)

> +		return 0;

> +

> +	tid_info = kzalloc(sizeof(*tid_info), GFP_KERNEL);

> +

> +	if (!tid_info) {

> +		qed_nvmetcp_stop(cdev);

> +

> +		return -ENOMEM;

> +	}

> +

> +	rc = qed_cxt_get_tid_mem_info(QED_AFFIN_HWFN(cdev), tid_info);

> +	if (rc) {

> +		DP_NOTICE(cdev, "Failed to gather task information\n");

> +		qed_nvmetcp_stop(cdev);

> +		kfree(tid_info);

> +

> +		return rc;

> +	}

> +

> +	/* Fill task information */

> +	tasks->size = tid_info->tid_size;

> +	tasks->num_tids_per_block = tid_info->num_tids_per_block;

> +	memcpy(tasks->blocks, tid_info->blocks,

> +	       MAX_TID_BLOCKS_NVMETCP * sizeof(u8 *));

> +

> +	kfree(tid_info);

> +

> +	return 0;

> +}

> +

> +static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

> +	.common = &qed_common_ops_pass,

> +	.ll2 = &qed_ll2_ops_pass,

> +	.fill_dev_info = &qed_fill_nvmetcp_dev_info,

> +	.register_ops = &qed_register_nvmetcp_ops,

> +	.start = &qed_nvmetcp_start,

> +	.stop = &qed_nvmetcp_stop,

> +

> +	/* Placeholder - Connection level ops */

> +};

> +

> +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> +{

> +	return &qed_nvmetcp_ops_pass;

> +}

> +EXPORT_SYMBOL(qed_get_nvmetcp_ops);

> +

> +void qed_put_nvmetcp_ops(void)

> +{

> +}

> +EXPORT_SYMBOL(qed_put_nvmetcp_ops);

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> new file mode 100644

> index 000000000000..774b46ade408

> --- /dev/null

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> @@ -0,0 +1,51 @@

> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#ifndef _QED_NVMETCP_H

> +#define _QED_NVMETCP_H

> +

> +#include <linux/types.h>

> +#include <linux/list.h>

> +#include <linux/slab.h>

> +#include <linux/spinlock.h>

> +#include <linux/qed/tcp_common.h>

> +#include <linux/qed/qed_nvmetcp_if.h>

> +#include <linux/qed/qed_chain.h>

> +#include "qed.h"

> +#include "qed_hsi.h"

> +#include "qed_mcp.h"

> +#include "qed_sp.h"

> +

> +#define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)

> +

> +/* tcp parameters */

> +#define QED_TCP_TWO_MSL_TIMER 4000

> +#define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10

> +#define QED_TCP_MAX_FIN_RT 2

> +#define QED_TCP_SWS_TIMER 5000

> +

> +struct qed_nvmetcp_info {

> +	spinlock_t lock; /* Connection resources. */

> +	struct list_head free_list;

> +	u16 max_num_outstanding_tasks;

> +	void *event_context;

> +	nvmetcp_event_cb_t event_cb;

> +};

> +

> +#if IS_ENABLED(CONFIG_QED_NVMETCP)

> +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);

> +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);

> +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn);

> +

> +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> +static inline int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)

> +{

> +	return -EINVAL;

> +}

> +

> +static inline void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn) {}

> +static inline void qed_nvmetcp_free(struct qed_hwfn *p_hwfn) {}

> +

> +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> +

> +#endif

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> index 993f1357b6fc..525159e747a5 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> @@ -100,6 +100,8 @@ union ramrod_data {

>   	struct iscsi_spe_conn_mac_update iscsi_conn_mac_update;

>   	struct iscsi_spe_conn_termination iscsi_conn_terminate;

>   

> +	struct nvmetcp_init_ramrod_params nvmetcp_init;

> +

>   	struct vf_start_ramrod_data vf_start;

>   	struct vf_stop_ramrod_data vf_stop;

>   };

> diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h

> index 977807e1be53..59c5e5866607 100644

> --- a/include/linux/qed/common_hsi.h

> +++ b/include/linux/qed/common_hsi.h

> @@ -703,6 +703,7 @@ enum mf_mode {

>   /* Per-protocol connection types */

>   enum protocol_type {

>   	PROTOCOLID_ISCSI,

> +	PROTOCOLID_NVMETCP = PROTOCOLID_ISCSI,

>   	PROTOCOLID_FCOE,

>   	PROTOCOLID_ROCE,

>   	PROTOCOLID_CORE,


Why not a separate Protocol ID?
Don't you expect iSCSI and NVMe-TCP to be run at the same time?

> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> new file mode 100644

> index 000000000000..e9ccfc07041d

> --- /dev/null

> +++ b/include/linux/qed/nvmetcp_common.h

> @@ -0,0 +1,54 @@

> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#ifndef __NVMETCP_COMMON__

> +#define __NVMETCP_COMMON__

> +

> +#include "tcp_common.h"

> +

> +/* NVMeTCP firmware function init parameters */

> +struct nvmetcp_spe_func_init {

> +	__le16 half_way_close_timeout;

> +	u8 num_sq_pages_in_ring;

> +	u8 num_r2tq_pages_in_ring;

> +	u8 num_uhq_pages_in_ring;

> +	u8 ll2_rx_queue_id;

> +	u8 flags;

> +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_MASK 0x1

> +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_SHIFT 0

> +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_MASK 0x1

> +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_SHIFT 1

> +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_MASK 0x3F

> +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_SHIFT 2

> +	u8 debug_flags;

> +	__le16 reserved1;

> +	u8 params;

> +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_MASK	0xF

> +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT	0

> +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_MASK	0xF

> +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_SHIFT	4

> +	u8 reserved2[5];

> +	struct scsi_init_func_params func_params;

> +	struct scsi_init_func_queues q_params;

> +};

> +

> +/* NVMeTCP init params passed by driver to FW in NVMeTCP init ramrod. */

> +struct nvmetcp_init_ramrod_params {

> +	struct nvmetcp_spe_func_init nvmetcp_init_spe;

> +	struct tcp_init_params tcp_init;

> +};

> +

> +/* NVMeTCP Ramrod Command IDs */

> +enum nvmetcp_ramrod_cmd_id {

> +	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,

> +	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,

> +	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,

> +	MAX_NVMETCP_RAMROD_CMD_ID

> +};

> +

> +struct nvmetcp_glbl_queue_entry {

> +	struct regpair cq_pbl_addr;

> +	struct regpair reserved;

> +};

> +

> +#endif /* __NVMETCP_COMMON__ */

> diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h

> index 68d17a4fbf20..524f57821ba2 100644

> --- a/include/linux/qed/qed_if.h

> +++ b/include/linux/qed/qed_if.h

> @@ -542,6 +542,26 @@ struct qed_iscsi_pf_params {

>   	u8 bdq_pbl_num_entries[3];

>   };

>   

> +struct qed_nvmetcp_pf_params {

> +	u64 glbl_q_params_addr;

> +	u16 cq_num_entries;

> +

> +	u16 num_cons;

> +	u16 num_tasks;

> +

> +	u8 num_sq_pages_in_ring;

> +	u8 num_r2tq_pages_in_ring;

> +	u8 num_uhq_pages_in_ring;

> +

> +	u8 num_queues;

> +	u8 gl_rq_pi;

> +	u8 gl_cmd_pi;

> +	u8 debug_mode;

> +	u8 ll2_ooo_queue_id;

> +

> +	u16 min_rto;

> +};

> +

>   struct qed_rdma_pf_params {

>   	/* Supplied to QED during resource allocation (may affect the ILT and

>   	 * the doorbell BAR).

> @@ -560,6 +580,7 @@ struct qed_pf_params {

>   	struct qed_eth_pf_params eth_pf_params;

>   	struct qed_fcoe_pf_params fcoe_pf_params;

>   	struct qed_iscsi_pf_params iscsi_pf_params;

> +	struct qed_nvmetcp_pf_params nvmetcp_pf_params;

>   	struct qed_rdma_pf_params rdma_pf_params;

>   };

>   

> @@ -662,6 +683,7 @@ enum qed_sb_type {

>   enum qed_protocol {

>   	QED_PROTOCOL_ETH,

>   	QED_PROTOCOL_ISCSI,

> +	QED_PROTOCOL_NVMETCP = QED_PROTOCOL_ISCSI,

>   	QED_PROTOCOL_FCOE,

>   };

>   

> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> new file mode 100644

> index 000000000000..abc1f41862e3

> --- /dev/null

> +++ b/include/linux/qed/qed_nvmetcp_if.h

> @@ -0,0 +1,72 @@

> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#ifndef _QED_NVMETCP_IF_H

> +#define _QED_NVMETCP_IF_H

> +#include <linux/types.h>

> +#include <linux/qed/qed_if.h>

> +

> +#define QED_NVMETCP_MAX_IO_SIZE	0x800000

> +

> +typedef int (*nvmetcp_event_cb_t) (void *context,

> +				   u8 fw_event_code, void *fw_handle);

> +

> +struct qed_dev_nvmetcp_info {

> +	struct qed_dev_info common;

> +

> +	u8 port_id;  /* Physical port */

> +	u8 num_cqs;

> +};

> +

> +#define MAX_TID_BLOCKS_NVMETCP (512)

> +struct qed_nvmetcp_tid {

> +	u32 size;		/* In bytes per task */

> +	u32 num_tids_per_block;

> +	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];

> +};

> +

> +struct qed_nvmetcp_cb_ops {

> +	struct qed_common_cb_ops common;

> +};

> +

> +/**

> + * struct qed_nvmetcp_ops - qed NVMeTCP operations.

> + * @common:		common operations pointer

> + * @ll2:		light L2 operations pointer

> + * @fill_dev_info:	fills NVMeTCP specific information

> + *			@param cdev

> + *			@param info

> + *			@return 0 on success, otherwise error value.

> + * @register_ops:	register nvmetcp operations

> + *			@param cdev

> + *			@param ops - specified using qed_nvmetcp_cb_ops

> + *			@param cookie - driver private

> + * @start:		nvmetcp in FW

> + *			@param cdev

> + *			@param tasks - qed will fill information about tasks

> + *			return 0 on success, otherwise error value.

> + * @stop:		nvmetcp in FW

> + *			@param cdev

> + *			return 0 on success, otherwise error value.

> + */

> +struct qed_nvmetcp_ops {

> +	const struct qed_common_ops *common;

> +

> +	const struct qed_ll2_ops *ll2;

> +

> +	int (*fill_dev_info)(struct qed_dev *cdev,

> +			     struct qed_dev_nvmetcp_info *info);

> +

> +	void (*register_ops)(struct qed_dev *cdev,

> +			     struct qed_nvmetcp_cb_ops *ops, void *cookie);

> +

> +	int (*start)(struct qed_dev *cdev,

> +		     struct qed_nvmetcp_tid *tasks,

> +		     void *event_context, nvmetcp_event_cb_t async_event_cb);

> +

> +	int (*stop)(struct qed_dev *cdev);

> +};

> +

> +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> +void qed_put_nvmetcp_ops(void);

> +#endif

> 

As mentioned, please rearrange the patchset to have the NVMe-TCP patches 
first, then the driver specific bits.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 1, 2021, 5:28 p.m. UTC | #6
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP HSI and HSI functionality in order to

> initialize and interact with the HW device as part of the connection level

> HSI.

> 

> This includes:

> - Connection offload: offload a TCP connection to the FW.

> - Connection update: update the ICReq-ICResp params

> - Connection clear SQ: outstanding IOs FW flush.

> - Connection termination: terminate the TCP connection and flush the FW.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> ---

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 580 +++++++++++++++++-

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  63 ++

>   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   3 +

>   include/linux/qed/nvmetcp_common.h            | 143 +++++

>   include/linux/qed/qed_nvmetcp_if.h            |  94 +++

>   5 files changed, 881 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> index da3b5002d216..79bd1cc6677f 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> @@ -259,6 +259,578 @@ static int qed_nvmetcp_start(struct qed_dev *cdev,

>   	return 0;

>   }

>   

> +static struct qed_hash_nvmetcp_con *qed_nvmetcp_get_hash(struct qed_dev *cdev,

> +							 u32 handle)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con = NULL;

> +

> +	if (!(cdev->flags & QED_FLAG_STORAGE_STARTED))

> +		return NULL;

> +

> +	hash_for_each_possible(cdev->connections, hash_con, node, handle) {

> +		if (hash_con->con->icid == handle)

> +			break;

> +	}

> +

> +	if (!hash_con || hash_con->con->icid != handle)

> +		return NULL;

> +

> +	return hash_con;

> +}

> +

> +static int qed_sp_nvmetcp_conn_offload(struct qed_hwfn *p_hwfn,

> +				       struct qed_nvmetcp_conn *p_conn,

> +				       enum spq_mode comp_mode,

> +				       struct qed_spq_comp_cb *p_comp_addr)

> +{

> +	struct nvmetcp_spe_conn_offload *p_ramrod = NULL;

> +	struct tcp_offload_params_opt2 *p_tcp2 = NULL;

> +	struct qed_sp_init_data init_data = { 0 };

> +	struct qed_spq_entry *p_ent = NULL;

> +	dma_addr_t r2tq_pbl_addr;

> +	dma_addr_t xhq_pbl_addr;

> +	dma_addr_t uhq_pbl_addr;

> +	u16 physical_q;

> +	int rc = 0;

> +	u32 dval;

> +	u8 i;

> +

> +	/* Get SPQ entry */

> +	init_data.cid = p_conn->icid;

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_offload;

> +

> +	/* Transmission PQ is the first of the PF */

> +	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_OFLD);

> +	p_conn->physical_q0 = cpu_to_le16(physical_q);

> +	p_ramrod->nvmetcp.physical_q0 = cpu_to_le16(physical_q);

> +

> +	/* nvmetcp Pure-ACK PQ */

> +	physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_ACK);

> +	p_conn->physical_q1 = cpu_to_le16(physical_q);

> +	p_ramrod->nvmetcp.physical_q1 = cpu_to_le16(physical_q);

> +

> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> +

> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.sq_pbl_addr, p_conn->sq_pbl_addr);

> +

> +	r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);

> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.r2tq_pbl_addr, r2tq_pbl_addr);

> +

> +	xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);

> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.xhq_pbl_addr, xhq_pbl_addr);

> +

> +	uhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->uhq);

> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.uhq_pbl_addr, uhq_pbl_addr);

> +

> +	p_ramrod->nvmetcp.flags = p_conn->offl_flags;

> +	p_ramrod->nvmetcp.default_cq = p_conn->default_cq;

> +	p_ramrod->nvmetcp.initial_ack = 0;

> +

> +	DMA_REGPAIR_LE(p_ramrod->nvmetcp.nvmetcp.cccid_itid_table_addr,

> +		       p_conn->nvmetcp_cccid_itid_table_addr);

> +	p_ramrod->nvmetcp.nvmetcp.cccid_max_range =

> +		 cpu_to_le16(p_conn->nvmetcp_cccid_max_range);

> +

> +	p_tcp2 = &p_ramrod->tcp;

> +

> +	qed_set_fw_mac_addr(&p_tcp2->remote_mac_addr_hi,

> +			    &p_tcp2->remote_mac_addr_mid,

> +			    &p_tcp2->remote_mac_addr_lo, p_conn->remote_mac);

> +	qed_set_fw_mac_addr(&p_tcp2->local_mac_addr_hi,

> +			    &p_tcp2->local_mac_addr_mid,

> +			    &p_tcp2->local_mac_addr_lo, p_conn->local_mac);

> +

> +	p_tcp2->vlan_id = cpu_to_le16(p_conn->vlan_id);

> +	p_tcp2->flags = cpu_to_le16(p_conn->tcp_flags);

> +

> +	p_tcp2->ip_version = p_conn->ip_version;

> +	for (i = 0; i < 4; i++) {

> +		dval = p_conn->remote_ip[i];

> +		p_tcp2->remote_ip[i] = cpu_to_le32(dval);

> +		dval = p_conn->local_ip[i];

> +		p_tcp2->local_ip[i] = cpu_to_le32(dval);

> +	}

> +


What is this?
Some convoluted way of assigning the IP address in little endian?
Pointless if it's IPv4, as then each bit is just one byte.
And if it's for IPv6, what do you do for IPv4?
And isn't there a helper for it?

> +	p_tcp2->flow_label = cpu_to_le32(p_conn->flow_label);

> +	p_tcp2->ttl = p_conn->ttl;

> +	p_tcp2->tos_or_tc = p_conn->tos_or_tc;

> +	p_tcp2->remote_port = cpu_to_le16(p_conn->remote_port);

> +	p_tcp2->local_port = cpu_to_le16(p_conn->local_port);

> +	p_tcp2->mss = cpu_to_le16(p_conn->mss);

> +	p_tcp2->rcv_wnd_scale = p_conn->rcv_wnd_scale;

> +	p_tcp2->connect_mode = p_conn->connect_mode;

> +	p_tcp2->cwnd = cpu_to_le32(p_conn->cwnd);

> +	p_tcp2->ka_max_probe_cnt = p_conn->ka_max_probe_cnt;

> +	p_tcp2->ka_timeout = cpu_to_le32(p_conn->ka_timeout);

> +	p_tcp2->max_rt_time = cpu_to_le32(p_conn->max_rt_time);

> +	p_tcp2->ka_interval = cpu_to_le32(p_conn->ka_interval);

> +

> +	return qed_spq_post(p_hwfn, p_ent, NULL);

> +}

> +

> +static int qed_sp_nvmetcp_conn_update(struct qed_hwfn *p_hwfn,

> +				      struct qed_nvmetcp_conn *p_conn,

> +				      enum spq_mode comp_mode,

> +				      struct qed_spq_comp_cb *p_comp_addr)

> +{

> +	struct nvmetcp_conn_update_ramrod_params *p_ramrod = NULL;

> +	struct qed_spq_entry *p_ent = NULL;

> +	struct qed_sp_init_data init_data;

> +	int rc = -EINVAL;

> +	u32 dval;

> +

> +	/* Get SPQ entry */

> +	memset(&init_data, 0, sizeof(init_data));

> +	init_data.cid = p_conn->icid;

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_UPDATE_CONN,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_update;

> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> +	p_ramrod->flags = p_conn->update_flag;

> +	p_ramrod->max_seq_size = cpu_to_le32(p_conn->max_seq_size);

> +	dval = p_conn->max_recv_pdu_length;

> +	p_ramrod->max_recv_pdu_length = cpu_to_le32(dval);

> +	dval = p_conn->max_send_pdu_length;

> +	p_ramrod->max_send_pdu_length = cpu_to_le32(dval);

> +	dval = p_conn->first_seq_length;

> +	p_ramrod->first_seq_length = cpu_to_le32(dval);

> +

> +	return qed_spq_post(p_hwfn, p_ent, NULL);

> +}

> +

> +static int qed_sp_nvmetcp_conn_terminate(struct qed_hwfn *p_hwfn,

> +					 struct qed_nvmetcp_conn *p_conn,

> +					 enum spq_mode comp_mode,

> +					 struct qed_spq_comp_cb *p_comp_addr)

> +{

> +	struct nvmetcp_spe_conn_termination *p_ramrod = NULL;

> +	struct qed_spq_entry *p_ent = NULL;

> +	struct qed_sp_init_data init_data;

> +	int rc = -EINVAL;

> +

> +	/* Get SPQ entry */

> +	memset(&init_data, 0, sizeof(init_data));

> +	init_data.cid = p_conn->icid;

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	p_ramrod = &p_ent->ramrod.nvmetcp_conn_terminate;

> +	p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> +	p_ramrod->abortive = p_conn->abortive_dsconnect;

> +

> +	return qed_spq_post(p_hwfn, p_ent, NULL);

> +}

> +

> +static int qed_sp_nvmetcp_conn_clear_sq(struct qed_hwfn *p_hwfn,

> +					struct qed_nvmetcp_conn *p_conn,

> +					enum spq_mode comp_mode,

> +					struct qed_spq_comp_cb *p_comp_addr)

> +{

> +	struct qed_spq_entry *p_ent = NULL;

> +	struct qed_sp_init_data init_data;

> +	int rc = -EINVAL;

> +

> +	/* Get SPQ entry */

> +	memset(&init_data, 0, sizeof(init_data));

> +	init_data.cid = p_conn->icid;

> +	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> +	init_data.comp_mode = comp_mode;

> +	init_data.p_comp_data = p_comp_addr;

> +

> +	rc = qed_sp_init_request(p_hwfn, &p_ent,

> +				 NVMETCP_RAMROD_CMD_ID_CLEAR_SQ,

> +				 PROTOCOLID_NVMETCP, &init_data);

> +	if (rc)

> +		return rc;

> +

> +	return qed_spq_post(p_hwfn, p_ent, NULL);

> +}

> +

> +static void __iomem *qed_nvmetcp_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)

> +{

> +	return (u8 __iomem *)p_hwfn->doorbells +

> +			     qed_db_addr(cid, DQ_DEMS_LEGACY);

> +}

> +

> +static int qed_nvmetcp_allocate_connection(struct qed_hwfn *p_hwfn,

> +					   struct qed_nvmetcp_conn **p_out_conn)

> +{

> +	struct qed_chain_init_params params = {

> +		.mode		= QED_CHAIN_MODE_PBL,

> +		.intended_use	= QED_CHAIN_USE_TO_CONSUME_PRODUCE,

> +		.cnt_type	= QED_CHAIN_CNT_TYPE_U16,

> +	};

> +	struct qed_nvmetcp_pf_params *p_params = NULL;

> +	struct qed_nvmetcp_conn *p_conn = NULL;

> +	int rc = 0;

> +

> +	/* Try finding a free connection that can be used */

> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +	if (!list_empty(&p_hwfn->p_nvmetcp_info->free_list))

> +		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,

> +					  struct qed_nvmetcp_conn, list_entry);

> +	if (p_conn) {

> +		list_del(&p_conn->list_entry);

> +		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +		*p_out_conn = p_conn;

> +

> +		return 0;

> +	}

> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +

> +	/* Need to allocate a new connection */

> +	p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> +

> +	p_conn = kzalloc(sizeof(*p_conn), GFP_KERNEL);

> +	if (!p_conn)

> +		return -ENOMEM;

> +

> +	params.num_elems = p_params->num_r2tq_pages_in_ring *

> +			   QED_CHAIN_PAGE_SIZE / sizeof(struct nvmetcp_wqe);

> +	params.elem_size = sizeof(struct nvmetcp_wqe);

> +

> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->r2tq, &params);

> +	if (rc)

> +		goto nomem_r2tq;

> +

> +	params.num_elems = p_params->num_uhq_pages_in_ring *

> +			   QED_CHAIN_PAGE_SIZE / sizeof(struct iscsi_uhqe);

> +	params.elem_size = sizeof(struct iscsi_uhqe);

> +

> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->uhq, &params);

> +	if (rc)

> +		goto nomem_uhq;

> +

> +	params.elem_size = sizeof(struct iscsi_xhqe);

> +

> +	rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->xhq, &params);

> +	if (rc)

> +		goto nomem;

> +

> +	p_conn->free_on_delete = true;

> +	*p_out_conn = p_conn;

> +

> +	return 0;

> +

> +nomem:

> +	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);

> +nomem_uhq:

> +	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);

> +nomem_r2tq:

> +	kfree(p_conn);

> +

> +	return -ENOMEM;

> +}

> +

> +static int qed_nvmetcp_acquire_connection(struct qed_hwfn *p_hwfn,

> +					  struct qed_nvmetcp_conn **p_out_conn)

> +{

> +	struct qed_nvmetcp_conn *p_conn = NULL;

> +	int rc = 0;

> +	u32 icid;

> +

> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +	rc = qed_cxt_acquire_cid(p_hwfn, PROTOCOLID_NVMETCP, &icid);

> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +

> +	if (rc)

> +		return rc;

> +

> +	rc = qed_nvmetcp_allocate_connection(p_hwfn, &p_conn);

> +	if (rc) {

> +		spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +		qed_cxt_release_cid(p_hwfn, icid);

> +		spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +

> +		return rc;

> +	}

> +

> +	p_conn->icid = icid;

> +	p_conn->conn_id = (u16)icid;

> +	p_conn->fw_cid = (p_hwfn->hw_info.opaque_fid << 16) | icid;

> +	*p_out_conn = p_conn;

> +

> +	return rc;

> +}

> +

> +static void qed_nvmetcp_release_connection(struct qed_hwfn *p_hwfn,

> +					   struct qed_nvmetcp_conn *p_conn)

> +{

> +	spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +	list_add_tail(&p_conn->list_entry, &p_hwfn->p_nvmetcp_info->free_list);

> +	qed_cxt_release_cid(p_hwfn, p_conn->icid);

> +	spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> +}

> +

> +static void qed_nvmetcp_free_connection(struct qed_hwfn *p_hwfn,

> +					struct qed_nvmetcp_conn *p_conn)

> +{

> +	qed_chain_free(p_hwfn->cdev, &p_conn->xhq);

> +	qed_chain_free(p_hwfn->cdev, &p_conn->uhq);

> +	qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);

> +

> +	kfree(p_conn);

> +}

> +

> +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)

> +{

> +	struct qed_nvmetcp_info *p_nvmetcp_info;

> +

> +	p_nvmetcp_info = kzalloc(sizeof(*p_nvmetcp_info), GFP_KERNEL);

> +	if (!p_nvmetcp_info)

> +		return -ENOMEM;

> +

> +	INIT_LIST_HEAD(&p_nvmetcp_info->free_list);

> +

> +	p_hwfn->p_nvmetcp_info = p_nvmetcp_info;

> +

> +	return 0;

> +}

> +

> +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn)

> +{

> +	spin_lock_init(&p_hwfn->p_nvmetcp_info->lock);

> +}

> +

> +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn)

> +{

> +	struct qed_nvmetcp_conn *p_conn = NULL;

> +

> +	if (!p_hwfn->p_nvmetcp_info)

> +		return;

> +

> +	while (!list_empty(&p_hwfn->p_nvmetcp_info->free_list)) {

> +		p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,

> +					  struct qed_nvmetcp_conn, list_entry);

> +		if (p_conn) {

> +			list_del(&p_conn->list_entry);

> +			qed_nvmetcp_free_connection(p_hwfn, p_conn);

> +		}

> +	}

> +

> +	kfree(p_hwfn->p_nvmetcp_info);

> +	p_hwfn->p_nvmetcp_info = NULL;

> +}

> +

> +static int qed_nvmetcp_acquire_conn(struct qed_dev *cdev,

> +				    u32 *handle,

> +				    u32 *fw_cid, void __iomem **p_doorbell)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +	int rc;

> +

> +	/* Allocate a hashed connection */

> +	hash_con = kzalloc(sizeof(*hash_con), GFP_ATOMIC);

> +	if (!hash_con)

> +		return -ENOMEM;

> +

> +	/* Acquire the connection */

> +	rc = qed_nvmetcp_acquire_connection(QED_AFFIN_HWFN(cdev),

> +					    &hash_con->con);

> +	if (rc) {

> +		DP_NOTICE(cdev, "Failed to acquire Connection\n");

> +		kfree(hash_con);

> +

> +		return rc;

> +	}

> +

> +	/* Added the connection to hash table */

> +	*handle = hash_con->con->icid;

> +	*fw_cid = hash_con->con->fw_cid;

> +	hash_add(cdev->connections, &hash_con->node, *handle);

> +

> +	if (p_doorbell)

> +		*p_doorbell = qed_nvmetcp_get_db_addr(QED_AFFIN_HWFN(cdev),

> +						      *handle);

> +

> +	return 0;

> +}

> +

> +static int qed_nvmetcp_release_conn(struct qed_dev *cdev, u32 handle)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +

> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);

> +	if (!hash_con) {

> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> +			  handle);

> +

> +		return -EINVAL;

> +	}

> +

> +	hlist_del(&hash_con->node);

> +	qed_nvmetcp_release_connection(QED_AFFIN_HWFN(cdev), hash_con->con);

> +	kfree(hash_con);

> +

> +	return 0;

> +}

> +

> +static int qed_nvmetcp_offload_conn(struct qed_dev *cdev, u32 handle,

> +				    struct qed_nvmetcp_params_offload *conn_info)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +	struct qed_nvmetcp_conn *con;

> +

> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);

> +	if (!hash_con) {

> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> +			  handle);

> +

> +		return -EINVAL;

> +	}

> +

> +	/* Update the connection with information from the params */

> +	con = hash_con->con;

> +

> +	/* FW initializations */

> +	con->layer_code = NVMETCP_SLOW_PATH_LAYER_CODE;

> +	con->sq_pbl_addr = conn_info->sq_pbl_addr;

> +	con->nvmetcp_cccid_max_range = conn_info->nvmetcp_cccid_max_range;

> +	con->nvmetcp_cccid_itid_table_addr = conn_info->nvmetcp_cccid_itid_table_addr;

> +	con->default_cq = conn_info->default_cq;

> +

> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE, 0);

> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE, 1);

> +	SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B, 1);

> +

> +	/* Networking and TCP stack initializations */

> +	ether_addr_copy(con->local_mac, conn_info->src.mac);

> +	ether_addr_copy(con->remote_mac, conn_info->dst.mac);

> +	memcpy(con->local_ip, conn_info->src.ip, sizeof(con->local_ip));

> +	memcpy(con->remote_ip, conn_info->dst.ip, sizeof(con->remote_ip));

> +	con->local_port = conn_info->src.port;

> +	con->remote_port = conn_info->dst.port;

> +	con->vlan_id = conn_info->vlan_id;

> +

> +	if (conn_info->timestamp_en)

> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_TS_EN, 1);

> +

> +	if (conn_info->delayed_ack_en)

> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_DA_EN, 1);

> +

> +	if (conn_info->tcp_keep_alive_en)

> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_KA_EN, 1);

> +

> +	if (conn_info->ecn_en)

> +		SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_ECN_EN, 1);

> +

> +	con->ip_version = conn_info->ip_version;

> +	con->flow_label = QED_TCP_FLOW_LABEL;

> +	con->ka_max_probe_cnt = conn_info->ka_max_probe_cnt;

> +	con->ka_timeout = conn_info->ka_timeout;

> +	con->ka_interval = conn_info->ka_interval;

> +	con->max_rt_time = conn_info->max_rt_time;

> +	con->ttl = conn_info->ttl;

> +	con->tos_or_tc = conn_info->tos_or_tc;

> +	con->mss = conn_info->mss;

> +	con->cwnd = conn_info->cwnd;

> +	con->rcv_wnd_scale = conn_info->rcv_wnd_scale;

> +	con->connect_mode = 0; /* TCP_CONNECT_ACTIVE */

> +

> +	return qed_sp_nvmetcp_conn_offload(QED_AFFIN_HWFN(cdev), con,

> +					 QED_SPQ_MODE_EBLOCK, NULL);

> +}

> +

> +static int qed_nvmetcp_update_conn(struct qed_dev *cdev,

> +				   u32 handle,

> +				   struct qed_nvmetcp_params_update *conn_info)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +	struct qed_nvmetcp_conn *con;

> +

> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);

> +	if (!hash_con) {

> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> +			  handle);

> +

> +		return -EINVAL;

> +	}

> +

> +	/* Update the connection with information from the params */

> +	con = hash_con->con;

> +

> +	SET_FIELD(con->update_flag,

> +		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_INITIAL_R2T, 0);

> +	SET_FIELD(con->update_flag,

> +		  ISCSI_CONN_UPDATE_RAMROD_PARAMS_IMMEDIATE_DATA, 1);

> +

> +	if (conn_info->hdr_digest_en)

> +		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN, 1);

> +

> +	if (conn_info->data_digest_en)

> +		SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_DD_EN, 1);

> +

> +	/* Placeholder - initialize pfv, cpda, hpda */

> +

> +	con->max_seq_size = conn_info->max_io_size;

> +	con->max_recv_pdu_length = conn_info->max_recv_pdu_length;

> +	con->max_send_pdu_length = conn_info->max_send_pdu_length;

> +	con->first_seq_length = conn_info->max_io_size;

> +

> +	return qed_sp_nvmetcp_conn_update(QED_AFFIN_HWFN(cdev), con,

> +					QED_SPQ_MODE_EBLOCK, NULL);

> +}

> +

> +static int qed_nvmetcp_clear_conn_sq(struct qed_dev *cdev, u32 handle)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +

> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);

> +	if (!hash_con) {

> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> +			  handle);

> +

> +		return -EINVAL;

> +	}

> +

> +	return qed_sp_nvmetcp_conn_clear_sq(QED_AFFIN_HWFN(cdev), hash_con->con,

> +					    QED_SPQ_MODE_EBLOCK, NULL);

> +}

> +

> +static int qed_nvmetcp_destroy_conn(struct qed_dev *cdev,

> +				    u32 handle, u8 abrt_conn)

> +{

> +	struct qed_hash_nvmetcp_con *hash_con;

> +

> +	hash_con = qed_nvmetcp_get_hash(cdev, handle);

> +	if (!hash_con) {

> +		DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> +			  handle);

> +

> +		return -EINVAL;

> +	}

> +

> +	hash_con->con->abortive_dsconnect = abrt_conn;

> +

> +	return qed_sp_nvmetcp_conn_terminate(QED_AFFIN_HWFN(cdev), hash_con->con,

> +					   QED_SPQ_MODE_EBLOCK, NULL);

> +}

> +

>   static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

>   	.common = &qed_common_ops_pass,

>   	.ll2 = &qed_ll2_ops_pass,

> @@ -266,8 +838,12 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

>   	.register_ops = &qed_register_nvmetcp_ops,

>   	.start = &qed_nvmetcp_start,

>   	.stop = &qed_nvmetcp_stop,

> -

> -	/* Placeholder - Connection level ops */

> +	.acquire_conn = &qed_nvmetcp_acquire_conn,

> +	.release_conn = &qed_nvmetcp_release_conn,

> +	.offload_conn = &qed_nvmetcp_offload_conn,

> +	.update_conn = &qed_nvmetcp_update_conn,

> +	.destroy_conn = &qed_nvmetcp_destroy_conn,

> +	.clear_sq = &qed_nvmetcp_clear_conn_sq,

>   };

>   

>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> index 774b46ade408..749169f0bdb1 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> @@ -19,6 +19,7 @@

>   #define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)

>   

>   /* tcp parameters */

> +#define QED_TCP_FLOW_LABEL 0

>   #define QED_TCP_TWO_MSL_TIMER 4000

>   #define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10

>   #define QED_TCP_MAX_FIN_RT 2

> @@ -32,6 +33,68 @@ struct qed_nvmetcp_info {

>   	nvmetcp_event_cb_t event_cb;

>   };

>   

> +struct qed_hash_nvmetcp_con {

> +	struct hlist_node node;

> +	struct qed_nvmetcp_conn *con;

> +};

> +

> +struct qed_nvmetcp_conn {

> +	struct list_head list_entry;

> +	bool free_on_delete;

> +

> +	u16 conn_id;

> +	u32 icid;

> +	u32 fw_cid;

> +

> +	u8 layer_code;

> +	u8 offl_flags;

> +	u8 connect_mode;

> +

> +	dma_addr_t sq_pbl_addr;

> +	struct qed_chain r2tq;

> +	struct qed_chain xhq;

> +	struct qed_chain uhq;

> +

> +	u8 local_mac[6];

> +	u8 remote_mac[6];

> +	u8 ip_version;

> +	u8 ka_max_probe_cnt;

> +

> +	u16 vlan_id;

> +	u16 tcp_flags;

> +	u32 remote_ip[4];

> +	u32 local_ip[4];

> +

> +	u32 flow_label;

> +	u32 ka_timeout;

> +	u32 ka_interval;

> +	u32 max_rt_time;

> +

> +	u8 ttl;

> +	u8 tos_or_tc;

> +	u16 remote_port;

> +	u16 local_port;

> +	u16 mss;

> +	u8 rcv_wnd_scale;

> +	u32 rcv_wnd;

> +	u32 cwnd;

> +

> +	u8 update_flag;

> +	u8 default_cq;

> +	u8 abortive_dsconnect;

> +

> +	u32 max_seq_size;

> +	u32 max_recv_pdu_length;

> +	u32 max_send_pdu_length;

> +	u32 first_seq_length;

> +

> +	u16 physical_q0;

> +	u16 physical_q1;

> +

> +	u16 nvmetcp_cccid_max_range;

> +	dma_addr_t nvmetcp_cccid_itid_table_addr;

> +};

> +

>   #if IS_ENABLED(CONFIG_QED_NVMETCP)

>   int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);

>   void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> index 525159e747a5..60ff3222bf55 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> @@ -101,6 +101,9 @@ union ramrod_data {

>   	struct iscsi_spe_conn_termination iscsi_conn_terminate;

>   

>   	struct nvmetcp_init_ramrod_params nvmetcp_init;

> +	struct nvmetcp_spe_conn_offload nvmetcp_conn_offload;

> +	struct nvmetcp_conn_update_ramrod_params nvmetcp_conn_update;

> +	struct nvmetcp_spe_conn_termination nvmetcp_conn_terminate;

>   

>   	struct vf_start_ramrod_data vf_start;

>   	struct vf_stop_ramrod_data vf_stop;

> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> index e9ccfc07041d..c8836b71b866 100644

> --- a/include/linux/qed/nvmetcp_common.h

> +++ b/include/linux/qed/nvmetcp_common.h

> @@ -6,6 +6,8 @@

>   

>   #include "tcp_common.h"

>   

> +#define NVMETCP_SLOW_PATH_LAYER_CODE (6)

> +

>   /* NVMeTCP firmware function init parameters */

>   struct nvmetcp_spe_func_init {

>   	__le16 half_way_close_timeout;

> @@ -43,6 +45,10 @@ enum nvmetcp_ramrod_cmd_id {

>   	NVMETCP_RAMROD_CMD_ID_UNUSED = 0,

>   	NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,

>   	NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,

> +	NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN = 3,

> +	NVMETCP_RAMROD_CMD_ID_UPDATE_CONN = 4,

> +	NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN = 5,

> +	NVMETCP_RAMROD_CMD_ID_CLEAR_SQ = 6,

>   	MAX_NVMETCP_RAMROD_CMD_ID

>   };

>   

> @@ -51,4 +57,141 @@ struct nvmetcp_glbl_queue_entry {

>   	struct regpair reserved;

>   };

>   

> +/* NVMeTCP conn level EQEs */

> +enum nvmetcp_eqe_opcode {

> +	NVMETCP_EVENT_TYPE_INIT_FUNC = 0, /* Response after init Ramrod */

> +	NVMETCP_EVENT_TYPE_DESTROY_FUNC, /* Response after destroy Ramrod */

> +	NVMETCP_EVENT_TYPE_OFFLOAD_CONN,/* Response after option 2 offload Ramrod */

> +	NVMETCP_EVENT_TYPE_UPDATE_CONN, /* Response after update Ramrod */

> +	NVMETCP_EVENT_TYPE_CLEAR_SQ, /* Response after clear sq Ramrod */

> +	NVMETCP_EVENT_TYPE_TERMINATE_CONN, /* Response after termination Ramrod */

> +	NVMETCP_EVENT_TYPE_RESERVED0,

> +	NVMETCP_EVENT_TYPE_RESERVED1,

> +	NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE, /* Connect completed (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE, /* Termination completed (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_START_OF_ERROR_TYPES = 10, /* Separate EQs from err EQs */

> +	NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD, /* TCP RST packet receive (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD, /* TCP FIN packet receive (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD, /* TCP SYN+ACK packet receive (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME, /* TCP max retransmit time (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT, /* TCP max retransmit count (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT, /* TCP ka probes count (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_ASYN_FIN_WAIT2, /* TCP fin wait 2 (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR, /* NVMeTCP error response (A-syn EQE) */

> +	NVMETCP_EVENT_TYPE_TCP_CONN_ERROR, /* NVMeTCP error - tcp error (A-syn EQE) */

> +	MAX_NVMETCP_EQE_OPCODE

> +};

> +

> +struct nvmetcp_conn_offload_section {

> +	struct regpair cccid_itid_table_addr; /* CCCID to iTID table address */

> +	__le16 cccid_max_range; /* CCCID max value - used for validation */

> +	__le16 reserved[3];

> +};

> +

> +/* NVMe TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod */

> +struct nvmetcp_conn_offload_params {

> +	struct regpair sq_pbl_addr;

> +	struct regpair r2tq_pbl_addr;

> +	struct regpair xhq_pbl_addr;

> +	struct regpair uhq_pbl_addr;

> +	__le16 physical_q0;

> +	__le16 physical_q1;

> +	u8 flags;

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_MASK 0x1

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_SHIFT 0

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_MASK 0x1

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_SHIFT 1

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_MASK 0x1

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_MASK 0x1

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_SHIFT 3

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0xF

> +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 4

> +	u8 default_cq;

> +	__le16 reserved0;

> +	__le32 reserved1;

> +	__le32 initial_ack;

> +

> +	struct nvmetcp_conn_offload_section nvmetcp; /* NVMe/TCP section */

> +};

> +

> +/* NVMe TCP and TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod. */

> +struct nvmetcp_spe_conn_offload {

> +	__le16 reserved;

> +	__le16 conn_id;

> +	__le32 fw_cid;

> +	struct nvmetcp_conn_offload_params nvmetcp;

> +	struct tcp_offload_params_opt2 tcp;

> +};

> +

> +/* NVMeTCP connection update params passed by driver to FW in NVMETCP update ramrod. */

> +struct nvmetcp_conn_update_ramrod_params {

> +	__le16 reserved0;

> +	__le16 conn_id;

> +	__le32 reserved1;

> +	u8 flags;

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_SHIFT 1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_SHIFT 2

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_DATA_SHIFT 3

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_SHIFT 4

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_SHIFT 5

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_SHIFT 6

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_MASK 0x1

> +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_SHIFT 7

> +	u8 reserved3[3];

> +	__le32 max_seq_size;

> +	__le32 max_send_pdu_length;

> +	__le32 max_recv_pdu_length;

> +	__le32 first_seq_length;

> +	__le32 reserved4[5];

> +};

> +

> +/* NVMeTCP connection termination request */

> +struct nvmetcp_spe_conn_termination {

> +	__le16 reserved0;

> +	__le16 conn_id;

> +	__le32 reserved1;

> +	u8 abortive;

> +	u8 reserved2[7];

> +	struct regpair reserved3;

> +	struct regpair reserved4;

> +};

> +

> +struct nvmetcp_dif_flags {

> +	u8 flags;

> +};

> +

> +enum nvmetcp_wqe_type {

> +	NVMETCP_WQE_TYPE_NORMAL,

> +	NVMETCP_WQE_TYPE_TASK_CLEANUP,

> +	NVMETCP_WQE_TYPE_MIDDLE_PATH,

> +	NVMETCP_WQE_TYPE_IC,

> +	MAX_NVMETCP_WQE_TYPE

> +};

> +

> +struct nvmetcp_wqe {

> +	__le16 task_id;

> +	u8 flags;

> +#define NVMETCP_WQE_WQE_TYPE_MASK 0x7 /* [use nvmetcp_wqe_type] */

> +#define NVMETCP_WQE_WQE_TYPE_SHIFT 0

> +#define NVMETCP_WQE_NUM_SGES_MASK 0xF

> +#define NVMETCP_WQE_NUM_SGES_SHIFT 3

> +#define NVMETCP_WQE_RESPONSE_MASK 0x1

> +#define NVMETCP_WQE_RESPONSE_SHIFT 7

> +	struct nvmetcp_dif_flags prot_flags;

> +	__le32 contlen_cdbsize;

> +#define NVMETCP_WQE_CONT_LEN_MASK 0xFFFFFF

> +#define NVMETCP_WQE_CONT_LEN_SHIFT 0

> +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_MASK 0xFF

> +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24

> +};

> +

>   #endif /* __NVMETCP_COMMON__ */

> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> index abc1f41862e3..96263e3cfa1e 100644

> --- a/include/linux/qed/qed_nvmetcp_if.h

> +++ b/include/linux/qed/qed_nvmetcp_if.h

> @@ -25,6 +25,50 @@ struct qed_nvmetcp_tid {

>   	u8 *blocks[MAX_TID_BLOCKS_NVMETCP];

>   };

>   

> +struct qed_nvmetcp_id_params {

> +	u8 mac[ETH_ALEN];

> +	u32 ip[4];

> +	u16 port;

> +};

> +

> +struct qed_nvmetcp_params_offload {

> +	/* FW initializations */

> +	dma_addr_t sq_pbl_addr;

> +	dma_addr_t nvmetcp_cccid_itid_table_addr;

> +	u16 nvmetcp_cccid_max_range;

> +	u8 default_cq;

> +

> +	/* Networking and TCP stack initializations */

> +	struct qed_nvmetcp_id_params src;

> +	struct qed_nvmetcp_id_params dst;

> +	u32 ka_timeout;

> +	u32 ka_interval;

> +	u32 max_rt_time;

> +	u32 cwnd;

> +	u16 mss;

> +	u16 vlan_id;

> +	bool timestamp_en;

> +	bool delayed_ack_en;

> +	bool tcp_keep_alive_en;

> +	bool ecn_en;

> +	u8 ip_version;

> +	u8 ka_max_probe_cnt;

> +	u8 ttl;

> +	u8 tos_or_tc;

> +	u8 rcv_wnd_scale;

> +};

> +

> +struct qed_nvmetcp_params_update {

> +	u32 max_io_size;

> +	u32 max_recv_pdu_length;

> +	u32 max_send_pdu_length;

> +

> +	/* Placeholder: pfv, cpda, hpda */

> +

> +	bool hdr_digest_en;

> +	bool data_digest_en;

> +};

> +

>   struct qed_nvmetcp_cb_ops {

>   	struct qed_common_cb_ops common;

>   };

> @@ -48,6 +92,38 @@ struct qed_nvmetcp_cb_ops {

>    * @stop:		nvmetcp in FW

>    *			@param cdev

>    *			return 0 on success, otherwise error value.

> + * @acquire_conn:	acquire a new nvmetcp connection

> + *			@param cdev

> + *			@param handle - qed will fill handle that should be

> + *				used henceforth as identifier of the

> + *				connection.

> + *			@param p_doorbell - qed will fill the address of the

> + *				doorbell.

> + *			@return 0 on sucesss, otherwise error value.

> + * @release_conn:	release a previously acquired nvmetcp connection

> + *			@param cdev

> + *			@param handle - the connection handle.

> + *			@return 0 on success, otherwise error value.

> + * @offload_conn:	configures an offloaded connection

> + *			@param cdev

> + *			@param handle - the connection handle.

> + *			@param conn_info - the configuration to use for the

> + *				offload.

> + *			@return 0 on success, otherwise error value.

> + * @update_conn:	updates an offloaded connection

> + *			@param cdev

> + *			@param handle - the connection handle.

> + *			@param conn_info - the configuration to use for the

> + *				offload.

> + *			@return 0 on success, otherwise error value.

> + * @destroy_conn:	stops an offloaded connection

> + *			@param cdev

> + *			@param handle - the connection handle.

> + *			@return 0 on success, otherwise error value.

> + * @clear_sq:		clear all task in sq

> + *			@param cdev

> + *			@param handle - the connection handle.

> + *			@return 0 on success, otherwise error value.

>    */

>   struct qed_nvmetcp_ops {

>   	const struct qed_common_ops *common;

> @@ -65,6 +141,24 @@ struct qed_nvmetcp_ops {

>   		     void *event_context, nvmetcp_event_cb_t async_event_cb);

>   

>   	int (*stop)(struct qed_dev *cdev);

> +

> +	int (*acquire_conn)(struct qed_dev *cdev,

> +			    u32 *handle,

> +			    u32 *fw_cid, void __iomem **p_doorbell);

> +

> +	int (*release_conn)(struct qed_dev *cdev, u32 handle);

> +

> +	int (*offload_conn)(struct qed_dev *cdev,

> +			    u32 handle,

> +			    struct qed_nvmetcp_params_offload *conn_info);

> +

> +	int (*update_conn)(struct qed_dev *cdev,

> +			   u32 handle,

> +			   struct qed_nvmetcp_params_update *conn_info);

> +

> +	int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);

> +

> +	int (*clear_sq)(struct qed_dev *cdev, u32 handle);

>   };

>   

>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:11 a.m. UTC | #7
On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Omkar Kulkarni <okulkarni@marvell.com>

> 

> This patch adds qed NVMeTCP personality in order to support the NVMeTCP

> qed functionalities and manage the HW device shared resources.

> The same design is used with Eth (qede), RDMA(qedr), iSCSI (qedi) and

> FCoE (qedf).

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> ---

>   drivers/net/ethernet/qlogic/qed/qed.h         |  3 ++

>   drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 32 ++++++++++++++

>   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  1 +

>   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 44 ++++++++++++++++---

>   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  3 +-

>   drivers/net/ethernet/qlogic/qed/qed_ll2.c     | 31 ++++++++-----

>   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |  3 ++

>   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |  3 +-

>   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  5 ++-

>   .../net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +

>   10 files changed, 108 insertions(+), 18 deletions(-)

> 

> diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h

> index 91d4635009ab..7ae648c4edba 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed.h

> @@ -200,6 +200,7 @@ enum qed_pci_personality {

>   	QED_PCI_ETH,

>   	QED_PCI_FCOE,

>   	QED_PCI_ISCSI,

> +	QED_PCI_NVMETCP,

>   	QED_PCI_ETH_ROCE,

>   	QED_PCI_ETH_IWARP,

>   	QED_PCI_ETH_RDMA,

> @@ -285,6 +286,8 @@ struct qed_hw_info {

>   	((dev)->hw_info.personality == QED_PCI_FCOE)

>   #define QED_IS_ISCSI_PERSONALITY(dev)					\

>   	((dev)->hw_info.personality == QED_PCI_ISCSI)

> +#define QED_IS_NVMETCP_PERSONALITY(dev)					\

> +	((dev)->hw_info.personality == QED_PCI_NVMETCP)

>   

So you have a distinct PCI personality for NVMe-oF, but not for the 
protocol? Strange.
Why don't you have a distinct NVMe-oF protocol ID?

>   	/* Resource Allocation scheme results */

>   	u32				resc_start[QED_MAX_RESC];

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> index 0a22f8ce9a2c..6cef75723e38 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> @@ -2106,6 +2106,30 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)

>   		}

>   		break;

>   	}

> +	case QED_PCI_NVMETCP:

> +	{

> +		struct qed_nvmetcp_pf_params *p_params;

> +

> +		p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> +

> +		if (p_params->num_cons && p_params->num_tasks) {

> +			qed_cxt_set_proto_cid_count(p_hwfn,

> +						    PROTOCOLID_NVMETCP,

> +						    p_params->num_cons,

> +						    0);

> +

> +			qed_cxt_set_proto_tid_count(p_hwfn,

> +						    PROTOCOLID_NVMETCP,

> +						    QED_CTX_NVMETCP_TID_SEG,

> +						    0,

> +						    p_params->num_tasks,

> +						    true);

> +		} else {

> +			DP_INFO(p_hwfn->cdev,

> +				"NvmeTCP personality used without setting params!\n");

> +		}

> +		break;

> +	}

>   	default:

>   		return -EINVAL;

>   	}

> @@ -2132,6 +2156,10 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,

>   		proto = PROTOCOLID_ISCSI;

>   		seg = QED_CXT_ISCSI_TID_SEG;

>   		break;

> +	case QED_PCI_NVMETCP:

> +		proto = PROTOCOLID_NVMETCP;

> +		seg = QED_CTX_NVMETCP_TID_SEG;

> +		break;

>   	default:

>   		return -EINVAL;

>   	}

> @@ -2458,6 +2486,10 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,

>   		proto = PROTOCOLID_ISCSI;

>   		seg = QED_CXT_ISCSI_TID_SEG;

>   		break;

> +	case QED_PCI_NVMETCP:

> +		proto = PROTOCOLID_NVMETCP;

> +		seg = QED_CTX_NVMETCP_TID_SEG;

> +		break;

>   	default:

>   		return -EINVAL;

>   	}

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> index 056e79620a0e..8f1a77cb33f6 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> @@ -51,6 +51,7 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,

>   			     struct qed_tid_mem *p_info);

>   

>   #define QED_CXT_ISCSI_TID_SEG	PROTOCOLID_ISCSI

> +#define QED_CTX_NVMETCP_TID_SEG PROTOCOLID_NVMETCP

>   #define QED_CXT_ROCE_TID_SEG	PROTOCOLID_ROCE

>   #define QED_CXT_FCOE_TID_SEG	PROTOCOLID_FCOE

>   enum qed_cxt_elem_type {

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c

> index d2f5855b2ea7..d3f8cc42d07e 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c

> @@ -37,6 +37,7 @@

>   #include "qed_sriov.h"

>   #include "qed_vf.h"

>   #include "qed_rdma.h"

> +#include "qed_nvmetcp.h"

>   

>   static DEFINE_SPINLOCK(qm_lock);

>   

> @@ -667,7 +668,8 @@ qed_llh_set_engine_affin(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)

>   	}

>   

>   	/* Storage PF is bound to a single engine while L2 PF uses both */

> -	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn))

> +	if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn) ||

> +	    QED_IS_NVMETCP_PERSONALITY(p_hwfn))

>   		eng = cdev->fir_affin ? QED_ENG1 : QED_ENG0;

>   	else			/* L2_PERSONALITY */

>   		eng = QED_BOTH_ENG;

> @@ -1164,6 +1166,9 @@ void qed_llh_remove_mac_filter(struct qed_dev *cdev,

>   	if (!test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))

>   		goto out;

>   

> +	if (QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> +		return;

> +

>   	ether_addr_copy(filter.mac.addr, mac_addr);

>   	rc = qed_llh_shadow_remove_filter(cdev, ppfid, &filter, &filter_idx,

>   					  &ref_cnt);

> @@ -1381,6 +1386,11 @@ void qed_resc_free(struct qed_dev *cdev)

>   			qed_ooo_free(p_hwfn);

>   		}

>   

> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> +			qed_nvmetcp_free(p_hwfn);

> +			qed_ooo_free(p_hwfn);

> +		}

> +

>   		if (QED_IS_RDMA_PERSONALITY(p_hwfn) && rdma_info) {

>   			qed_spq_unregister_async_cb(p_hwfn, rdma_info->proto);

>   			qed_rdma_info_free(p_hwfn);

> @@ -1423,6 +1433,7 @@ static u32 qed_get_pq_flags(struct qed_hwfn *p_hwfn)

>   		flags |= PQ_FLAGS_OFLD;

>   		break;

>   	case QED_PCI_ISCSI:

> +	case QED_PCI_NVMETCP:

>   		flags |= PQ_FLAGS_ACK | PQ_FLAGS_OOO | PQ_FLAGS_OFLD;

>   		break;

>   	case QED_PCI_ETH_ROCE:

> @@ -2269,6 +2280,12 @@ int qed_resc_alloc(struct qed_dev *cdev)

>   							PROTOCOLID_ISCSI,

>   							NULL);

>   			n_eqes += 2 * num_cons;

> +		} else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> +			num_cons =

> +			    qed_cxt_get_proto_cid_count(p_hwfn,

> +							PROTOCOLID_NVMETCP,

> +							NULL);

> +			n_eqes += 2 * num_cons;

>   		}

>   

>   		if (n_eqes > 0xFFFF) {

> @@ -2313,6 +2330,15 @@ int qed_resc_alloc(struct qed_dev *cdev)

>   				goto alloc_err;

>   		}

>   

> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> +			rc = qed_nvmetcp_alloc(p_hwfn);

> +			if (rc)

> +				goto alloc_err;

> +			rc = qed_ooo_alloc(p_hwfn);

> +			if (rc)

> +				goto alloc_err;

> +		}

> +

>   		if (QED_IS_RDMA_PERSONALITY(p_hwfn)) {

>   			rc = qed_rdma_info_alloc(p_hwfn);

>   			if (rc)

> @@ -2393,6 +2419,11 @@ void qed_resc_setup(struct qed_dev *cdev)

>   			qed_iscsi_setup(p_hwfn);

>   			qed_ooo_setup(p_hwfn);

>   		}

> +

> +		if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> +			qed_nvmetcp_setup(p_hwfn);

> +			qed_ooo_setup(p_hwfn);

> +		}

>   	}

>   }

>   

> @@ -2854,7 +2885,8 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,

>   

>   	/* Protocol Configuration */

>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_TCP_RT_OFFSET,

> -		     (p_hwfn->hw_info.personality == QED_PCI_ISCSI) ? 1 : 0);

> +		     ((p_hwfn->hw_info.personality == QED_PCI_ISCSI) ||

> +			 (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)) ? 1 : 0);

>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_FCOE_RT_OFFSET,

>   		     (p_hwfn->hw_info.personality == QED_PCI_FCOE) ? 1 : 0);

>   	STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_ROCE_RT_OFFSET, 0);

> @@ -3531,7 +3563,7 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)

>   					       RESC_NUM(p_hwfn,

>   							QED_CMDQS_CQS));

>   

> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

>   		feat_num[QED_ISCSI_CQ] = min_t(u32, sb_cnt.cnt,

>   					       RESC_NUM(p_hwfn,

>   							QED_CMDQS_CQS));

> @@ -3734,7 +3766,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,

>   		break;

>   	case QED_BDQ:

>   		if (p_hwfn->hw_info.personality != QED_PCI_ISCSI &&

> -		    p_hwfn->hw_info.personality != QED_PCI_FCOE)

> +		    p_hwfn->hw_info.personality != QED_PCI_FCOE &&

> +			p_hwfn->hw_info.personality != QED_PCI_NVMETCP)

>   			*p_resc_num = 0;

>   		else

>   			*p_resc_num = 1;

> @@ -3755,7 +3788,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,

>   			*p_resc_start = 0;

>   		else if (p_hwfn->cdev->num_ports_in_engine == 4)

>   			*p_resc_start = p_hwfn->port_id;

> -		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)

> +		else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI ||

> +			 p_hwfn->hw_info.personality == QED_PCI_NVMETCP)

>   			*p_resc_start = p_hwfn->port_id;

>   		else if (p_hwfn->hw_info.personality == QED_PCI_FCOE)

>   			*p_resc_start = p_hwfn->port_id + 2;

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> index 24472f6a83c2..9c9ec8f53ef8 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> @@ -12148,7 +12148,8 @@ struct public_func {

>   #define FUNC_MF_CFG_PROTOCOL_ISCSI              0x00000010

>   #define FUNC_MF_CFG_PROTOCOL_FCOE               0x00000020

>   #define FUNC_MF_CFG_PROTOCOL_ROCE               0x00000030

> -#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000030

> +#define FUNC_MF_CFG_PROTOCOL_NVMETCP    0x00000040

> +#define FUNC_MF_CFG_PROTOCOL_MAX	0x00000040

>   

>   #define FUNC_MF_CFG_MIN_BW_MASK		0x0000ff00

>   #define FUNC_MF_CFG_MIN_BW_SHIFT	8

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> index 49783f365079..88bfcdcd4a4c 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> @@ -960,7 +960,8 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,

>   

>   	if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&

>   	    p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&

> -	    conn_type != QED_LL2_TYPE_IWARP) {

> +	    conn_type != QED_LL2_TYPE_IWARP &&

> +		(!QED_IS_NVMETCP_PERSONALITY(p_hwfn))) {

>   		p_ramrod->mf_si_bcast_accept_all = 1;

>   		p_ramrod->mf_si_mcast_accept_all = 1;

>   	} else {

> @@ -1049,6 +1050,8 @@ static int qed_sp_ll2_tx_queue_start(struct qed_hwfn *p_hwfn,

>   	case QED_LL2_TYPE_OOO:

>   		if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)

>   			p_ramrod->conn_type = PROTOCOLID_ISCSI;

> +		else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)

> +			p_ramrod->conn_type = PROTOCOLID_NVMETCP;

>   		else

>   			p_ramrod->conn_type = PROTOCOLID_IWARP;

>   		break;

> @@ -1634,7 +1637,8 @@ int qed_ll2_establish_connection(void *cxt, u8 connection_handle)

>   	if (rc)

>   		goto out;

>   

> -	if (!QED_IS_RDMA_PERSONALITY(p_hwfn))

> +	if (!QED_IS_RDMA_PERSONALITY(p_hwfn) &&

> +	    !QED_IS_NVMETCP_PERSONALITY(p_hwfn))

>   		qed_wr(p_hwfn, p_ptt, PRS_REG_USE_LIGHT_L2, 1);

>   

>   	qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);

> @@ -2376,7 +2380,8 @@ static int qed_ll2_start_ooo(struct qed_hwfn *p_hwfn,

>   static bool qed_ll2_is_storage_eng1(struct qed_dev *cdev)

>   {

>   	return (QED_IS_FCOE_PERSONALITY(QED_LEADING_HWFN(cdev)) ||

> -		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev))) &&

> +		QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev)) ||

> +		QED_IS_NVMETCP_PERSONALITY(QED_LEADING_HWFN(cdev))) &&

>   		(QED_AFFIN_HWFN(cdev) != QED_LEADING_HWFN(cdev));

>   }

>   

> @@ -2402,11 +2407,13 @@ static int qed_ll2_stop(struct qed_dev *cdev)

>   

>   	if (cdev->ll2->handle == QED_LL2_UNUSED_HANDLE)

>   		return 0;

> +	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> +		qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);

>   

>   	qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);

>   	eth_zero_addr(cdev->ll2_mac_address);

>   

> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

>   		qed_ll2_stop_ooo(p_hwfn);

>   

>   	/* In CMT mode, LL2 is always started on engine 0 for a storage PF */

> @@ -2442,6 +2449,7 @@ static int __qed_ll2_start(struct qed_hwfn *p_hwfn,

>   		conn_type = QED_LL2_TYPE_FCOE;

>   		break;

>   	case QED_PCI_ISCSI:

> +	case QED_PCI_NVMETCP:

>   		conn_type = QED_LL2_TYPE_ISCSI;

>   		break;

>   	case QED_PCI_ETH_ROCE:

> @@ -2567,7 +2575,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

>   		}

>   	}

>   

> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn)) {

> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {

>   		DP_VERBOSE(cdev, QED_MSG_STORAGE, "Starting OOO LL2 queue\n");

>   		rc = qed_ll2_start_ooo(p_hwfn, params);

>   		if (rc) {

> @@ -2576,10 +2584,13 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

>   		}

>   	}

>   

> -	rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);

> -	if (rc) {

> -		DP_NOTICE(cdev, "Failed to add an LLH filter\n");

> -		goto err3;

> +	if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {

> +		rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);

> +		if (rc) {

> +			DP_NOTICE(cdev, "Failed to add an LLH filter\n");

> +			goto err3;

> +		}

> +

>   	}

>   

>   	ether_addr_copy(cdev->ll2_mac_address, params->ll2_mac_address);

> @@ -2587,7 +2598,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

>   	return 0;

>   

>   err3:

> -	if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> +	if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

>   		qed_ll2_stop_ooo(p_hwfn);

>   err2:

>   	if (b_is_storage_eng1)

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> index cd882c453394..4387292c37e2 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> @@ -2446,6 +2446,9 @@ qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,

>   	case FUNC_MF_CFG_PROTOCOL_ISCSI:

>   		*p_proto = QED_PCI_ISCSI;

>   		break;

> +	case FUNC_MF_CFG_PROTOCOL_NVMETCP:

> +		*p_proto = QED_PCI_NVMETCP;

> +		break;

>   	case FUNC_MF_CFG_PROTOCOL_FCOE:

>   		*p_proto = QED_PCI_FCOE;

>   		break;

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> index 3e3192a3ad9b..6190adf965bc 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> @@ -1306,7 +1306,8 @@ int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)

>   	}

>   

>   	if ((tlv_group & QED_MFW_TLV_ISCSI) &&

> -	    p_hwfn->hw_info.personality != QED_PCI_ISCSI) {

> +	    p_hwfn->hw_info.personality != QED_PCI_ISCSI &&

> +		p_hwfn->hw_info.personality != QED_PCI_NVMETCP) {

>   		DP_VERBOSE(p_hwfn, QED_MSG_SP,

>   			   "Skipping iSCSI TLVs for non-iSCSI function\n");

>   		tlv_group &= ~QED_MFW_TLV_ISCSI;

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ooo.c b/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> index 88353aa404dc..d37bb2463f98 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> @@ -16,7 +16,7 @@

>   #include "qed_ll2.h"

>   #include "qed_ooo.h"

>   #include "qed_cxt.h"

> -

> +#include "qed_nvmetcp.h"

>   static struct qed_ooo_archipelago

>   *qed_ooo_seek_archipelago(struct qed_hwfn *p_hwfn,

>   			  struct qed_ooo_info

> @@ -85,6 +85,9 @@ int qed_ooo_alloc(struct qed_hwfn *p_hwfn)

>   	case QED_PCI_ISCSI:

>   		proto = PROTOCOLID_ISCSI;

>   		break;

> +	case QED_PCI_NVMETCP:

> +		proto = PROTOCOLID_NVMETCP;

> +		break;

>   	case QED_PCI_ETH_RDMA:

>   	case QED_PCI_ETH_IWARP:

>   		proto = PROTOCOLID_IWARP;

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> index aa71adcf31ee..60b3876387a9 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> @@ -385,6 +385,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,

>   		p_ramrod->personality = PERSONALITY_FCOE;

>   		break;

>   	case QED_PCI_ISCSI:

> +	case QED_PCI_NVMETCP:

>   		p_ramrod->personality = PERSONALITY_ISCSI;

>   		break;

>   	case QED_PCI_ETH_ROCE:

> 

As indicated, I do find this mix of 'nvmetcp is nearly iscsi' a bit 
strange. I would have preferred to have distinct types for nvmetcp.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:24 a.m. UTC | #8
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch introduces the NVMeTCP FW initializations which is used

> to initialize the IO level configuration into a per IO HW

> resource ("task") as part of the IO path flow.

> 

> This includes:

> - Write IO FW initialization

> - Read IO FW initialization.

> - IC-Req and IC-Resp FW exchange.

> - FW Cleanup flow (Flush IO).

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> ---

>   drivers/net/ethernet/qlogic/qed/Makefile      |   5 +-

>   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   7 +-

>   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         | 372 ++++++++++++++++++

>   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |  43 ++

>   include/linux/qed/nvmetcp_common.h            |   3 +

>   include/linux/qed/qed_nvmetcp_if.h            |  17 +

>   6 files changed, 445 insertions(+), 2 deletions(-)

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> 

> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile

> index 7cb0db67ba5b..0d9c2fe0245d 100644

> --- a/drivers/net/ethernet/qlogic/qed/Makefile

> +++ b/drivers/net/ethernet/qlogic/qed/Makefile

> @@ -28,7 +28,10 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o

>   qed-$(CONFIG_QED_LL2) += qed_ll2.o

>   qed-$(CONFIG_QED_OOO) += qed_ooo.o

>   

> -qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o

> +qed-$(CONFIG_QED_NVMETCP) +=	\

> +	qed_nvmetcp.o		\

> +	qed_nvmetcp_fw_funcs.o	\

> +	qed_nvmetcp_ip_services.o

>   

>   qed-$(CONFIG_QED_RDMA) +=	\

>   	qed_iwarp.o		\

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> index 1e2eb6dcbd6e..434363f8b5c0 100644

> --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> @@ -27,6 +27,7 @@

>   #include "qed_mcp.h"

>   #include "qed_sp.h"

>   #include "qed_reg_addr.h"

> +#include "qed_nvmetcp_fw_funcs.h"

>   

>   static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,

>   				   u16 echo, union event_ring_data *data,

> @@ -848,7 +849,11 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

>   	.remove_src_tcp_port_filter = &qed_llh_remove_src_tcp_port_filter,

>   	.add_dst_tcp_port_filter = &qed_llh_add_dst_tcp_port_filter,

>   	.remove_dst_tcp_port_filter = &qed_llh_remove_dst_tcp_port_filter,

> -	.clear_all_filters = &qed_llh_clear_all_filters

> +	.clear_all_filters = &qed_llh_clear_all_filters,

> +	.init_read_io = &init_nvmetcp_host_read_task,

> +	.init_write_io = &init_nvmetcp_host_write_task,

> +	.init_icreq_exchange = &init_nvmetcp_init_conn_req_task,

> +	.init_task_cleanup = &init_cleanup_task_nvmetcp

>   };

>   

>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> new file mode 100644

> index 000000000000..8485ad678284

> --- /dev/null

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> @@ -0,0 +1,372 @@

> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#include <linux/kernel.h>

> +#include <linux/module.h>

> +#include <linux/pci.h>

> +#include <linux/kernel.h>

> +#include <linux/list.h>

> +#include <linux/mm.h>

> +#include <linux/types.h>

> +#include <asm/byteorder.h>

> +#include <linux/qed/common_hsi.h>

> +#include <linux/qed/storage_common.h>

> +#include <linux/qed/nvmetcp_common.h>

> +#include <linux/qed/qed_nvmetcp_if.h>

> +#include "qed_nvmetcp_fw_funcs.h"

> +

> +#define NVMETCP_NUM_SGES_IN_CACHE 0x4

> +

> +bool nvmetcp_is_slow_sgl(u16 num_sges, bool small_mid_sge)

> +{

> +	return (num_sges > SCSI_NUM_SGES_SLOW_SGL_THR && small_mid_sge);

> +}

> +

> +void init_scsi_sgl_context(struct scsi_sgl_params *ctx_sgl_params,

> +			   struct scsi_cached_sges *ctx_data_desc,

> +			   struct storage_sgl_task_params *sgl_params)

> +{

> +	u8 num_sges_to_init = (u8)(sgl_params->num_sges > NVMETCP_NUM_SGES_IN_CACHE ?

> +				   NVMETCP_NUM_SGES_IN_CACHE : sgl_params->num_sges);

> +	u8 sge_index;

> +

> +	/* sgl params */

> +	ctx_sgl_params->sgl_addr.lo = cpu_to_le32(sgl_params->sgl_phys_addr.lo);

> +	ctx_sgl_params->sgl_addr.hi = cpu_to_le32(sgl_params->sgl_phys_addr.hi);

> +	ctx_sgl_params->sgl_total_length = cpu_to_le32(sgl_params->total_buffer_size);

> +	ctx_sgl_params->sgl_num_sges = cpu_to_le16(sgl_params->num_sges);

> +

> +	for (sge_index = 0; sge_index < num_sges_to_init; sge_index++) {

> +		ctx_data_desc->sge[sge_index].sge_addr.lo =

> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.lo);

> +		ctx_data_desc->sge[sge_index].sge_addr.hi =

> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.hi);

> +		ctx_data_desc->sge[sge_index].sge_len =

> +			cpu_to_le32(sgl_params->sgl[sge_index].sge_len);

> +	}

> +}

> +

> +static inline u32 calc_rw_task_size(struct nvmetcp_task_params *task_params,

> +				    enum nvmetcp_task_type task_type)

> +{

> +	u32 io_size;

> +

> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE)

> +		io_size = task_params->tx_io_size;

> +	else

> +		io_size = task_params->rx_io_size;

> +

> +	if (unlikely(!io_size))

> +		return 0;

> +

> +	return io_size;

> +}

> +

> +static inline void init_sqe(struct nvmetcp_task_params *task_params,

> +			    struct storage_sgl_task_params *sgl_task_params,

> +			    enum nvmetcp_task_type task_type)

> +{

> +	if (!task_params->sqe)

> +		return;

> +

> +	memset(task_params->sqe, 0, sizeof(*task_params->sqe));

> +	task_params->sqe->task_id = cpu_to_le16(task_params->itid);

> +

> +	switch (task_type) {

> +	case NVMETCP_TASK_TYPE_HOST_WRITE: {

> +		u32 buf_size = 0;

> +		u32 num_sges = 0;

> +

> +		SET_FIELD(task_params->sqe->contlen_cdbsize,

> +			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);

> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> +			  NVMETCP_WQE_TYPE_NORMAL);

> +		if (task_params->tx_io_size) {

> +			if (task_params->send_write_incapsule)

> +				buf_size = calc_rw_task_size(task_params, task_type);

> +

> +			if (nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> +						sgl_task_params->small_mid_sge))

> +				num_sges = NVMETCP_WQE_NUM_SGES_SLOWIO;

> +			else

> +				num_sges = min((u16)sgl_task_params->num_sges,

> +					       (u16)SCSI_NUM_SGES_SLOW_SGL_THR);

> +		}

> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES, num_sges);

> +		SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN, buf_size);

> +	} break;

> +

> +	case NVMETCP_TASK_TYPE_HOST_READ: {

> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> +			  NVMETCP_WQE_TYPE_NORMAL);

> +		SET_FIELD(task_params->sqe->contlen_cdbsize,

> +			  NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);

> +	} break;

> +

> +	case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST: {

> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> +			  NVMETCP_WQE_TYPE_MIDDLE_PATH);

> +

> +		if (task_params->tx_io_size) {

> +			SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN,

> +				  task_params->tx_io_size);

> +			SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES,

> +				  min((u16)sgl_task_params->num_sges,

> +				      (u16)SCSI_NUM_SGES_SLOW_SGL_THR));

> +		}

> +	} break;

> +

> +	case NVMETCP_TASK_TYPE_CLEANUP:

> +		SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> +			  NVMETCP_WQE_TYPE_TASK_CLEANUP);

> +

> +	default:

> +		break;

> +	}

> +}

> +

> +/* The following function initializes of NVMeTCP task params */

> +static inline void

> +init_nvmetcp_task_params(struct e5_nvmetcp_task_context *context,

> +			 struct nvmetcp_task_params *task_params,

> +			 enum nvmetcp_task_type task_type)

> +{

> +	context->ystorm_st_context.state.cccid = task_params->host_cccid;

> +	SET_FIELD(context->ustorm_st_context.error_flags, USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP, 1);

> +	context->ustorm_st_context.nvme_tcp_opaque_lo = cpu_to_le32(task_params->opq.lo);

> +	context->ustorm_st_context.nvme_tcp_opaque_hi = cpu_to_le32(task_params->opq.hi);

> +}

> +

> +/* The following function initializes default values to all tasks */

> +static inline void

> +init_default_nvmetcp_task(struct nvmetcp_task_params *task_params, void *pdu_header,

> +			  enum nvmetcp_task_type task_type)

> +{

> +	struct e5_nvmetcp_task_context *context = task_params->context;

> +	const u8 val_byte = context->mstorm_ag_context.cdu_validation;

> +	u8 dw_index;

> +

> +	memset(context, 0, sizeof(*context));

> +

> +	init_nvmetcp_task_params(context, task_params,

> +				 (enum nvmetcp_task_type)task_type);

> +

> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE ||

> +	    task_type == NVMETCP_TASK_TYPE_HOST_READ) {

> +		for (dw_index = 0; dw_index < QED_NVMETCP_CMD_HDR_SIZE / 4; dw_index++)

> +			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =

> +				cpu_to_le32(((u32 *)pdu_header)[dw_index]);

> +	} else {

> +		for (dw_index = 0; dw_index < QED_NVMETCP_CMN_HDR_SIZE / 4; dw_index++)

> +			context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =

> +				cpu_to_le32(((u32 *)pdu_header)[dw_index]);

> +	}

> +


And this is what I meant. You are twiddling with the bytes already, so 
why bother with a separate struct at all?

> +	/* M-Storm Context: */

> +	context->mstorm_ag_context.cdu_validation = val_byte;

> +	context->mstorm_st_context.task_type = (u8)(task_type);

> +	context->mstorm_ag_context.task_cid = cpu_to_le16(task_params->conn_icid);

> +

> +	/* Ustorm Context: */

> +	SET_FIELD(context->ustorm_ag_context.flags1, E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV, 1);

> +	context->ustorm_st_context.task_type = (u8)(task_type);

> +	context->ustorm_st_context.cq_rss_number = task_params->cq_rss_number;

> +	context->ustorm_ag_context.icid = cpu_to_le16(task_params->conn_icid);

> +}

> +

> +/* The following function initializes the U-Storm Task Contexts */

> +static inline void

> +init_ustorm_task_contexts(struct ustorm_nvmetcp_task_st_ctx *ustorm_st_context,

> +			  struct e5_ustorm_nvmetcp_task_ag_ctx *ustorm_ag_context,

> +			  u32 remaining_recv_len,

> +			  u32 expected_data_transfer_len, u8 num_sges,

> +			  bool tx_dif_conn_err_en)

> +{

> +	/* Remaining data to be received in bytes. Used in validations*/

> +	ustorm_st_context->rem_rcv_len = cpu_to_le32(remaining_recv_len);

> +	ustorm_ag_context->exp_data_acked = cpu_to_le32(expected_data_transfer_len);

> +	ustorm_st_context->exp_data_transfer_len = cpu_to_le32(expected_data_transfer_len);

> +	SET_FIELD(ustorm_st_context->reg1.reg1_map, NVMETCP_REG1_NUM_SGES, num_sges);

> +	SET_FIELD(ustorm_ag_context->flags2, E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN,

> +		  tx_dif_conn_err_en ? 1 : 0);

> +}

> +

> +/* The following function initializes Local Completion Contexts: */

> +static inline void

> +set_local_completion_context(struct e5_nvmetcp_task_context *context)

> +{

> +	SET_FIELD(context->ystorm_st_context.state.flags,

> +		  YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP, 1);

> +	SET_FIELD(context->ustorm_st_context.flags,

> +		  USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP, 1);

> +}

> +

> +/* Common Fastpath task init function: */

> +static inline void

> +init_rw_nvmetcp_task(struct nvmetcp_task_params *task_params,

> +		     enum nvmetcp_task_type task_type,

> +		     struct nvmetcp_conn_params *conn_params, void *pdu_header,

> +		     struct storage_sgl_task_params *sgl_task_params)

> +{

> +	struct e5_nvmetcp_task_context *context = task_params->context;

> +	u32 task_size = calc_rw_task_size(task_params, task_type);

> +	u32 exp_data_transfer_len = conn_params->max_burst_length;

> +	bool slow_io = false;

> +	u8 num_sges = 0;

> +

> +	init_default_nvmetcp_task(task_params, pdu_header, task_type);

> +

> +	/* Tx/Rx: */

> +	if (task_params->tx_io_size) {

> +		/* if data to transmit: */

> +		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,

> +				      &context->ystorm_st_context.state.data_desc,

> +				      sgl_task_params);

> +		slow_io = nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> +					      sgl_task_params->small_mid_sge);

> +		num_sges =

> +			(u8)(!slow_io ? min((u32)sgl_task_params->num_sges,

> +					    (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :

> +					    NVMETCP_WQE_NUM_SGES_SLOWIO);

> +		if (slow_io) {

> +			SET_FIELD(context->ystorm_st_context.state.flags,

> +				  YSTORM_NVMETCP_TASK_STATE_SLOW_IO, 1);

> +		}

> +	} else if (task_params->rx_io_size) {

> +		/* if data to receive: */

> +		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,

> +				      &context->mstorm_st_context.data_desc,

> +				      sgl_task_params);

> +		num_sges =

> +			(u8)(!nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> +						  sgl_task_params->small_mid_sge) ?

> +						  min((u32)sgl_task_params->num_sges,

> +						      (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :

> +						      NVMETCP_WQE_NUM_SGES_SLOWIO);

> +		context->mstorm_st_context.rem_task_size = cpu_to_le32(task_size);

> +	}

> +

> +	/* Ustorm context: */

> +	if (exp_data_transfer_len > task_size)

> +		/* The size of the transmitted task*/

> +		exp_data_transfer_len = task_size;

> +	init_ustorm_task_contexts(&context->ustorm_st_context,

> +				  &context->ustorm_ag_context,

> +				  /* Remaining Receive length is the Task Size */

> +				  task_size,

> +				  /* The size of the transmitted task */

> +				  exp_data_transfer_len,

> +				  /* num_sges */

> +				  num_sges,

> +				  false);

> +

> +	/* Set exp_data_acked */

> +	if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE) {

> +		if (task_params->send_write_incapsule)

> +			context->ustorm_ag_context.exp_data_acked = task_size;

> +		else

> +			context->ustorm_ag_context.exp_data_acked = 0;

> +	} else if (task_type == NVMETCP_TASK_TYPE_HOST_READ) {

> +		context->ustorm_ag_context.exp_data_acked = 0;

> +	}

> +

> +	context->ustorm_ag_context.exp_cont_len = 0;

> +

> +	init_sqe(task_params, sgl_task_params, task_type);

> +}

> +

> +static void

> +init_common_initiator_read_task(struct nvmetcp_task_params *task_params,

> +				struct nvmetcp_conn_params *conn_params,

> +				struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				struct storage_sgl_task_params *sgl_task_params)

> +{

> +	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_READ,

> +			     conn_params, cmd_pdu_header, sgl_task_params);

> +}

> +

> +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,

> +				 struct nvmetcp_conn_params *conn_params,

> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				 struct storage_sgl_task_params *sgl_task_params)

> +{

> +	init_common_initiator_read_task(task_params, conn_params,

> +					(void *)cmd_pdu_header, sgl_task_params);

> +}

> +

> +static void

> +init_common_initiator_write_task(struct nvmetcp_task_params *task_params,

> +				 struct nvmetcp_conn_params *conn_params,

> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				 struct storage_sgl_task_params *sgl_task_params)

> +{

> +	init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_WRITE,

> +			     conn_params, cmd_pdu_header, sgl_task_params);

> +}

> +

> +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,

> +				  struct nvmetcp_conn_params *conn_params,

> +				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				  struct storage_sgl_task_params *sgl_task_params)

> +{

> +	init_common_initiator_write_task(task_params, conn_params,

> +					 (void *)cmd_pdu_header,

> +					 sgl_task_params);

> +}

> +

> +static void

> +init_common_login_request_task(struct nvmetcp_task_params *task_params,

> +			       void *login_req_pdu_header,

> +			       struct storage_sgl_task_params *tx_sgl_task_params,

> +			       struct storage_sgl_task_params *rx_sgl_task_params)

> +{

> +	struct e5_nvmetcp_task_context *context = task_params->context;

> +

> +	init_default_nvmetcp_task(task_params, (void *)login_req_pdu_header,

> +				  NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);

> +

> +	/* Ustorm Context: */

> +	init_ustorm_task_contexts(&context->ustorm_st_context,

> +				  &context->ustorm_ag_context,

> +

> +				  /* Remaining Receive length is the Task Size */

> +				  task_params->rx_io_size ?

> +				  rx_sgl_task_params->total_buffer_size : 0,

> +

> +				  /* The size of the transmitted task */

> +				  task_params->tx_io_size ?

> +				  tx_sgl_task_params->total_buffer_size : 0,

> +				  0, /* num_sges */

> +				  0); /* tx_dif_conn_err_en */

> +

> +	/* SGL context: */

> +	if (task_params->tx_io_size)

> +		init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,

> +				      &context->ystorm_st_context.state.data_desc,

> +				      tx_sgl_task_params);

> +	if (task_params->rx_io_size)

> +		init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,

> +				      &context->mstorm_st_context.data_desc,

> +				      rx_sgl_task_params);

> +

> +	context->mstorm_st_context.rem_task_size =

> +		cpu_to_le32(task_params->rx_io_size ?

> +				 rx_sgl_task_params->total_buffer_size : 0);

> +

> +	init_sqe(task_params, tx_sgl_task_params, NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);

> +}

> +

> +/* The following function initializes Login task in Host mode: */

> +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,

> +				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> +				     struct storage_sgl_task_params *tx_sgl_task_params,

> +				     struct storage_sgl_task_params *rx_sgl_task_params)

> +{

> +	init_common_login_request_task(task_params, init_conn_req_pdu_hdr,

> +				       tx_sgl_task_params, rx_sgl_task_params);

> +}

> +

> +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params)

> +{

> +	init_sqe(task_params, NULL, NVMETCP_TASK_TYPE_CLEANUP);

> +}

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> new file mode 100644

> index 000000000000..3a8c74356c4c

> --- /dev/null

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> @@ -0,0 +1,43 @@

> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> +/* Copyright 2021 Marvell. All rights reserved. */

> +

> +#ifndef _QED_NVMETCP_FW_FUNCS_H

> +#define _QED_NVMETCP_FW_FUNCS_H

> +

> +#include <linux/kernel.h>

> +#include <linux/module.h>

> +#include <linux/pci.h>

> +#include <linux/kernel.h>

> +#include <linux/list.h>

> +#include <linux/mm.h>

> +#include <linux/types.h>

> +#include <asm/byteorder.h>

> +#include <linux/qed/common_hsi.h>

> +#include <linux/qed/storage_common.h>

> +#include <linux/qed/nvmetcp_common.h>

> +#include <linux/qed/qed_nvmetcp_if.h>

> +

> +#if IS_ENABLED(CONFIG_QED_NVMETCP)

> +

> +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,

> +				 struct nvmetcp_conn_params *conn_params,

> +				 struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				 struct storage_sgl_task_params *sgl_task_params);

> +

> +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,

> +				  struct nvmetcp_conn_params *conn_params,

> +				  struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +				  struct storage_sgl_task_params *sgl_task_params);

> +

> +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,

> +				     struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> +				     struct storage_sgl_task_params *tx_sgl_task_params,

> +				     struct storage_sgl_task_params *rx_sgl_task_params);

> +

> +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params);

> +

> +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> +

> +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> +

> +#endif /* _QED_NVMETCP_FW_FUNCS_H */

> diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> index dda7a785c321..c0023bb185dd 100644

> --- a/include/linux/qed/nvmetcp_common.h

> +++ b/include/linux/qed/nvmetcp_common.h

> @@ -9,6 +9,9 @@

>   #define NVMETCP_SLOW_PATH_LAYER_CODE (6)

>   #define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)

>   

> +#define QED_NVMETCP_CMD_HDR_SIZE 72

> +#define QED_NVMETCP_CMN_HDR_SIZE 24

> +

>   /* NVMeTCP firmware function init parameters */

>   struct nvmetcp_spe_func_init {

>   	__le16 half_way_close_timeout;

> diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> index 04e90dc42c12..d971be84f804 100644

> --- a/include/linux/qed/qed_nvmetcp_if.h

> +++ b/include/linux/qed/qed_nvmetcp_if.h

> @@ -220,6 +220,23 @@ struct qed_nvmetcp_ops {

>   	void (*remove_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);

>   

>   	void (*clear_all_filters)(struct qed_dev *cdev);

> +

> +	void (*init_read_io)(struct nvmetcp_task_params *task_params,

> +			     struct nvmetcp_conn_params *conn_params,

> +			     struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +			     struct storage_sgl_task_params *sgl_task_params);

> +

> +	void (*init_write_io)(struct nvmetcp_task_params *task_params,

> +			      struct nvmetcp_conn_params *conn_params,

> +			      struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> +			      struct storage_sgl_task_params *sgl_task_params);

> +

> +	void (*init_icreq_exchange)(struct nvmetcp_task_params *task_params,

> +				    struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> +				    struct storage_sgl_task_params *tx_sgl_task_params,

> +				    struct storage_sgl_task_params *rx_sgl_task_params);

> +

> +	void (*init_task_cleanup)(struct nvmetcp_task_params *task_params);

>   };

>   

>   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:26 a.m. UTC | #9
On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Nikolay Assa <nassa@marvell.com>

> 

> This patch introduces APIs which the NVMeTCP Offload device (qedn)

> will use through the paired net-device (qede).

> It includes APIs for:

> - ipv4/ipv6 routing

> - get VLAN from net-device

> - TCP ports reservation

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Nikolay Assa <nassa@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   .../qlogic/qed/qed_nvmetcp_ip_services.c      | 239 ++++++++++++++++++

>   .../linux/qed/qed_nvmetcp_ip_services_if.h    |  29 +++

>   2 files changed, 268 insertions(+)

>   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

>   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

> 

> diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> new file mode 100644

> index 000000000000..2904b1a0830a

> --- /dev/null

> +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> @@ -0,0 +1,239 @@

> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> +/*

> + * Copyright 2021 Marvell. All rights reserved.

> + */

> +

> +#include <linux/types.h>

> +#include <asm/byteorder.h>

> +#include <asm/param.h>

> +#include <linux/delay.h>

> +#include <linux/pci.h>

> +#include <linux/dma-mapping.h>

> +#include <linux/etherdevice.h>

> +#include <linux/kernel.h>

> +#include <linux/stddef.h>

> +#include <linux/errno.h>

> +

> +#include <net/tcp.h>

> +

> +#include <linux/qed/qed_nvmetcp_ip_services_if.h>

> +

> +#define QED_IP_RESOL_TIMEOUT  4

> +

> +int qed_route_ipv4(struct sockaddr_storage *local_addr,

> +		   struct sockaddr_storage *remote_addr,

> +		   struct sockaddr *hardware_address,

> +		   struct net_device **ndev)

> +{

> +	struct neighbour *neigh = NULL;

> +	__be32 *loc_ip, *rem_ip;

> +	struct rtable *rt;

> +	int rc = -ENXIO;

> +	int retry;

> +

> +	loc_ip = &((struct sockaddr_in *)local_addr)->sin_addr.s_addr;

> +	rem_ip = &((struct sockaddr_in *)remote_addr)->sin_addr.s_addr;

> +	*ndev = NULL;

> +	rt = ip_route_output(&init_net, *rem_ip, *loc_ip, 0/*tos*/, 0/*oif*/);

> +	if (IS_ERR(rt)) {

> +		pr_err("lookup route failed\n");

> +		rc = PTR_ERR(rt);

> +		goto return_err;

> +	}

> +

> +	neigh = dst_neigh_lookup(&rt->dst, rem_ip);

> +	if (!neigh) {

> +		rc = -ENOMEM;

> +		ip_rt_put(rt);

> +		goto return_err;

> +	}

> +

> +	*ndev = rt->dst.dev;

> +	ip_rt_put(rt);

> +

> +	/* If not resolved, kick-off state machine towards resolution */

> +	if (!(neigh->nud_state & NUD_VALID))

> +		neigh_event_send(neigh, NULL);

> +

> +	/* query neighbor until resolved or timeout */

> +	retry = QED_IP_RESOL_TIMEOUT;

> +	while (!(neigh->nud_state & NUD_VALID) && retry > 0) {

> +		msleep(1000);

> +		retry--;

> +	}

> +

> +	if (neigh->nud_state & NUD_VALID) {

> +		/* copy resolved MAC address */

> +		neigh_ha_snapshot(hardware_address->sa_data, neigh, *ndev);

> +

> +		hardware_address->sa_family = (*ndev)->type;

> +		rc = 0;

> +	}

> +

> +	neigh_release(neigh);

> +	if (!(*loc_ip)) {

> +		*loc_ip = inet_select_addr(*ndev, *rem_ip, RT_SCOPE_UNIVERSE);

> +		local_addr->ss_family = AF_INET;

> +	}

> +

> +return_err:

> +

> +	return rc;

> +}

> +EXPORT_SYMBOL(qed_route_ipv4);

> +

> +int qed_route_ipv6(struct sockaddr_storage *local_addr,

> +		   struct sockaddr_storage *remote_addr,

> +		   struct sockaddr *hardware_address,

> +		   struct net_device **ndev)

> +{

> +	struct neighbour *neigh = NULL;

> +	struct dst_entry *dst;

> +	struct flowi6 fl6;

> +	int rc = -ENXIO;

> +	int retry;

> +

> +	memset(&fl6, 0, sizeof(fl6));

> +	fl6.saddr = ((struct sockaddr_in6 *)local_addr)->sin6_addr;

> +	fl6.daddr = ((struct sockaddr_in6 *)remote_addr)->sin6_addr;

> +

> +	dst = ip6_route_output(&init_net, NULL, &fl6);

> +	if (!dst || dst->error) {

> +		if (dst) {

> +			dst_release(dst);

> +			pr_err("lookup route failed %d\n", dst->error);

> +		}

> +

> +		goto out;

> +	}

> +

> +	neigh = dst_neigh_lookup(dst, &fl6.daddr);

> +	if (neigh) {

> +		*ndev = ip6_dst_idev(dst)->dev;

> +

> +		/* If not resolved, kick-off state machine towards resolution */

> +		if (!(neigh->nud_state & NUD_VALID))

> +			neigh_event_send(neigh, NULL);

> +

> +		/* query neighbor until resolved or timeout */

> +		retry = QED_IP_RESOL_TIMEOUT;

> +		while (!(neigh->nud_state & NUD_VALID) && retry > 0) {

> +			msleep(1000);

> +			retry--;

> +		}

> +

> +		if (neigh->nud_state & NUD_VALID) {

> +			neigh_ha_snapshot((u8 *)hardware_address->sa_data, neigh, *ndev);

> +

> +			hardware_address->sa_family = (*ndev)->type;

> +			rc = 0;

> +		}

> +

> +		neigh_release(neigh);

> +

> +		if (ipv6_addr_any(&fl6.saddr)) {

> +			if (ipv6_dev_get_saddr(dev_net(*ndev), *ndev,

> +					       &fl6.daddr, 0, &fl6.saddr)) {

> +				pr_err("Unable to find source IP address\n");

> +				goto out;

> +			}

> +

> +			local_addr->ss_family = AF_INET6;

> +			((struct sockaddr_in6 *)local_addr)->sin6_addr =

> +								fl6.saddr;

> +		}

> +	}

> +

> +	dst_release(dst);

> +

> +out:

> +

> +	return rc;

> +}

> +EXPORT_SYMBOL(qed_route_ipv6);

> +

> +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id)

> +{

> +	if (is_vlan_dev(*ndev)) {

> +		*vlan_id = vlan_dev_vlan_id(*ndev);

> +		*ndev = vlan_dev_real_dev(*ndev);

> +	}

> +}

> +EXPORT_SYMBOL(qed_vlan_get_ndev);

> +

> +struct pci_dev *qed_validate_ndev(struct net_device *ndev)

> +{

> +	struct pci_dev *pdev = NULL;

> +	struct net_device *upper;

> +

> +	for_each_pci_dev(pdev) {

> +		if (pdev && pdev->driver &&

> +		    !strcmp(pdev->driver->name, "qede")) {

> +			upper = pci_get_drvdata(pdev);

> +			if (upper->ifindex == ndev->ifindex)

> +				return pdev;

> +		}

> +	}

> +

> +	return NULL;

> +}

> +EXPORT_SYMBOL(qed_validate_ndev);

> +

> +__be16 qed_get_in_port(struct sockaddr_storage *sa)

> +{

> +	return sa->ss_family == AF_INET

> +		? ((struct sockaddr_in *)sa)->sin_port

> +		: ((struct sockaddr_in6 *)sa)->sin6_port;

> +}

> +EXPORT_SYMBOL(qed_get_in_port);

> +

> +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,

> +		       struct socket **sock, u16 *port)

> +{

> +	struct sockaddr_storage sa;

> +	int rc = 0;

> +

> +	rc = sock_create(local_ip_addr.ss_family, SOCK_STREAM, IPPROTO_TCP, sock);

> +	if (rc) {

> +		pr_warn("failed to create socket: %d\n", rc);

> +		goto err;

> +	}

> +

> +	(*sock)->sk->sk_allocation = GFP_KERNEL;

> +	sk_set_memalloc((*sock)->sk);

> +

> +	rc = kernel_bind(*sock, (struct sockaddr *)&local_ip_addr,

> +			 sizeof(local_ip_addr));

> +

> +	if (rc) {

> +		pr_warn("failed to bind socket: %d\n", rc);

> +		goto err_sock;

> +	}

> +

> +	rc = kernel_getsockname(*sock, (struct sockaddr *)&sa);

> +	if (rc < 0) {

> +		pr_warn("getsockname() failed: %d\n", rc);

> +		goto err_sock;

> +	}

> +

> +	*port = ntohs(qed_get_in_port(&sa));

> +

> +	return 0;

> +

> +err_sock:

> +	sock_release(*sock);

> +	sock = NULL;

> +err:

> +

> +	return rc;

> +}

> +EXPORT_SYMBOL(qed_fetch_tcp_port);

> +

> +void qed_return_tcp_port(struct socket *sock)

> +{

> +	if (sock && sock->sk) {

> +		tcp_set_state(sock->sk, TCP_CLOSE);

> +		sock_release(sock);

> +	}

> +}

> +EXPORT_SYMBOL(qed_return_tcp_port);

> diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h

> new file mode 100644

> index 000000000000..3604aee53796

> --- /dev/null

> +++ b/include/linux/qed/qed_nvmetcp_ip_services_if.h

> @@ -0,0 +1,29 @@

> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> +/*

> + * Copyright 2021 Marvell. All rights reserved.

> + */

> +

> +#ifndef _QED_IP_SERVICES_IF_H

> +#define _QED_IP_SERVICES_IF_H

> +

> +#include <linux/types.h>

> +#include <net/route.h>

> +#include <net/ip6_route.h>

> +#include <linux/inetdevice.h>

> +

> +int qed_route_ipv4(struct sockaddr_storage *local_addr,

> +		   struct sockaddr_storage *remote_addr,

> +		   struct sockaddr *hardware_address,

> +		   struct net_device **ndev);

> +int qed_route_ipv6(struct sockaddr_storage *local_addr,

> +		   struct sockaddr_storage *remote_addr,

> +		   struct sockaddr *hardware_address,

> +		   struct net_device **ndev);

> +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id);

> +struct pci_dev *qed_validate_ndev(struct net_device *ndev);

> +void qed_return_tcp_port(struct socket *sock);

> +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,

> +		       struct socket **sock, u16 *port);

> +__be16 qed_get_in_port(struct sockaddr_storage *sa);

> +

> +#endif /* _QED_IP_SERVICES_IF_H */

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:27 a.m. UTC | #10
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the skeleton of the qedn driver.

> The new driver will be added under "drivers/nvme/hw/qedn" and will be

> enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

> 

> The internal implementation:

> - qedn.h:

>    Includes all common structs to be used by the qedn vendor driver.

> 

> - qedn_main.c

>    Includes the qedn_init and qedn_cleanup implementation.

>    As part of the qedn init, the driver will register as a pci device and

>    will work with the Marvell fastlinQ NICs.

>    As part of the probe, the driver will register to the nvme_tcp_offload

>    (ULP).

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Arie Gershberg <agershberg@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   MAINTAINERS                      |  10 ++

>   drivers/nvme/Kconfig             |   1 +

>   drivers/nvme/Makefile            |   1 +

>   drivers/nvme/hw/Kconfig          |   8 ++

>   drivers/nvme/hw/Makefile         |   3 +

>   drivers/nvme/hw/qedn/Makefile    |   5 +

>   drivers/nvme/hw/qedn/qedn.h      |  19 +++

>   drivers/nvme/hw/qedn/qedn_main.c | 201 +++++++++++++++++++++++++++++++

>   8 files changed, 248 insertions(+)

>   create mode 100644 drivers/nvme/hw/Kconfig

>   create mode 100644 drivers/nvme/hw/Makefile

>   create mode 100644 drivers/nvme/hw/qedn/Makefile

>   create mode 100644 drivers/nvme/hw/qedn/qedn.h

>   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:32 a.m. UTC | #11
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the adding of qedn_fp_queue - this is a per cpu

> core element which handles all of the connections on that cpu core.

> The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which

> are handled on the same cpu core, and will only use the same FW-driver

> resources with no need to be related to the same NVMeoF controller.

> 

> The per qedn_fq_queue resources are the FW CQ and FW status block:

> - The FW CQ will be used for the FW to notify the driver that the

>    the exchange has ended and the FW will pass the incoming NVMeoF CQE

>    (if exist) to the driver.

> - FW status block - which is used for the FW to notify the driver with

>    the producer update of the FW CQE chain.

> 

> The FW fast-path queues are based on qed_chain.h

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/hw/qedn/qedn.h      |  26 +++

>   drivers/nvme/hw/qedn/qedn_main.c | 287 ++++++++++++++++++++++++++++++-

>   2 files changed, 310 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> index 7efe2366eb7c..5d4d04d144e4 100644

> --- a/drivers/nvme/hw/qedn/qedn.h

> +++ b/drivers/nvme/hw/qedn/qedn.h

> @@ -33,18 +33,41 @@

>   #define QEDN_PROTO_CQ_PROD_IDX	0

>   #define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2

>   

> +#define QEDN_PAGE_SIZE	4096 /* FW page size - Configurable */

> +#define QEDN_IRQ_NAME_LEN 24

> +#define QEDN_IRQ_NO_FLAGS 0

> +

> +/* TCP defines */

> +#define QEDN_TCP_RTO_DEFAULT 280

> +

>   enum qedn_state {

>   	QEDN_STATE_CORE_PROBED = 0,

>   	QEDN_STATE_CORE_OPEN,

>   	QEDN_STATE_GL_PF_LIST_ADDED,

>   	QEDN_STATE_MFW_STATE,

> +	QEDN_STATE_NVMETCP_OPEN,

> +	QEDN_STATE_IRQ_SET,

> +	QEDN_STATE_FP_WORK_THREAD_SET,

>   	QEDN_STATE_REGISTERED_OFFLOAD_DEV,

>   	QEDN_STATE_MODULE_REMOVE_ONGOING,

>   };

>   

> +/* Per CPU core params */

> +struct qedn_fp_queue {

> +	struct qed_chain cq_chain;

> +	u16 *cq_prod;

> +	struct mutex cq_mutex; /* cq handler mutex */

> +	struct qedn_ctx	*qedn;

> +	struct qed_sb_info *sb_info;

> +	unsigned int cpu;

> +	u16 sb_id;

> +	char irqname[QEDN_IRQ_NAME_LEN];

> +};

> +

>   struct qedn_ctx {

>   	struct pci_dev *pdev;

>   	struct qed_dev *cdev;

> +	struct qed_int_info int_info;

>   	struct qed_dev_nvmetcp_info dev_info;

>   	struct nvme_tcp_ofld_dev qedn_ofld_dev;

>   	struct qed_pf_params pf_params;

> @@ -57,6 +80,9 @@ struct qedn_ctx {

>   

>   	/* Fast path queues */

>   	u8 num_fw_cqs;

> +	struct qedn_fp_queue *fp_q_arr;

> +	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

> +	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

>   };

>   

>   struct qedn_global {

> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> index 52007d35622d..0135a1f490da 100644

> --- a/drivers/nvme/hw/qedn/qedn_main.c

> +++ b/drivers/nvme/hw/qedn/qedn_main.c

> @@ -141,6 +141,104 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {

>   	.commit_rqs = qedn_commit_rqs,

>   };

>   

> +/* Fastpath IRQ handler */

> +static irqreturn_t qedn_irq_handler(int irq, void *dev_id)

> +{

> +	/* Placeholder */

> +

> +	return IRQ_HANDLED;

> +}

> +

> +static void qedn_sync_free_irqs(struct qedn_ctx *qedn)

> +{

> +	u16 vector_idx;

> +	int i;

> +

> +	for (i = 0; i < qedn->num_fw_cqs; i++) {

> +		vector_idx = i * qedn->dev_info.common.num_hwfns +

> +			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);

> +		synchronize_irq(qedn->int_info.msix[vector_idx].vector);

> +		irq_set_affinity_hint(qedn->int_info.msix[vector_idx].vector,

> +				      NULL);

> +		free_irq(qedn->int_info.msix[vector_idx].vector,

> +			 &qedn->fp_q_arr[i]);

> +	}

> +

> +	qedn->int_info.used_cnt = 0;

> +	qed_ops->common->set_fp_int(qedn->cdev, 0);

> +}

> +

> +static int qedn_request_msix_irq(struct qedn_ctx *qedn)

> +{

> +	struct pci_dev *pdev = qedn->pdev;

> +	struct qedn_fp_queue *fp_q = NULL;

> +	int i, rc, cpu;

> +	u16 vector_idx;

> +	u32 vector;

> +

> +	/* numa-awareness will be added in future enhancements */

> +	cpu = cpumask_first(cpu_online_mask);

> +	for (i = 0; i < qedn->num_fw_cqs; i++) {

> +		fp_q = &qedn->fp_q_arr[i];

> +		vector_idx = i * qedn->dev_info.common.num_hwfns +

> +			     qed_ops->common->get_affin_hwfn_idx(qedn->cdev);

> +		vector = qedn->int_info.msix[vector_idx].vector;

> +		sprintf(fp_q->irqname, "qedn_queue_%x.%x.%x_%d",

> +			pdev->bus->number, PCI_SLOT(pdev->devfn),

> +			PCI_FUNC(pdev->devfn), i);

> +		rc = request_irq(vector, qedn_irq_handler, QEDN_IRQ_NO_FLAGS,

> +				 fp_q->irqname, fp_q);

> +		if (rc) {

> +			pr_err("request_irq failed.\n");

> +			qedn_sync_free_irqs(qedn);

> +

> +			return rc;

> +		}

> +

> +		fp_q->cpu = cpu;

> +		qedn->int_info.used_cnt++;

> +		rc = irq_set_affinity_hint(vector, get_cpu_mask(cpu));

> +		cpu = cpumask_next_wrap(cpu, cpu_online_mask, -1, false);

> +	}

> +

> +	return 0;

> +}

> +


Hah. I knew it.
So you _do_ have a limited number of MSIx interrupts.
And that should limit the number of queue pairs, too.

> +static int qedn_setup_irq(struct qedn_ctx *qedn)

> +{

> +	int rc = 0;

> +	u8 rval;

> +

> +	rval = qed_ops->common->set_fp_int(qedn->cdev, qedn->num_fw_cqs);

> +	if (rval < qedn->num_fw_cqs) {

> +		qedn->num_fw_cqs = rval;

> +		if (rval == 0) {

> +			pr_err("set_fp_int return 0 IRQs\n");

> +

> +			return -ENODEV;

> +		}

> +	}

> +

> +	rc = qed_ops->common->get_fp_int(qedn->cdev, &qedn->int_info);

> +	if (rc) {

> +		pr_err("get_fp_int failed\n");

> +		goto exit_setup_int;

> +	}

> +

> +	if (qedn->int_info.msix_cnt) {

> +		rc = qedn_request_msix_irq(qedn);

> +		goto exit_setup_int;

> +	} else {

> +		pr_err("msix_cnt = 0\n");

> +		rc = -EINVAL;

> +		goto exit_setup_int;

> +	}

> +

> +exit_setup_int:

> +

> +	return rc;

> +}

> +

>   static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)

>   {

>   	/* Placeholder - Initialize qedn fields */

> @@ -185,21 +283,173 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)

>   	mutex_unlock(&qedn_glb.glb_mutex);

>   }

>   

> +static void qedn_free_function_queues(struct qedn_ctx *qedn)

> +{

> +	struct qed_sb_info *sb_info = NULL;

> +	struct qedn_fp_queue *fp_q;

> +	int i;

> +

> +	/* Free workqueues */

> +

> +	/* Free the fast path queues*/

> +	for (i = 0; i < qedn->num_fw_cqs; i++) {

> +		fp_q = &qedn->fp_q_arr[i];

> +

> +		/* Free SB */

> +		sb_info = fp_q->sb_info;

> +		if (sb_info->sb_virt) {

> +			qed_ops->common->sb_release(qedn->cdev, sb_info,

> +						    fp_q->sb_id,

> +						    QED_SB_TYPE_STORAGE);

> +			dma_free_coherent(&qedn->pdev->dev,

> +					  sizeof(*sb_info->sb_virt),

> +					  (void *)sb_info->sb_virt,

> +					  sb_info->sb_phys);

> +			memset(sb_info, 0, sizeof(*sb_info));

> +			kfree(sb_info);

> +			fp_q->sb_info = NULL;

> +		}

> +

> +		qed_ops->common->chain_free(qedn->cdev, &fp_q->cq_chain);

> +	}

> +

> +	if (qedn->fw_cq_array_virt)

> +		dma_free_coherent(&qedn->pdev->dev,

> +				  qedn->num_fw_cqs * sizeof(u64),

> +				  qedn->fw_cq_array_virt,

> +				  qedn->fw_cq_array_phy);

> +	kfree(qedn->fp_q_arr);

> +	qedn->fp_q_arr = NULL;

> +}

> +

> +static int qedn_alloc_and_init_sb(struct qedn_ctx *qedn,

> +				  struct qed_sb_info *sb_info, u16 sb_id)

> +{

> +	int rc = 0;

> +

> +	sb_info->sb_virt = dma_alloc_coherent(&qedn->pdev->dev,

> +					      sizeof(struct status_block_e4),

> +					      &sb_info->sb_phys, GFP_KERNEL);

> +	if (!sb_info->sb_virt) {

> +		pr_err("Status block allocation failed\n");

> +

> +		return -ENOMEM;

> +	}

> +

> +	rc = qed_ops->common->sb_init(qedn->cdev, sb_info, sb_info->sb_virt,

> +				      sb_info->sb_phys, sb_id,

> +				      QED_SB_TYPE_STORAGE);

> +	if (rc) {

> +		pr_err("Status block initialization failed\n");

> +

> +		return rc;

> +	}

> +

> +	return 0;

> +}

> +

> +static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> +{

> +	struct qed_chain_init_params chain_params = {};

> +	struct status_block_e4 *sb = NULL;  /* To change to status_block_e4 */

> +	struct qedn_fp_queue *fp_q = NULL;

> +	int rc = 0, arr_size;

> +	u64 cq_phy_addr;

> +	int i;

> +

> +	/* Place holder - IO-path workqueues */

> +

> +	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,

> +				 sizeof(struct qedn_fp_queue), GFP_KERNEL);

> +	if (!qedn->fp_q_arr)

> +		return -ENOMEM;

> +

> +	arr_size = qedn->num_fw_cqs * sizeof(struct nvmetcp_glbl_queue_entry);

> +	qedn->fw_cq_array_virt = dma_alloc_coherent(&qedn->pdev->dev,

> +						    arr_size,

> +						    &qedn->fw_cq_array_phy,

> +						    GFP_KERNEL);

> +	if (!qedn->fw_cq_array_virt) {

> +		rc = -ENOMEM;

> +		goto mem_alloc_failure;

> +	}

> +

> +	/* placeholder - create task pools */

> +

> +	for (i = 0; i < qedn->num_fw_cqs; i++) {

> +		fp_q = &qedn->fp_q_arr[i];

> +		mutex_init(&fp_q->cq_mutex);

> +

> +		/* FW CQ */

> +		chain_params.intended_use = QED_CHAIN_USE_TO_CONSUME,

> +		chain_params.mode = QED_CHAIN_MODE_PBL,

> +		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,

> +		chain_params.num_elems = QEDN_FW_CQ_SIZE;

> +		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/

> +

> +		rc = qed_ops->common->chain_alloc(qedn->cdev,

> +						  &fp_q->cq_chain,

> +						  &chain_params);

> +		if (rc) {

> +			pr_err("CQ chain pci_alloc_consistent fail\n");

> +			goto mem_alloc_failure;

> +		}

> +

> +		cq_phy_addr = qed_chain_get_pbl_phys(&fp_q->cq_chain);

> +		qedn->fw_cq_array_virt[i].cq_pbl_addr.hi = PTR_HI(cq_phy_addr);

> +		qedn->fw_cq_array_virt[i].cq_pbl_addr.lo = PTR_LO(cq_phy_addr);

> +

> +		/* SB */

> +		fp_q->sb_info = kzalloc(sizeof(*fp_q->sb_info), GFP_KERNEL);

> +		if (!fp_q->sb_info)

> +			goto mem_alloc_failure;

> +

> +		fp_q->sb_id = i;

> +		rc = qedn_alloc_and_init_sb(qedn, fp_q->sb_info, fp_q->sb_id);

> +		if (rc) {

> +			pr_err("SB allocation and initialization failed.\n");

> +			goto mem_alloc_failure;

> +		}

> +

> +		sb = fp_q->sb_info->sb_virt;

> +		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];

> +		fp_q->qedn = qedn;

> +

> +		/* Placeholder - Init IO-path workqueue */

> +

> +		/* Placeholder - Init IO-path resources */

> +	}

> +

> +	return 0;

> +

> +mem_alloc_failure:

> +	pr_err("Function allocation failed\n");

> +	qedn_free_function_queues(qedn);

> +

> +	return rc;

> +}

> +

>   static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)

>   {

>   	u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;

>   	struct qed_nvmetcp_pf_params *pf_params;

> +	int rc;

>   

>   	pf_params = &qedn->pf_params.nvmetcp_pf_params;

>   	memset(pf_params, 0, sizeof(*pf_params));

>   	qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());

> +	pr_info("Num qedn CPU cores is %u\n", qedn->num_fw_cqs);

>   

>   	pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;

>   	pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;

>   

> -	/* Placeholder - Initialize function level queues */

> +	rc = qedn_alloc_function_queues(qedn);

> +	if (rc) {

> +		pr_err("Global queue allocation failed.\n");

> +		goto err_alloc_mem;

> +	}

>   

> -	/* Placeholder - Initialize TCP params */

> +	set_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state);

>   

>   	/* Queues */

>   	pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;

> @@ -207,11 +457,14 @@ static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)

>   	pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;

>   	pf_params->num_queues = qedn->num_fw_cqs;

>   	pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;

> +	pf_params->glbl_q_params_addr = qedn->fw_cq_array_phy;

>   

>   	/* the CQ SB pi */

>   	pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;

>   

> -	return 0;

> +err_alloc_mem:

> +

> +	return rc;

>   }

>   

>   static inline int qedn_slowpath_start(struct qedn_ctx *qedn)

> @@ -255,6 +508,12 @@ static void __qedn_remove(struct pci_dev *pdev)

>   	else

>   		pr_err("Failed to remove from global PF list\n");

>   

> +	if (test_and_clear_bit(QEDN_STATE_IRQ_SET, &qedn->state))

> +		qedn_sync_free_irqs(qedn);

> +

> +	if (test_and_clear_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state))

> +		qed_ops->stop(qedn->cdev);

> +

>   	if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {

>   		rc = qed_ops->common->update_drv_state(qedn->cdev, false);

>   		if (rc)

> @@ -264,6 +523,9 @@ static void __qedn_remove(struct pci_dev *pdev)

>   	if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))

>   		qed_ops->common->slowpath_stop(qedn->cdev);

>   

> +	if (test_and_clear_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state))

> +		qedn_free_function_queues(qedn);

> +

>   	if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))

>   		qed_ops->common->remove(qedn->cdev);

>   

> @@ -335,6 +597,25 @@ static int __qedn_probe(struct pci_dev *pdev)

>   

>   	set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);

>   

> +	rc = qedn_setup_irq(qedn);

> +	if (rc)

> +		goto exit_probe_and_release_mem;

> +

> +	set_bit(QEDN_STATE_IRQ_SET, &qedn->state);

> +

> +	/* NVMeTCP start HW PF */

> +	rc = qed_ops->start(qedn->cdev,

> +			    NULL /* Placeholder for FW IO-path resources */,

> +			    qedn,

> +			    NULL /* Placeholder for FW Event callback */);

> +	if (rc) {

> +		rc = -ENODEV;

> +		pr_err("Cannot start NVMeTCP Function\n");

> +		goto exit_probe_and_release_mem;

> +	}

> +

> +	set_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state);

> +

>   	rc = qed_ops->common->update_drv_state(qedn->cdev, true);

>   	if (rc) {

>   		pr_err("Failed to send drv state to MFW\n");

> 

So you have a limited number of MSI-x interrupts, but don't limit the 
number of hw queues to that. Why?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:38 a.m. UTC | #12
On 4/29/21 9:09 PM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>

> 

> HW filter can be configured to filter TCP packets based on either

> source or target TCP port. QEDN leverage this feature to route

> NVMeTCP traffic.

> 

> This patch configures HW filter block based on source port for all

> receiving packets to deliver correct QEDN PF.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/hw/qedn/qedn.h      |  15 +++++

>   drivers/nvme/hw/qedn/qedn_main.c | 108 ++++++++++++++++++++++++++++++-

>   2 files changed, 122 insertions(+), 1 deletion(-)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:42 a.m. UTC | #13
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the IO level workqueues:

> 

> - qedn_nvme_req_fp_wq(): process new requests, similar to

> 			 nvme_tcp_io_work(). The flow starts from

> 			 send_req() and will aggregate all the requests

> 			 on this CPU core.

> 

> - qedn_fw_cq_fp_wq():   process new FW completions, the flow starts from

> 			the IRQ handler and for a single interrupt it will

> 			process all the pending NVMeoF Completions under

> 			polling mode.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/hw/qedn/Makefile    |   2 +-

>   drivers/nvme/hw/qedn/qedn.h      |  29 +++++++

>   drivers/nvme/hw/qedn/qedn_conn.c |   3 +

>   drivers/nvme/hw/qedn/qedn_main.c | 114 +++++++++++++++++++++++--

>   drivers/nvme/hw/qedn/qedn_task.c | 138 +++++++++++++++++++++++++++++++

>   5 files changed, 278 insertions(+), 8 deletions(-)

>   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

> 

> diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile

> index d8b343afcd16..c7d838a61ae6 100644

> --- a/drivers/nvme/hw/qedn/Makefile

> +++ b/drivers/nvme/hw/qedn/Makefile

> @@ -1,4 +1,4 @@

>   # SPDX-License-Identifier: GPL-2.0-only

>   

>   obj-$(CONFIG_NVME_QEDN) += qedn.o

> -qedn-y := qedn_main.o qedn_conn.o

> +qedn-y := qedn_main.o qedn_conn.o qedn_task.o

> \ No newline at end of file

> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> index c15cac37ec1e..bd9a250cb2f5 100644

> --- a/drivers/nvme/hw/qedn/qedn.h

> +++ b/drivers/nvme/hw/qedn/qedn.h

> @@ -47,6 +47,9 @@

>   #define QEDN_NON_ABORTIVE_TERMINATION 0

>   #define QEDN_ABORTIVE_TERMINATION 1

>   

> +#define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"

> +#define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"

> +

>   /*

>    * TCP offload stack default configurations and defines.

>    * Future enhancements will allow controlling the configurable

> @@ -100,6 +103,7 @@ struct qedn_fp_queue {

>   	struct qedn_ctx	*qedn;

>   	struct qed_sb_info *sb_info;

>   	unsigned int cpu;

> +	struct work_struct fw_cq_fp_wq_entry;

>   	u16 sb_id;

>   	char irqname[QEDN_IRQ_NAME_LEN];

>   };

> @@ -131,6 +135,8 @@ struct qedn_ctx {

>   	struct qedn_fp_queue *fp_q_arr;

>   	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

>   	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

> +	struct workqueue_struct *nvme_req_fp_wq;

> +	struct workqueue_struct *fw_cq_fp_wq;

>   };

>   

>   struct qedn_endpoint {

> @@ -213,6 +219,25 @@ struct qedn_ctrl {

>   

>   /* Connection level struct */

>   struct qedn_conn_ctx {

> +	/* IO path */

> +	struct workqueue_struct	*nvme_req_fp_wq; /* ptr to qedn->nvme_req_fp_wq */

> +	struct nvme_tcp_ofld_req *req; /* currently proccessed request */

> +

> +	struct list_head host_pend_req_list;

> +	/* Spinlock to access pending request list */

> +	spinlock_t nvme_req_lock;

> +	unsigned int cpu;

> +

> +	/* Entry for registering to nvme_req_fp_wq */

> +	struct work_struct nvme_req_fp_wq_entry;

> +	/*

> +	 * Spinlock for accessing qedn_process_req as it can be called

> +	 * from multiple place like queue_rq, async, self requeued

> +	 */

> +	struct mutex nvme_req_mutex;

> +	struct qedn_fp_queue *fp_q;

> +	int qid;

> +

>   	struct qedn_ctx *qedn;

>   	struct nvme_tcp_ofld_queue *queue;

>   	struct nvme_tcp_ofld_ctrl *ctrl;

> @@ -280,5 +305,9 @@ int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);

>   int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);

>   void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);

>   __be16 qedn_get_in_port(struct sockaddr_storage *sa);

> +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid);

> +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);

> +void qedn_nvme_req_fp_wq_handler(struct work_struct *work);

> +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);

>   

>   #endif /* _QEDN_H_ */

> diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c

> index 9bfc0a5f0cdb..90d8aa36d219 100644

> --- a/drivers/nvme/hw/qedn/qedn_conn.c

> +++ b/drivers/nvme/hw/qedn/qedn_conn.c

> @@ -385,6 +385,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

>   	}

>   

>   	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);

> +	INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);

> +	spin_lock_init(&conn_ctx->nvme_req_lock);

> +

>   	rc = qed_ops->acquire_conn(qedn->cdev,

>   				   &conn_ctx->conn_handle,

>   				   &conn_ctx->fw_cid,

> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> index 8b5714e7e2bb..38f23dbb03a5 100644

> --- a/drivers/nvme/hw/qedn/qedn_main.c

> +++ b/drivers/nvme/hw/qedn/qedn_main.c

> @@ -267,6 +267,18 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)

>   	return 0;

>   }

>   

> +static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)

> +{

> +	struct qedn_ctx *qedn = conn_ctx->qedn;

> +	struct qedn_fp_queue *fp_q = NULL;

> +	int index;

> +

> +	index = qid ? (qid - 1) % qedn->num_fw_cqs : 0;

> +	fp_q = &qedn->fp_q_arr[index];

> +

> +	conn_ctx->cpu = fp_q->cpu;

> +}

> +

>   static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)

>   {

>   	struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;

> @@ -288,6 +300,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t

>   	conn_ctx->queue = queue;

>   	conn_ctx->ctrl = ctrl;

>   	conn_ctx->sq_depth = q_size;

> +	qedn_set_ctrl_io_cpus(conn_ctx, qid);

>   

>   	init_waitqueue_head(&conn_ctx->conn_waitq);

>   	atomic_set(&conn_ctx->est_conn_indicator, 0);

> @@ -295,6 +308,10 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t

>   

>   	spin_lock_init(&conn_ctx->conn_state_lock);

>   

> +	INIT_WORK(&conn_ctx->nvme_req_fp_wq_entry, qedn_nvme_req_fp_wq_handler);

> +	conn_ctx->nvme_req_fp_wq = qedn->nvme_req_fp_wq;

> +	conn_ctx->qid = qid;

> +

>   	qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr,

>   				 &ctrl->conn_params);

>   

> @@ -356,6 +373,7 @@ static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)

>   	if (!conn_ctx)

>   		return;

>   

> +	cancel_work_sync(&conn_ctx->nvme_req_fp_wq_entry);

>   	qedn_terminate_connection(conn_ctx, QEDN_ABORTIVE_TERMINATION);

>   

>   	qedn_queue_wait_for_terminate_complete(conn_ctx);

> @@ -385,12 +403,24 @@ static int qedn_init_req(struct nvme_tcp_ofld_req *req)

>   

>   static void qedn_commit_rqs(struct nvme_tcp_ofld_queue *queue)

>   {

> -	/* Placeholder - queue work */

> +	struct qedn_conn_ctx *conn_ctx;

> +

> +	conn_ctx = (struct qedn_conn_ctx *)queue->private_data;

> +

> +	if (!list_empty(&conn_ctx->host_pend_req_list))

> +		queue_work_on(conn_ctx->cpu, conn_ctx->nvme_req_fp_wq,

> +			      &conn_ctx->nvme_req_fp_wq_entry);

>   }

>   

>   static int qedn_send_req(struct nvme_tcp_ofld_req *req)

>   {

> -	/* Placeholder - qedn_send_req */

> +	struct qedn_conn_ctx *qedn_conn = (struct qedn_conn_ctx *)req->queue->private_data;

> +

> +	/* Under the assumption that the cccid/tag will be in the range of 0 to sq_depth-1. */

> +	if (!req->async && qedn_validate_cccid_in_range(qedn_conn, req->rq->tag))

> +		return BLK_STS_NOTSUPP;

> +

> +	qedn_queue_request(qedn_conn, req);

>   

>   	return 0;

>   }

> @@ -434,9 +464,59 @@ struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)

>   }

>   

>   /* Fastpath IRQ handler */

> +void qedn_fw_cq_fp_handler(struct qedn_fp_queue *fp_q)

> +{

> +	u16 sb_id, cq_prod_idx, cq_cons_idx;

> +	struct qedn_ctx *qedn = fp_q->qedn;

> +	struct nvmetcp_fw_cqe *cqe = NULL;

> +

> +	sb_id = fp_q->sb_id;

> +	qed_sb_update_sb_idx(fp_q->sb_info);

> +

> +	/* rmb - to prevent missing new cqes */

> +	rmb();

> +

> +	/* Read the latest cq_prod from the SB */

> +	cq_prod_idx = *fp_q->cq_prod;

> +	cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);

> +

> +	while (cq_cons_idx != cq_prod_idx) {

> +		cqe = qed_chain_consume(&fp_q->cq_chain);

> +		if (likely(cqe))

> +			qedn_io_work_cq(qedn, cqe);

> +		else

> +			pr_err("Failed consuming cqe\n");

> +

> +		cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);

> +

> +		/* Check if new completions were posted */

> +		if (unlikely(cq_prod_idx == cq_cons_idx)) {

> +			/* rmb - to prevent missing new cqes */

> +			rmb();

> +

> +			/* Update the latest cq_prod from the SB */

> +			cq_prod_idx = *fp_q->cq_prod;

> +		}

> +	}

> +}

> +

> +static void qedn_fw_cq_fq_wq_handler(struct work_struct *work)

> +{

> +	struct qedn_fp_queue *fp_q = container_of(work, struct qedn_fp_queue, fw_cq_fp_wq_entry);

> +

> +	qedn_fw_cq_fp_handler(fp_q);

> +	qed_sb_ack(fp_q->sb_info, IGU_INT_ENABLE, 1);

> +}

> +

>   static irqreturn_t qedn_irq_handler(int irq, void *dev_id)

>   {

> -	/* Placeholder */

> +	struct qedn_fp_queue *fp_q = dev_id;

> +	struct qedn_ctx *qedn = fp_q->qedn;

> +

> +	fp_q->cpu = smp_processor_id();

> +

> +	qed_sb_ack(fp_q->sb_info, IGU_INT_DISABLE, 0);

> +	queue_work_on(fp_q->cpu, qedn->fw_cq_fp_wq, &fp_q->fw_cq_fp_wq_entry);

>   

>   	return IRQ_HANDLED;

>   }

> @@ -584,6 +664,11 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)

>   	int i;

>   

>   	/* Free workqueues */

> +	destroy_workqueue(qedn->fw_cq_fp_wq);

> +	qedn->fw_cq_fp_wq = NULL;

> +

> +	destroy_workqueue(qedn->nvme_req_fp_wq);

> +	qedn->nvme_req_fp_wq = NULL;

>   

>   	/* Free the fast path queues*/

>   	for (i = 0; i < qedn->num_fw_cqs; i++) {

> @@ -651,7 +736,23 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

>   	u64 cq_phy_addr;

>   	int i;

>   

> -	/* Place holder - IO-path workqueues */

> +	qedn->fw_cq_fp_wq = alloc_workqueue(QEDN_FW_CQ_FP_WQ_WORKQUEUE,

> +					    WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);

> +	if (!qedn->fw_cq_fp_wq) {

> +		rc = -ENODEV;

> +		pr_err("Unable to create fastpath FW CQ workqueue!\n");

> +

> +		return rc;

> +	}

> +

> +	qedn->nvme_req_fp_wq = alloc_workqueue(QEDN_NVME_REQ_FP_WQ_WORKQUEUE,

> +					       WQ_HIGHPRI | WQ_MEM_RECLAIM, 1);

> +	if (!qedn->nvme_req_fp_wq) {

> +		rc = -ENODEV;

> +		pr_err("Unable to create fastpath qedn nvme workqueue!\n");

> +

> +		return rc;

> +	}

>   

>   	qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,

>   				 sizeof(struct qedn_fp_queue), GFP_KERNEL);


Why don't you use threaded interrupts if you're spinning off a workqueue 
for handling interrupts anyway?

> @@ -679,7 +780,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

>   		chain_params.mode = QED_CHAIN_MODE_PBL,

>   		chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,

>   		chain_params.num_elems = QEDN_FW_CQ_SIZE;

> -		chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/

> +		chain_params.elem_size = sizeof(struct nvmetcp_fw_cqe);

>   

>   		rc = qed_ops->common->chain_alloc(qedn->cdev,

>   						  &fp_q->cq_chain,

> @@ -708,8 +809,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

>   		sb = fp_q->sb_info->sb_virt;

>   		fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];

>   		fp_q->qedn = qedn;

> -

> -		/* Placeholder - Init IO-path workqueue */

> +		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);

>   

>   		/* Placeholder - Init IO-path resources */

>   	}

> diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c

> new file mode 100644

> index 000000000000..d3474188efdc

> --- /dev/null

> +++ b/drivers/nvme/hw/qedn/qedn_task.c

> @@ -0,0 +1,138 @@

> +// SPDX-License-Identifier: GPL-2.0

> +/*

> + * Copyright 2021 Marvell. All rights reserved.

> + */

> +

> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

> +

> + /* Kernel includes */

> +#include <linux/kernel.h>

> +

> +/* Driver includes */

> +#include "qedn.h"

> +

> +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)

> +{

> +	int rc = 0;

> +

> +	if (unlikely(cccid >= conn_ctx->sq_depth)) {

> +		pr_err("cccid 0x%x out of range ( > sq depth)\n", cccid);

> +		rc = -EINVAL;

> +	}

> +

> +	return rc;

> +}

> +

> +static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)

> +{

> +	return true;

> +}

> +

> +/* The WQ handler can be call from 3 flows:

> + *	1. queue_rq.

> + *	2. async.

> + *	3. self requeued

> + * Try to send requests from the pending list. If a request proccess has failed,

> + * re-register to the workqueue.

> + * If there are no additional pending requests - exit the handler.

> + */

> +void qedn_nvme_req_fp_wq_handler(struct work_struct *work)

> +{

> +	struct qedn_conn_ctx *qedn_conn;

> +	bool more = false;

> +

> +	qedn_conn = container_of(work, struct qedn_conn_ctx, nvme_req_fp_wq_entry);

> +	do {

> +		if (mutex_trylock(&qedn_conn->nvme_req_mutex)) {

> +			more = qedn_process_req(qedn_conn);

> +			qedn_conn->req = NULL;

> +			mutex_unlock(&qedn_conn->nvme_req_mutex);

> +		}

> +	} while (more);

> +

> +	if (!list_empty(&qedn_conn->host_pend_req_list))

> +		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,

> +			      &qedn_conn->nvme_req_fp_wq_entry);

> +}

> +

> +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req)

> +{

> +	bool empty, res = false;

> +

> +	spin_lock(&qedn_conn->nvme_req_lock);

> +	empty = list_empty(&qedn_conn->host_pend_req_list) && !qedn_conn->req;

> +	list_add_tail(&req->queue_entry, &qedn_conn->host_pend_req_list);

> +	spin_unlock(&qedn_conn->nvme_req_lock);

> +

> +	/* attempt workqueue bypass */

> +	if (qedn_conn->cpu == smp_processor_id() && empty &&

> +	    mutex_trylock(&qedn_conn->nvme_req_mutex)) {

> +		res = qedn_process_req(qedn_conn);

> +		qedn_conn->req = NULL;

> +		mutex_unlock(&qedn_conn->nvme_req_mutex);

> +		if (res || list_empty(&qedn_conn->host_pend_req_list))

> +			return;

> +	} else if (req->last) {

> +		queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,

> +			      &qedn_conn->nvme_req_fp_wq_entry);

> +	}

> +}

> +


Queueing a request?
Does wonders for your latency ... Can't you do without?

> +struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)

> +{

> +	struct regpair *p = &cqe->task_opaque;

> +

> +	return (struct qedn_task_ctx *)((((u64)(le32_to_cpu(p->hi)) << 32)

> +					+ le32_to_cpu(p->lo)));

> +}

> +

> +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)

> +{

> +	struct qedn_task_ctx *qedn_task = NULL;

> +	struct qedn_conn_ctx *conn_ctx = NULL;

> +	u16 itid;

> +	u32 cid;

> +

> +	conn_ctx = qedn_get_conn_hash(qedn, le16_to_cpu(cqe->conn_id));

> +	if (unlikely(!conn_ctx)) {

> +		pr_err("CID 0x%x: Failed to fetch conn_ctx from hash\n",

> +		       le16_to_cpu(cqe->conn_id));

> +

> +		return;

> +	}

> +

> +	cid = conn_ctx->fw_cid;

> +	itid = le16_to_cpu(cqe->itid);

> +	qedn_task = qedn_cqe_get_active_task(cqe);

> +	if (unlikely(!qedn_task))

> +		return;

> +

> +	if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {

> +		/* Placeholder - verify the connection was established */

> +

> +		switch (cqe->task_type) {

> +		case NVMETCP_TASK_TYPE_HOST_WRITE:

> +		case NVMETCP_TASK_TYPE_HOST_READ:

> +

> +			/* Placeholder - IO flow */

> +

> +			break;

> +

> +		case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:

> +

> +			/* Placeholder - IO flow */

> +

> +			break;

> +

> +		case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:

> +

> +			/* Placeholder - ICReq flow */

> +

> +			break;

> +		default:

> +			pr_info("Could not identify task type\n");

> +		}

> +	} else {

> +		/* Placeholder - Recovery flows */

> +	}

> +}

> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke May 2, 2021, 11:54 a.m. UTC | #14
On 4/29/21 9:09 PM, Shai Malin wrote:
> This patch will present the IO level functionality of qedn

> nvme-tcp-offload host mode. The qedn_task_ctx structure is containing

> various params and state of the current IO, and is mapped 1x1 to the

> fw_task_ctx which is a HW and FW IO context.

> A qedn_task is mapped directly to its parent connection.

> For every new IO a qedn_task structure will be assigned and they will be

> linked for the entire IO's life span.

> 

> The patch will include 2 flows:

>    1. Send new command to the FW:

> 	 The flow is: nvme_tcp_ofld_queue_rq() which invokes qedn_send_req()

> 	 which invokes qedn_queue_request() which will:

>       - Assign fw_task_ctx.

> 	 - Prepare the Read/Write SG buffer.

> 	 -  Initialize the HW and FW context.

> 	 - Pass the IO to the FW.

> 

>    2. Process the IO completion:

>       The flow is: qedn_irq_handler() which invokes qedn_fw_cq_fp_handler()

> 	 which invokes qedn_io_work_cq() which will:

> 	 - process the FW completion.

> 	 - Return the fw_task_ctx to the task pool.

> 	 - complete the nvme req.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>   drivers/nvme/hw/qedn/qedn.h      |   4 +

>   drivers/nvme/hw/qedn/qedn_conn.c |   1 +

>   drivers/nvme/hw/qedn/qedn_task.c | 269 ++++++++++++++++++++++++++++++-

>   3 files changed, 272 insertions(+), 2 deletions(-)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:13 p.m. UTC | #15
On 5/1/21 7:47 PM, Hannes Reinecke wrote:
> On 4/29/21 9:08 PM, Shai Malin wrote:

> > With the goal of enabling a generic infrastructure that allows NVMe/TCP

> > offload devices like NICs to seamlessly plug into the NVMe-oF stack, this

> > patch series introduces the nvme-tcp-offload ULP host layer, which will

> > be a new transport type called "tcp-offload" and will serve as an

> > abstraction layer to work with vendor specific nvme-tcp offload drivers.

> >

> > NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes

> > both the TCP level and the NVMeTCP level.

> >

> > The nvme-tcp-offload transport can co-exist with the existing tcp and

> > other transports. The tcp offload was designed so that stack changes are

> > kept to a bare minimum: only registering new transports.

> > All other APIs, ops etc. are identical to the regular tcp transport.

> > Representing the TCP offload as a new transport allows clear and manageable

> > differentiation between the connections which should use the offload path

> > and those that are not offloaded (even on the same device).

> >

> >

> > The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

> >

> > * NVMe layer: *

> >

> >         [ nvme/nvme-fabrics/blk-mq ]

> >               |

> >          (nvme API and blk-mq API)

> >               |

> >               |

> > * Vendor agnostic transport layer: *

> >

> >        [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]

> >               |        |             |

> >             (Verbs)

> >               |        |             |

> >               |     (Socket)

> >               |        |             |

> >               |        |        (nvme-tcp-offload API)

> >               |        |             |

> >               |        |             |

> > * Vendor Specific Driver: *

> >

> >               |        |             |

> >             [ qedr ]

> >                        |             |

> >                     [ qede ]

> >                                      |

> >                                    [ qedn ]

> >

> >

> > Performance:

> > ============

> > With this implementation on top of the Marvell qedn driver (using the

> > Marvell FastLinQ NIC), we were able to demonstrate the following CPU

> > utilization improvement:

> >

> > On AMD EPYC 7402, 2.80GHz, 28 cores:

> > - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):

> >    Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with

> >    NVMeTCP offload.

> >

> > On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:

> > - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):

> >    Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with

> >    NVMeTCP offload.

> >

> > In addition, we were able to demonstrate the following latency improvement:

> > - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):

> >    Improved the average latency from 105 usec with NVMeTCP SW to 39 usec

> >    with NVMeTCP offload.

> >

> >    Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec

> >    with NVMeTCP offload.

> >

> > The end-to-end offload latency was measured from fio while running against

> > back end of null device.

> >

> >

> > Upstream plan:

> > ==============

> > Following this RFC, the series will be sent in a modular way so that changes

> > in each part will not impact the previous part.

> >

> > - Part 1 (Patches 1-7):

> >    The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

> >

> > - Part 2 (Patch 8-15):

> >    The nvme-tcp-offload patches, will be sent to

> >    'linux-nvme@lists.infradead.org'.

> >

> > - Part 3 (Packet 16-27):

> >    The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.

> >

> >

> > Queue Initialization Design:

> > ============================

> > The nvme-tcp-offload ULP module shall register with the existing

> > nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.

> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> > with the following ops:

> > - claim_dev() - in order to resolve the route to the target according to

> >                  the paired net_dev.

> > - create_queue() - in order to create offloaded nvme-tcp queue.

> >

> > The nvme-tcp-offload ULP module shall manage all the controller level

> > functionalities, call claim_dev and based on the return values shall call

> > the relevant module create_queue in order to create the admin queue and

> > the IO queues.

> >

> >

> > IO-path Design:

> > ===============

> > The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload

> > ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor

> > driver and later, the nvme-tcp-offload vendor driver returns the request

> > completion (the IO completion).

> > No additional handling is needed in between; this design will reduce the

> > CPU utilization as we will describe below.

> >

> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> > with the following IO-path ops:

> > - init_req()

> > - send_req() - in order to pass the request to the handling of the

> >                 offload driver that shall pass it to the vendor specific device.

> > - poll_queue()

> >

> > Once the IO completes, the nvme-tcp-offload vendor driver shall call

> > command.done() that will invoke the nvme-tcp-offload ULP layer to

> > complete the request.

> >

> >

> > TCP events:

> > ===========

> > The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions

> > and OOO events.

> >

> >

> > Teardown and errors:

> > ====================

> > In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall

> > call the nvme_tcp_ofld_report_queue_err.

> > The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP

> > with the following teardown ops:

> > - drain_queue()

> > - destroy_queue()

> >

> >

> > The Marvell FastLinQ NIC HW engine:

> > ====================================

> > The Marvell NIC HW engine is capable of offloading the entire TCP/IP

> > stack and managing up to 64K connections per PF, already implemented and

> > upstream use cases for this include iWARP (by the Marvell qedr driver)

> > and iSCSI (by the Marvell qedi driver).

> > In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer

> > and is able to manage the IO level also in case of TCP re-transmissions

> > and OOO events.

> > The HW engine enables direct data placement (including the data digest CRC

> > calculation and validation) and direct data transmission (including data

> > digest CRC calculation).

> >

> >

> > The Marvell qedn driver:

> > ========================

> > The new driver will be added under "drivers/nvme/hw" and will be enabled

> > by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

> > As part of the qedn init, the driver will register as a pci device driver

> > and will work with the Marvell fastlinQ NIC.

> > As part of the probe, the driver will register to the nvme_tcp_offload

> > (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other

> > "qed_*_ops" which are used by the qede, qedr, qedf and qedi device

> > drivers.

> >

> >

> > QEDN Future work:

> > =================

> > - Support extended HW resources.

> > - Digest support.

> > - Devlink support for device configuration and TCP offload configurations.

> > - Statistics

> >

> >

> > Long term future work:

> > ======================

> > - The nvme-tcp-offload ULP target abstraction layer.

> > - The Marvell nvme-tcp-offload "qednt" target driver.

> >

> >

> > Changes since RFC v1:

> > =====================

> > - Fix nvme_tcp_ofld_ops return values.

> > - Remove NVMF_TRTYPE_TCP_OFFLOAD.

> > - Add nvme_tcp_ofld_poll() implementation.

> > - Fix nvme_tcp_ofld_queue_rq() to check map_sg() and send_req() return

> >    values.

> >

> > Changes since RFC v2:

> > =====================

> > - Add qedn - Marvell's NVMeTCP HW offload vendor driver init and probe

> >    (patches 8-11).

> > - Fixes in controller and queue level (patches 3-6).

> >

> > Changes since RFC v3:

> > =====================

> > - Add the full implementation of the nvme-tcp-offload layer including the

> >    new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new flows (ASYNC

> >    and timeout).

> > - Add nvme-tcp-offload device maximums: max_hw_sectors, max_segments.

> > - Add nvme-tcp-offload layer design and optimization changes.

> > - Add the qedn full implementation for the conn level, IO path and error

> >    handling.

> > - Add qed support for the new AHP HW.

> >

> >

> > Arie Gershberg (3):

> >    nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS

> >      definitions

> >    nvme-tcp-offload: Add controller level implementation

> >    nvme-tcp-offload: Add controller level error recovery implementation

> >

> > Dean Balandin (3):

> >    nvme-tcp-offload: Add device scan implementation

> >    nvme-tcp-offload: Add queue level implementation

> >    nvme-tcp-offload: Add IO level implementation

> >

> > Nikolay Assa (2):

> >    qed: Add IP services APIs support

> >    qedn: Add qedn_claim_dev API support

> >

> > Omkar Kulkarni (1):

> >    qed: Add qed-NVMeTCP personality

> >

> > Prabhakar Kushwaha (6):

> >    qed: Add support of HW filter block

> >    qedn: Add connection-level slowpath functionality

> >    qedn: Add support of configuring HW filter block

> >    qedn: Add support of Task and SGL

> >    qedn: Add support of NVME ICReq & ICResp

> >    qedn: Add support of ASYNC

> >

> > Shai Malin (12):

> >    qed: Add NVMeTCP Offload PF Level FW and HW HSI

> >    qed: Add NVMeTCP Offload Connection Level FW and HW HSI

> >    qed: Add NVMeTCP Offload IO Level FW and HW HSI

> >    qed: Add NVMeTCP Offload IO Level FW Initializations

> >    nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP

> >    nvme-tcp-offload: Add Timeout and ASYNC Support

> >    qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver

> >    qedn: Add qedn probe

> >    qedn: Add IRQ and fast-path resources initializations

> >    qedn: Add IO level nvme_req and fw_cq workqueues

> >    qedn: Add IO level fastpath functionality

> >    qedn: Add Connection and IO level recovery flows

> >

> >   MAINTAINERS                                   |   10 +

> >   drivers/net/ethernet/qlogic/Kconfig           |    3 +

> >   drivers/net/ethernet/qlogic/qed/Makefile      |    5 +

> >   drivers/net/ethernet/qlogic/qed/qed.h         |   16 +

> >   drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   32 +

> >   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    1 +

> >   drivers/net/ethernet/qlogic/qed/qed_dev.c     |  151 +-

> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    4 +-

> >   drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   31 +-

> >   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +

> >   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  868 +++++++++++

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++

> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  372 +++++

> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +

> >   .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++

> >   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-

> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +

> >   .../net/ethernet/qlogic/qed/qed_sp_commands.c |    1 +

> >   drivers/nvme/Kconfig                          |    1 +

> >   drivers/nvme/Makefile                         |    1 +

> >   drivers/nvme/host/Kconfig                     |   16 +

> >   drivers/nvme/host/Makefile                    |    3 +

> >   drivers/nvme/host/fabrics.c                   |    7 -

> >   drivers/nvme/host/fabrics.h                   |    7 +

> >   drivers/nvme/host/tcp-offload.c               | 1330 +++++++++++++++++

> >   drivers/nvme/host/tcp-offload.h               |  209 +++

> >   drivers/nvme/hw/Kconfig                       |    9 +

> >   drivers/nvme/hw/Makefile                      |    3 +

> >   drivers/nvme/hw/qedn/Makefile                 |    4 +

> >   drivers/nvme/hw/qedn/qedn.h                   |  435 ++++++

> >   drivers/nvme/hw/qedn/qedn_conn.c              |  999 +++++++++++++

> >   drivers/nvme/hw/qedn/qedn_main.c              | 1153 ++++++++++++++

> >   drivers/nvme/hw/qedn/qedn_task.c              |  977 ++++++++++++

> >   include/linux/qed/common_hsi.h                |    1 +

> >   include/linux/qed/nvmetcp_common.h            |  616 ++++++++

> >   include/linux/qed/qed_if.h                    |   22 +

> >   include/linux/qed/qed_nvmetcp_if.h            |  244 +++

> >   .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +

> >   39 files changed, 7947 insertions(+), 25 deletions(-)

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> >   create mode 100644 drivers/nvme/host/tcp-offload.c

> >   create mode 100644 drivers/nvme/host/tcp-offload.h

> >   create mode 100644 drivers/nvme/hw/Kconfig

> >   create mode 100644 drivers/nvme/hw/Makefile

> >   create mode 100644 drivers/nvme/hw/qedn/Makefile

> >   create mode 100644 drivers/nvme/hw/qedn/qedn.h

> >   create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c

> >   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

> >   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

> >   create mode 100644 include/linux/qed/nvmetcp_common.h

> >   create mode 100644 include/linux/qed/qed_nvmetcp_if.h

> >   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

> >

> I would structure this patchset slightly different, in putting the

> NVMe-oF implementation at the start of the patchset; this will be where

> you get most of the comment, and any change there will potentially

> reflect back on the driver implementation, too.

>

> Something to consider for the next round.


Will do. Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:23 p.m. UTC | #16
On 5/1/21 7:50 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch introduces the NVMeTCP device and PF level HSI and HSI

> > functionality in order to initialize and interact with the HW device.

> >

> > This patch is based on the qede, qedr, qedi, qedf drivers HSI.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > ---

> >   drivers/net/ethernet/qlogic/Kconfig           |   3 +

> >   drivers/net/ethernet/qlogic/qed/Makefile      |   2 +

> >   drivers/net/ethernet/qlogic/qed/qed.h         |   3 +

> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |   1 +

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 282 ++++++++++++++++++

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  51 ++++

> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   2 +

> >   include/linux/qed/common_hsi.h                |   1 +

> >   include/linux/qed/nvmetcp_common.h            |  54 ++++

> >   include/linux/qed/qed_if.h                    |  22 ++

> >   include/linux/qed/qed_nvmetcp_if.h            |  72 +++++

> >   11 files changed, 493 insertions(+)

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> >   create mode 100644 include/linux/qed/nvmetcp_common.h

> >   create mode 100644 include/linux/qed/qed_nvmetcp_if.h

> >

> > diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig

> > index 6b5ddb07ee83..98f430905ffa 100644

> > --- a/drivers/net/ethernet/qlogic/Kconfig

> > +++ b/drivers/net/ethernet/qlogic/Kconfig

> > @@ -110,6 +110,9 @@ config QED_RDMA

> >   config QED_ISCSI

> >       bool

> >

> > +config QED_NVMETCP

> > +     bool

> > +

> >   config QED_FCOE

> >       bool

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile

> > index 8251755ec18c..7cb0db67ba5b 100644

> > --- a/drivers/net/ethernet/qlogic/qed/Makefile

> > +++ b/drivers/net/ethernet/qlogic/qed/Makefile

> > @@ -28,6 +28,8 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o

> >   qed-$(CONFIG_QED_LL2) += qed_ll2.o

> >   qed-$(CONFIG_QED_OOO) += qed_ooo.o

> >

> > +qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o

> > +

> >   qed-$(CONFIG_QED_RDMA) +=   \

> >       qed_iwarp.o             \

> >       qed_rdma.o              \

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h

> > index a20cb8a0c377..91d4635009ab 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed.h

> > @@ -240,6 +240,7 @@ enum QED_FEATURE {

> >       QED_VF,

> >       QED_RDMA_CNQ,

> >       QED_ISCSI_CQ,

> > +     QED_NVMETCP_CQ = QED_ISCSI_CQ,

> >       QED_FCOE_CQ,

> >       QED_VF_L2_QUE,

> >       QED_MAX_FEATURES,

> > @@ -592,6 +593,7 @@ struct qed_hwfn {

> >       struct qed_ooo_info             *p_ooo_info;

> >       struct qed_rdma_info            *p_rdma_info;

> >       struct qed_iscsi_info           *p_iscsi_info;

> > +     struct qed_nvmetcp_info         *p_nvmetcp_info;

> >       struct qed_fcoe_info            *p_fcoe_info;

> >       struct qed_pf_params            pf_params;

> >

> > @@ -828,6 +830,7 @@ struct qed_dev {

> >               struct qed_eth_cb_ops           *eth;

> >               struct qed_fcoe_cb_ops          *fcoe;

> >               struct qed_iscsi_cb_ops         *iscsi;

> > +             struct qed_nvmetcp_cb_ops       *nvmetcp;

> >       } protocol_ops;

> >       void                            *ops_cookie;

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > index 559df9f4d656..24472f6a83c2 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > @@ -20,6 +20,7 @@

> >   #include <linux/qed/fcoe_common.h>

> >   #include <linux/qed/eth_common.h>

> >   #include <linux/qed/iscsi_common.h>

> > +#include <linux/qed/nvmetcp_common.h>

> >   #include <linux/qed/iwarp_common.h>

> >   #include <linux/qed/rdma_common.h>

> >   #include <linux/qed/roce_common.h>

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > new file mode 100644

> > index 000000000000..da3b5002d216

> > --- /dev/null

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > @@ -0,0 +1,282 @@

> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#include <linux/types.h>

> > +#include <asm/byteorder.h>

> > +#include <asm/param.h>

> > +#include <linux/delay.h>

> > +#include <linux/dma-mapping.h>

> > +#include <linux/etherdevice.h>

> > +#include <linux/kernel.h>

> > +#include <linux/log2.h>

> > +#include <linux/module.h>

> > +#include <linux/pci.h>

> > +#include <linux/stddef.h>

> > +#include <linux/string.h>

> > +#include <linux/errno.h>

> > +#include <linux/list.h>

> > +#include <linux/qed/qed_nvmetcp_if.h>

> > +#include "qed.h"

> > +#include "qed_cxt.h"

> > +#include "qed_dev_api.h"

> > +#include "qed_hsi.h"

> > +#include "qed_hw.h"

> > +#include "qed_int.h"

> > +#include "qed_nvmetcp.h"

> > +#include "qed_ll2.h"

> > +#include "qed_mcp.h"

> > +#include "qed_sp.h"

> > +#include "qed_reg_addr.h"

> > +

> > +static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,

> > +                                u16 echo, union event_ring_data *data,

> > +                                u8 fw_return_code)

> > +{

> > +     if (p_hwfn->p_nvmetcp_info->event_cb) {

> > +             struct qed_nvmetcp_info *p_nvmetcp = p_hwfn->p_nvmetcp_info;

> > +

> > +             return p_nvmetcp->event_cb(p_nvmetcp->event_context,

> > +                                      fw_event_code, data);

> > +     } else {

> > +             DP_NOTICE(p_hwfn, "nvmetcp async completion is not set\n");

> > +

> > +             return -EINVAL;

> > +     }

> > +}

> > +

> > +static int qed_sp_nvmetcp_func_start(struct qed_hwfn *p_hwfn,

> > +                                  enum spq_mode comp_mode,

> > +                                  struct qed_spq_comp_cb *p_comp_addr,

> > +                                  void *event_context,

> > +                                  nvmetcp_event_cb_t async_event_cb)

> > +{

> > +     struct nvmetcp_init_ramrod_params *p_ramrod = NULL;

> > +     struct qed_nvmetcp_pf_params *p_params = NULL;

> > +     struct scsi_init_func_queues *p_queue = NULL;

> > +     struct nvmetcp_spe_func_init *p_init = NULL;

> > +     struct qed_sp_init_data init_data = {};

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     int rc = 0;

> > +     u16 val;

> > +     u8 i;

> > +

> > +     /* Get SPQ entry */

> > +     init_data.cid = qed_spq_get_cid(p_hwfn);

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_INIT_FUNC,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     p_ramrod = &p_ent->ramrod.nvmetcp_init;

> > +     p_init = &p_ramrod->nvmetcp_init_spe;

> > +     p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> > +     p_queue = &p_init->q_params;

> > +

> > +     p_init->num_sq_pages_in_ring = p_params->num_sq_pages_in_ring;

> > +     p_init->num_r2tq_pages_in_ring = p_params->num_r2tq_pages_in_ring;

> > +     p_init->num_uhq_pages_in_ring = p_params->num_uhq_pages_in_ring;

> > +     p_init->ll2_rx_queue_id = RESC_START(p_hwfn, QED_LL2_RAM_QUEUE) +

> > +                                     p_params->ll2_ooo_queue_id;

> > +

> > +     SET_FIELD(p_init->flags, NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE, 1);

> > +

> > +     p_init->func_params.log_page_size = ilog2(PAGE_SIZE);

> > +     p_init->func_params.num_tasks = cpu_to_le16(p_params->num_tasks);

> > +     p_init->debug_flags = p_params->debug_mode;

> > +

> > +     DMA_REGPAIR_LE(p_queue->glbl_q_params_addr,

> > +                    p_params->glbl_q_params_addr);

> > +

> > +     p_queue->cq_num_entries = cpu_to_le16(QED_NVMETCP_FW_CQ_SIZE);

> > +     p_queue->num_queues = p_params->num_queues;

> > +     val = RESC_START(p_hwfn, QED_CMDQS_CQS);

> > +     p_queue->queue_relative_offset = cpu_to_le16((u16)val);

> > +     p_queue->cq_sb_pi = p_params->gl_rq_pi;

> > +

> > +     for (i = 0; i < p_params->num_queues; i++) {

> > +             val = qed_get_igu_sb_id(p_hwfn, i);

> > +             p_queue->cq_cmdq_sb_num_arr[i] = cpu_to_le16(val);

> > +     }

> > +

> > +     SET_FIELD(p_queue->q_validity,

> > +               SCSI_INIT_FUNC_QUEUES_CMD_VALID, 0);

> > +     p_queue->cmdq_num_entries = 0;

> > +     p_queue->bdq_resource_id = (u8)RESC_START(p_hwfn, QED_BDQ);

> > +

> > +     /* p_ramrod->tcp_init.min_rto = cpu_to_le16(p_params->min_rto); */

> > +     p_ramrod->tcp_init.two_msl_timer = cpu_to_le32(QED_TCP_TWO_MSL_TIMER);

> > +     p_ramrod->tcp_init.tx_sws_timer = cpu_to_le16(QED_TCP_SWS_TIMER);

> > +     p_init->half_way_close_timeout = cpu_to_le16(QED_TCP_HALF_WAY_CLOSE_TIMEOUT);

> > +     p_ramrod->tcp_init.max_fin_rt = QED_TCP_MAX_FIN_RT;

> > +

> > +     SET_FIELD(p_ramrod->nvmetcp_init_spe.params,

> > +               NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT, QED_TCP_MAX_FIN_RT);

> > +

> > +     p_hwfn->p_nvmetcp_info->event_context = event_context;

> > +     p_hwfn->p_nvmetcp_info->event_cb = async_event_cb;

> > +

> > +     qed_spq_register_async_cb(p_hwfn, PROTOCOLID_NVMETCP,

> > +                               qed_nvmetcp_async_event);

> > +

> > +     return qed_spq_post(p_hwfn, p_ent, NULL);

> > +}

> > +

> > +static int qed_sp_nvmetcp_func_stop(struct qed_hwfn *p_hwfn,

> > +                                 enum spq_mode comp_mode,

> > +                                 struct qed_spq_comp_cb *p_comp_addr)

> > +{

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     struct qed_sp_init_data init_data;

> > +     int rc;

> > +

> > +     /* Get SPQ entry */

> > +     memset(&init_data, 0, sizeof(init_data));

> > +     init_data.cid = qed_spq_get_cid(p_hwfn);

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     rc = qed_spq_post(p_hwfn, p_ent, NULL);

> > +

> > +     qed_spq_unregister_async_cb(p_hwfn, PROTOCOLID_NVMETCP);

> > +

> > +     return rc;

> > +}

> > +

> > +static int qed_fill_nvmetcp_dev_info(struct qed_dev *cdev,

> > +                                  struct qed_dev_nvmetcp_info *info)

> > +{

> > +     struct qed_hwfn *hwfn = QED_AFFIN_HWFN(cdev);

> > +     int rc;

> > +

> > +     memset(info, 0, sizeof(*info));

> > +     rc = qed_fill_dev_info(cdev, &info->common);

> > +

> > +     info->port_id = MFW_PORT(hwfn);

> > +     info->num_cqs = FEAT_NUM(hwfn, QED_NVMETCP_CQ);

> > +

> > +     return rc;

> > +}

> > +

> > +static void qed_register_nvmetcp_ops(struct qed_dev *cdev,

> > +                                  struct qed_nvmetcp_cb_ops *ops,

> > +                                  void *cookie)

> > +{

> > +     cdev->protocol_ops.nvmetcp = ops;

> > +     cdev->ops_cookie = cookie;

> > +}

> > +

> > +static int qed_nvmetcp_stop(struct qed_dev *cdev)

> > +{

> > +     int rc;

> > +

> > +     if (!(cdev->flags & QED_FLAG_STORAGE_STARTED)) {

> > +             DP_NOTICE(cdev, "nvmetcp already stopped\n");

> > +

> > +             return 0;

> > +     }

> > +

> > +     if (!hash_empty(cdev->connections)) {

> > +             DP_NOTICE(cdev,

> > +                       "Can't stop nvmetcp - not all connections were returned\n");

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     /* Stop the nvmetcp */

> > +     rc = qed_sp_nvmetcp_func_stop(QED_AFFIN_HWFN(cdev), QED_SPQ_MODE_EBLOCK,

> > +                                   NULL);

> > +     cdev->flags &= ~QED_FLAG_STORAGE_STARTED;

> > +

> > +     return rc;

> > +}

> > +

> > +static int qed_nvmetcp_start(struct qed_dev *cdev,

> > +                          struct qed_nvmetcp_tid *tasks,

> > +                          void *event_context,

> > +                          nvmetcp_event_cb_t async_event_cb)

> > +{

> > +     struct qed_tid_mem *tid_info;

> > +     int rc;

> > +

> > +     if (cdev->flags & QED_FLAG_STORAGE_STARTED) {

> > +             DP_NOTICE(cdev, "nvmetcp already started;\n");

> > +

> > +             return 0;

> > +     }

> > +

> > +     rc = qed_sp_nvmetcp_func_start(QED_AFFIN_HWFN(cdev),

> > +                                    QED_SPQ_MODE_EBLOCK, NULL,

> > +                                    event_context, async_event_cb);

> > +     if (rc) {

> > +             DP_NOTICE(cdev, "Failed to start nvmetcp\n");

> > +

> > +             return rc;

> > +     }

> > +

> > +     cdev->flags |= QED_FLAG_STORAGE_STARTED;

> > +     hash_init(cdev->connections);

> > +

> > +     if (!tasks)

> > +             return 0;

> > +

> > +     tid_info = kzalloc(sizeof(*tid_info), GFP_KERNEL);

> > +

> > +     if (!tid_info) {

> > +             qed_nvmetcp_stop(cdev);

> > +

> > +             return -ENOMEM;

> > +     }

> > +

> > +     rc = qed_cxt_get_tid_mem_info(QED_AFFIN_HWFN(cdev), tid_info);

> > +     if (rc) {

> > +             DP_NOTICE(cdev, "Failed to gather task information\n");

> > +             qed_nvmetcp_stop(cdev);

> > +             kfree(tid_info);

> > +

> > +             return rc;

> > +     }

> > +

> > +     /* Fill task information */

> > +     tasks->size = tid_info->tid_size;

> > +     tasks->num_tids_per_block = tid_info->num_tids_per_block;

> > +     memcpy(tasks->blocks, tid_info->blocks,

> > +            MAX_TID_BLOCKS_NVMETCP * sizeof(u8 *));

> > +

> > +     kfree(tid_info);

> > +

> > +     return 0;

> > +}

> > +

> > +static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

> > +     .common = &qed_common_ops_pass,

> > +     .ll2 = &qed_ll2_ops_pass,

> > +     .fill_dev_info = &qed_fill_nvmetcp_dev_info,

> > +     .register_ops = &qed_register_nvmetcp_ops,

> > +     .start = &qed_nvmetcp_start,

> > +     .stop = &qed_nvmetcp_stop,

> > +

> > +     /* Placeholder - Connection level ops */

> > +};

> > +

> > +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> > +{

> > +     return &qed_nvmetcp_ops_pass;

> > +}

> > +EXPORT_SYMBOL(qed_get_nvmetcp_ops);

> > +

> > +void qed_put_nvmetcp_ops(void)

> > +{

> > +}

> > +EXPORT_SYMBOL(qed_put_nvmetcp_ops);

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> > new file mode 100644

> > index 000000000000..774b46ade408

> > --- /dev/null

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> > @@ -0,0 +1,51 @@

> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#ifndef _QED_NVMETCP_H

> > +#define _QED_NVMETCP_H

> > +

> > +#include <linux/types.h>

> > +#include <linux/list.h>

> > +#include <linux/slab.h>

> > +#include <linux/spinlock.h>

> > +#include <linux/qed/tcp_common.h>

> > +#include <linux/qed/qed_nvmetcp_if.h>

> > +#include <linux/qed/qed_chain.h>

> > +#include "qed.h"

> > +#include "qed_hsi.h"

> > +#include "qed_mcp.h"

> > +#include "qed_sp.h"

> > +

> > +#define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)

> > +

> > +/* tcp parameters */

> > +#define QED_TCP_TWO_MSL_TIMER 4000

> > +#define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10

> > +#define QED_TCP_MAX_FIN_RT 2

> > +#define QED_TCP_SWS_TIMER 5000

> > +

> > +struct qed_nvmetcp_info {

> > +     spinlock_t lock; /* Connection resources. */

> > +     struct list_head free_list;

> > +     u16 max_num_outstanding_tasks;

> > +     void *event_context;

> > +     nvmetcp_event_cb_t event_cb;

> > +};

> > +

> > +#if IS_ENABLED(CONFIG_QED_NVMETCP)

> > +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);

> > +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);

> > +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn);

> > +

> > +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> > +static inline int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)

> > +{

> > +     return -EINVAL;

> > +}

> > +

> > +static inline void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn) {}

> > +static inline void qed_nvmetcp_free(struct qed_hwfn *p_hwfn) {}

> > +

> > +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> > +

> > +#endif

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > index 993f1357b6fc..525159e747a5 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > @@ -100,6 +100,8 @@ union ramrod_data {

> >       struct iscsi_spe_conn_mac_update iscsi_conn_mac_update;

> >       struct iscsi_spe_conn_termination iscsi_conn_terminate;

> >

> > +     struct nvmetcp_init_ramrod_params nvmetcp_init;

> > +

> >       struct vf_start_ramrod_data vf_start;

> >       struct vf_stop_ramrod_data vf_stop;

> >   };

> > diff --git a/include/linux/qed/common_hsi.h b/include/linux/qed/common_hsi.h

> > index 977807e1be53..59c5e5866607 100644

> > --- a/include/linux/qed/common_hsi.h

> > +++ b/include/linux/qed/common_hsi.h

> > @@ -703,6 +703,7 @@ enum mf_mode {

> >   /* Per-protocol connection types */

> >   enum protocol_type {

> >       PROTOCOLID_ISCSI,

> > +     PROTOCOLID_NVMETCP = PROTOCOLID_ISCSI,

> >       PROTOCOLID_FCOE,

> >       PROTOCOLID_ROCE,

> >       PROTOCOLID_CORE,

>

> Why not a separate Protocol ID?

> Don't you expect iSCSI and NVMe-TCP to be run at the same time?


PROTOCOLID determines the FW resource layout, which is the same for iSCSI
and NVMeTCP.
I will change PROTOCOLID_NVMETCP and PROTOCOLID_ISCSI to
PROTOCOLID_TCP_ULP.
iSCSI and NVMeTCP can run concurrently on the device, but not on the same PF.
Both iSCSI and NVMeTCP PFs will use PROTOCOLID_TCP_ULP

>

> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> > new file mode 100644

> > index 000000000000..e9ccfc07041d

> > --- /dev/null

> > +++ b/include/linux/qed/nvmetcp_common.h

> > @@ -0,0 +1,54 @@

> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#ifndef __NVMETCP_COMMON__

> > +#define __NVMETCP_COMMON__

> > +

> > +#include "tcp_common.h"

> > +

> > +/* NVMeTCP firmware function init parameters */

> > +struct nvmetcp_spe_func_init {

> > +     __le16 half_way_close_timeout;

> > +     u8 num_sq_pages_in_ring;

> > +     u8 num_r2tq_pages_in_ring;

> > +     u8 num_uhq_pages_in_ring;

> > +     u8 ll2_rx_queue_id;

> > +     u8 flags;

> > +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_MASK 0x1

> > +#define NVMETCP_SPE_FUNC_INIT_COUNTERS_EN_SHIFT 0

> > +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_MASK 0x1

> > +#define NVMETCP_SPE_FUNC_INIT_NVMETCP_MODE_SHIFT 1

> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_MASK 0x3F

> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED0_SHIFT 2

> > +     u8 debug_flags;

> > +     __le16 reserved1;

> > +     u8 params;

> > +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_MASK        0xF

> > +#define NVMETCP_SPE_FUNC_INIT_MAX_SYN_RT_SHIFT       0

> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_MASK 0xF

> > +#define NVMETCP_SPE_FUNC_INIT_RESERVED1_SHIFT        4

> > +     u8 reserved2[5];

> > +     struct scsi_init_func_params func_params;

> > +     struct scsi_init_func_queues q_params;

> > +};

> > +

> > +/* NVMeTCP init params passed by driver to FW in NVMeTCP init ramrod. */

> > +struct nvmetcp_init_ramrod_params {

> > +     struct nvmetcp_spe_func_init nvmetcp_init_spe;

> > +     struct tcp_init_params tcp_init;

> > +};

> > +

> > +/* NVMeTCP Ramrod Command IDs */

> > +enum nvmetcp_ramrod_cmd_id {

> > +     NVMETCP_RAMROD_CMD_ID_UNUSED = 0,

> > +     NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,

> > +     NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,

> > +     MAX_NVMETCP_RAMROD_CMD_ID

> > +};

> > +

> > +struct nvmetcp_glbl_queue_entry {

> > +     struct regpair cq_pbl_addr;

> > +     struct regpair reserved;

> > +};

> > +

> > +#endif /* __NVMETCP_COMMON__ */

> > diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h

> > index 68d17a4fbf20..524f57821ba2 100644

> > --- a/include/linux/qed/qed_if.h

> > +++ b/include/linux/qed/qed_if.h

> > @@ -542,6 +542,26 @@ struct qed_iscsi_pf_params {

> >       u8 bdq_pbl_num_entries[3];

> >   };

> >

> > +struct qed_nvmetcp_pf_params {

> > +     u64 glbl_q_params_addr;

> > +     u16 cq_num_entries;

> > +

> > +     u16 num_cons;

> > +     u16 num_tasks;

> > +

> > +     u8 num_sq_pages_in_ring;

> > +     u8 num_r2tq_pages_in_ring;

> > +     u8 num_uhq_pages_in_ring;

> > +

> > +     u8 num_queues;

> > +     u8 gl_rq_pi;

> > +     u8 gl_cmd_pi;

> > +     u8 debug_mode;

> > +     u8 ll2_ooo_queue_id;

> > +

> > +     u16 min_rto;

> > +};

> > +

> >   struct qed_rdma_pf_params {

> >       /* Supplied to QED during resource allocation (may affect the ILT and

> >        * the doorbell BAR).

> > @@ -560,6 +580,7 @@ struct qed_pf_params {

> >       struct qed_eth_pf_params eth_pf_params;

> >       struct qed_fcoe_pf_params fcoe_pf_params;

> >       struct qed_iscsi_pf_params iscsi_pf_params;

> > +     struct qed_nvmetcp_pf_params nvmetcp_pf_params;

> >       struct qed_rdma_pf_params rdma_pf_params;

> >   };

> >

> > @@ -662,6 +683,7 @@ enum qed_sb_type {

> >   enum qed_protocol {

> >       QED_PROTOCOL_ETH,

> >       QED_PROTOCOL_ISCSI,

> > +     QED_PROTOCOL_NVMETCP = QED_PROTOCOL_ISCSI,

> >       QED_PROTOCOL_FCOE,

> >   };

> >

> > diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> > new file mode 100644

> > index 000000000000..abc1f41862e3

> > --- /dev/null

> > +++ b/include/linux/qed/qed_nvmetcp_if.h

> > @@ -0,0 +1,72 @@

> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#ifndef _QED_NVMETCP_IF_H

> > +#define _QED_NVMETCP_IF_H

> > +#include <linux/types.h>

> > +#include <linux/qed/qed_if.h>

> > +

> > +#define QED_NVMETCP_MAX_IO_SIZE      0x800000

> > +

> > +typedef int (*nvmetcp_event_cb_t) (void *context,

> > +                                u8 fw_event_code, void *fw_handle);

> > +

> > +struct qed_dev_nvmetcp_info {

> > +     struct qed_dev_info common;

> > +

> > +     u8 port_id;  /* Physical port */

> > +     u8 num_cqs;

> > +};

> > +

> > +#define MAX_TID_BLOCKS_NVMETCP (512)

> > +struct qed_nvmetcp_tid {

> > +     u32 size;               /* In bytes per task */

> > +     u32 num_tids_per_block;

> > +     u8 *blocks[MAX_TID_BLOCKS_NVMETCP];

> > +};

> > +

> > +struct qed_nvmetcp_cb_ops {

> > +     struct qed_common_cb_ops common;

> > +};

> > +

> > +/**

> > + * struct qed_nvmetcp_ops - qed NVMeTCP operations.

> > + * @common:          common operations pointer

> > + * @ll2:             light L2 operations pointer

> > + * @fill_dev_info:   fills NVMeTCP specific information

> > + *                   @param cdev

> > + *                   @param info

> > + *                   @return 0 on success, otherwise error value.

> > + * @register_ops:    register nvmetcp operations

> > + *                   @param cdev

> > + *                   @param ops - specified using qed_nvmetcp_cb_ops

> > + *                   @param cookie - driver private

> > + * @start:           nvmetcp in FW

> > + *                   @param cdev

> > + *                   @param tasks - qed will fill information about tasks

> > + *                   return 0 on success, otherwise error value.

> > + * @stop:            nvmetcp in FW

> > + *                   @param cdev

> > + *                   return 0 on success, otherwise error value.

> > + */

> > +struct qed_nvmetcp_ops {

> > +     const struct qed_common_ops *common;

> > +

> > +     const struct qed_ll2_ops *ll2;

> > +

> > +     int (*fill_dev_info)(struct qed_dev *cdev,

> > +                          struct qed_dev_nvmetcp_info *info);

> > +

> > +     void (*register_ops)(struct qed_dev *cdev,

> > +                          struct qed_nvmetcp_cb_ops *ops, void *cookie);

> > +

> > +     int (*start)(struct qed_dev *cdev,

> > +                  struct qed_nvmetcp_tid *tasks,

> > +                  void *event_context, nvmetcp_event_cb_t async_event_cb);

> > +

> > +     int (*stop)(struct qed_dev *cdev);

> > +};

> > +

> > +const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> > +void qed_put_nvmetcp_ops(void);

> > +#endif

> >

> As mentioned, please rearrange the patchset to have the NVMe-TCP patches

> first, then the driver specific bits.


Sure.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:25 p.m. UTC | #17
On 5/1/21 8:28 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch introduces the NVMeTCP HSI and HSI functionality in order to

> > initialize and interact with the HW device as part of the connection level

> > HSI.

> >

> > This includes:

> > - Connection offload: offload a TCP connection to the FW.

> > - Connection update: update the ICReq-ICResp params

> > - Connection clear SQ: outstanding IOs FW flush.

> > - Connection termination: terminate the TCP connection and flush the FW.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > ---

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c | 580 +++++++++++++++++-

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  63 ++

> >   drivers/net/ethernet/qlogic/qed/qed_sp.h      |   3 +

> >   include/linux/qed/nvmetcp_common.h            | 143 +++++

> >   include/linux/qed/qed_nvmetcp_if.h            |  94 +++

> >   5 files changed, 881 insertions(+), 2 deletions(-)

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > index da3b5002d216..79bd1cc6677f 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > @@ -259,6 +259,578 @@ static int qed_nvmetcp_start(struct qed_dev *cdev,

> >       return 0;

> >   }

> >

> > +static struct qed_hash_nvmetcp_con *qed_nvmetcp_get_hash(struct qed_dev *cdev,

> > +                                                      u32 handle)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con = NULL;

> > +

> > +     if (!(cdev->flags & QED_FLAG_STORAGE_STARTED))

> > +             return NULL;

> > +

> > +     hash_for_each_possible(cdev->connections, hash_con, node, handle) {

> > +             if (hash_con->con->icid == handle)

> > +                     break;

> > +     }

> > +

> > +     if (!hash_con || hash_con->con->icid != handle)

> > +             return NULL;

> > +

> > +     return hash_con;

> > +}

> > +

> > +static int qed_sp_nvmetcp_conn_offload(struct qed_hwfn *p_hwfn,

> > +                                    struct qed_nvmetcp_conn *p_conn,

> > +                                    enum spq_mode comp_mode,

> > +                                    struct qed_spq_comp_cb *p_comp_addr)

> > +{

> > +     struct nvmetcp_spe_conn_offload *p_ramrod = NULL;

> > +     struct tcp_offload_params_opt2 *p_tcp2 = NULL;

> > +     struct qed_sp_init_data init_data = { 0 };

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     dma_addr_t r2tq_pbl_addr;

> > +     dma_addr_t xhq_pbl_addr;

> > +     dma_addr_t uhq_pbl_addr;

> > +     u16 physical_q;

> > +     int rc = 0;

> > +     u32 dval;

> > +     u8 i;

> > +

> > +     /* Get SPQ entry */

> > +     init_data.cid = p_conn->icid;

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_offload;

> > +

> > +     /* Transmission PQ is the first of the PF */

> > +     physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_OFLD);

> > +     p_conn->physical_q0 = cpu_to_le16(physical_q);

> > +     p_ramrod->nvmetcp.physical_q0 = cpu_to_le16(physical_q);

> > +

> > +     /* nvmetcp Pure-ACK PQ */

> > +     physical_q = qed_get_cm_pq_idx(p_hwfn, PQ_FLAGS_ACK);

> > +     p_conn->physical_q1 = cpu_to_le16(physical_q);

> > +     p_ramrod->nvmetcp.physical_q1 = cpu_to_le16(physical_q);

> > +

> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> > +

> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.sq_pbl_addr, p_conn->sq_pbl_addr);

> > +

> > +     r2tq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->r2tq);

> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.r2tq_pbl_addr, r2tq_pbl_addr);

> > +

> > +     xhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->xhq);

> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.xhq_pbl_addr, xhq_pbl_addr);

> > +

> > +     uhq_pbl_addr = qed_chain_get_pbl_phys(&p_conn->uhq);

> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.uhq_pbl_addr, uhq_pbl_addr);

> > +

> > +     p_ramrod->nvmetcp.flags = p_conn->offl_flags;

> > +     p_ramrod->nvmetcp.default_cq = p_conn->default_cq;

> > +     p_ramrod->nvmetcp.initial_ack = 0;

> > +

> > +     DMA_REGPAIR_LE(p_ramrod->nvmetcp.nvmetcp.cccid_itid_table_addr,

> > +                    p_conn->nvmetcp_cccid_itid_table_addr);

> > +     p_ramrod->nvmetcp.nvmetcp.cccid_max_range =

> > +              cpu_to_le16(p_conn->nvmetcp_cccid_max_range);

> > +

> > +     p_tcp2 = &p_ramrod->tcp;

> > +

> > +     qed_set_fw_mac_addr(&p_tcp2->remote_mac_addr_hi,

> > +                         &p_tcp2->remote_mac_addr_mid,

> > +                         &p_tcp2->remote_mac_addr_lo, p_conn->remote_mac);

> > +     qed_set_fw_mac_addr(&p_tcp2->local_mac_addr_hi,

> > +                         &p_tcp2->local_mac_addr_mid,

> > +                         &p_tcp2->local_mac_addr_lo, p_conn->local_mac);

> > +

> > +     p_tcp2->vlan_id = cpu_to_le16(p_conn->vlan_id);

> > +     p_tcp2->flags = cpu_to_le16(p_conn->tcp_flags);

> > +

> > +     p_tcp2->ip_version = p_conn->ip_version;

> > +     for (i = 0; i < 4; i++) {

> > +             dval = p_conn->remote_ip[i];

> > +             p_tcp2->remote_ip[i] = cpu_to_le32(dval);

> > +             dval = p_conn->local_ip[i];

> > +             p_tcp2->local_ip[i] = cpu_to_le32(dval);

> > +     }

> > +

>

> What is this?

> Some convoluted way of assigning the IP address in little endian?

> Pointless if it's IPv4, as then each bit is just one byte.

> And if it's for IPv6, what do you do for IPv4?

> And isn't there a helper for it?


Endianity here is only for BE machines.
I haven't found a relevant helper function,
Will re-write to have cleaner implementation separately for ipv4 and ipv6.

>

> > +     p_tcp2->flow_label = cpu_to_le32(p_conn->flow_label);

> > +     p_tcp2->ttl = p_conn->ttl;

> > +     p_tcp2->tos_or_tc = p_conn->tos_or_tc;

> > +     p_tcp2->remote_port = cpu_to_le16(p_conn->remote_port);

> > +     p_tcp2->local_port = cpu_to_le16(p_conn->local_port);

> > +     p_tcp2->mss = cpu_to_le16(p_conn->mss);

> > +     p_tcp2->rcv_wnd_scale = p_conn->rcv_wnd_scale;

> > +     p_tcp2->connect_mode = p_conn->connect_mode;

> > +     p_tcp2->cwnd = cpu_to_le32(p_conn->cwnd);

> > +     p_tcp2->ka_max_probe_cnt = p_conn->ka_max_probe_cnt;

> > +     p_tcp2->ka_timeout = cpu_to_le32(p_conn->ka_timeout);

> > +     p_tcp2->max_rt_time = cpu_to_le32(p_conn->max_rt_time);

> > +     p_tcp2->ka_interval = cpu_to_le32(p_conn->ka_interval);

> > +

> > +     return qed_spq_post(p_hwfn, p_ent, NULL);

> > +}

> > +

> > +static int qed_sp_nvmetcp_conn_update(struct qed_hwfn *p_hwfn,

> > +                                   struct qed_nvmetcp_conn *p_conn,

> > +                                   enum spq_mode comp_mode,

> > +                                   struct qed_spq_comp_cb *p_comp_addr)

> > +{

> > +     struct nvmetcp_conn_update_ramrod_params *p_ramrod = NULL;

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     struct qed_sp_init_data init_data;

> > +     int rc = -EINVAL;

> > +     u32 dval;

> > +

> > +     /* Get SPQ entry */

> > +     memset(&init_data, 0, sizeof(init_data));

> > +     init_data.cid = p_conn->icid;

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_UPDATE_CONN,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_update;

> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> > +     p_ramrod->flags = p_conn->update_flag;

> > +     p_ramrod->max_seq_size = cpu_to_le32(p_conn->max_seq_size);

> > +     dval = p_conn->max_recv_pdu_length;

> > +     p_ramrod->max_recv_pdu_length = cpu_to_le32(dval);

> > +     dval = p_conn->max_send_pdu_length;

> > +     p_ramrod->max_send_pdu_length = cpu_to_le32(dval);

> > +     dval = p_conn->first_seq_length;

> > +     p_ramrod->first_seq_length = cpu_to_le32(dval);

> > +

> > +     return qed_spq_post(p_hwfn, p_ent, NULL);

> > +}

> > +

> > +static int qed_sp_nvmetcp_conn_terminate(struct qed_hwfn *p_hwfn,

> > +                                      struct qed_nvmetcp_conn *p_conn,

> > +                                      enum spq_mode comp_mode,

> > +                                      struct qed_spq_comp_cb *p_comp_addr)

> > +{

> > +     struct nvmetcp_spe_conn_termination *p_ramrod = NULL;

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     struct qed_sp_init_data init_data;

> > +     int rc = -EINVAL;

> > +

> > +     /* Get SPQ entry */

> > +     memset(&init_data, 0, sizeof(init_data));

> > +     init_data.cid = p_conn->icid;

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     p_ramrod = &p_ent->ramrod.nvmetcp_conn_terminate;

> > +     p_ramrod->conn_id = cpu_to_le16(p_conn->conn_id);

> > +     p_ramrod->abortive = p_conn->abortive_dsconnect;

> > +

> > +     return qed_spq_post(p_hwfn, p_ent, NULL);

> > +}

> > +

> > +static int qed_sp_nvmetcp_conn_clear_sq(struct qed_hwfn *p_hwfn,

> > +                                     struct qed_nvmetcp_conn *p_conn,

> > +                                     enum spq_mode comp_mode,

> > +                                     struct qed_spq_comp_cb *p_comp_addr)

> > +{

> > +     struct qed_spq_entry *p_ent = NULL;

> > +     struct qed_sp_init_data init_data;

> > +     int rc = -EINVAL;

> > +

> > +     /* Get SPQ entry */

> > +     memset(&init_data, 0, sizeof(init_data));

> > +     init_data.cid = p_conn->icid;

> > +     init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;

> > +     init_data.comp_mode = comp_mode;

> > +     init_data.p_comp_data = p_comp_addr;

> > +

> > +     rc = qed_sp_init_request(p_hwfn, &p_ent,

> > +                              NVMETCP_RAMROD_CMD_ID_CLEAR_SQ,

> > +                              PROTOCOLID_NVMETCP, &init_data);

> > +     if (rc)

> > +             return rc;

> > +

> > +     return qed_spq_post(p_hwfn, p_ent, NULL);

> > +}

> > +

> > +static void __iomem *qed_nvmetcp_get_db_addr(struct qed_hwfn *p_hwfn, u32 cid)

> > +{

> > +     return (u8 __iomem *)p_hwfn->doorbells +

> > +                          qed_db_addr(cid, DQ_DEMS_LEGACY);

> > +}

> > +

> > +static int qed_nvmetcp_allocate_connection(struct qed_hwfn *p_hwfn,

> > +                                        struct qed_nvmetcp_conn **p_out_conn)

> > +{

> > +     struct qed_chain_init_params params = {

> > +             .mode           = QED_CHAIN_MODE_PBL,

> > +             .intended_use   = QED_CHAIN_USE_TO_CONSUME_PRODUCE,

> > +             .cnt_type       = QED_CHAIN_CNT_TYPE_U16,

> > +     };

> > +     struct qed_nvmetcp_pf_params *p_params = NULL;

> > +     struct qed_nvmetcp_conn *p_conn = NULL;

> > +     int rc = 0;

> > +

> > +     /* Try finding a free connection that can be used */

> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +     if (!list_empty(&p_hwfn->p_nvmetcp_info->free_list))

> > +             p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,

> > +                                       struct qed_nvmetcp_conn, list_entry);

> > +     if (p_conn) {

> > +             list_del(&p_conn->list_entry);

> > +             spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +             *p_out_conn = p_conn;

> > +

> > +             return 0;

> > +     }

> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +

> > +     /* Need to allocate a new connection */

> > +     p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> > +

> > +     p_conn = kzalloc(sizeof(*p_conn), GFP_KERNEL);

> > +     if (!p_conn)

> > +             return -ENOMEM;

> > +

> > +     params.num_elems = p_params->num_r2tq_pages_in_ring *

> > +                        QED_CHAIN_PAGE_SIZE / sizeof(struct nvmetcp_wqe);

> > +     params.elem_size = sizeof(struct nvmetcp_wqe);

> > +

> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->r2tq, &params);

> > +     if (rc)

> > +             goto nomem_r2tq;

> > +

> > +     params.num_elems = p_params->num_uhq_pages_in_ring *

> > +                        QED_CHAIN_PAGE_SIZE / sizeof(struct iscsi_uhqe);

> > +     params.elem_size = sizeof(struct iscsi_uhqe);

> > +

> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->uhq, &params);

> > +     if (rc)

> > +             goto nomem_uhq;

> > +

> > +     params.elem_size = sizeof(struct iscsi_xhqe);

> > +

> > +     rc = qed_chain_alloc(p_hwfn->cdev, &p_conn->xhq, &params);

> > +     if (rc)

> > +             goto nomem;

> > +

> > +     p_conn->free_on_delete = true;

> > +     *p_out_conn = p_conn;

> > +

> > +     return 0;

> > +

> > +nomem:

> > +     qed_chain_free(p_hwfn->cdev, &p_conn->uhq);

> > +nomem_uhq:

> > +     qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);

> > +nomem_r2tq:

> > +     kfree(p_conn);

> > +

> > +     return -ENOMEM;

> > +}

> > +

> > +static int qed_nvmetcp_acquire_connection(struct qed_hwfn *p_hwfn,

> > +                                       struct qed_nvmetcp_conn **p_out_conn)

> > +{

> > +     struct qed_nvmetcp_conn *p_conn = NULL;

> > +     int rc = 0;

> > +     u32 icid;

> > +

> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +     rc = qed_cxt_acquire_cid(p_hwfn, PROTOCOLID_NVMETCP, &icid);

> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +

> > +     if (rc)

> > +             return rc;

> > +

> > +     rc = qed_nvmetcp_allocate_connection(p_hwfn, &p_conn);

> > +     if (rc) {

> > +             spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +             qed_cxt_release_cid(p_hwfn, icid);

> > +             spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +

> > +             return rc;

> > +     }

> > +

> > +     p_conn->icid = icid;

> > +     p_conn->conn_id = (u16)icid;

> > +     p_conn->fw_cid = (p_hwfn->hw_info.opaque_fid << 16) | icid;

> > +     *p_out_conn = p_conn;

> > +

> > +     return rc;

> > +}

> > +

> > +static void qed_nvmetcp_release_connection(struct qed_hwfn *p_hwfn,

> > +                                        struct qed_nvmetcp_conn *p_conn)

> > +{

> > +     spin_lock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +     list_add_tail(&p_conn->list_entry, &p_hwfn->p_nvmetcp_info->free_list);

> > +     qed_cxt_release_cid(p_hwfn, p_conn->icid);

> > +     spin_unlock_bh(&p_hwfn->p_nvmetcp_info->lock);

> > +}

> > +

> > +static void qed_nvmetcp_free_connection(struct qed_hwfn *p_hwfn,

> > +                                     struct qed_nvmetcp_conn *p_conn)

> > +{

> > +     qed_chain_free(p_hwfn->cdev, &p_conn->xhq);

> > +     qed_chain_free(p_hwfn->cdev, &p_conn->uhq);

> > +     qed_chain_free(p_hwfn->cdev, &p_conn->r2tq);

> > +

> > +     kfree(p_conn);

> > +}

> > +

> > +int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn)

> > +{

> > +     struct qed_nvmetcp_info *p_nvmetcp_info;

> > +

> > +     p_nvmetcp_info = kzalloc(sizeof(*p_nvmetcp_info), GFP_KERNEL);

> > +     if (!p_nvmetcp_info)

> > +             return -ENOMEM;

> > +

> > +     INIT_LIST_HEAD(&p_nvmetcp_info->free_list);

> > +

> > +     p_hwfn->p_nvmetcp_info = p_nvmetcp_info;

> > +

> > +     return 0;

> > +}

> > +

> > +void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn)

> > +{

> > +     spin_lock_init(&p_hwfn->p_nvmetcp_info->lock);

> > +}

> > +

> > +void qed_nvmetcp_free(struct qed_hwfn *p_hwfn)

> > +{

> > +     struct qed_nvmetcp_conn *p_conn = NULL;

> > +

> > +     if (!p_hwfn->p_nvmetcp_info)

> > +             return;

> > +

> > +     while (!list_empty(&p_hwfn->p_nvmetcp_info->free_list)) {

> > +             p_conn = list_first_entry(&p_hwfn->p_nvmetcp_info->free_list,

> > +                                       struct qed_nvmetcp_conn, list_entry);

> > +             if (p_conn) {

> > +                     list_del(&p_conn->list_entry);

> > +                     qed_nvmetcp_free_connection(p_hwfn, p_conn);

> > +             }

> > +     }

> > +

> > +     kfree(p_hwfn->p_nvmetcp_info);

> > +     p_hwfn->p_nvmetcp_info = NULL;

> > +}

> > +

> > +static int qed_nvmetcp_acquire_conn(struct qed_dev *cdev,

> > +                                 u32 *handle,

> > +                                 u32 *fw_cid, void __iomem **p_doorbell)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +     int rc;

> > +

> > +     /* Allocate a hashed connection */

> > +     hash_con = kzalloc(sizeof(*hash_con), GFP_ATOMIC);

> > +     if (!hash_con)

> > +             return -ENOMEM;

> > +

> > +     /* Acquire the connection */

> > +     rc = qed_nvmetcp_acquire_connection(QED_AFFIN_HWFN(cdev),

> > +                                         &hash_con->con);

> > +     if (rc) {

> > +             DP_NOTICE(cdev, "Failed to acquire Connection\n");

> > +             kfree(hash_con);

> > +

> > +             return rc;

> > +     }

> > +

> > +     /* Added the connection to hash table */

> > +     *handle = hash_con->con->icid;

> > +     *fw_cid = hash_con->con->fw_cid;

> > +     hash_add(cdev->connections, &hash_con->node, *handle);

> > +

> > +     if (p_doorbell)

> > +             *p_doorbell = qed_nvmetcp_get_db_addr(QED_AFFIN_HWFN(cdev),

> > +                                                   *handle);

> > +

> > +     return 0;

> > +}

> > +

> > +static int qed_nvmetcp_release_conn(struct qed_dev *cdev, u32 handle)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +

> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);

> > +     if (!hash_con) {

> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> > +                       handle);

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     hlist_del(&hash_con->node);

> > +     qed_nvmetcp_release_connection(QED_AFFIN_HWFN(cdev), hash_con->con);

> > +     kfree(hash_con);

> > +

> > +     return 0;

> > +}

> > +

> > +static int qed_nvmetcp_offload_conn(struct qed_dev *cdev, u32 handle,

> > +                                 struct qed_nvmetcp_params_offload *conn_info)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +     struct qed_nvmetcp_conn *con;

> > +

> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);

> > +     if (!hash_con) {

> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> > +                       handle);

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     /* Update the connection with information from the params */

> > +     con = hash_con->con;

> > +

> > +     /* FW initializations */

> > +     con->layer_code = NVMETCP_SLOW_PATH_LAYER_CODE;

> > +     con->sq_pbl_addr = conn_info->sq_pbl_addr;

> > +     con->nvmetcp_cccid_max_range = conn_info->nvmetcp_cccid_max_range;

> > +     con->nvmetcp_cccid_itid_table_addr = conn_info->nvmetcp_cccid_itid_table_addr;

> > +     con->default_cq = conn_info->default_cq;

> > +

> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE, 0);

> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE, 1);

> > +     SET_FIELD(con->offl_flags, NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B, 1);

> > +

> > +     /* Networking and TCP stack initializations */

> > +     ether_addr_copy(con->local_mac, conn_info->src.mac);

> > +     ether_addr_copy(con->remote_mac, conn_info->dst.mac);

> > +     memcpy(con->local_ip, conn_info->src.ip, sizeof(con->local_ip));

> > +     memcpy(con->remote_ip, conn_info->dst.ip, sizeof(con->remote_ip));

> > +     con->local_port = conn_info->src.port;

> > +     con->remote_port = conn_info->dst.port;

> > +     con->vlan_id = conn_info->vlan_id;

> > +

> > +     if (conn_info->timestamp_en)

> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_TS_EN, 1);

> > +

> > +     if (conn_info->delayed_ack_en)

> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_DA_EN, 1);

> > +

> > +     if (conn_info->tcp_keep_alive_en)

> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_KA_EN, 1);

> > +

> > +     if (conn_info->ecn_en)

> > +             SET_FIELD(con->tcp_flags, TCP_OFFLOAD_PARAMS_OPT2_ECN_EN, 1);

> > +

> > +     con->ip_version = conn_info->ip_version;

> > +     con->flow_label = QED_TCP_FLOW_LABEL;

> > +     con->ka_max_probe_cnt = conn_info->ka_max_probe_cnt;

> > +     con->ka_timeout = conn_info->ka_timeout;

> > +     con->ka_interval = conn_info->ka_interval;

> > +     con->max_rt_time = conn_info->max_rt_time;

> > +     con->ttl = conn_info->ttl;

> > +     con->tos_or_tc = conn_info->tos_or_tc;

> > +     con->mss = conn_info->mss;

> > +     con->cwnd = conn_info->cwnd;

> > +     con->rcv_wnd_scale = conn_info->rcv_wnd_scale;

> > +     con->connect_mode = 0; /* TCP_CONNECT_ACTIVE */

> > +

> > +     return qed_sp_nvmetcp_conn_offload(QED_AFFIN_HWFN(cdev), con,

> > +                                      QED_SPQ_MODE_EBLOCK, NULL);

> > +}

> > +

> > +static int qed_nvmetcp_update_conn(struct qed_dev *cdev,

> > +                                u32 handle,

> > +                                struct qed_nvmetcp_params_update *conn_info)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +     struct qed_nvmetcp_conn *con;

> > +

> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);

> > +     if (!hash_con) {

> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> > +                       handle);

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     /* Update the connection with information from the params */

> > +     con = hash_con->con;

> > +

> > +     SET_FIELD(con->update_flag,

> > +               ISCSI_CONN_UPDATE_RAMROD_PARAMS_INITIAL_R2T, 0);

> > +     SET_FIELD(con->update_flag,

> > +               ISCSI_CONN_UPDATE_RAMROD_PARAMS_IMMEDIATE_DATA, 1);

> > +

> > +     if (conn_info->hdr_digest_en)

> > +             SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_HD_EN, 1);

> > +

> > +     if (conn_info->data_digest_en)

> > +             SET_FIELD(con->update_flag, ISCSI_CONN_UPDATE_RAMROD_PARAMS_DD_EN, 1);

> > +

> > +     /* Placeholder - initialize pfv, cpda, hpda */

> > +

> > +     con->max_seq_size = conn_info->max_io_size;

> > +     con->max_recv_pdu_length = conn_info->max_recv_pdu_length;

> > +     con->max_send_pdu_length = conn_info->max_send_pdu_length;

> > +     con->first_seq_length = conn_info->max_io_size;

> > +

> > +     return qed_sp_nvmetcp_conn_update(QED_AFFIN_HWFN(cdev), con,

> > +                                     QED_SPQ_MODE_EBLOCK, NULL);

> > +}

> > +

> > +static int qed_nvmetcp_clear_conn_sq(struct qed_dev *cdev, u32 handle)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +

> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);

> > +     if (!hash_con) {

> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> > +                       handle);

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     return qed_sp_nvmetcp_conn_clear_sq(QED_AFFIN_HWFN(cdev), hash_con->con,

> > +                                         QED_SPQ_MODE_EBLOCK, NULL);

> > +}

> > +

> > +static int qed_nvmetcp_destroy_conn(struct qed_dev *cdev,

> > +                                 u32 handle, u8 abrt_conn)

> > +{

> > +     struct qed_hash_nvmetcp_con *hash_con;

> > +

> > +     hash_con = qed_nvmetcp_get_hash(cdev, handle);

> > +     if (!hash_con) {

> > +             DP_NOTICE(cdev, "Failed to find connection for handle %d\n",

> > +                       handle);

> > +

> > +             return -EINVAL;

> > +     }

> > +

> > +     hash_con->con->abortive_dsconnect = abrt_conn;

> > +

> > +     return qed_sp_nvmetcp_conn_terminate(QED_AFFIN_HWFN(cdev), hash_con->con,

> > +                                        QED_SPQ_MODE_EBLOCK, NULL);

> > +}

> > +

> >   static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

> >       .common = &qed_common_ops_pass,

> >       .ll2 = &qed_ll2_ops_pass,

> > @@ -266,8 +838,12 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

> >       .register_ops = &qed_register_nvmetcp_ops,

> >       .start = &qed_nvmetcp_start,

> >       .stop = &qed_nvmetcp_stop,

> > -

> > -     /* Placeholder - Connection level ops */

> > +     .acquire_conn = &qed_nvmetcp_acquire_conn,

> > +     .release_conn = &qed_nvmetcp_release_conn,

> > +     .offload_conn = &qed_nvmetcp_offload_conn,

> > +     .update_conn = &qed_nvmetcp_update_conn,

> > +     .destroy_conn = &qed_nvmetcp_destroy_conn,

> > +     .clear_sq = &qed_nvmetcp_clear_conn_sq,

> >   };

> >

> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> > index 774b46ade408..749169f0bdb1 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h

> > @@ -19,6 +19,7 @@

> >   #define QED_NVMETCP_FW_CQ_SIZE (4 * 1024)

> >

> >   /* tcp parameters */

> > +#define QED_TCP_FLOW_LABEL 0

> >   #define QED_TCP_TWO_MSL_TIMER 4000

> >   #define QED_TCP_HALF_WAY_CLOSE_TIMEOUT 10

> >   #define QED_TCP_MAX_FIN_RT 2

> > @@ -32,6 +33,68 @@ struct qed_nvmetcp_info {

> >       nvmetcp_event_cb_t event_cb;

> >   };

> >

> > +struct qed_hash_nvmetcp_con {

> > +     struct hlist_node node;

> > +     struct qed_nvmetcp_conn *con;

> > +};

> > +

> > +struct qed_nvmetcp_conn {

> > +     struct list_head list_entry;

> > +     bool free_on_delete;

> > +

> > +     u16 conn_id;

> > +     u32 icid;

> > +     u32 fw_cid;

> > +

> > +     u8 layer_code;

> > +     u8 offl_flags;

> > +     u8 connect_mode;

> > +

> > +     dma_addr_t sq_pbl_addr;

> > +     struct qed_chain r2tq;

> > +     struct qed_chain xhq;

> > +     struct qed_chain uhq;

> > +

> > +     u8 local_mac[6];

> > +     u8 remote_mac[6];

> > +     u8 ip_version;

> > +     u8 ka_max_probe_cnt;

> > +

> > +     u16 vlan_id;

> > +     u16 tcp_flags;

> > +     u32 remote_ip[4];

> > +     u32 local_ip[4];

> > +

> > +     u32 flow_label;

> > +     u32 ka_timeout;

> > +     u32 ka_interval;

> > +     u32 max_rt_time;

> > +

> > +     u8 ttl;

> > +     u8 tos_or_tc;

> > +     u16 remote_port;

> > +     u16 local_port;

> > +     u16 mss;

> > +     u8 rcv_wnd_scale;

> > +     u32 rcv_wnd;

> > +     u32 cwnd;

> > +

> > +     u8 update_flag;

> > +     u8 default_cq;

> > +     u8 abortive_dsconnect;

> > +

> > +     u32 max_seq_size;

> > +     u32 max_recv_pdu_length;

> > +     u32 max_send_pdu_length;

> > +     u32 first_seq_length;

> > +

> > +     u16 physical_q0;

> > +     u16 physical_q1;

> > +

> > +     u16 nvmetcp_cccid_max_range;

> > +     dma_addr_t nvmetcp_cccid_itid_table_addr;

> > +};

> > +

> >   #if IS_ENABLED(CONFIG_QED_NVMETCP)

> >   int qed_nvmetcp_alloc(struct qed_hwfn *p_hwfn);

> >   void qed_nvmetcp_setup(struct qed_hwfn *p_hwfn);

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > index 525159e747a5..60ff3222bf55 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h

> > @@ -101,6 +101,9 @@ union ramrod_data {

> >       struct iscsi_spe_conn_termination iscsi_conn_terminate;

> >

> >       struct nvmetcp_init_ramrod_params nvmetcp_init;

> > +     struct nvmetcp_spe_conn_offload nvmetcp_conn_offload;

> > +     struct nvmetcp_conn_update_ramrod_params nvmetcp_conn_update;

> > +     struct nvmetcp_spe_conn_termination nvmetcp_conn_terminate;

> >

> >       struct vf_start_ramrod_data vf_start;

> >       struct vf_stop_ramrod_data vf_stop;

> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> > index e9ccfc07041d..c8836b71b866 100644

> > --- a/include/linux/qed/nvmetcp_common.h

> > +++ b/include/linux/qed/nvmetcp_common.h

> > @@ -6,6 +6,8 @@

> >

> >   #include "tcp_common.h"

> >

> > +#define NVMETCP_SLOW_PATH_LAYER_CODE (6)

> > +

> >   /* NVMeTCP firmware function init parameters */

> >   struct nvmetcp_spe_func_init {

> >       __le16 half_way_close_timeout;

> > @@ -43,6 +45,10 @@ enum nvmetcp_ramrod_cmd_id {

> >       NVMETCP_RAMROD_CMD_ID_UNUSED = 0,

> >       NVMETCP_RAMROD_CMD_ID_INIT_FUNC = 1,

> >       NVMETCP_RAMROD_CMD_ID_DESTROY_FUNC = 2,

> > +     NVMETCP_RAMROD_CMD_ID_OFFLOAD_CONN = 3,

> > +     NVMETCP_RAMROD_CMD_ID_UPDATE_CONN = 4,

> > +     NVMETCP_RAMROD_CMD_ID_TERMINATION_CONN = 5,

> > +     NVMETCP_RAMROD_CMD_ID_CLEAR_SQ = 6,

> >       MAX_NVMETCP_RAMROD_CMD_ID

> >   };

> >

> > @@ -51,4 +57,141 @@ struct nvmetcp_glbl_queue_entry {

> >       struct regpair reserved;

> >   };

> >

> > +/* NVMeTCP conn level EQEs */

> > +enum nvmetcp_eqe_opcode {

> > +     NVMETCP_EVENT_TYPE_INIT_FUNC = 0, /* Response after init Ramrod */

> > +     NVMETCP_EVENT_TYPE_DESTROY_FUNC, /* Response after destroy Ramrod */

> > +     NVMETCP_EVENT_TYPE_OFFLOAD_CONN,/* Response after option 2 offload Ramrod */

> > +     NVMETCP_EVENT_TYPE_UPDATE_CONN, /* Response after update Ramrod */

> > +     NVMETCP_EVENT_TYPE_CLEAR_SQ, /* Response after clear sq Ramrod */

> > +     NVMETCP_EVENT_TYPE_TERMINATE_CONN, /* Response after termination Ramrod */

> > +     NVMETCP_EVENT_TYPE_RESERVED0,

> > +     NVMETCP_EVENT_TYPE_RESERVED1,

> > +     NVMETCP_EVENT_TYPE_ASYN_CONNECT_COMPLETE, /* Connect completed (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_TERMINATE_DONE, /* Termination completed (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_START_OF_ERROR_TYPES = 10, /* Separate EQs from err EQs */

> > +     NVMETCP_EVENT_TYPE_ASYN_ABORT_RCVD, /* TCP RST packet receive (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_CLOSE_RCVD, /* TCP FIN packet receive (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_SYN_RCVD, /* TCP SYN+ACK packet receive (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_RT_TIME, /* TCP max retransmit time (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_RT_CNT, /* TCP max retransmit count (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_MAX_KA_PROBES_CNT, /* TCP ka probes count (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_ASYN_FIN_WAIT2, /* TCP fin wait 2 (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_NVMETCP_CONN_ERROR, /* NVMeTCP error response (A-syn EQE) */

> > +     NVMETCP_EVENT_TYPE_TCP_CONN_ERROR, /* NVMeTCP error - tcp error (A-syn EQE) */

> > +     MAX_NVMETCP_EQE_OPCODE

> > +};

> > +

> > +struct nvmetcp_conn_offload_section {

> > +     struct regpair cccid_itid_table_addr; /* CCCID to iTID table address */

> > +     __le16 cccid_max_range; /* CCCID max value - used for validation */

> > +     __le16 reserved[3];

> > +};

> > +

> > +/* NVMe TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod */

> > +struct nvmetcp_conn_offload_params {

> > +     struct regpair sq_pbl_addr;

> > +     struct regpair r2tq_pbl_addr;

> > +     struct regpair xhq_pbl_addr;

> > +     struct regpair uhq_pbl_addr;

> > +     __le16 physical_q0;

> > +     __le16 physical_q1;

> > +     u8 flags;

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_MASK 0x1

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TCP_ON_CHIP_1B_SHIFT 0

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_MASK 0x1

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_TARGET_MODE_SHIFT 1

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_MASK 0x1

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESTRICTED_MODE_SHIFT 2

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_MASK 0x1

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_NVMETCP_MODE_SHIFT 3

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_MASK 0xF

> > +#define NVMETCP_CONN_OFFLOAD_PARAMS_RESERVED1_SHIFT 4

> > +     u8 default_cq;

> > +     __le16 reserved0;

> > +     __le32 reserved1;

> > +     __le32 initial_ack;

> > +

> > +     struct nvmetcp_conn_offload_section nvmetcp; /* NVMe/TCP section */

> > +};

> > +

> > +/* NVMe TCP and TCP connection offload params passed by driver to FW in NVMeTCP offload ramrod. */

> > +struct nvmetcp_spe_conn_offload {

> > +     __le16 reserved;

> > +     __le16 conn_id;

> > +     __le32 fw_cid;

> > +     struct nvmetcp_conn_offload_params nvmetcp;

> > +     struct tcp_offload_params_opt2 tcp;

> > +};

> > +

> > +/* NVMeTCP connection update params passed by driver to FW in NVMETCP update ramrod. */

> > +struct nvmetcp_conn_update_ramrod_params {

> > +     __le16 reserved0;

> > +     __le16 conn_id;

> > +     __le32 reserved1;

> > +     u8 flags;

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_HD_EN_SHIFT 0

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_DD_EN_SHIFT 1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED0_SHIFT 2

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED1_DATA_SHIFT 3

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED2_SHIFT 4

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED3_SHIFT 5

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED4_SHIFT 6

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_MASK 0x1

> > +#define NVMETCP_CONN_UPDATE_RAMROD_PARAMS_RESERVED5_SHIFT 7

> > +     u8 reserved3[3];

> > +     __le32 max_seq_size;

> > +     __le32 max_send_pdu_length;

> > +     __le32 max_recv_pdu_length;

> > +     __le32 first_seq_length;

> > +     __le32 reserved4[5];

> > +};

> > +

> > +/* NVMeTCP connection termination request */

> > +struct nvmetcp_spe_conn_termination {

> > +     __le16 reserved0;

> > +     __le16 conn_id;

> > +     __le32 reserved1;

> > +     u8 abortive;

> > +     u8 reserved2[7];

> > +     struct regpair reserved3;

> > +     struct regpair reserved4;

> > +};

> > +

> > +struct nvmetcp_dif_flags {

> > +     u8 flags;

> > +};

> > +

> > +enum nvmetcp_wqe_type {

> > +     NVMETCP_WQE_TYPE_NORMAL,

> > +     NVMETCP_WQE_TYPE_TASK_CLEANUP,

> > +     NVMETCP_WQE_TYPE_MIDDLE_PATH,

> > +     NVMETCP_WQE_TYPE_IC,

> > +     MAX_NVMETCP_WQE_TYPE

> > +};

> > +

> > +struct nvmetcp_wqe {

> > +     __le16 task_id;

> > +     u8 flags;

> > +#define NVMETCP_WQE_WQE_TYPE_MASK 0x7 /* [use nvmetcp_wqe_type] */

> > +#define NVMETCP_WQE_WQE_TYPE_SHIFT 0

> > +#define NVMETCP_WQE_NUM_SGES_MASK 0xF

> > +#define NVMETCP_WQE_NUM_SGES_SHIFT 3

> > +#define NVMETCP_WQE_RESPONSE_MASK 0x1

> > +#define NVMETCP_WQE_RESPONSE_SHIFT 7

> > +     struct nvmetcp_dif_flags prot_flags;

> > +     __le32 contlen_cdbsize;

> > +#define NVMETCP_WQE_CONT_LEN_MASK 0xFFFFFF

> > +#define NVMETCP_WQE_CONT_LEN_SHIFT 0

> > +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_MASK 0xFF

> > +#define NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD_SHIFT 24

> > +};

> > +

> >   #endif /* __NVMETCP_COMMON__ */

> > diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> > index abc1f41862e3..96263e3cfa1e 100644

> > --- a/include/linux/qed/qed_nvmetcp_if.h

> > +++ b/include/linux/qed/qed_nvmetcp_if.h

> > @@ -25,6 +25,50 @@ struct qed_nvmetcp_tid {

> >       u8 *blocks[MAX_TID_BLOCKS_NVMETCP];

> >   };

> >

> > +struct qed_nvmetcp_id_params {

> > +     u8 mac[ETH_ALEN];

> > +     u32 ip[4];

> > +     u16 port;

> > +};

> > +

> > +struct qed_nvmetcp_params_offload {

> > +     /* FW initializations */

> > +     dma_addr_t sq_pbl_addr;

> > +     dma_addr_t nvmetcp_cccid_itid_table_addr;

> > +     u16 nvmetcp_cccid_max_range;

> > +     u8 default_cq;

> > +

> > +     /* Networking and TCP stack initializations */

> > +     struct qed_nvmetcp_id_params src;

> > +     struct qed_nvmetcp_id_params dst;

> > +     u32 ka_timeout;

> > +     u32 ka_interval;

> > +     u32 max_rt_time;

> > +     u32 cwnd;

> > +     u16 mss;

> > +     u16 vlan_id;

> > +     bool timestamp_en;

> > +     bool delayed_ack_en;

> > +     bool tcp_keep_alive_en;

> > +     bool ecn_en;

> > +     u8 ip_version;

> > +     u8 ka_max_probe_cnt;

> > +     u8 ttl;

> > +     u8 tos_or_tc;

> > +     u8 rcv_wnd_scale;

> > +};

> > +

> > +struct qed_nvmetcp_params_update {

> > +     u32 max_io_size;

> > +     u32 max_recv_pdu_length;

> > +     u32 max_send_pdu_length;

> > +

> > +     /* Placeholder: pfv, cpda, hpda */

> > +

> > +     bool hdr_digest_en;

> > +     bool data_digest_en;

> > +};

> > +

> >   struct qed_nvmetcp_cb_ops {

> >       struct qed_common_cb_ops common;

> >   };

> > @@ -48,6 +92,38 @@ struct qed_nvmetcp_cb_ops {

> >    * @stop:           nvmetcp in FW

> >    *                  @param cdev

> >    *                  return 0 on success, otherwise error value.

> > + * @acquire_conn:    acquire a new nvmetcp connection

> > + *                   @param cdev

> > + *                   @param handle - qed will fill handle that should be

> > + *                           used henceforth as identifier of the

> > + *                           connection.

> > + *                   @param p_doorbell - qed will fill the address of the

> > + *                           doorbell.

> > + *                   @return 0 on sucesss, otherwise error value.

> > + * @release_conn:    release a previously acquired nvmetcp connection

> > + *                   @param cdev

> > + *                   @param handle - the connection handle.

> > + *                   @return 0 on success, otherwise error value.

> > + * @offload_conn:    configures an offloaded connection

> > + *                   @param cdev

> > + *                   @param handle - the connection handle.

> > + *                   @param conn_info - the configuration to use for the

> > + *                           offload.

> > + *                   @return 0 on success, otherwise error value.

> > + * @update_conn:     updates an offloaded connection

> > + *                   @param cdev

> > + *                   @param handle - the connection handle.

> > + *                   @param conn_info - the configuration to use for the

> > + *                           offload.

> > + *                   @return 0 on success, otherwise error value.

> > + * @destroy_conn:    stops an offloaded connection

> > + *                   @param cdev

> > + *                   @param handle - the connection handle.

> > + *                   @return 0 on success, otherwise error value.

> > + * @clear_sq:                clear all task in sq

> > + *                   @param cdev

> > + *                   @param handle - the connection handle.

> > + *                   @return 0 on success, otherwise error value.

> >    */

> >   struct qed_nvmetcp_ops {

> >       const struct qed_common_ops *common;

> > @@ -65,6 +141,24 @@ struct qed_nvmetcp_ops {

> >                    void *event_context, nvmetcp_event_cb_t async_event_cb);

> >

> >       int (*stop)(struct qed_dev *cdev);

> > +

> > +     int (*acquire_conn)(struct qed_dev *cdev,

> > +                         u32 *handle,

> > +                         u32 *fw_cid, void __iomem **p_doorbell);

> > +

> > +     int (*release_conn)(struct qed_dev *cdev, u32 handle);

> > +

> > +     int (*offload_conn)(struct qed_dev *cdev,

> > +                         u32 handle,

> > +                         struct qed_nvmetcp_params_offload *conn_info);

> > +

> > +     int (*update_conn)(struct qed_dev *cdev,

> > +                        u32 handle,

> > +                        struct qed_nvmetcp_params_update *conn_info);

> > +

> > +     int (*destroy_conn)(struct qed_dev *cdev, u32 handle, u8 abrt_conn);

> > +

> > +     int (*clear_sq)(struct qed_dev *cdev, u32 handle);

> >   };

> >

> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> >

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:26 p.m. UTC | #18
On 5/1/21 2:11 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > From: Omkar Kulkarni <okulkarni@marvell.com>

> >

> > This patch adds qed NVMeTCP personality in order to support the NVMeTCP

> > qed functionalities and manage the HW device shared resources.

> > The same design is used with Eth (qede), RDMA(qedr), iSCSI (qedi) and

> > FCoE (qedf).

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > ---

> >   drivers/net/ethernet/qlogic/qed/qed.h         |  3 ++

> >   drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 32 ++++++++++++++

> >   drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  1 +

> >   drivers/net/ethernet/qlogic/qed/qed_dev.c     | 44 ++++++++++++++++---

> >   drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  3 +-

> >   drivers/net/ethernet/qlogic/qed/qed_ll2.c     | 31 ++++++++-----

> >   drivers/net/ethernet/qlogic/qed/qed_mcp.c     |  3 ++

> >   drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |  3 +-

> >   drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  5 ++-

> >   .../net/ethernet/qlogic/qed/qed_sp_commands.c |  1 +

> >   10 files changed, 108 insertions(+), 18 deletions(-)

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h

> > index 91d4635009ab..7ae648c4edba 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed.h

> > @@ -200,6 +200,7 @@ enum qed_pci_personality {

> >       QED_PCI_ETH,

> >       QED_PCI_FCOE,

> >       QED_PCI_ISCSI,

> > +     QED_PCI_NVMETCP,

> >       QED_PCI_ETH_ROCE,

> >       QED_PCI_ETH_IWARP,

> >       QED_PCI_ETH_RDMA,

> > @@ -285,6 +286,8 @@ struct qed_hw_info {

> >       ((dev)->hw_info.personality == QED_PCI_FCOE)

> >   #define QED_IS_ISCSI_PERSONALITY(dev)                                       \

> >       ((dev)->hw_info.personality == QED_PCI_ISCSI)

> > +#define QED_IS_NVMETCP_PERSONALITY(dev)                                      \

> > +     ((dev)->hw_info.personality == QED_PCI_NVMETCP)

> >

> So you have a distinct PCI personality for NVMe-oF, but not for the

> protocol? Strange.

> Why don't you have a distinct NVMe-oF protocol ID?

>

> >       /* Resource Allocation scheme results */

> >       u32                             resc_start[QED_MAX_RESC];

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> > index 0a22f8ce9a2c..6cef75723e38 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c

> > @@ -2106,6 +2106,30 @@ int qed_cxt_set_pf_params(struct qed_hwfn *p_hwfn, u32 rdma_tasks)

> >               }

> >               break;

> >       }

> > +     case QED_PCI_NVMETCP:

> > +     {

> > +             struct qed_nvmetcp_pf_params *p_params;

> > +

> > +             p_params = &p_hwfn->pf_params.nvmetcp_pf_params;

> > +

> > +             if (p_params->num_cons && p_params->num_tasks) {

> > +                     qed_cxt_set_proto_cid_count(p_hwfn,

> > +                                                 PROTOCOLID_NVMETCP,

> > +                                                 p_params->num_cons,

> > +                                                 0);

> > +

> > +                     qed_cxt_set_proto_tid_count(p_hwfn,

> > +                                                 PROTOCOLID_NVMETCP,

> > +                                                 QED_CTX_NVMETCP_TID_SEG,

> > +                                                 0,

> > +                                                 p_params->num_tasks,

> > +                                                 true);

> > +             } else {

> > +                     DP_INFO(p_hwfn->cdev,

> > +                             "NvmeTCP personality used without setting params!\n");

> > +             }

> > +             break;

> > +     }

> >       default:

> >               return -EINVAL;

> >       }

> > @@ -2132,6 +2156,10 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,

> >               proto = PROTOCOLID_ISCSI;

> >               seg = QED_CXT_ISCSI_TID_SEG;

> >               break;

> > +     case QED_PCI_NVMETCP:

> > +             proto = PROTOCOLID_NVMETCP;

> > +             seg = QED_CTX_NVMETCP_TID_SEG;

> > +             break;

> >       default:

> >               return -EINVAL;

> >       }

> > @@ -2458,6 +2486,10 @@ int qed_cxt_get_task_ctx(struct qed_hwfn *p_hwfn,

> >               proto = PROTOCOLID_ISCSI;

> >               seg = QED_CXT_ISCSI_TID_SEG;

> >               break;

> > +     case QED_PCI_NVMETCP:

> > +             proto = PROTOCOLID_NVMETCP;

> > +             seg = QED_CTX_NVMETCP_TID_SEG;

> > +             break;

> >       default:

> >               return -EINVAL;

> >       }

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> > index 056e79620a0e..8f1a77cb33f6 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h

> > @@ -51,6 +51,7 @@ int qed_cxt_get_tid_mem_info(struct qed_hwfn *p_hwfn,

> >                            struct qed_tid_mem *p_info);

> >

> >   #define QED_CXT_ISCSI_TID_SEG       PROTOCOLID_ISCSI

> > +#define QED_CTX_NVMETCP_TID_SEG PROTOCOLID_NVMETCP

> >   #define QED_CXT_ROCE_TID_SEG        PROTOCOLID_ROCE

> >   #define QED_CXT_FCOE_TID_SEG        PROTOCOLID_FCOE

> >   enum qed_cxt_elem_type {

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c

> > index d2f5855b2ea7..d3f8cc42d07e 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c

> > @@ -37,6 +37,7 @@

> >   #include "qed_sriov.h"

> >   #include "qed_vf.h"

> >   #include "qed_rdma.h"

> > +#include "qed_nvmetcp.h"

> >

> >   static DEFINE_SPINLOCK(qm_lock);

> >

> > @@ -667,7 +668,8 @@ qed_llh_set_engine_affin(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)

> >       }

> >

> >       /* Storage PF is bound to a single engine while L2 PF uses both */

> > -     if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn))

> > +     if (QED_IS_FCOE_PERSONALITY(p_hwfn) || QED_IS_ISCSI_PERSONALITY(p_hwfn) ||

> > +         QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> >               eng = cdev->fir_affin ? QED_ENG1 : QED_ENG0;

> >       else                    /* L2_PERSONALITY */

> >               eng = QED_BOTH_ENG;

> > @@ -1164,6 +1166,9 @@ void qed_llh_remove_mac_filter(struct qed_dev *cdev,

> >       if (!test_bit(QED_MF_LLH_MAC_CLSS, &cdev->mf_bits))

> >               goto out;

> >

> > +     if (QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> > +             return;

> > +

> >       ether_addr_copy(filter.mac.addr, mac_addr);

> >       rc = qed_llh_shadow_remove_filter(cdev, ppfid, &filter, &filter_idx,

> >                                         &ref_cnt);

> > @@ -1381,6 +1386,11 @@ void qed_resc_free(struct qed_dev *cdev)

> >                       qed_ooo_free(p_hwfn);

> >               }

> >

> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> > +                     qed_nvmetcp_free(p_hwfn);

> > +                     qed_ooo_free(p_hwfn);

> > +             }

> > +

> >               if (QED_IS_RDMA_PERSONALITY(p_hwfn) && rdma_info) {

> >                       qed_spq_unregister_async_cb(p_hwfn, rdma_info->proto);

> >                       qed_rdma_info_free(p_hwfn);

> > @@ -1423,6 +1433,7 @@ static u32 qed_get_pq_flags(struct qed_hwfn *p_hwfn)

> >               flags |= PQ_FLAGS_OFLD;

> >               break;

> >       case QED_PCI_ISCSI:

> > +     case QED_PCI_NVMETCP:

> >               flags |= PQ_FLAGS_ACK | PQ_FLAGS_OOO | PQ_FLAGS_OFLD;

> >               break;

> >       case QED_PCI_ETH_ROCE:

> > @@ -2269,6 +2280,12 @@ int qed_resc_alloc(struct qed_dev *cdev)

> >                                                       PROTOCOLID_ISCSI,

> >                                                       NULL);

> >                       n_eqes += 2 * num_cons;

> > +             } else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> > +                     num_cons =

> > +                         qed_cxt_get_proto_cid_count(p_hwfn,

> > +                                                     PROTOCOLID_NVMETCP,

> > +                                                     NULL);

> > +                     n_eqes += 2 * num_cons;

> >               }

> >

> >               if (n_eqes > 0xFFFF) {

> > @@ -2313,6 +2330,15 @@ int qed_resc_alloc(struct qed_dev *cdev)

> >                               goto alloc_err;

> >               }

> >

> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> > +                     rc = qed_nvmetcp_alloc(p_hwfn);

> > +                     if (rc)

> > +                             goto alloc_err;

> > +                     rc = qed_ooo_alloc(p_hwfn);

> > +                     if (rc)

> > +                             goto alloc_err;

> > +             }

> > +

> >               if (QED_IS_RDMA_PERSONALITY(p_hwfn)) {

> >                       rc = qed_rdma_info_alloc(p_hwfn);

> >                       if (rc)

> > @@ -2393,6 +2419,11 @@ void qed_resc_setup(struct qed_dev *cdev)

> >                       qed_iscsi_setup(p_hwfn);

> >                       qed_ooo_setup(p_hwfn);

> >               }

> > +

> > +             if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP) {

> > +                     qed_nvmetcp_setup(p_hwfn);

> > +                     qed_ooo_setup(p_hwfn);

> > +             }

> >       }

> >   }

> >

> > @@ -2854,7 +2885,8 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,

> >

> >       /* Protocol Configuration */

> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_TCP_RT_OFFSET,

> > -                  (p_hwfn->hw_info.personality == QED_PCI_ISCSI) ? 1 : 0);

> > +                  ((p_hwfn->hw_info.personality == QED_PCI_ISCSI) ||

> > +                      (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)) ? 1 : 0);

> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_FCOE_RT_OFFSET,

> >                    (p_hwfn->hw_info.personality == QED_PCI_FCOE) ? 1 : 0);

> >       STORE_RT_REG(p_hwfn, PRS_REG_SEARCH_ROCE_RT_OFFSET, 0);

> > @@ -3531,7 +3563,7 @@ static void qed_hw_set_feat(struct qed_hwfn *p_hwfn)

> >                                              RESC_NUM(p_hwfn,

> >                                                       QED_CMDQS_CQS));

> >

> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> >               feat_num[QED_ISCSI_CQ] = min_t(u32, sb_cnt.cnt,

> >                                              RESC_NUM(p_hwfn,

> >                                                       QED_CMDQS_CQS));

> > @@ -3734,7 +3766,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,

> >               break;

> >       case QED_BDQ:

> >               if (p_hwfn->hw_info.personality != QED_PCI_ISCSI &&

> > -                 p_hwfn->hw_info.personality != QED_PCI_FCOE)

> > +                 p_hwfn->hw_info.personality != QED_PCI_FCOE &&

> > +                     p_hwfn->hw_info.personality != QED_PCI_NVMETCP)

> >                       *p_resc_num = 0;

> >               else

> >                       *p_resc_num = 1;

> > @@ -3755,7 +3788,8 @@ int qed_hw_get_dflt_resc(struct qed_hwfn *p_hwfn,

> >                       *p_resc_start = 0;

> >               else if (p_hwfn->cdev->num_ports_in_engine == 4)

> >                       *p_resc_start = p_hwfn->port_id;

> > -             else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)

> > +             else if (p_hwfn->hw_info.personality == QED_PCI_ISCSI ||

> > +                      p_hwfn->hw_info.personality == QED_PCI_NVMETCP)

> >                       *p_resc_start = p_hwfn->port_id;

> >               else if (p_hwfn->hw_info.personality == QED_PCI_FCOE)

> >                       *p_resc_start = p_hwfn->port_id + 2;

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > index 24472f6a83c2..9c9ec8f53ef8 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h

> > @@ -12148,7 +12148,8 @@ struct public_func {

> >   #define FUNC_MF_CFG_PROTOCOL_ISCSI              0x00000010

> >   #define FUNC_MF_CFG_PROTOCOL_FCOE               0x00000020

> >   #define FUNC_MF_CFG_PROTOCOL_ROCE               0x00000030

> > -#define FUNC_MF_CFG_PROTOCOL_MAX     0x00000030

> > +#define FUNC_MF_CFG_PROTOCOL_NVMETCP    0x00000040

> > +#define FUNC_MF_CFG_PROTOCOL_MAX     0x00000040

> >

> >   #define FUNC_MF_CFG_MIN_BW_MASK             0x0000ff00

> >   #define FUNC_MF_CFG_MIN_BW_SHIFT    8

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> > index 49783f365079..88bfcdcd4a4c 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c

> > @@ -960,7 +960,8 @@ static int qed_sp_ll2_rx_queue_start(struct qed_hwfn *p_hwfn,

> >

> >       if (test_bit(QED_MF_LL2_NON_UNICAST, &p_hwfn->cdev->mf_bits) &&

> >           p_ramrod->main_func_queue && conn_type != QED_LL2_TYPE_ROCE &&

> > -         conn_type != QED_LL2_TYPE_IWARP) {

> > +         conn_type != QED_LL2_TYPE_IWARP &&

> > +             (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))) {

> >               p_ramrod->mf_si_bcast_accept_all = 1;

> >               p_ramrod->mf_si_mcast_accept_all = 1;

> >       } else {

> > @@ -1049,6 +1050,8 @@ static int qed_sp_ll2_tx_queue_start(struct qed_hwfn *p_hwfn,

> >       case QED_LL2_TYPE_OOO:

> >               if (p_hwfn->hw_info.personality == QED_PCI_ISCSI)

> >                       p_ramrod->conn_type = PROTOCOLID_ISCSI;

> > +             else if (p_hwfn->hw_info.personality == QED_PCI_NVMETCP)

> > +                     p_ramrod->conn_type = PROTOCOLID_NVMETCP;

> >               else

> >                       p_ramrod->conn_type = PROTOCOLID_IWARP;

> >               break;

> > @@ -1634,7 +1637,8 @@ int qed_ll2_establish_connection(void *cxt, u8 connection_handle)

> >       if (rc)

> >               goto out;

> >

> > -     if (!QED_IS_RDMA_PERSONALITY(p_hwfn))

> > +     if (!QED_IS_RDMA_PERSONALITY(p_hwfn) &&

> > +         !QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> >               qed_wr(p_hwfn, p_ptt, PRS_REG_USE_LIGHT_L2, 1);

> >

> >       qed_ll2_establish_connection_ooo(p_hwfn, p_ll2_conn);

> > @@ -2376,7 +2380,8 @@ static int qed_ll2_start_ooo(struct qed_hwfn *p_hwfn,

> >   static bool qed_ll2_is_storage_eng1(struct qed_dev *cdev)

> >   {

> >       return (QED_IS_FCOE_PERSONALITY(QED_LEADING_HWFN(cdev)) ||

> > -             QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev))) &&

> > +             QED_IS_ISCSI_PERSONALITY(QED_LEADING_HWFN(cdev)) ||

> > +             QED_IS_NVMETCP_PERSONALITY(QED_LEADING_HWFN(cdev))) &&

> >               (QED_AFFIN_HWFN(cdev) != QED_LEADING_HWFN(cdev));

> >   }

> >

> > @@ -2402,11 +2407,13 @@ static int qed_ll2_stop(struct qed_dev *cdev)

> >

> >       if (cdev->ll2->handle == QED_LL2_UNUSED_HANDLE)

> >               return 0;

> > +     if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> > +             qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);

> >

> >       qed_llh_remove_mac_filter(cdev, 0, cdev->ll2_mac_address);

> >       eth_zero_addr(cdev->ll2_mac_address);

> >

> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> >               qed_ll2_stop_ooo(p_hwfn);

> >

> >       /* In CMT mode, LL2 is always started on engine 0 for a storage PF */

> > @@ -2442,6 +2449,7 @@ static int __qed_ll2_start(struct qed_hwfn *p_hwfn,

> >               conn_type = QED_LL2_TYPE_FCOE;

> >               break;

> >       case QED_PCI_ISCSI:

> > +     case QED_PCI_NVMETCP:

> >               conn_type = QED_LL2_TYPE_ISCSI;

> >               break;

> >       case QED_PCI_ETH_ROCE:

> > @@ -2567,7 +2575,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

> >               }

> >       }

> >

> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn)) {

> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {

> >               DP_VERBOSE(cdev, QED_MSG_STORAGE, "Starting OOO LL2 queue\n");

> >               rc = qed_ll2_start_ooo(p_hwfn, params);

> >               if (rc) {

> > @@ -2576,10 +2584,13 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

> >               }

> >       }

> >

> > -     rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);

> > -     if (rc) {

> > -             DP_NOTICE(cdev, "Failed to add an LLH filter\n");

> > -             goto err3;

> > +     if (!QED_IS_NVMETCP_PERSONALITY(p_hwfn)) {

> > +             rc = qed_llh_add_mac_filter(cdev, 0, params->ll2_mac_address);

> > +             if (rc) {

> > +                     DP_NOTICE(cdev, "Failed to add an LLH filter\n");

> > +                     goto err3;

> > +             }

> > +

> >       }

> >

> >       ether_addr_copy(cdev->ll2_mac_address, params->ll2_mac_address);

> > @@ -2587,7 +2598,7 @@ static int qed_ll2_start(struct qed_dev *cdev, struct qed_ll2_params *params)

> >       return 0;

> >

> >   err3:

> > -     if (QED_IS_ISCSI_PERSONALITY(p_hwfn))

> > +     if (QED_IS_ISCSI_PERSONALITY(p_hwfn) || QED_IS_NVMETCP_PERSONALITY(p_hwfn))

> >               qed_ll2_stop_ooo(p_hwfn);

> >   err2:

> >       if (b_is_storage_eng1)

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> > index cd882c453394..4387292c37e2 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c

> > @@ -2446,6 +2446,9 @@ qed_mcp_get_shmem_proto(struct qed_hwfn *p_hwfn,

> >       case FUNC_MF_CFG_PROTOCOL_ISCSI:

> >               *p_proto = QED_PCI_ISCSI;

> >               break;

> > +     case FUNC_MF_CFG_PROTOCOL_NVMETCP:

> > +             *p_proto = QED_PCI_NVMETCP;

> > +             break;

> >       case FUNC_MF_CFG_PROTOCOL_FCOE:

> >               *p_proto = QED_PCI_FCOE;

> >               break;

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> > index 3e3192a3ad9b..6190adf965bc 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c

> > @@ -1306,7 +1306,8 @@ int qed_mfw_process_tlv_req(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt)

> >       }

> >

> >       if ((tlv_group & QED_MFW_TLV_ISCSI) &&

> > -         p_hwfn->hw_info.personality != QED_PCI_ISCSI) {

> > +         p_hwfn->hw_info.personality != QED_PCI_ISCSI &&

> > +             p_hwfn->hw_info.personality != QED_PCI_NVMETCP) {

> >               DP_VERBOSE(p_hwfn, QED_MSG_SP,

> >                          "Skipping iSCSI TLVs for non-iSCSI function\n");

> >               tlv_group &= ~QED_MFW_TLV_ISCSI;

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ooo.c b/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> > index 88353aa404dc..d37bb2463f98 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ooo.c

> > @@ -16,7 +16,7 @@

> >   #include "qed_ll2.h"

> >   #include "qed_ooo.h"

> >   #include "qed_cxt.h"

> > -

> > +#include "qed_nvmetcp.h"

> >   static struct qed_ooo_archipelago

> >   *qed_ooo_seek_archipelago(struct qed_hwfn *p_hwfn,

> >                         struct qed_ooo_info

> > @@ -85,6 +85,9 @@ int qed_ooo_alloc(struct qed_hwfn *p_hwfn)

> >       case QED_PCI_ISCSI:

> >               proto = PROTOCOLID_ISCSI;

> >               break;

> > +     case QED_PCI_NVMETCP:

> > +             proto = PROTOCOLID_NVMETCP;

> > +             break;

> >       case QED_PCI_ETH_RDMA:

> >       case QED_PCI_ETH_IWARP:

> >               proto = PROTOCOLID_IWARP;

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> > index aa71adcf31ee..60b3876387a9 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c

> > @@ -385,6 +385,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,

> >               p_ramrod->personality = PERSONALITY_FCOE;

> >               break;

> >       case QED_PCI_ISCSI:

> > +     case QED_PCI_NVMETCP:

> >               p_ramrod->personality = PERSONALITY_ISCSI;

> >               break;

> >       case QED_PCI_ETH_ROCE:

> >

> As indicated, I do find this mix of 'nvmetcp is nearly iscsi' a bit

> strange. I would have preferred to have distinct types for nvmetcp.

>


PERSONALITY_ determines the FW resource layout, which is the same for iSCSI and
NVMeTCP. I will change PERSONALITY_ISCSI to PERSONALITY_TCP_ULP.

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:44 p.m. UTC | #19
On 5/2/21 2:26 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > From: Nikolay Assa <nassa@marvell.com>

> >

> > This patch introduces APIs which the NVMeTCP Offload device (qedn)

> > will use through the paired net-device (qede).

> > It includes APIs for:

> > - ipv4/ipv6 routing

> > - get VLAN from net-device

> > - TCP ports reservation

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Nikolay Assa <nassa@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   .../qlogic/qed/qed_nvmetcp_ip_services.c      | 239 ++++++++++++++++++

> >   .../linux/qed/qed_nvmetcp_ip_services_if.h    |  29 +++

> >   2 files changed, 268 insertions(+)

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> >   create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> > new file mode 100644

> > index 000000000000..2904b1a0830a

> > --- /dev/null

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c

> > @@ -0,0 +1,239 @@

> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> > +/*

> > + * Copyright 2021 Marvell. All rights reserved.

> > + */

> > +

> > +#include <linux/types.h>

> > +#include <asm/byteorder.h>

> > +#include <asm/param.h>

> > +#include <linux/delay.h>

> > +#include <linux/pci.h>

> > +#include <linux/dma-mapping.h>

> > +#include <linux/etherdevice.h>

> > +#include <linux/kernel.h>

> > +#include <linux/stddef.h>

> > +#include <linux/errno.h>

> > +

> > +#include <net/tcp.h>

> > +

> > +#include <linux/qed/qed_nvmetcp_ip_services_if.h>

> > +

> > +#define QED_IP_RESOL_TIMEOUT  4

> > +

> > +int qed_route_ipv4(struct sockaddr_storage *local_addr,

> > +                struct sockaddr_storage *remote_addr,

> > +                struct sockaddr *hardware_address,

> > +                struct net_device **ndev)

> > +{

> > +     struct neighbour *neigh = NULL;

> > +     __be32 *loc_ip, *rem_ip;

> > +     struct rtable *rt;

> > +     int rc = -ENXIO;

> > +     int retry;

> > +

> > +     loc_ip = &((struct sockaddr_in *)local_addr)->sin_addr.s_addr;

> > +     rem_ip = &((struct sockaddr_in *)remote_addr)->sin_addr.s_addr;

> > +     *ndev = NULL;

> > +     rt = ip_route_output(&init_net, *rem_ip, *loc_ip, 0/*tos*/, 0/*oif*/);

> > +     if (IS_ERR(rt)) {

> > +             pr_err("lookup route failed\n");

> > +             rc = PTR_ERR(rt);

> > +             goto return_err;

> > +     }

> > +

> > +     neigh = dst_neigh_lookup(&rt->dst, rem_ip);

> > +     if (!neigh) {

> > +             rc = -ENOMEM;

> > +             ip_rt_put(rt);

> > +             goto return_err;

> > +     }

> > +

> > +     *ndev = rt->dst.dev;

> > +     ip_rt_put(rt);

> > +

> > +     /* If not resolved, kick-off state machine towards resolution */

> > +     if (!(neigh->nud_state & NUD_VALID))

> > +             neigh_event_send(neigh, NULL);

> > +

> > +     /* query neighbor until resolved or timeout */

> > +     retry = QED_IP_RESOL_TIMEOUT;

> > +     while (!(neigh->nud_state & NUD_VALID) && retry > 0) {

> > +             msleep(1000);

> > +             retry--;

> > +     }

> > +

> > +     if (neigh->nud_state & NUD_VALID) {

> > +             /* copy resolved MAC address */

> > +             neigh_ha_snapshot(hardware_address->sa_data, neigh, *ndev);

> > +

> > +             hardware_address->sa_family = (*ndev)->type;

> > +             rc = 0;

> > +     }

> > +

> > +     neigh_release(neigh);

> > +     if (!(*loc_ip)) {

> > +             *loc_ip = inet_select_addr(*ndev, *rem_ip, RT_SCOPE_UNIVERSE);

> > +             local_addr->ss_family = AF_INET;

> > +     }

> > +

> > +return_err:

> > +

> > +     return rc;

> > +}

> > +EXPORT_SYMBOL(qed_route_ipv4);

> > +

> > +int qed_route_ipv6(struct sockaddr_storage *local_addr,

> > +                struct sockaddr_storage *remote_addr,

> > +                struct sockaddr *hardware_address,

> > +                struct net_device **ndev)

> > +{

> > +     struct neighbour *neigh = NULL;

> > +     struct dst_entry *dst;

> > +     struct flowi6 fl6;

> > +     int rc = -ENXIO;

> > +     int retry;

> > +

> > +     memset(&fl6, 0, sizeof(fl6));

> > +     fl6.saddr = ((struct sockaddr_in6 *)local_addr)->sin6_addr;

> > +     fl6.daddr = ((struct sockaddr_in6 *)remote_addr)->sin6_addr;

> > +

> > +     dst = ip6_route_output(&init_net, NULL, &fl6);

> > +     if (!dst || dst->error) {

> > +             if (dst) {

> > +                     dst_release(dst);

> > +                     pr_err("lookup route failed %d\n", dst->error);

> > +             }

> > +

> > +             goto out;

> > +     }

> > +

> > +     neigh = dst_neigh_lookup(dst, &fl6.daddr);

> > +     if (neigh) {

> > +             *ndev = ip6_dst_idev(dst)->dev;

> > +

> > +             /* If not resolved, kick-off state machine towards resolution */

> > +             if (!(neigh->nud_state & NUD_VALID))

> > +                     neigh_event_send(neigh, NULL);

> > +

> > +             /* query neighbor until resolved or timeout */

> > +             retry = QED_IP_RESOL_TIMEOUT;

> > +             while (!(neigh->nud_state & NUD_VALID) && retry > 0) {

> > +                     msleep(1000);

> > +                     retry--;

> > +             }

> > +

> > +             if (neigh->nud_state & NUD_VALID) {

> > +                     neigh_ha_snapshot((u8 *)hardware_address->sa_data, neigh, *ndev);

> > +

> > +                     hardware_address->sa_family = (*ndev)->type;

> > +                     rc = 0;

> > +             }

> > +

> > +             neigh_release(neigh);

> > +

> > +             if (ipv6_addr_any(&fl6.saddr)) {

> > +                     if (ipv6_dev_get_saddr(dev_net(*ndev), *ndev,

> > +                                            &fl6.daddr, 0, &fl6.saddr)) {

> > +                             pr_err("Unable to find source IP address\n");

> > +                             goto out;

> > +                     }

> > +

> > +                     local_addr->ss_family = AF_INET6;

> > +                     ((struct sockaddr_in6 *)local_addr)->sin6_addr =

> > +                                                             fl6.saddr;

> > +             }

> > +     }

> > +

> > +     dst_release(dst);

> > +

> > +out:

> > +

> > +     return rc;

> > +}

> > +EXPORT_SYMBOL(qed_route_ipv6);

> > +

> > +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id)

> > +{

> > +     if (is_vlan_dev(*ndev)) {

> > +             *vlan_id = vlan_dev_vlan_id(*ndev);

> > +             *ndev = vlan_dev_real_dev(*ndev);

> > +     }

> > +}

> > +EXPORT_SYMBOL(qed_vlan_get_ndev);

> > +

> > +struct pci_dev *qed_validate_ndev(struct net_device *ndev)

> > +{

> > +     struct pci_dev *pdev = NULL;

> > +     struct net_device *upper;

> > +

> > +     for_each_pci_dev(pdev) {

> > +             if (pdev && pdev->driver &&

> > +                 !strcmp(pdev->driver->name, "qede")) {

> > +                     upper = pci_get_drvdata(pdev);

> > +                     if (upper->ifindex == ndev->ifindex)

> > +                             return pdev;

> > +             }

> > +     }

> > +

> > +     return NULL;

> > +}

> > +EXPORT_SYMBOL(qed_validate_ndev);

> > +

> > +__be16 qed_get_in_port(struct sockaddr_storage *sa)

> > +{

> > +     return sa->ss_family == AF_INET

> > +             ? ((struct sockaddr_in *)sa)->sin_port

> > +             : ((struct sockaddr_in6 *)sa)->sin6_port;

> > +}

> > +EXPORT_SYMBOL(qed_get_in_port);

> > +

> > +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,

> > +                    struct socket **sock, u16 *port)

> > +{

> > +     struct sockaddr_storage sa;

> > +     int rc = 0;

> > +

> > +     rc = sock_create(local_ip_addr.ss_family, SOCK_STREAM, IPPROTO_TCP, sock);

> > +     if (rc) {

> > +             pr_warn("failed to create socket: %d\n", rc);

> > +             goto err;

> > +     }

> > +

> > +     (*sock)->sk->sk_allocation = GFP_KERNEL;

> > +     sk_set_memalloc((*sock)->sk);

> > +

> > +     rc = kernel_bind(*sock, (struct sockaddr *)&local_ip_addr,

> > +                      sizeof(local_ip_addr));

> > +

> > +     if (rc) {

> > +             pr_warn("failed to bind socket: %d\n", rc);

> > +             goto err_sock;

> > +     }

> > +

> > +     rc = kernel_getsockname(*sock, (struct sockaddr *)&sa);

> > +     if (rc < 0) {

> > +             pr_warn("getsockname() failed: %d\n", rc);

> > +             goto err_sock;

> > +     }

> > +

> > +     *port = ntohs(qed_get_in_port(&sa));

> > +

> > +     return 0;

> > +

> > +err_sock:

> > +     sock_release(*sock);

> > +     sock = NULL;

> > +err:

> > +

> > +     return rc;

> > +}

> > +EXPORT_SYMBOL(qed_fetch_tcp_port);

> > +

> > +void qed_return_tcp_port(struct socket *sock)

> > +{

> > +     if (sock && sock->sk) {

> > +             tcp_set_state(sock->sk, TCP_CLOSE);

> > +             sock_release(sock);

> > +     }

> > +}

> > +EXPORT_SYMBOL(qed_return_tcp_port);

> > diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h

> > new file mode 100644

> > index 000000000000..3604aee53796

> > --- /dev/null

> > +++ b/include/linux/qed/qed_nvmetcp_ip_services_if.h

> > @@ -0,0 +1,29 @@

> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> > +/*

> > + * Copyright 2021 Marvell. All rights reserved.

> > + */

> > +

> > +#ifndef _QED_IP_SERVICES_IF_H

> > +#define _QED_IP_SERVICES_IF_H

> > +

> > +#include <linux/types.h>

> > +#include <net/route.h>

> > +#include <net/ip6_route.h>

> > +#include <linux/inetdevice.h>

> > +

> > +int qed_route_ipv4(struct sockaddr_storage *local_addr,

> > +                struct sockaddr_storage *remote_addr,

> > +                struct sockaddr *hardware_address,

> > +                struct net_device **ndev);

> > +int qed_route_ipv6(struct sockaddr_storage *local_addr,

> > +                struct sockaddr_storage *remote_addr,

> > +                struct sockaddr *hardware_address,

> > +                struct net_device **ndev);

> > +void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id);

> > +struct pci_dev *qed_validate_ndev(struct net_device *ndev);

> > +void qed_return_tcp_port(struct socket *sock);

> > +int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr,

> > +                    struct socket **sock, u16 *port);

> > +__be16 qed_get_in_port(struct sockaddr_storage *sa);

> > +

> > +#endif /* _QED_IP_SERVICES_IF_H */

> >

> Reviewed-by: Hannes Reinecke <hare@suse.de>

>


Thanks.

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:46 p.m. UTC | #20
On 5/1/21 3:18 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch will present the structure for the NVMeTCP offload common

> > layer driver. This module is added under "drivers/nvme/host/" and future

> > offload drivers which will register to it will be placed under

> > "drivers/nvme/hw".

> > This new driver will be enabled by the Kconfig "NVM Express over Fabrics

> > TCP offload commmon layer".

> > In order to support the new transport type, for host mode, no change is

> > needed.

> >

> > Each new vendor-specific offload driver will register to this ULP during

> > its probe function, by filling out the nvme_tcp_ofld_dev->ops and

> > nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev

> > with the initialized struct.

> >

> > The internal implementation:

> > - tcp-offload.h:

> >    Includes all common structs and ops to be used and shared by offload

> >    drivers.

> >

> > - tcp-offload.c:

> >    Includes the init function which registers as a NVMf transport just

> >    like any other transport.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Dean Balandin <dbalandin@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/host/Kconfig       |  16 +++

> >   drivers/nvme/host/Makefile      |   3 +

> >   drivers/nvme/host/tcp-offload.c | 126 +++++++++++++++++++

> >   drivers/nvme/host/tcp-offload.h | 206 ++++++++++++++++++++++++++++++++

> >   4 files changed, 351 insertions(+)

> >   create mode 100644 drivers/nvme/host/tcp-offload.c

> >   create mode 100644 drivers/nvme/host/tcp-offload.h

> >

> It will be tricky to select the correct transport eg when traversing the

> discovery log page; the discovery log page only knows about 'tcp' (not

> 'tcp_offload'), so the offload won't be picked up.

> But that can we worked on / fixed later on, as it's arguably a policy

> decision.


I agree that we should improve the policy decision and allow additional
capabilities and it may be discussed as a new NVMe TPAR.

>

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 3, 2021, 3:52 p.m. UTC | #21
On 5/1/21 7:29 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > From: Arie Gershberg <agershberg@marvell.com>

> >

> > In this patch, we implement controller level error handling and recovery.

> > Upon an error discovered by the ULP or reset controller initiated by the

> > nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller

> > recovery which includes teardown and re-connect of all queues.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Arie Gershberg <agershberg@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/host/tcp-offload.c | 138 +++++++++++++++++++++++++++++++-

> >   drivers/nvme/host/tcp-offload.h |   1 +

> >   2 files changed, 137 insertions(+), 2 deletions(-)

> >

> > diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c

> > index 59e1955e02ec..9082b11c133f 100644

> > --- a/drivers/nvme/host/tcp-offload.c

> > +++ b/drivers/nvme/host/tcp-offload.c

> > @@ -74,6 +74,23 @@ void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev)

> >   }

> >   EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);

> >

> > +/**

> > + * nvme_tcp_ofld_error_recovery() - NVMeTCP Offload Library error recovery.

> > + * function.

> > + * @nctrl:   NVMe controller instance to change to resetting.

> > + *

> > + * API function that change the controller state to resseting.

> > + * Part of the overall controller reset sequence.

> > + */

> > +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl)

> > +{

> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_RESETTING))

> > +             return;

> > +

> > +     queue_work(nvme_reset_wq, &to_tcp_ofld_ctrl(nctrl)->err_work);

> > +}

> > +EXPORT_SYMBOL_GPL(nvme_tcp_ofld_error_recovery);

> > +

> >   /**

> >    * nvme_tcp_ofld_report_queue_err() - NVMeTCP Offload report error event

> >    * callback function. Pointed to by nvme_tcp_ofld_queue->report_err.

> > @@ -84,7 +101,8 @@ EXPORT_SYMBOL_GPL(nvme_tcp_ofld_unregister_dev);

> >    */

> >   int nvme_tcp_ofld_report_queue_err(struct nvme_tcp_ofld_queue *queue)

> >   {

> > -     /* Placeholder - invoke error recovery flow */

> > +     pr_err("nvme-tcp-offload queue error\n");

> > +     nvme_tcp_ofld_error_recovery(&queue->ctrl->nctrl);

> >

> >       return 0;

> >   }

> > @@ -296,6 +314,28 @@ nvme_tcp_ofld_configure_io_queues(struct nvme_ctrl *nctrl, bool new)

> >       return rc;

> >   }

> >

> > +static void nvme_tcp_ofld_reconnect_or_remove(struct nvme_ctrl *nctrl)

> > +{

> > +     /* If we are resetting/deleting then do nothing */

> > +     if (nctrl->state != NVME_CTRL_CONNECTING) {

> > +             WARN_ON_ONCE(nctrl->state == NVME_CTRL_NEW ||

> > +                          nctrl->state == NVME_CTRL_LIVE);

> > +

> > +             return;

> > +     }

> > +

> > +     if (nvmf_should_reconnect(nctrl)) {

> > +             dev_info(nctrl->device, "Reconnecting in %d seconds...\n",

> > +                      nctrl->opts->reconnect_delay);

> > +             queue_delayed_work(nvme_wq,

> > +                                &to_tcp_ofld_ctrl(nctrl)->connect_work,

> > +                                nctrl->opts->reconnect_delay * HZ);

> > +     } else {

> > +             dev_info(nctrl->device, "Removing controller...\n");

> > +             nvme_delete_ctrl(nctrl);

> > +     }

> > +}

> > +

> >   static int nvme_tcp_ofld_setup_ctrl(struct nvme_ctrl *nctrl, bool new)

> >   {

> >       struct nvmf_ctrl_options *opts = nctrl->opts;

> > @@ -407,10 +447,68 @@ nvme_tcp_ofld_teardown_io_queues(struct nvme_ctrl *nctrl, bool remove)

> >       /* Placeholder - teardown_io_queues */

> >   }

> >

> > +static void nvme_tcp_ofld_reconnect_ctrl_work(struct work_struct *work)

> > +{

> > +     struct nvme_tcp_ofld_ctrl *ctrl =

> > +                             container_of(to_delayed_work(work),

> > +                                          struct nvme_tcp_ofld_ctrl,

> > +                                          connect_work);

> > +     struct nvme_ctrl *nctrl = &ctrl->nctrl;

> > +

> > +     ++nctrl->nr_reconnects;

> > +

> > +     if (ctrl->dev->ops->setup_ctrl(ctrl, false))

> > +             goto requeue;

> > +

> > +     if (nvme_tcp_ofld_setup_ctrl(nctrl, false))

> > +             goto release_and_requeue;

> > +

> > +     dev_info(nctrl->device, "Successfully reconnected (%d attempt)\n",

> > +              nctrl->nr_reconnects);

> > +

> > +     nctrl->nr_reconnects = 0;

> > +

> > +     return;

> > +

> > +release_and_requeue:

> > +     ctrl->dev->ops->release_ctrl(ctrl);

> > +requeue:

> > +     dev_info(nctrl->device, "Failed reconnect attempt %d\n",

> > +              nctrl->nr_reconnects);

> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);

> > +}

> > +

> > +static void nvme_tcp_ofld_error_recovery_work(struct work_struct *work)

> > +{

> > +     struct nvme_tcp_ofld_ctrl *ctrl =

> > +             container_of(work, struct nvme_tcp_ofld_ctrl, err_work);

> > +     struct nvme_ctrl *nctrl = &ctrl->nctrl;

> > +

> > +     nvme_stop_keep_alive(nctrl);

> > +     nvme_tcp_ofld_teardown_io_queues(nctrl, false);

> > +     /* unquiesce to fail fast pending requests */

> > +     nvme_start_queues(nctrl);

> > +     nvme_tcp_ofld_teardown_admin_queue(nctrl, false);

> > +     blk_mq_unquiesce_queue(nctrl->admin_q);

> > +

> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {

> > +             /* state change failure is ok if we started nctrl delete */

> > +             WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&

> > +                          nctrl->state != NVME_CTRL_DELETING_NOIO);

> > +

> > +             return;

> > +     }

> > +

> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);

> > +}

> > +

> >   static void

> >   nvme_tcp_ofld_teardown_ctrl(struct nvme_ctrl *nctrl, bool shutdown)

> >   {

> > -     /* Placeholder - err_work and connect_work */

> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);

> > +

> > +     cancel_work_sync(&ctrl->err_work);

> > +     cancel_delayed_work_sync(&ctrl->connect_work);

> >       nvme_tcp_ofld_teardown_io_queues(nctrl, shutdown);

> >       blk_mq_quiesce_queue(nctrl->admin_q);

> >       if (shutdown)

> > @@ -425,6 +523,38 @@ static void nvme_tcp_ofld_delete_ctrl(struct nvme_ctrl *nctrl)

> >       nvme_tcp_ofld_teardown_ctrl(nctrl, true);

> >   }

> >

> > +static void nvme_tcp_ofld_reset_ctrl_work(struct work_struct *work)

> > +{

> > +     struct nvme_ctrl *nctrl =

> > +             container_of(work, struct nvme_ctrl, reset_work);

> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(nctrl);

> > +

> > +     nvme_stop_ctrl(nctrl);

> > +     nvme_tcp_ofld_teardown_ctrl(nctrl, false);

> > +

> > +     if (!nvme_change_ctrl_state(nctrl, NVME_CTRL_CONNECTING)) {

> > +             /* state change failure is ok if we started ctrl delete */

> > +             WARN_ON_ONCE(nctrl->state != NVME_CTRL_DELETING &&

> > +                          nctrl->state != NVME_CTRL_DELETING_NOIO);

> > +

> > +             return;

> > +     }

> > +

> > +     if (ctrl->dev->ops->setup_ctrl(ctrl, false))

> > +             goto out_fail;

> > +

> > +     if (nvme_tcp_ofld_setup_ctrl(nctrl, false))

> > +             goto release_ctrl;

> > +

> > +     return;

> > +

> > +release_ctrl:

> > +     ctrl->dev->ops->release_ctrl(ctrl);

> > +out_fail:

> > +     ++nctrl->nr_reconnects;

> > +     nvme_tcp_ofld_reconnect_or_remove(nctrl);

> > +}

> > +

> >   static int

> >   nvme_tcp_ofld_init_request(struct blk_mq_tag_set *set,

> >                          struct request *rq,

> > @@ -521,6 +651,10 @@ nvme_tcp_ofld_create_ctrl(struct device *ndev, struct nvmf_ctrl_options *opts)

> >                            opts->nr_poll_queues + 1;

> >       nctrl->sqsize = opts->queue_size - 1;

> >       nctrl->kato = opts->kato;

> > +     INIT_DELAYED_WORK(&ctrl->connect_work,

> > +                       nvme_tcp_ofld_reconnect_ctrl_work);

> > +     INIT_WORK(&ctrl->err_work, nvme_tcp_ofld_error_recovery_work);

> > +     INIT_WORK(&nctrl->reset_work, nvme_tcp_ofld_reset_ctrl_work);

> >       if (!(opts->mask & NVMF_OPT_TRSVCID)) {

> >               opts->trsvcid =

> >                       kstrdup(__stringify(NVME_TCP_DISC_PORT), GFP_KERNEL);

> > diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h

> > index 9fd270240eaa..b23b1d7ea6fa 100644

> > --- a/drivers/nvme/host/tcp-offload.h

> > +++ b/drivers/nvme/host/tcp-offload.h

> > @@ -204,3 +204,4 @@ struct nvme_tcp_ofld_ops {

> >   /* Exported functions for lower vendor specific offload drivers */

> >   int nvme_tcp_ofld_register_dev(struct nvme_tcp_ofld_dev *dev);

> >   void nvme_tcp_ofld_unregister_dev(struct nvme_tcp_ofld_dev *dev);

> > +void nvme_tcp_ofld_error_recovery(struct nvme_ctrl *nctrl);

> >

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.


>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 4, 2021, 4:28 p.m. UTC | #22
On 5/2/21 2:25 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch introduces the NVMeTCP FW initializations which is used

> > to initialize the IO level configuration into a per IO HW

> > resource ("task") as part of the IO path flow.

> >

> > This includes:

> > - Write IO FW initialization

> > - Read IO FW initialization.

> > - IC-Req and IC-Resp FW exchange.

> > - FW Cleanup flow (Flush IO).

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > ---

> >   drivers/net/ethernet/qlogic/qed/Makefile      |   5 +-

> >   drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |   7 +-

> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.c         | 372 ++++++++++++++++++

> >   .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |  43 ++

> >   include/linux/qed/nvmetcp_common.h            |   3 +

> >   include/linux/qed/qed_nvmetcp_if.h            |  17 +

> >   6 files changed, 445 insertions(+), 2 deletions(-)

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> >   create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> >

> > diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile

> > index 7cb0db67ba5b..0d9c2fe0245d 100644

> > --- a/drivers/net/ethernet/qlogic/qed/Makefile

> > +++ b/drivers/net/ethernet/qlogic/qed/Makefile

> > @@ -28,7 +28,10 @@ qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o

> >   qed-$(CONFIG_QED_LL2) += qed_ll2.o

> >   qed-$(CONFIG_QED_OOO) += qed_ooo.o

> >

> > -qed-$(CONFIG_QED_NVMETCP) += qed_nvmetcp.o

> > +qed-$(CONFIG_QED_NVMETCP) += \

> > +     qed_nvmetcp.o           \

> > +     qed_nvmetcp_fw_funcs.o  \

> > +     qed_nvmetcp_ip_services.o

> >

> >   qed-$(CONFIG_QED_RDMA) +=   \

> >       qed_iwarp.o             \

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > index 1e2eb6dcbd6e..434363f8b5c0 100644

> > --- a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c

> > @@ -27,6 +27,7 @@

> >   #include "qed_mcp.h"

> >   #include "qed_sp.h"

> >   #include "qed_reg_addr.h"

> > +#include "qed_nvmetcp_fw_funcs.h"

> >

> >   static int qed_nvmetcp_async_event(struct qed_hwfn *p_hwfn, u8 fw_event_code,

> >                                  u16 echo, union event_ring_data *data,

> > @@ -848,7 +849,11 @@ static const struct qed_nvmetcp_ops qed_nvmetcp_ops_pass = {

> >       .remove_src_tcp_port_filter = &qed_llh_remove_src_tcp_port_filter,

> >       .add_dst_tcp_port_filter = &qed_llh_add_dst_tcp_port_filter,

> >       .remove_dst_tcp_port_filter = &qed_llh_remove_dst_tcp_port_filter,

> > -     .clear_all_filters = &qed_llh_clear_all_filters

> > +     .clear_all_filters = &qed_llh_clear_all_filters,

> > +     .init_read_io = &init_nvmetcp_host_read_task,

> > +     .init_write_io = &init_nvmetcp_host_write_task,

> > +     .init_icreq_exchange = &init_nvmetcp_init_conn_req_task,

> > +     .init_task_cleanup = &init_cleanup_task_nvmetcp

> >   };

> >

> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void)

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> > new file mode 100644

> > index 000000000000..8485ad678284

> > --- /dev/null

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c

> > @@ -0,0 +1,372 @@

> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause)

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#include <linux/kernel.h>

> > +#include <linux/module.h>

> > +#include <linux/pci.h>

> > +#include <linux/kernel.h>

> > +#include <linux/list.h>

> > +#include <linux/mm.h>

> > +#include <linux/types.h>

> > +#include <asm/byteorder.h>

> > +#include <linux/qed/common_hsi.h>

> > +#include <linux/qed/storage_common.h>

> > +#include <linux/qed/nvmetcp_common.h>

> > +#include <linux/qed/qed_nvmetcp_if.h>

> > +#include "qed_nvmetcp_fw_funcs.h"

> > +

> > +#define NVMETCP_NUM_SGES_IN_CACHE 0x4

> > +

> > +bool nvmetcp_is_slow_sgl(u16 num_sges, bool small_mid_sge)

> > +{

> > +     return (num_sges > SCSI_NUM_SGES_SLOW_SGL_THR && small_mid_sge);

> > +}

> > +

> > +void init_scsi_sgl_context(struct scsi_sgl_params *ctx_sgl_params,

> > +                        struct scsi_cached_sges *ctx_data_desc,

> > +                        struct storage_sgl_task_params *sgl_params)

> > +{

> > +     u8 num_sges_to_init = (u8)(sgl_params->num_sges > NVMETCP_NUM_SGES_IN_CACHE ?

> > +                                NVMETCP_NUM_SGES_IN_CACHE : sgl_params->num_sges);

> > +     u8 sge_index;

> > +

> > +     /* sgl params */

> > +     ctx_sgl_params->sgl_addr.lo = cpu_to_le32(sgl_params->sgl_phys_addr.lo);

> > +     ctx_sgl_params->sgl_addr.hi = cpu_to_le32(sgl_params->sgl_phys_addr.hi);

> > +     ctx_sgl_params->sgl_total_length = cpu_to_le32(sgl_params->total_buffer_size);

> > +     ctx_sgl_params->sgl_num_sges = cpu_to_le16(sgl_params->num_sges);

> > +

> > +     for (sge_index = 0; sge_index < num_sges_to_init; sge_index++) {

> > +             ctx_data_desc->sge[sge_index].sge_addr.lo =

> > +                     cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.lo);

> > +             ctx_data_desc->sge[sge_index].sge_addr.hi =

> > +                     cpu_to_le32(sgl_params->sgl[sge_index].sge_addr.hi);

> > +             ctx_data_desc->sge[sge_index].sge_len =

> > +                     cpu_to_le32(sgl_params->sgl[sge_index].sge_len);

> > +     }

> > +}

> > +

> > +static inline u32 calc_rw_task_size(struct nvmetcp_task_params *task_params,

> > +                                 enum nvmetcp_task_type task_type)

> > +{

> > +     u32 io_size;

> > +

> > +     if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE)

> > +             io_size = task_params->tx_io_size;

> > +     else

> > +             io_size = task_params->rx_io_size;

> > +

> > +     if (unlikely(!io_size))

> > +             return 0;

> > +

> > +     return io_size;

> > +}

> > +

> > +static inline void init_sqe(struct nvmetcp_task_params *task_params,

> > +                         struct storage_sgl_task_params *sgl_task_params,

> > +                         enum nvmetcp_task_type task_type)

> > +{

> > +     if (!task_params->sqe)

> > +             return;

> > +

> > +     memset(task_params->sqe, 0, sizeof(*task_params->sqe));

> > +     task_params->sqe->task_id = cpu_to_le16(task_params->itid);

> > +

> > +     switch (task_type) {

> > +     case NVMETCP_TASK_TYPE_HOST_WRITE: {

> > +             u32 buf_size = 0;

> > +             u32 num_sges = 0;

> > +

> > +             SET_FIELD(task_params->sqe->contlen_cdbsize,

> > +                       NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);

> > +             SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> > +                       NVMETCP_WQE_TYPE_NORMAL);

> > +             if (task_params->tx_io_size) {

> > +                     if (task_params->send_write_incapsule)

> > +                             buf_size = calc_rw_task_size(task_params, task_type);

> > +

> > +                     if (nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> > +                                             sgl_task_params->small_mid_sge))

> > +                             num_sges = NVMETCP_WQE_NUM_SGES_SLOWIO;

> > +                     else

> > +                             num_sges = min((u16)sgl_task_params->num_sges,

> > +                                            (u16)SCSI_NUM_SGES_SLOW_SGL_THR);

> > +             }

> > +             SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES, num_sges);

> > +             SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN, buf_size);

> > +     } break;

> > +

> > +     case NVMETCP_TASK_TYPE_HOST_READ: {

> > +             SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> > +                       NVMETCP_WQE_TYPE_NORMAL);

> > +             SET_FIELD(task_params->sqe->contlen_cdbsize,

> > +                       NVMETCP_WQE_CDB_SIZE_OR_NVMETCP_CMD, 1);

> > +     } break;

> > +

> > +     case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST: {

> > +             SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> > +                       NVMETCP_WQE_TYPE_MIDDLE_PATH);

> > +

> > +             if (task_params->tx_io_size) {

> > +                     SET_FIELD(task_params->sqe->contlen_cdbsize, NVMETCP_WQE_CONT_LEN,

> > +                               task_params->tx_io_size);

> > +                     SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_NUM_SGES,

> > +                               min((u16)sgl_task_params->num_sges,

> > +                                   (u16)SCSI_NUM_SGES_SLOW_SGL_THR));

> > +             }

> > +     } break;

> > +

> > +     case NVMETCP_TASK_TYPE_CLEANUP:

> > +             SET_FIELD(task_params->sqe->flags, NVMETCP_WQE_WQE_TYPE,

> > +                       NVMETCP_WQE_TYPE_TASK_CLEANUP);

> > +

> > +     default:

> > +             break;

> > +     }

> > +}

> > +

> > +/* The following function initializes of NVMeTCP task params */

> > +static inline void

> > +init_nvmetcp_task_params(struct e5_nvmetcp_task_context *context,

> > +                      struct nvmetcp_task_params *task_params,

> > +                      enum nvmetcp_task_type task_type)

> > +{

> > +     context->ystorm_st_context.state.cccid = task_params->host_cccid;

> > +     SET_FIELD(context->ustorm_st_context.error_flags, USTORM_NVMETCP_TASK_ST_CTX_NVME_TCP, 1);

> > +     context->ustorm_st_context.nvme_tcp_opaque_lo = cpu_to_le32(task_params->opq.lo);

> > +     context->ustorm_st_context.nvme_tcp_opaque_hi = cpu_to_le32(task_params->opq.hi);

> > +}

> > +

> > +/* The following function initializes default values to all tasks */

> > +static inline void

> > +init_default_nvmetcp_task(struct nvmetcp_task_params *task_params, void *pdu_header,

> > +                       enum nvmetcp_task_type task_type)

> > +{

> > +     struct e5_nvmetcp_task_context *context = task_params->context;

> > +     const u8 val_byte = context->mstorm_ag_context.cdu_validation;

> > +     u8 dw_index;

> > +

> > +     memset(context, 0, sizeof(*context));

> > +

> > +     init_nvmetcp_task_params(context, task_params,

> > +                              (enum nvmetcp_task_type)task_type);

> > +

> > +     if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE ||

> > +         task_type == NVMETCP_TASK_TYPE_HOST_READ) {

> > +             for (dw_index = 0; dw_index < QED_NVMETCP_CMD_HDR_SIZE / 4; dw_index++)

> > +                     context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =

> > +                             cpu_to_le32(((u32 *)pdu_header)[dw_index]);

> > +     } else {

> > +             for (dw_index = 0; dw_index < QED_NVMETCP_CMN_HDR_SIZE / 4; dw_index++)

> > +                     context->ystorm_st_context.pdu_hdr.task_hdr.reg[dw_index] =

> > +                             cpu_to_le32(((u32 *)pdu_header)[dw_index]);

> > +     }

> > +

>

> And this is what I meant. You are twiddling with the bytes already, so

> why bother with a separate struct at all?


We agree, will be fixed in V5.

>

> > +     /* M-Storm Context: */

> > +     context->mstorm_ag_context.cdu_validation = val_byte;

> > +     context->mstorm_st_context.task_type = (u8)(task_type);

> > +     context->mstorm_ag_context.task_cid = cpu_to_le16(task_params->conn_icid);

> > +

> > +     /* Ustorm Context: */

> > +     SET_FIELD(context->ustorm_ag_context.flags1, E5_USTORM_NVMETCP_TASK_AG_CTX_R2T2RECV, 1);

> > +     context->ustorm_st_context.task_type = (u8)(task_type);

> > +     context->ustorm_st_context.cq_rss_number = task_params->cq_rss_number;

> > +     context->ustorm_ag_context.icid = cpu_to_le16(task_params->conn_icid);

> > +}

> > +

> > +/* The following function initializes the U-Storm Task Contexts */

> > +static inline void

> > +init_ustorm_task_contexts(struct ustorm_nvmetcp_task_st_ctx *ustorm_st_context,

> > +                       struct e5_ustorm_nvmetcp_task_ag_ctx *ustorm_ag_context,

> > +                       u32 remaining_recv_len,

> > +                       u32 expected_data_transfer_len, u8 num_sges,

> > +                       bool tx_dif_conn_err_en)

> > +{

> > +     /* Remaining data to be received in bytes. Used in validations*/

> > +     ustorm_st_context->rem_rcv_len = cpu_to_le32(remaining_recv_len);

> > +     ustorm_ag_context->exp_data_acked = cpu_to_le32(expected_data_transfer_len);

> > +     ustorm_st_context->exp_data_transfer_len = cpu_to_le32(expected_data_transfer_len);

> > +     SET_FIELD(ustorm_st_context->reg1.reg1_map, NVMETCP_REG1_NUM_SGES, num_sges);

> > +     SET_FIELD(ustorm_ag_context->flags2, E5_USTORM_NVMETCP_TASK_AG_CTX_DIF_ERROR_CF_EN,

> > +               tx_dif_conn_err_en ? 1 : 0);

> > +}

> > +

> > +/* The following function initializes Local Completion Contexts: */

> > +static inline void

> > +set_local_completion_context(struct e5_nvmetcp_task_context *context)

> > +{

> > +     SET_FIELD(context->ystorm_st_context.state.flags,

> > +               YSTORM_NVMETCP_TASK_STATE_LOCAL_COMP, 1);

> > +     SET_FIELD(context->ustorm_st_context.flags,

> > +               USTORM_NVMETCP_TASK_ST_CTX_LOCAL_COMP, 1);

> > +}

> > +

> > +/* Common Fastpath task init function: */

> > +static inline void

> > +init_rw_nvmetcp_task(struct nvmetcp_task_params *task_params,

> > +                  enum nvmetcp_task_type task_type,

> > +                  struct nvmetcp_conn_params *conn_params, void *pdu_header,

> > +                  struct storage_sgl_task_params *sgl_task_params)

> > +{

> > +     struct e5_nvmetcp_task_context *context = task_params->context;

> > +     u32 task_size = calc_rw_task_size(task_params, task_type);

> > +     u32 exp_data_transfer_len = conn_params->max_burst_length;

> > +     bool slow_io = false;

> > +     u8 num_sges = 0;

> > +

> > +     init_default_nvmetcp_task(task_params, pdu_header, task_type);

> > +

> > +     /* Tx/Rx: */

> > +     if (task_params->tx_io_size) {

> > +             /* if data to transmit: */

> > +             init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,

> > +                                   &context->ystorm_st_context.state.data_desc,

> > +                                   sgl_task_params);

> > +             slow_io = nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> > +                                           sgl_task_params->small_mid_sge);

> > +             num_sges =

> > +                     (u8)(!slow_io ? min((u32)sgl_task_params->num_sges,

> > +                                         (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :

> > +                                         NVMETCP_WQE_NUM_SGES_SLOWIO);

> > +             if (slow_io) {

> > +                     SET_FIELD(context->ystorm_st_context.state.flags,

> > +                               YSTORM_NVMETCP_TASK_STATE_SLOW_IO, 1);

> > +             }

> > +     } else if (task_params->rx_io_size) {

> > +             /* if data to receive: */

> > +             init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,

> > +                                   &context->mstorm_st_context.data_desc,

> > +                                   sgl_task_params);

> > +             num_sges =

> > +                     (u8)(!nvmetcp_is_slow_sgl(sgl_task_params->num_sges,

> > +                                               sgl_task_params->small_mid_sge) ?

> > +                                               min((u32)sgl_task_params->num_sges,

> > +                                                   (u32)SCSI_NUM_SGES_SLOW_SGL_THR) :

> > +                                                   NVMETCP_WQE_NUM_SGES_SLOWIO);

> > +             context->mstorm_st_context.rem_task_size = cpu_to_le32(task_size);

> > +     }

> > +

> > +     /* Ustorm context: */

> > +     if (exp_data_transfer_len > task_size)

> > +             /* The size of the transmitted task*/

> > +             exp_data_transfer_len = task_size;

> > +     init_ustorm_task_contexts(&context->ustorm_st_context,

> > +                               &context->ustorm_ag_context,

> > +                               /* Remaining Receive length is the Task Size */

> > +                               task_size,

> > +                               /* The size of the transmitted task */

> > +                               exp_data_transfer_len,

> > +                               /* num_sges */

> > +                               num_sges,

> > +                               false);

> > +

> > +     /* Set exp_data_acked */

> > +     if (task_type == NVMETCP_TASK_TYPE_HOST_WRITE) {

> > +             if (task_params->send_write_incapsule)

> > +                     context->ustorm_ag_context.exp_data_acked = task_size;

> > +             else

> > +                     context->ustorm_ag_context.exp_data_acked = 0;

> > +     } else if (task_type == NVMETCP_TASK_TYPE_HOST_READ) {

> > +             context->ustorm_ag_context.exp_data_acked = 0;

> > +     }

> > +

> > +     context->ustorm_ag_context.exp_cont_len = 0;

> > +

> > +     init_sqe(task_params, sgl_task_params, task_type);

> > +}

> > +

> > +static void

> > +init_common_initiator_read_task(struct nvmetcp_task_params *task_params,

> > +                             struct nvmetcp_conn_params *conn_params,

> > +                             struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                             struct storage_sgl_task_params *sgl_task_params)

> > +{

> > +     init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_READ,

> > +                          conn_params, cmd_pdu_header, sgl_task_params);

> > +}

> > +

> > +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,

> > +                              struct nvmetcp_conn_params *conn_params,

> > +                              struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                              struct storage_sgl_task_params *sgl_task_params)

> > +{

> > +     init_common_initiator_read_task(task_params, conn_params,

> > +                                     (void *)cmd_pdu_header, sgl_task_params);

> > +}

> > +

> > +static void

> > +init_common_initiator_write_task(struct nvmetcp_task_params *task_params,

> > +                              struct nvmetcp_conn_params *conn_params,

> > +                              struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                              struct storage_sgl_task_params *sgl_task_params)

> > +{

> > +     init_rw_nvmetcp_task(task_params, NVMETCP_TASK_TYPE_HOST_WRITE,

> > +                          conn_params, cmd_pdu_header, sgl_task_params);

> > +}

> > +

> > +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,

> > +                               struct nvmetcp_conn_params *conn_params,

> > +                               struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                               struct storage_sgl_task_params *sgl_task_params)

> > +{

> > +     init_common_initiator_write_task(task_params, conn_params,

> > +                                      (void *)cmd_pdu_header,

> > +                                      sgl_task_params);

> > +}

> > +

> > +static void

> > +init_common_login_request_task(struct nvmetcp_task_params *task_params,

> > +                            void *login_req_pdu_header,

> > +                            struct storage_sgl_task_params *tx_sgl_task_params,

> > +                            struct storage_sgl_task_params *rx_sgl_task_params)

> > +{

> > +     struct e5_nvmetcp_task_context *context = task_params->context;

> > +

> > +     init_default_nvmetcp_task(task_params, (void *)login_req_pdu_header,

> > +                               NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);

> > +

> > +     /* Ustorm Context: */

> > +     init_ustorm_task_contexts(&context->ustorm_st_context,

> > +                               &context->ustorm_ag_context,

> > +

> > +                               /* Remaining Receive length is the Task Size */

> > +                               task_params->rx_io_size ?

> > +                               rx_sgl_task_params->total_buffer_size : 0,

> > +

> > +                               /* The size of the transmitted task */

> > +                               task_params->tx_io_size ?

> > +                               tx_sgl_task_params->total_buffer_size : 0,

> > +                               0, /* num_sges */

> > +                               0); /* tx_dif_conn_err_en */

> > +

> > +     /* SGL context: */

> > +     if (task_params->tx_io_size)

> > +             init_scsi_sgl_context(&context->ystorm_st_context.state.sgl_params,

> > +                                   &context->ystorm_st_context.state.data_desc,

> > +                                   tx_sgl_task_params);

> > +     if (task_params->rx_io_size)

> > +             init_scsi_sgl_context(&context->mstorm_st_context.sgl_params,

> > +                                   &context->mstorm_st_context.data_desc,

> > +                                   rx_sgl_task_params);

> > +

> > +     context->mstorm_st_context.rem_task_size =

> > +             cpu_to_le32(task_params->rx_io_size ?

> > +                              rx_sgl_task_params->total_buffer_size : 0);

> > +

> > +     init_sqe(task_params, tx_sgl_task_params, NVMETCP_TASK_TYPE_INIT_CONN_REQUEST);

> > +}

> > +

> > +/* The following function initializes Login task in Host mode: */

> > +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,

> > +                                  struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> > +                                  struct storage_sgl_task_params *tx_sgl_task_params,

> > +                                  struct storage_sgl_task_params *rx_sgl_task_params)

> > +{

> > +     init_common_login_request_task(task_params, init_conn_req_pdu_hdr,

> > +                                    tx_sgl_task_params, rx_sgl_task_params);

> > +}

> > +

> > +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params)

> > +{

> > +     init_sqe(task_params, NULL, NVMETCP_TASK_TYPE_CLEANUP);

> > +}

> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> > new file mode 100644

> > index 000000000000..3a8c74356c4c

> > --- /dev/null

> > +++ b/drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h

> > @@ -0,0 +1,43 @@

> > +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */

> > +/* Copyright 2021 Marvell. All rights reserved. */

> > +

> > +#ifndef _QED_NVMETCP_FW_FUNCS_H

> > +#define _QED_NVMETCP_FW_FUNCS_H

> > +

> > +#include <linux/kernel.h>

> > +#include <linux/module.h>

> > +#include <linux/pci.h>

> > +#include <linux/kernel.h>

> > +#include <linux/list.h>

> > +#include <linux/mm.h>

> > +#include <linux/types.h>

> > +#include <asm/byteorder.h>

> > +#include <linux/qed/common_hsi.h>

> > +#include <linux/qed/storage_common.h>

> > +#include <linux/qed/nvmetcp_common.h>

> > +#include <linux/qed/qed_nvmetcp_if.h>

> > +

> > +#if IS_ENABLED(CONFIG_QED_NVMETCP)

> > +

> > +void init_nvmetcp_host_read_task(struct nvmetcp_task_params *task_params,

> > +                              struct nvmetcp_conn_params *conn_params,

> > +                              struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                              struct storage_sgl_task_params *sgl_task_params);

> > +

> > +void init_nvmetcp_host_write_task(struct nvmetcp_task_params *task_params,

> > +                               struct nvmetcp_conn_params *conn_params,

> > +                               struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                               struct storage_sgl_task_params *sgl_task_params);

> > +

> > +void init_nvmetcp_init_conn_req_task(struct nvmetcp_task_params *task_params,

> > +                                  struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> > +                                  struct storage_sgl_task_params *tx_sgl_task_params,

> > +                                  struct storage_sgl_task_params *rx_sgl_task_params);

> > +

> > +void init_cleanup_task_nvmetcp(struct nvmetcp_task_params *task_params);

> > +

> > +#else /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> > +

> > +#endif /* IS_ENABLED(CONFIG_QED_NVMETCP) */

> > +

> > +#endif /* _QED_NVMETCP_FW_FUNCS_H */

> > diff --git a/include/linux/qed/nvmetcp_common.h b/include/linux/qed/nvmetcp_common.h

> > index dda7a785c321..c0023bb185dd 100644

> > --- a/include/linux/qed/nvmetcp_common.h

> > +++ b/include/linux/qed/nvmetcp_common.h

> > @@ -9,6 +9,9 @@

> >   #define NVMETCP_SLOW_PATH_LAYER_CODE (6)

> >   #define NVMETCP_WQE_NUM_SGES_SLOWIO (0xf)

> >

> > +#define QED_NVMETCP_CMD_HDR_SIZE 72

> > +#define QED_NVMETCP_CMN_HDR_SIZE 24

> > +

> >   /* NVMeTCP firmware function init parameters */

> >   struct nvmetcp_spe_func_init {

> >       __le16 half_way_close_timeout;

> > diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h

> > index 04e90dc42c12..d971be84f804 100644

> > --- a/include/linux/qed/qed_nvmetcp_if.h

> > +++ b/include/linux/qed/qed_nvmetcp_if.h

> > @@ -220,6 +220,23 @@ struct qed_nvmetcp_ops {

> >       void (*remove_dst_tcp_port_filter)(struct qed_dev *cdev, u16 dest_port);

> >

> >       void (*clear_all_filters)(struct qed_dev *cdev);

> > +

> > +     void (*init_read_io)(struct nvmetcp_task_params *task_params,

> > +                          struct nvmetcp_conn_params *conn_params,

> > +                          struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                          struct storage_sgl_task_params *sgl_task_params);

> > +

> > +     void (*init_write_io)(struct nvmetcp_task_params *task_params,

> > +                           struct nvmetcp_conn_params *conn_params,

> > +                           struct nvmetcp_cmd_capsule_hdr *cmd_pdu_header,

> > +                           struct storage_sgl_task_params *sgl_task_params);

> > +

> > +     void (*init_icreq_exchange)(struct nvmetcp_task_params *task_params,

> > +                                 struct nvmetcp_init_conn_req_hdr *init_conn_req_pdu_hdr,

> > +                                 struct storage_sgl_task_params *tx_sgl_task_params,

> > +                                 struct storage_sgl_task_params *rx_sgl_task_params);

> > +

> > +     void (*init_task_cleanup)(struct nvmetcp_task_params *task_params);

> >   };

> >

> >   const struct qed_nvmetcp_ops *qed_get_nvmetcp_ops(void);

> >

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 4, 2021, 4:49 p.m. UTC | #23
On 5/1/21 7:45 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > In this patch, we present the nvme-tcp-offload timeout support

> > nvme_tcp_ofld_timeout() and ASYNC support

> > nvme_tcp_ofld_submit_async_event().

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/host/tcp-offload.c | 85 ++++++++++++++++++++++++++++++++-

> >   drivers/nvme/host/tcp-offload.h |  2 +

> >   2 files changed, 86 insertions(+), 1 deletion(-)

> >

> > diff --git a/drivers/nvme/host/tcp-offload.c b/drivers/nvme/host/tcp-offload.c

> > index 0cdf5a432208..1d62f921f109 100644

> > --- a/drivers/nvme/host/tcp-offload.c

> > +++ b/drivers/nvme/host/tcp-offload.c

> > @@ -133,6 +133,26 @@ void nvme_tcp_ofld_req_done(struct nvme_tcp_ofld_req *req,

> >               nvme_complete_rq(rq);

> >   }

> >

> > +/**

> > + * nvme_tcp_ofld_async_req_done() - NVMeTCP Offload request done callback

> > + * function for async request. Pointed to by nvme_tcp_ofld_req->done.

> > + * Handles both NVME_TCP_F_DATA_SUCCESS flag and NVMe CQ.

> > + * @req:     NVMeTCP offload request to complete.

> > + * @result:     The nvme_result.

> > + * @status:     The completion status.

> > + *

> > + * API function that allows the vendor specific offload driver to report request

> > + * completions to the common offload layer.

> > + */

> > +void nvme_tcp_ofld_async_req_done(struct nvme_tcp_ofld_req *req,

> > +                               union nvme_result *result, __le16 status)

> > +{

> > +     struct nvme_tcp_ofld_queue *queue = req->queue;

> > +     struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;

> > +

> > +     nvme_complete_async_event(&ctrl->nctrl, status, result);

> > +}

> > +

> >   struct nvme_tcp_ofld_dev *

> >   nvme_tcp_ofld_lookup_dev(struct nvme_tcp_ofld_ctrl *ctrl)

> >   {

> > @@ -719,7 +739,23 @@ void nvme_tcp_ofld_map_data(struct nvme_command *c, u32 data_len)

> >

> >   static void nvme_tcp_ofld_submit_async_event(struct nvme_ctrl *arg)

> >   {

> > -     /* Placeholder - submit_async_event */

> > +     struct nvme_tcp_ofld_ctrl *ctrl = to_tcp_ofld_ctrl(arg);

> > +     struct nvme_tcp_ofld_queue *queue = &ctrl->queues[0];

> > +     struct nvme_tcp_ofld_dev *dev = queue->dev;

> > +     struct nvme_tcp_ofld_ops *ops = dev->ops;

> > +

> > +     ctrl->async_req.nvme_cmd.common.opcode = nvme_admin_async_event;

> > +     ctrl->async_req.nvme_cmd.common.command_id = NVME_AQ_BLK_MQ_DEPTH;

> > +     ctrl->async_req.nvme_cmd.common.flags |= NVME_CMD_SGL_METABUF;

> > +

> > +     nvme_tcp_ofld_set_sg_null(&ctrl->async_req.nvme_cmd);

> > +

> > +     ctrl->async_req.async = true;

> > +     ctrl->async_req.queue = queue;

> > +     ctrl->async_req.last = true;

> > +     ctrl->async_req.done = nvme_tcp_ofld_async_req_done;

> > +

> > +     ops->send_req(&ctrl->async_req);

> >   }

> >

> >   static void

> > @@ -1024,6 +1060,51 @@ static int nvme_tcp_ofld_poll(struct blk_mq_hw_ctx *hctx)

> >       return ops->poll_queue(queue);

> >   }

> >

> > +static void nvme_tcp_ofld_complete_timed_out(struct request *rq)

> > +{

> > +     struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);

> > +     struct nvme_ctrl *nctrl = &req->queue->ctrl->nctrl;

> > +

> > +     nvme_tcp_ofld_stop_queue(nctrl, nvme_tcp_ofld_qid(req->queue));

> > +     if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) {

> > +             nvme_req(rq)->status = NVME_SC_HOST_ABORTED_CMD;

> > +             blk_mq_complete_request(rq);

> > +     }

> > +}

> > +

> > +static enum blk_eh_timer_return nvme_tcp_ofld_timeout(struct request *rq, bool reserved)

> > +{

> > +     struct nvme_tcp_ofld_req *req = blk_mq_rq_to_pdu(rq);

> > +     struct nvme_tcp_ofld_ctrl *ctrl = req->queue->ctrl;

> > +

> > +     dev_warn(ctrl->nctrl.device,

> > +              "queue %d: timeout request %#x type %d\n",

> > +              nvme_tcp_ofld_qid(req->queue), rq->tag, req->nvme_cmd.common.opcode);

> > +

> > +     if (ctrl->nctrl.state != NVME_CTRL_LIVE) {

> > +             /*

> > +              * If we are resetting, connecting or deleting we should

> > +              * complete immediately because we may block controller

> > +              * teardown or setup sequence

> > +              * - ctrl disable/shutdown fabrics requests

> > +              * - connect requests

> > +              * - initialization admin requests

> > +              * - I/O requests that entered after unquiescing and

> > +              *   the controller stopped responding

> > +              *

> > +              * All other requests should be cancelled by the error

> > +              * recovery work, so it's fine that we fail it here.

> > +              */

> > +             nvme_tcp_ofld_complete_timed_out(rq);

> > +

> > +             return BLK_EH_DONE;

> > +     }

>

> And this particular error code has been causing _so_ _many_ issues

> during testing, that I'd rather get rid of it altogether.

> But probably not your fault, your just copying what tcp and rdma is doing.

>


I agree. We preferred to keep all the teardown/error flows similar to the tcp
and rdma design in order to be able to align the tcp-offload to any future
changes. Would you like us to do anything differently?

> > +

> > +     nvme_tcp_ofld_error_recovery(&ctrl->nctrl);

> > +

> > +     return BLK_EH_RESET_TIMER;

> > +}

> > +

> >   static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {

> >       .queue_rq       = nvme_tcp_ofld_queue_rq,

> >       .commit_rqs     = nvme_tcp_ofld_commit_rqs,

> > @@ -1031,6 +1112,7 @@ static struct blk_mq_ops nvme_tcp_ofld_mq_ops = {

> >       .init_request   = nvme_tcp_ofld_init_request,

> >       .exit_request   = nvme_tcp_ofld_exit_request,

> >       .init_hctx      = nvme_tcp_ofld_init_hctx,

> > +     .timeout        = nvme_tcp_ofld_timeout,

> >       .map_queues     = nvme_tcp_ofld_map_queues,

> >       .poll           = nvme_tcp_ofld_poll,

> >   };

> > @@ -1041,6 +1123,7 @@ static struct blk_mq_ops nvme_tcp_ofld_admin_mq_ops = {

> >       .init_request   = nvme_tcp_ofld_init_request,

> >       .exit_request   = nvme_tcp_ofld_exit_request,

> >       .init_hctx      = nvme_tcp_ofld_init_admin_hctx,

> > +     .timeout        = nvme_tcp_ofld_timeout,

> >   };

> >

> >   static const struct nvme_ctrl_ops nvme_tcp_ofld_ctrl_ops = {

> > diff --git a/drivers/nvme/host/tcp-offload.h b/drivers/nvme/host/tcp-offload.h

> > index d82645fcf9da..275a7e2d9d8a 100644

> > --- a/drivers/nvme/host/tcp-offload.h

> > +++ b/drivers/nvme/host/tcp-offload.h

> > @@ -110,6 +110,8 @@ struct nvme_tcp_ofld_ctrl {

> >       /* Connectivity params */

> >       struct nvme_tcp_ofld_ctrl_con_params conn_params;

> >

> > +     struct nvme_tcp_ofld_req async_req;

> > +

> >       /* Vendor specific driver context */

> >       void *private_data;

> >   };

> >

> So:

>

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 4, 2021, 4:52 p.m. UTC | #24
On 5/2/21 2:27 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch will present the skeleton of the qedn driver.

> > The new driver will be added under "drivers/nvme/hw/qedn" and will be

> > enabled by the Kconfig "Marvell NVM Express over Fabrics TCP offload".

> >

> > The internal implementation:

> > - qedn.h:

> >    Includes all common structs to be used by the qedn vendor driver.

> >

> > - qedn_main.c

> >    Includes the qedn_init and qedn_cleanup implementation.

> >    As part of the qedn init, the driver will register as a pci device and

> >    will work with the Marvell fastlinQ NICs.

> >    As part of the probe, the driver will register to the nvme_tcp_offload

> >    (ULP).

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Arie Gershberg <agershberg@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   MAINTAINERS                      |  10 ++

> >   drivers/nvme/Kconfig             |   1 +

> >   drivers/nvme/Makefile            |   1 +

> >   drivers/nvme/hw/Kconfig          |   8 ++

> >   drivers/nvme/hw/Makefile         |   3 +

> >   drivers/nvme/hw/qedn/Makefile    |   5 +

> >   drivers/nvme/hw/qedn/qedn.h      |  19 +++

> >   drivers/nvme/hw/qedn/qedn_main.c | 201 +++++++++++++++++++++++++++++++

> >   8 files changed, 248 insertions(+)

> >   create mode 100644 drivers/nvme/hw/Kconfig

> >   create mode 100644 drivers/nvme/hw/Makefile

> >   create mode 100644 drivers/nvme/hw/qedn/Makefile

> >   create mode 100644 drivers/nvme/hw/qedn/qedn.h

> >   create mode 100644 drivers/nvme/hw/qedn/qedn_main.c

> > Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 5, 2021, 5:54 p.m. UTC | #25
On 5/2/21 4:32 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch will present the adding of qedn_fp_queue - this is a per cpu

> > core element which handles all of the connections on that cpu core.

> > The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which

> > are handled on the same cpu core, and will only use the same FW-driver

> > resources with no need to be related to the same NVMeoF controller.

> >

> > The per qedn_fq_queue resources are the FW CQ and FW status block:

> > - The FW CQ will be used for the FW to notify the driver that the

> >    the exchange has ended and the FW will pass the incoming NVMeoF CQE

> >    (if exist) to the driver.

> > - FW status block - which is used for the FW to notify the driver with

> >    the producer update of the FW CQE chain.

> >

> > The FW fast-path queues are based on qed_chain.h

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/hw/qedn/qedn.h      |  26 +++

> >   drivers/nvme/hw/qedn/qedn_main.c | 287 ++++++++++++++++++++++++++++++-

> >   2 files changed, 310 insertions(+), 3 deletions(-)

> >

> > diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> > index 7efe2366eb7c..5d4d04d144e4 100644

> > --- a/drivers/nvme/hw/qedn/qedn.h

> > +++ b/drivers/nvme/hw/qedn/qedn.h

> > @@ -33,18 +33,41 @@

> >   #define QEDN_PROTO_CQ_PROD_IDX      0

> >   #define QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES 2

> >

> > +#define QEDN_PAGE_SIZE       4096 /* FW page size - Configurable */

> > +#define QEDN_IRQ_NAME_LEN 24

> > +#define QEDN_IRQ_NO_FLAGS 0

> > +

> > +/* TCP defines */

> > +#define QEDN_TCP_RTO_DEFAULT 280

> > +

> >   enum qedn_state {

> >       QEDN_STATE_CORE_PROBED = 0,

> >       QEDN_STATE_CORE_OPEN,

> >       QEDN_STATE_GL_PF_LIST_ADDED,

> >       QEDN_STATE_MFW_STATE,

> > +     QEDN_STATE_NVMETCP_OPEN,

> > +     QEDN_STATE_IRQ_SET,

> > +     QEDN_STATE_FP_WORK_THREAD_SET,

> >       QEDN_STATE_REGISTERED_OFFLOAD_DEV,

> >       QEDN_STATE_MODULE_REMOVE_ONGOING,

> >   };

> >

> > +/* Per CPU core params */

> > +struct qedn_fp_queue {

> > +     struct qed_chain cq_chain;

> > +     u16 *cq_prod;

> > +     struct mutex cq_mutex; /* cq handler mutex */

> > +     struct qedn_ctx *qedn;

> > +     struct qed_sb_info *sb_info;

> > +     unsigned int cpu;

> > +     u16 sb_id;

> > +     char irqname[QEDN_IRQ_NAME_LEN];

> > +};

> > +

> >   struct qedn_ctx {

> >       struct pci_dev *pdev;

> >       struct qed_dev *cdev;

> > +     struct qed_int_info int_info;

> >       struct qed_dev_nvmetcp_info dev_info;

> >       struct nvme_tcp_ofld_dev qedn_ofld_dev;

> >       struct qed_pf_params pf_params;

> > @@ -57,6 +80,9 @@ struct qedn_ctx {

> >

> >       /* Fast path queues */

> >       u8 num_fw_cqs;

> > +     struct qedn_fp_queue *fp_q_arr;

> > +     struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

> > +     dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

> >   };

> >

> >   struct qedn_global {

> > diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> > index 52007d35622d..0135a1f490da 100644

> > --- a/drivers/nvme/hw/qedn/qedn_main.c

> > +++ b/drivers/nvme/hw/qedn/qedn_main.c

> > @@ -141,6 +141,104 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {

> >       .commit_rqs = qedn_commit_rqs,

> >   };

> >

> > +/* Fastpath IRQ handler */

> > +static irqreturn_t qedn_irq_handler(int irq, void *dev_id)

> > +{

> > +     /* Placeholder */

> > +

> > +     return IRQ_HANDLED;

> > +}

> > +

> > +static void qedn_sync_free_irqs(struct qedn_ctx *qedn)

> > +{

> > +     u16 vector_idx;

> > +     int i;

> > +

> > +     for (i = 0; i < qedn->num_fw_cqs; i++) {

> > +             vector_idx = i * qedn->dev_info.common.num_hwfns +

> > +                          qed_ops->common->get_affin_hwfn_idx(qedn->cdev);

> > +             synchronize_irq(qedn->int_info.msix[vector_idx].vector);

> > +             irq_set_affinity_hint(qedn->int_info.msix[vector_idx].vector,

> > +                                   NULL);

> > +             free_irq(qedn->int_info.msix[vector_idx].vector,

> > +                      &qedn->fp_q_arr[i]);

> > +     }

> > +

> > +     qedn->int_info.used_cnt = 0;

> > +     qed_ops->common->set_fp_int(qedn->cdev, 0);

> > +}

> > +

> > +static int qedn_request_msix_irq(struct qedn_ctx *qedn)

> > +{

> > +     struct pci_dev *pdev = qedn->pdev;

> > +     struct qedn_fp_queue *fp_q = NULL;

> > +     int i, rc, cpu;

> > +     u16 vector_idx;

> > +     u32 vector;

> > +

> > +     /* numa-awareness will be added in future enhancements */

> > +     cpu = cpumask_first(cpu_online_mask);

> > +     for (i = 0; i < qedn->num_fw_cqs; i++) {

> > +             fp_q = &qedn->fp_q_arr[i];

> > +             vector_idx = i * qedn->dev_info.common.num_hwfns +

> > +                          qed_ops->common->get_affin_hwfn_idx(qedn->cdev);

> > +             vector = qedn->int_info.msix[vector_idx].vector;

> > +             sprintf(fp_q->irqname, "qedn_queue_%x.%x.%x_%d",

> > +                     pdev->bus->number, PCI_SLOT(pdev->devfn),

> > +                     PCI_FUNC(pdev->devfn), i);

> > +             rc = request_irq(vector, qedn_irq_handler, QEDN_IRQ_NO_FLAGS,

> > +                              fp_q->irqname, fp_q);

> > +             if (rc) {

> > +                     pr_err("request_irq failed.\n");

> > +                     qedn_sync_free_irqs(qedn);

> > +

> > +                     return rc;

> > +             }

> > +

> > +             fp_q->cpu = cpu;

> > +             qedn->int_info.used_cnt++;

> > +             rc = irq_set_affinity_hint(vector, get_cpu_mask(cpu));

> > +             cpu = cpumask_next_wrap(cpu, cpu_online_mask, -1, false);

> > +     }

> > +

> > +     return 0;

> > +}

> > +

>

> Hah. I knew it.

> So you _do_ have a limited number of MSIx interrupts.

> And that should limit the number of queue pairs, too.


Yes. Thanks!
Will be fixed in the relevant patch in V5.

>

> > +static int qedn_setup_irq(struct qedn_ctx *qedn)

> > +{

> > +     int rc = 0;

> > +     u8 rval;

> > +

> > +     rval = qed_ops->common->set_fp_int(qedn->cdev, qedn->num_fw_cqs);

> > +     if (rval < qedn->num_fw_cqs) {

> > +             qedn->num_fw_cqs = rval;

> > +             if (rval == 0) {

> > +                     pr_err("set_fp_int return 0 IRQs\n");

> > +

> > +                     return -ENODEV;

> > +             }

> > +     }

> > +

> > +     rc = qed_ops->common->get_fp_int(qedn->cdev, &qedn->int_info);

> > +     if (rc) {

> > +             pr_err("get_fp_int failed\n");

> > +             goto exit_setup_int;

> > +     }

> > +

> > +     if (qedn->int_info.msix_cnt) {

> > +             rc = qedn_request_msix_irq(qedn);

> > +             goto exit_setup_int;

> > +     } else {

> > +             pr_err("msix_cnt = 0\n");

> > +             rc = -EINVAL;

> > +             goto exit_setup_int;

> > +     }

> > +

> > +exit_setup_int:

> > +

> > +     return rc;

> > +}

> > +

> >   static inline void qedn_init_pf_struct(struct qedn_ctx *qedn)

> >   {

> >       /* Placeholder - Initialize qedn fields */

> > @@ -185,21 +283,173 @@ static void qedn_remove_pf_from_gl_list(struct qedn_ctx *qedn)

> >       mutex_unlock(&qedn_glb.glb_mutex);

> >   }

> >

> > +static void qedn_free_function_queues(struct qedn_ctx *qedn)

> > +{

> > +     struct qed_sb_info *sb_info = NULL;

> > +     struct qedn_fp_queue *fp_q;

> > +     int i;

> > +

> > +     /* Free workqueues */

> > +

> > +     /* Free the fast path queues*/

> > +     for (i = 0; i < qedn->num_fw_cqs; i++) {

> > +             fp_q = &qedn->fp_q_arr[i];

> > +

> > +             /* Free SB */

> > +             sb_info = fp_q->sb_info;

> > +             if (sb_info->sb_virt) {

> > +                     qed_ops->common->sb_release(qedn->cdev, sb_info,

> > +                                                 fp_q->sb_id,

> > +                                                 QED_SB_TYPE_STORAGE);

> > +                     dma_free_coherent(&qedn->pdev->dev,

> > +                                       sizeof(*sb_info->sb_virt),

> > +                                       (void *)sb_info->sb_virt,

> > +                                       sb_info->sb_phys);

> > +                     memset(sb_info, 0, sizeof(*sb_info));

> > +                     kfree(sb_info);

> > +                     fp_q->sb_info = NULL;

> > +             }

> > +

> > +             qed_ops->common->chain_free(qedn->cdev, &fp_q->cq_chain);

> > +     }

> > +

> > +     if (qedn->fw_cq_array_virt)

> > +             dma_free_coherent(&qedn->pdev->dev,

> > +                               qedn->num_fw_cqs * sizeof(u64),

> > +                               qedn->fw_cq_array_virt,

> > +                               qedn->fw_cq_array_phy);

> > +     kfree(qedn->fp_q_arr);

> > +     qedn->fp_q_arr = NULL;

> > +}

> > +

> > +static int qedn_alloc_and_init_sb(struct qedn_ctx *qedn,

> > +                               struct qed_sb_info *sb_info, u16 sb_id)

> > +{

> > +     int rc = 0;

> > +

> > +     sb_info->sb_virt = dma_alloc_coherent(&qedn->pdev->dev,

> > +                                           sizeof(struct status_block_e4),

> > +                                           &sb_info->sb_phys, GFP_KERNEL);

> > +     if (!sb_info->sb_virt) {

> > +             pr_err("Status block allocation failed\n");

> > +

> > +             return -ENOMEM;

> > +     }

> > +

> > +     rc = qed_ops->common->sb_init(qedn->cdev, sb_info, sb_info->sb_virt,

> > +                                   sb_info->sb_phys, sb_id,

> > +                                   QED_SB_TYPE_STORAGE);

> > +     if (rc) {

> > +             pr_err("Status block initialization failed\n");

> > +

> > +             return rc;

> > +     }

> > +

> > +     return 0;

> > +}

> > +

> > +static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> > +{

> > +     struct qed_chain_init_params chain_params = {};

> > +     struct status_block_e4 *sb = NULL;  /* To change to status_block_e4 */

> > +     struct qedn_fp_queue *fp_q = NULL;

> > +     int rc = 0, arr_size;

> > +     u64 cq_phy_addr;

> > +     int i;

> > +

> > +     /* Place holder - IO-path workqueues */

> > +

> > +     qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,

> > +                              sizeof(struct qedn_fp_queue), GFP_KERNEL);

> > +     if (!qedn->fp_q_arr)

> > +             return -ENOMEM;

> > +

> > +     arr_size = qedn->num_fw_cqs * sizeof(struct nvmetcp_glbl_queue_entry);

> > +     qedn->fw_cq_array_virt = dma_alloc_coherent(&qedn->pdev->dev,

> > +                                                 arr_size,

> > +                                                 &qedn->fw_cq_array_phy,

> > +                                                 GFP_KERNEL);

> > +     if (!qedn->fw_cq_array_virt) {

> > +             rc = -ENOMEM;

> > +             goto mem_alloc_failure;

> > +     }

> > +

> > +     /* placeholder - create task pools */

> > +

> > +     for (i = 0; i < qedn->num_fw_cqs; i++) {

> > +             fp_q = &qedn->fp_q_arr[i];

> > +             mutex_init(&fp_q->cq_mutex);

> > +

> > +             /* FW CQ */

> > +             chain_params.intended_use = QED_CHAIN_USE_TO_CONSUME,

> > +             chain_params.mode = QED_CHAIN_MODE_PBL,

> > +             chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,

> > +             chain_params.num_elems = QEDN_FW_CQ_SIZE;

> > +             chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/

> > +

> > +             rc = qed_ops->common->chain_alloc(qedn->cdev,

> > +                                               &fp_q->cq_chain,

> > +                                               &chain_params);

> > +             if (rc) {

> > +                     pr_err("CQ chain pci_alloc_consistent fail\n");

> > +                     goto mem_alloc_failure;

> > +             }

> > +

> > +             cq_phy_addr = qed_chain_get_pbl_phys(&fp_q->cq_chain);

> > +             qedn->fw_cq_array_virt[i].cq_pbl_addr.hi = PTR_HI(cq_phy_addr);

> > +             qedn->fw_cq_array_virt[i].cq_pbl_addr.lo = PTR_LO(cq_phy_addr);

> > +

> > +             /* SB */

> > +             fp_q->sb_info = kzalloc(sizeof(*fp_q->sb_info), GFP_KERNEL);

> > +             if (!fp_q->sb_info)

> > +                     goto mem_alloc_failure;

> > +

> > +             fp_q->sb_id = i;

> > +             rc = qedn_alloc_and_init_sb(qedn, fp_q->sb_info, fp_q->sb_id);

> > +             if (rc) {

> > +                     pr_err("SB allocation and initialization failed.\n");

> > +                     goto mem_alloc_failure;

> > +             }

> > +

> > +             sb = fp_q->sb_info->sb_virt;

> > +             fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];

> > +             fp_q->qedn = qedn;

> > +

> > +             /* Placeholder - Init IO-path workqueue */

> > +

> > +             /* Placeholder - Init IO-path resources */

> > +     }

> > +

> > +     return 0;

> > +

> > +mem_alloc_failure:

> > +     pr_err("Function allocation failed\n");

> > +     qedn_free_function_queues(qedn);

> > +

> > +     return rc;

> > +}

> > +

> >   static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)

> >   {

> >       u32 fw_conn_queue_pages = QEDN_NVMETCP_NUM_FW_CONN_QUEUE_PAGES;

> >       struct qed_nvmetcp_pf_params *pf_params;

> > +     int rc;

> >

> >       pf_params = &qedn->pf_params.nvmetcp_pf_params;

> >       memset(pf_params, 0, sizeof(*pf_params));

> >       qedn->num_fw_cqs = min_t(u8, qedn->dev_info.num_cqs, num_online_cpus());

> > +     pr_info("Num qedn CPU cores is %u\n", qedn->num_fw_cqs);

> >

> >       pf_params->num_cons = QEDN_MAX_CONNS_PER_PF;

> >       pf_params->num_tasks = QEDN_MAX_TASKS_PER_PF;

> >

> > -     /* Placeholder - Initialize function level queues */

> > +     rc = qedn_alloc_function_queues(qedn);

> > +     if (rc) {

> > +             pr_err("Global queue allocation failed.\n");

> > +             goto err_alloc_mem;

> > +     }

> >

> > -     /* Placeholder - Initialize TCP params */

> > +     set_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state);

> >

> >       /* Queues */

> >       pf_params->num_sq_pages_in_ring = fw_conn_queue_pages;

> > @@ -207,11 +457,14 @@ static int qedn_set_nvmetcp_pf_param(struct qedn_ctx *qedn)

> >       pf_params->num_uhq_pages_in_ring = fw_conn_queue_pages;

> >       pf_params->num_queues = qedn->num_fw_cqs;

> >       pf_params->cq_num_entries = QEDN_FW_CQ_SIZE;

> > +     pf_params->glbl_q_params_addr = qedn->fw_cq_array_phy;

> >

> >       /* the CQ SB pi */

> >       pf_params->gl_rq_pi = QEDN_PROTO_CQ_PROD_IDX;

> >

> > -     return 0;

> > +err_alloc_mem:

> > +

> > +     return rc;

> >   }

> >

> >   static inline int qedn_slowpath_start(struct qedn_ctx *qedn)

> > @@ -255,6 +508,12 @@ static void __qedn_remove(struct pci_dev *pdev)

> >       else

> >               pr_err("Failed to remove from global PF list\n");

> >

> > +     if (test_and_clear_bit(QEDN_STATE_IRQ_SET, &qedn->state))

> > +             qedn_sync_free_irqs(qedn);

> > +

> > +     if (test_and_clear_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state))

> > +             qed_ops->stop(qedn->cdev);

> > +

> >       if (test_and_clear_bit(QEDN_STATE_MFW_STATE, &qedn->state)) {

> >               rc = qed_ops->common->update_drv_state(qedn->cdev, false);

> >               if (rc)

> > @@ -264,6 +523,9 @@ static void __qedn_remove(struct pci_dev *pdev)

> >       if (test_and_clear_bit(QEDN_STATE_CORE_OPEN, &qedn->state))

> >               qed_ops->common->slowpath_stop(qedn->cdev);

> >

> > +     if (test_and_clear_bit(QEDN_STATE_FP_WORK_THREAD_SET, &qedn->state))

> > +             qedn_free_function_queues(qedn);

> > +

> >       if (test_and_clear_bit(QEDN_STATE_CORE_PROBED, &qedn->state))

> >               qed_ops->common->remove(qedn->cdev);

> >

> > @@ -335,6 +597,25 @@ static int __qedn_probe(struct pci_dev *pdev)

> >

> >       set_bit(QEDN_STATE_CORE_OPEN, &qedn->state);

> >

> > +     rc = qedn_setup_irq(qedn);

> > +     if (rc)

> > +             goto exit_probe_and_release_mem;

> > +

> > +     set_bit(QEDN_STATE_IRQ_SET, &qedn->state);

> > +

> > +     /* NVMeTCP start HW PF */

> > +     rc = qed_ops->start(qedn->cdev,

> > +                         NULL /* Placeholder for FW IO-path resources */,

> > +                         qedn,

> > +                         NULL /* Placeholder for FW Event callback */);

> > +     if (rc) {

> > +             rc = -ENODEV;

> > +             pr_err("Cannot start NVMeTCP Function\n");

> > +             goto exit_probe_and_release_mem;

> > +     }

> > +

> > +     set_bit(QEDN_STATE_NVMETCP_OPEN, &qedn->state);

> > +

> >       rc = qed_ops->common->update_drv_state(qedn->cdev, true);

> >       if (rc) {

> >               pr_err("Failed to send drv state to MFW\n");

> >

> So you have a limited number of MSI-x interrupts, but don't limit the

> number of hw queues to that. Why?


Will be fixed in V5.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 5, 2021, 5:57 p.m. UTC | #26
On 5/2/21 2:38 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > From: Prabhakar Kushwaha <pkushwaha@marvell.com>

> >

> > HW filter can be configured to filter TCP packets based on either

> > source or target TCP port. QEDN leverage this feature to route

> > NVMeTCP traffic.

> >

> > This patch configures HW filter block based on source port for all

> > receiving packets to deliver correct QEDN PF.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/hw/qedn/qedn.h      |  15 +++++

> >   drivers/nvme/hw/qedn/qedn_main.c | 108 ++++++++++++++++++++++++++++++-

> >   2 files changed, 122 insertions(+), 1 deletion(-)

> >

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 5, 2021, 6:04 p.m. UTC | #27
On 5/2/21 2:54 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch will present the IO level functionality of qedn

> > nvme-tcp-offload host mode. The qedn_task_ctx structure is containing

> > various params and state of the current IO, and is mapped 1x1 to the

> > fw_task_ctx which is a HW and FW IO context.

> > A qedn_task is mapped directly to its parent connection.

> > For every new IO a qedn_task structure will be assigned and they will be

> > linked for the entire IO's life span.

> >

> > The patch will include 2 flows:

> >    1. Send new command to the FW:

> >        The flow is: nvme_tcp_ofld_queue_rq() which invokes qedn_send_req()

> >        which invokes qedn_queue_request() which will:

> >       - Assign fw_task_ctx.

> >        - Prepare the Read/Write SG buffer.

> >        -  Initialize the HW and FW context.

> >        - Pass the IO to the FW.

> >

> >    2. Process the IO completion:

> >       The flow is: qedn_irq_handler() which invokes qedn_fw_cq_fp_handler()

> >        which invokes qedn_io_work_cq() which will:

> >        - process the FW completion.

> >        - Return the fw_task_ctx to the task pool.

> >        - complete the nvme req.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/hw/qedn/qedn.h      |   4 +

> >   drivers/nvme/hw/qedn/qedn_conn.c |   1 +

> >   drivers/nvme/hw/qedn/qedn_task.c | 269 ++++++++++++++++++++++++++++++-

> >   3 files changed, 272 insertions(+), 2 deletions(-)

> >

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Thanks.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Shai Malin May 7, 2021, 1:56 p.m. UTC | #28
On 5/2/21 2:42 PM, Hannes Reinecke wrote:
> On 4/29/21 9:09 PM, Shai Malin wrote:

> > This patch will present the IO level workqueues:

> >

> > - qedn_nvme_req_fp_wq(): process new requests, similar to

> >                        nvme_tcp_io_work(). The flow starts from

> >                        send_req() and will aggregate all the requests

> >                        on this CPU core.

> >

> > - qedn_fw_cq_fp_wq():   process new FW completions, the flow starts from

> >                       the IRQ handler and for a single interrupt it will

> >                       process all the pending NVMeoF Completions under

> >                       polling mode.

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >   drivers/nvme/hw/qedn/Makefile    |   2 +-

> >   drivers/nvme/hw/qedn/qedn.h      |  29 +++++++

> >   drivers/nvme/hw/qedn/qedn_conn.c |   3 +

> >   drivers/nvme/hw/qedn/qedn_main.c | 114 +++++++++++++++++++++++--

> >   drivers/nvme/hw/qedn/qedn_task.c | 138 +++++++++++++++++++++++++++++++

> >   5 files changed, 278 insertions(+), 8 deletions(-)

> >   create mode 100644 drivers/nvme/hw/qedn/qedn_task.c

> >

> > diff --git a/drivers/nvme/hw/qedn/Makefile b/drivers/nvme/hw/qedn/Makefile

> > index d8b343afcd16..c7d838a61ae6 100644

> > --- a/drivers/nvme/hw/qedn/Makefile

> > +++ b/drivers/nvme/hw/qedn/Makefile

> > @@ -1,4 +1,4 @@

> >   # SPDX-License-Identifier: GPL-2.0-only

> >

> >   obj-$(CONFIG_NVME_QEDN) += qedn.o

> > -qedn-y := qedn_main.o qedn_conn.o

> > +qedn-y := qedn_main.o qedn_conn.o qedn_task.o

> > \ No newline at end of file

> > diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> > index c15cac37ec1e..bd9a250cb2f5 100644

> > --- a/drivers/nvme/hw/qedn/qedn.h

> > +++ b/drivers/nvme/hw/qedn/qedn.h

> > @@ -47,6 +47,9 @@

> >   #define QEDN_NON_ABORTIVE_TERMINATION 0

> >   #define QEDN_ABORTIVE_TERMINATION 1

> >

> > +#define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"

> > +#define QEDN_NVME_REQ_FP_WQ_WORKQUEUE "qedn_nvme_req_fp_wq"

> > +

> >   /*

> >    * TCP offload stack default configurations and defines.

> >    * Future enhancements will allow controlling the configurable

> > @@ -100,6 +103,7 @@ struct qedn_fp_queue {

> >       struct qedn_ctx *qedn;

> >       struct qed_sb_info *sb_info;

> >       unsigned int cpu;

> > +     struct work_struct fw_cq_fp_wq_entry;

> >       u16 sb_id;

> >       char irqname[QEDN_IRQ_NAME_LEN];

> >   };

> > @@ -131,6 +135,8 @@ struct qedn_ctx {

> >       struct qedn_fp_queue *fp_q_arr;

> >       struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

> >       dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

> > +     struct workqueue_struct *nvme_req_fp_wq;

> > +     struct workqueue_struct *fw_cq_fp_wq;

> >   };

> >

> >   struct qedn_endpoint {

> > @@ -213,6 +219,25 @@ struct qedn_ctrl {

> >

> >   /* Connection level struct */

> >   struct qedn_conn_ctx {

> > +     /* IO path */

> > +     struct workqueue_struct *nvme_req_fp_wq; /* ptr to qedn->nvme_req_fp_wq */

> > +     struct nvme_tcp_ofld_req *req; /* currently proccessed request */

> > +

> > +     struct list_head host_pend_req_list;

> > +     /* Spinlock to access pending request list */

> > +     spinlock_t nvme_req_lock;

> > +     unsigned int cpu;

> > +

> > +     /* Entry for registering to nvme_req_fp_wq */

> > +     struct work_struct nvme_req_fp_wq_entry;

> > +     /*

> > +      * Spinlock for accessing qedn_process_req as it can be called

> > +      * from multiple place like queue_rq, async, self requeued

> > +      */

> > +     struct mutex nvme_req_mutex;

> > +     struct qedn_fp_queue *fp_q;

> > +     int qid;

> > +

> >       struct qedn_ctx *qedn;

> >       struct nvme_tcp_ofld_queue *queue;

> >       struct nvme_tcp_ofld_ctrl *ctrl;

> > @@ -280,5 +305,9 @@ int qedn_wait_for_conn_est(struct qedn_conn_ctx *conn_ctx);

> >   int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state);

> >   void qedn_terminate_connection(struct qedn_conn_ctx *conn_ctx, int abrt_flag);

> >   __be16 qedn_get_in_port(struct sockaddr_storage *sa);

> > +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid);

> > +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);

> > +void qedn_nvme_req_fp_wq_handler(struct work_struct *work);

> > +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);

> >

> >   #endif /* _QEDN_H_ */

> > diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c

> > index 9bfc0a5f0cdb..90d8aa36d219 100644

> > --- a/drivers/nvme/hw/qedn/qedn_conn.c

> > +++ b/drivers/nvme/hw/qedn/qedn_conn.c

> > @@ -385,6 +385,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

> >       }

> >

> >       set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);

> > +     INIT_LIST_HEAD(&conn_ctx->host_pend_req_list);

> > +     spin_lock_init(&conn_ctx->nvme_req_lock);

> > +

> >       rc = qed_ops->acquire_conn(qedn->cdev,

> >                                  &conn_ctx->conn_handle,

> >                                  &conn_ctx->fw_cid,

> > diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> > index 8b5714e7e2bb..38f23dbb03a5 100644

> > --- a/drivers/nvme/hw/qedn/qedn_main.c

> > +++ b/drivers/nvme/hw/qedn/qedn_main.c

> > @@ -267,6 +267,18 @@ static int qedn_release_ctrl(struct nvme_tcp_ofld_ctrl *ctrl)

> >       return 0;

> >   }

> >

> > +static void qedn_set_ctrl_io_cpus(struct qedn_conn_ctx *conn_ctx, int qid)

> > +{

> > +     struct qedn_ctx *qedn = conn_ctx->qedn;

> > +     struct qedn_fp_queue *fp_q = NULL;

> > +     int index;

> > +

> > +     index = qid ? (qid - 1) % qedn->num_fw_cqs : 0;

> > +     fp_q = &qedn->fp_q_arr[index];

> > +

> > +     conn_ctx->cpu = fp_q->cpu;

> > +}

> > +

> >   static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t q_size)

> >   {

> >       struct nvme_tcp_ofld_ctrl *ctrl = queue->ctrl;

> > @@ -288,6 +300,7 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t

> >       conn_ctx->queue = queue;

> >       conn_ctx->ctrl = ctrl;

> >       conn_ctx->sq_depth = q_size;

> > +     qedn_set_ctrl_io_cpus(conn_ctx, qid);

> >

> >       init_waitqueue_head(&conn_ctx->conn_waitq);

> >       atomic_set(&conn_ctx->est_conn_indicator, 0);

> > @@ -295,6 +308,10 @@ static int qedn_create_queue(struct nvme_tcp_ofld_queue *queue, int qid, size_t

> >

> >       spin_lock_init(&conn_ctx->conn_state_lock);

> >

> > +     INIT_WORK(&conn_ctx->nvme_req_fp_wq_entry, qedn_nvme_req_fp_wq_handler);

> > +     conn_ctx->nvme_req_fp_wq = qedn->nvme_req_fp_wq;

> > +     conn_ctx->qid = qid;

> > +

> >       qedn_initialize_endpoint(&conn_ctx->ep, qedn->local_mac_addr,

> >                                &ctrl->conn_params);

> >

> > @@ -356,6 +373,7 @@ static void qedn_destroy_queue(struct nvme_tcp_ofld_queue *queue)

> >       if (!conn_ctx)

> >               return;

> >

> > +     cancel_work_sync(&conn_ctx->nvme_req_fp_wq_entry);

> >       qedn_terminate_connection(conn_ctx, QEDN_ABORTIVE_TERMINATION);

> >

> >       qedn_queue_wait_for_terminate_complete(conn_ctx);

> > @@ -385,12 +403,24 @@ static int qedn_init_req(struct nvme_tcp_ofld_req *req)

> >

> >   static void qedn_commit_rqs(struct nvme_tcp_ofld_queue *queue)

> >   {

> > -     /* Placeholder - queue work */

> > +     struct qedn_conn_ctx *conn_ctx;

> > +

> > +     conn_ctx = (struct qedn_conn_ctx *)queue->private_data;

> > +

> > +     if (!list_empty(&conn_ctx->host_pend_req_list))

> > +             queue_work_on(conn_ctx->cpu, conn_ctx->nvme_req_fp_wq,

> > +                           &conn_ctx->nvme_req_fp_wq_entry);

> >   }

> >

> >   static int qedn_send_req(struct nvme_tcp_ofld_req *req)

> >   {

> > -     /* Placeholder - qedn_send_req */

> > +     struct qedn_conn_ctx *qedn_conn = (struct qedn_conn_ctx *)req->queue->private_data;

> > +

> > +     /* Under the assumption that the cccid/tag will be in the range of 0 to sq_depth-1. */

> > +     if (!req->async && qedn_validate_cccid_in_range(qedn_conn, req->rq->tag))

> > +             return BLK_STS_NOTSUPP;

> > +

> > +     qedn_queue_request(qedn_conn, req);

> >

> >       return 0;

> >   }

> > @@ -434,9 +464,59 @@ struct qedn_conn_ctx *qedn_get_conn_hash(struct qedn_ctx *qedn, u16 icid)

> >   }

> >

> >   /* Fastpath IRQ handler */

> > +void qedn_fw_cq_fp_handler(struct qedn_fp_queue *fp_q)

> > +{

> > +     u16 sb_id, cq_prod_idx, cq_cons_idx;

> > +     struct qedn_ctx *qedn = fp_q->qedn;

> > +     struct nvmetcp_fw_cqe *cqe = NULL;

> > +

> > +     sb_id = fp_q->sb_id;

> > +     qed_sb_update_sb_idx(fp_q->sb_info);

> > +

> > +     /* rmb - to prevent missing new cqes */

> > +     rmb();

> > +

> > +     /* Read the latest cq_prod from the SB */

> > +     cq_prod_idx = *fp_q->cq_prod;

> > +     cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);

> > +

> > +     while (cq_cons_idx != cq_prod_idx) {

> > +             cqe = qed_chain_consume(&fp_q->cq_chain);

> > +             if (likely(cqe))

> > +                     qedn_io_work_cq(qedn, cqe);

> > +             else

> > +                     pr_err("Failed consuming cqe\n");

> > +

> > +             cq_cons_idx = qed_chain_get_cons_idx(&fp_q->cq_chain);

> > +

> > +             /* Check if new completions were posted */

> > +             if (unlikely(cq_prod_idx == cq_cons_idx)) {

> > +                     /* rmb - to prevent missing new cqes */

> > +                     rmb();

> > +

> > +                     /* Update the latest cq_prod from the SB */

> > +                     cq_prod_idx = *fp_q->cq_prod;

> > +             }

> > +     }

> > +}

> > +

> > +static void qedn_fw_cq_fq_wq_handler(struct work_struct *work)

> > +{

> > +     struct qedn_fp_queue *fp_q = container_of(work, struct qedn_fp_queue, fw_cq_fp_wq_entry);

> > +

> > +     qedn_fw_cq_fp_handler(fp_q);

> > +     qed_sb_ack(fp_q->sb_info, IGU_INT_ENABLE, 1);

> > +}

> > +

> >   static irqreturn_t qedn_irq_handler(int irq, void *dev_id)

> >   {

> > -     /* Placeholder */

> > +     struct qedn_fp_queue *fp_q = dev_id;

> > +     struct qedn_ctx *qedn = fp_q->qedn;

> > +

> > +     fp_q->cpu = smp_processor_id();

> > +

> > +     qed_sb_ack(fp_q->sb_info, IGU_INT_DISABLE, 0);

> > +     queue_work_on(fp_q->cpu, qedn->fw_cq_fp_wq, &fp_q->fw_cq_fp_wq_entry);

> >

> >       return IRQ_HANDLED;

> >   }

> > @@ -584,6 +664,11 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)

> >       int i;

> >

> >       /* Free workqueues */

> > +     destroy_workqueue(qedn->fw_cq_fp_wq);

> > +     qedn->fw_cq_fp_wq = NULL;

> > +

> > +     destroy_workqueue(qedn->nvme_req_fp_wq);

> > +     qedn->nvme_req_fp_wq = NULL;

> >

> >       /* Free the fast path queues*/

> >       for (i = 0; i < qedn->num_fw_cqs; i++) {

> > @@ -651,7 +736,23 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> >       u64 cq_phy_addr;

> >       int i;

> >

> > -     /* Place holder - IO-path workqueues */

> > +     qedn->fw_cq_fp_wq = alloc_workqueue(QEDN_FW_CQ_FP_WQ_WORKQUEUE,

> > +                                         WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);

> > +     if (!qedn->fw_cq_fp_wq) {

> > +             rc = -ENODEV;

> > +             pr_err("Unable to create fastpath FW CQ workqueue!\n");

> > +

> > +             return rc;

> > +     }

> > +

> > +     qedn->nvme_req_fp_wq = alloc_workqueue(QEDN_NVME_REQ_FP_WQ_WORKQUEUE,

> > +                                            WQ_HIGHPRI | WQ_MEM_RECLAIM, 1);

> > +     if (!qedn->nvme_req_fp_wq) {

> > +             rc = -ENODEV;

> > +             pr_err("Unable to create fastpath qedn nvme workqueue!\n");

> > +

> > +             return rc;

> > +     }

> >

> >       qedn->fp_q_arr = kcalloc(qedn->num_fw_cqs,

> >                                sizeof(struct qedn_fp_queue), GFP_KERNEL);

>

> Why don't you use threaded interrupts if you're spinning off a workqueue

> for handling interrupts anyway?


We compared the performance (IOPS, CPU utilization, average latency,
and 99.99% tail latency) between workqueue and threaded interrupts and we are
seeing the same results under different workloads.

We will continue to evaluate the threaded interrupts design and if we will see
performance improvement we will change it in V5.

>

> > @@ -679,7 +780,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> >               chain_params.mode = QED_CHAIN_MODE_PBL,

> >               chain_params.cnt_type = QED_CHAIN_CNT_TYPE_U16,

> >               chain_params.num_elems = QEDN_FW_CQ_SIZE;

> > -             chain_params.elem_size = 64; /*Placeholder - sizeof(struct nvmetcp_fw_cqe)*/

> > +             chain_params.elem_size = sizeof(struct nvmetcp_fw_cqe);

> >

> >               rc = qed_ops->common->chain_alloc(qedn->cdev,

> >                                                 &fp_q->cq_chain,

> > @@ -708,8 +809,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> >               sb = fp_q->sb_info->sb_virt;

> >               fp_q->cq_prod = (u16 *)&sb->pi_array[QEDN_PROTO_CQ_PROD_IDX];

> >               fp_q->qedn = qedn;

> > -

> > -             /* Placeholder - Init IO-path workqueue */

> > +             INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);

> >

> >               /* Placeholder - Init IO-path resources */

> >       }

> > diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c

> > new file mode 100644

> > index 000000000000..d3474188efdc

> > --- /dev/null

> > +++ b/drivers/nvme/hw/qedn/qedn_task.c

> > @@ -0,0 +1,138 @@

> > +// SPDX-License-Identifier: GPL-2.0

> > +/*

> > + * Copyright 2021 Marvell. All rights reserved.

> > + */

> > +

> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

> > +

> > + /* Kernel includes */

> > +#include <linux/kernel.h>

> > +

> > +/* Driver includes */

> > +#include "qedn.h"

> > +

> > +inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 cccid)

> > +{

> > +     int rc = 0;

> > +

> > +     if (unlikely(cccid >= conn_ctx->sq_depth)) {

> > +             pr_err("cccid 0x%x out of range ( > sq depth)\n", cccid);

> > +             rc = -EINVAL;

> > +     }

> > +

> > +     return rc;

> > +}

> > +

> > +static bool qedn_process_req(struct qedn_conn_ctx *qedn_conn)

> > +{

> > +     return true;

> > +}

> > +

> > +/* The WQ handler can be call from 3 flows:

> > + *   1. queue_rq.

> > + *   2. async.

> > + *   3. self requeued

> > + * Try to send requests from the pending list. If a request proccess has failed,

> > + * re-register to the workqueue.

> > + * If there are no additional pending requests - exit the handler.

> > + */

> > +void qedn_nvme_req_fp_wq_handler(struct work_struct *work)

> > +{

> > +     struct qedn_conn_ctx *qedn_conn;

> > +     bool more = false;

> > +

> > +     qedn_conn = container_of(work, struct qedn_conn_ctx, nvme_req_fp_wq_entry);

> > +     do {

> > +             if (mutex_trylock(&qedn_conn->nvme_req_mutex)) {

> > +                     more = qedn_process_req(qedn_conn);

> > +                     qedn_conn->req = NULL;

> > +                     mutex_unlock(&qedn_conn->nvme_req_mutex);

> > +             }

> > +     } while (more);

> > +

> > +     if (!list_empty(&qedn_conn->host_pend_req_list))

> > +             queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,

> > +                           &qedn_conn->nvme_req_fp_wq_entry);

> > +}

> > +

> > +void qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req)

> > +{

> > +     bool empty, res = false;

> > +

> > +     spin_lock(&qedn_conn->nvme_req_lock);

> > +     empty = list_empty(&qedn_conn->host_pend_req_list) && !qedn_conn->req;

> > +     list_add_tail(&req->queue_entry, &qedn_conn->host_pend_req_list);

> > +     spin_unlock(&qedn_conn->nvme_req_lock);

> > +

> > +     /* attempt workqueue bypass */

> > +     if (qedn_conn->cpu == smp_processor_id() && empty &&

> > +         mutex_trylock(&qedn_conn->nvme_req_mutex)) {

> > +             res = qedn_process_req(qedn_conn);

> > +             qedn_conn->req = NULL;

> > +             mutex_unlock(&qedn_conn->nvme_req_mutex);

> > +             if (res || list_empty(&qedn_conn->host_pend_req_list))

> > +                     return;

> > +     } else if (req->last) {

> > +             queue_work_on(qedn_conn->cpu, qedn_conn->nvme_req_fp_wq,

> > +                           &qedn_conn->nvme_req_fp_wq_entry);

> > +     }

> > +}

> > +

>

> Queueing a request?

> Does wonders for your latency ... Can't you do without?


Yes, we can.
Will be fixed in V5.

>

> > +struct qedn_task_ctx *qedn_cqe_get_active_task(struct nvmetcp_fw_cqe *cqe)

> > +{

> > +     struct regpair *p = &cqe->task_opaque;

> > +

> > +     return (struct qedn_task_ctx *)((((u64)(le32_to_cpu(p->hi)) << 32)

> > +                                     + le32_to_cpu(p->lo)));

> > +}

> > +

> > +void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe)

> > +{

> > +     struct qedn_task_ctx *qedn_task = NULL;

> > +     struct qedn_conn_ctx *conn_ctx = NULL;

> > +     u16 itid;

> > +     u32 cid;

> > +

> > +     conn_ctx = qedn_get_conn_hash(qedn, le16_to_cpu(cqe->conn_id));

> > +     if (unlikely(!conn_ctx)) {

> > +             pr_err("CID 0x%x: Failed to fetch conn_ctx from hash\n",

> > +                    le16_to_cpu(cqe->conn_id));

> > +

> > +             return;

> > +     }

> > +

> > +     cid = conn_ctx->fw_cid;

> > +     itid = le16_to_cpu(cqe->itid);

> > +     qedn_task = qedn_cqe_get_active_task(cqe);

> > +     if (unlikely(!qedn_task))

> > +             return;

> > +

> > +     if (likely(cqe->cqe_type == NVMETCP_FW_CQE_TYPE_NORMAL)) {

> > +             /* Placeholder - verify the connection was established */

> > +

> > +             switch (cqe->task_type) {

> > +             case NVMETCP_TASK_TYPE_HOST_WRITE:

> > +             case NVMETCP_TASK_TYPE_HOST_READ:

> > +

> > +                     /* Placeholder - IO flow */

> > +

> > +                     break;

> > +

> > +             case NVMETCP_TASK_TYPE_HOST_READ_NO_CQE:

> > +

> > +                     /* Placeholder - IO flow */

> > +

> > +                     break;

> > +

> > +             case NVMETCP_TASK_TYPE_INIT_CONN_REQUEST:

> > +

> > +                     /* Placeholder - ICReq flow */

> > +

> > +                     break;

> > +             default:

> > +                     pr_info("Could not identify task type\n");

> > +             }

> > +     } else {

> > +             /* Placeholder - Recovery flows */

> > +     }

> > +}

> >

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                Kernel Storage Architect

> hare@suse.de                              +49 911 74053 688

> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg

> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer