[RFC,v6,00/27] NVMeTCP Offload ULP and QEDN Device Driver

Message ID	20210527235902.2185-1-smalin@marvell.com
Headers	show Return-Path: <netdev-owner@kernel.org> From: Shai Malin <smalin@marvell.com> To: <netdev@vger.kernel.org>, <linux-nvme@lists.infradead.org>, <davem@davemloft.net>, <kuba@kernel.org>, <sagi@grimberg.me>, <hch@lst.de>, <axboe@fb.com>, <kbusch@kernel.org> CC: <aelior@marvell.com>, <mkalderon@marvell.com>, <okulkarni@marvell.com>, <pkushwaha@marvell.com>, <malin1024@gmail.com>, <smalin@marvell.com> Subject: [RFC PATCH v6 00/27] NVMeTCP Offload ULP and QEDN Device Driver Date: Fri, 28 May 2021 02:58:35 +0300 Message-ID: <20210527235902.2185-1-smalin@marvell.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	NVMeTCP Offload ULP and QEDN Device Driver \| expand [RFC,v6,00/27] NVMeTCP Offload ULP and QEDN Device Driver [RFC,v6,02/27] nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions [RFC,v6,03/27] nvme-fabrics: Expose nvmf_check_required_opts() globally [RFC,v6,05/27] nvme-tcp-offload: Add controller level implementation [RFC,v6,07/27] nvme-tcp-offload: Add queue level implementation [RFC,v6,10/27] qed: Add NVMeTCP Offload PF Level FW and HW HSI [RFC,v6,11/27] qed: Add NVMeTCP Offload Connection Level FW and HW HSI [RFC,v6,14/27] qed: Add NVMeTCP Offload IO Level FW Initializations [RFC,v6,16/27] qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver [RFC,v6,18/27] qedn: Add qedn_claim_dev API support [RFC,v6,20/27] qedn: Add connection-level slowpath functionality [RFC,v6,22/27] qedn: Add IO level qedn_send_req and fw_cq workqueue [RFC,v6,24/27] qedn: Add support of NVME ICReq & ICResp [RFC,v6,26/27] qedn: Add Connection and IO level recovery flows

Shai Malin May 27, 2021, 11:58 p.m. UTC

With the goal of enabling a generic infrastructure that allows NVMe/TCP
offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
patch series introduces the nvme-tcp-offload ULP host layer, which will
be a new transport type called "tcp-offload" and will serve as an 
abstraction layer to work with vendor specific nvme-tcp offload drivers.

NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes 
both the TCP level and the NVMeTCP level.

The nvme-tcp-offload transport can co-exist with the existing tcp and 
other transports. The tcp offload was designed so that stack changes are 
kept to a bare minimum: only registering new transports.
All other APIs, ops etc. are identical to the regular tcp transport.
Representing the TCP offload as a new transport allows clear and manageable
differentiation between the connections which should use the offload path 
and those that are not offloaded (even on the same device).

The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:

* NVMe layer: *

       [ nvme/nvme-fabrics/blk-mq ]
             |
        (nvme API and blk-mq API)
             |
             |			 
* Vendor agnostic transport layer: *

      [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
             |        |             |
           (Verbs) 
             |        |             |
             |     (Socket)
             |        |             |
             |        |        (nvme-tcp-offload API)
             |        |             |
             |        |             |
* Vendor Specific Driver: *

             |        |             |
           [ qedr ]       
                      |             |
                   [ qede ]
                                    |
                                  [ qedn ]


Performance:
============
With this implementation on top of the Marvell qedn driver (using the
Marvell FastLinQ NIC), we were able to demonstrate the following CPU
utilization improvement:

On AMD EPYC 7402, 2.80GHz, 28 cores:
- For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate): 
  Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with 
  NVMeTCP offload.

On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores: 
- For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate): 
  Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with 
  NVMeTCP offload.

In addition, we were able to demonstrate the following latency improvement:
- For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
  Improved the average latency from 105 usec with NVMeTCP SW to 39 usec 
  with NVMeTCP offload.
  
  Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec 
  with NVMeTCP offload.

The end-to-end offload latency was measured from fio while running against 
back end of null device.


Upstream plan:
==============
Following this RFC, the series will be sent in a modular way so that 
part 1 (nvme-tcp-offload) and part 2 (qed) are independent and part 3 (qedn) 
depends on both parts 1+2.

- Part 1 (Patch 1-8): 
  The nvme-tcp-offload patches, will be sent to 
  'linux-nvme@lists.infradead.org'.
  
- Part 2 (Patches 9-15):
  The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.

- Part 3 (Packet 16-27):
  The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
 
Marvell is fully committed to maintain, test, and address issues with 
the new nvme-tcp-offload layer.
 

Usage:
======
With the Marvell NVMeTCP offload design, the network-device (qede) and the 
offload-device (qedn) are paired on each port - Logically similar to the 
RDMA model.
The user will interact with the network-device in order to configure 
the ip/vlan. The NVMeTCP configuration is populated as part of the 
nvme connect command.

Example:
Assign IP to the net-device (from any existing Linux tool):

    ip addr add 100.100.0.101/24 dev p1p1

This IP will be used by both net-device (qede) and offload-device (qedn).

In order to connect from "sw" nvme-tcp through the net-device (qede):

    nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn

In order to connect from "offload" nvme-tcp through the offload-device (qedn):

    nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
	
An alternative approach, and as a future enhancement that will not impact this 
series will be to modify nvme-cli with a new flag that will determine 
if "-t tcp" should be the regular nvme-tcp (which will be the default) 
or nvme-tcp-offload.
Exmaple:
    nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]


Queue Initialization Design:
============================
The nvme-tcp-offload ULP module shall register with the existing 
nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following ops:
- claim_dev() - in order to resolve the route to the target according to
                the paired net_dev.
- create_queue() - in order to create offloaded nvme-tcp queue.

The nvme-tcp-offload ULP module shall manage all the controller level
functionalities, call claim_dev and based on the return values shall call
the relevant module create_queue in order to create the admin queue and
the IO queues.


IO-path Design:
===============
The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload 
ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
driver and later, the nvme-tcp-offload vendor driver returns the request
completion (the IO completion).
No additional handling is needed in between; this design will reduce the
CPU utilization as we will describe below.

The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following IO-path ops:
- send_req() - in order to pass the request to the handling of the
               offload driver that shall pass it to the vendor specific device.
- poll_queue()

Once the IO completes, the nvme-tcp-offload vendor driver shall call 
command.done() that will invoke the nvme-tcp-offload ULP layer to
complete the request.


TCP events:
===========
The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
and OOO events.


Teardown and errors:
====================
In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
call the nvme_tcp_ofld_report_queue_err.
The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
with the following teardown ops:
- drain_queue()
- destroy_queue()


The Marvell FastLinQ NIC HW engine:
====================================
The Marvell NIC HW engine is capable of offloading the entire TCP/IP
stack and managing up to 64K connections per PF, already implemented and 
upstream use cases for this include iWARP (by the Marvell qedr driver) 
and iSCSI (by the Marvell qedi driver).
In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
and is able to manage the IO level also in case of TCP re-transmissions
and OOO events.
The HW engine enables direct data placement (including the data digest CRC
calculation and validation) and direct data transmission (including data
digest CRC calculation).


The Marvell qedn driver:
========================
The new driver will be added under "drivers/nvme/hw" and will be enabled
by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
As part of the qedn init, the driver will register as a pci device driver 
and will work with the Marvell fastlinQ NIC.
As part of the probe, the driver will register to the nvme_tcp_offload
(ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
"qed_*_ops" which are used by the qede, qedr, qedf and qedi device
drivers.


nvme-tcp-offload Future work:
=================
- NVMF_OPT_HOST_IFACE Support.

QEDN Future work:
=================
- FW changes in order to remove swapping requirements from qedn driver.
- Support extended HW resources.
- Digest support.
- Devlink support for device configuration and TCP offload configurations.
- Statistics

 
Long term future work:
======================
- The nvme-tcp-offload ULP target abstraction layer.
- The Marvell nvme-tcp-offload "qednt" target driver.


Changes since RFC v1:
=====================
- nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
- nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
- nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
- nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and 
  send_req() return values.

Changes since RFC v2:
=====================
- nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
- qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe
  (patches 8-11).
  
Changes since RFC v3:
=====================
- nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer 
  including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new 
  flows (ASYNC and timeout).
- nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
- nvme-tcp-offload: layer design and optimization changes.
- qedn: Add full implementation for the conn level, IO path and error handling.
- qed: Add support for the new AHP HW. 

Changes since RFC v4:
=====================
(Many thanks to Hannes Reinecke for his feedback)
- nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
- nvme_tcp_offload: Add per device private_data.
- nvme_tcp_offload: Fix header digest, data digest and tos initialization.
- qed: Add TCP_ULP FW resource layout.
- qed: Fix ipv4/ipv6 address initialization.
- qed, qedn: Replace structures with nvme-tcp.h structures.
- qedn: Remove the qedn_global list.
- qedn: Remove the workqueue flow from send_req.
- qedn: Add db_recovery support.

Changes since RFC v5:
=====================
(Many thanks to Sagi Grimberg for his feedback)
- nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
- nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
- nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
- nvme_tcp_offload: Change rwsem to mutex.
- nvme_tcp_offload: remove redundant fields.
- nvme_tcp_offload: Remove the "new" from setup_ctrl().
- nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
- nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() and 
  nvme_tcp_ofld_free_queue().
- nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into 
  patch 7 (io level).
- qedn: Fix the free_queue flow and the destroy_queue flow.
- qedn: Remove version number.


Arie Gershberg (3):
  nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS
    definitions
  nvme-tcp-offload: Add controller level implementation
  nvme-tcp-offload: Add controller level error recovery implementation

Dean Balandin (3):
  nvme-tcp-offload: Add device scan implementation
  nvme-tcp-offload: Add queue level implementation
  nvme-tcp-offload: Add IO level implementation

Nikolay Assa (2):
  qed: Add IP services APIs support
  qedn: Add qedn_claim_dev API support

Omkar Kulkarni (1):
  qed: Add TCP_ULP FW resource layout

Prabhakar Kushwaha (7):
  nvme-fabrics: Expose nvmf_check_required_opts() globally
  qed: Add support of HW filter block
  qedn: Add connection-level slowpath functionality
  qedn: Add support of configuring HW filter block
  qedn: Add support of Task and SGL
  qedn: Add support of NVME ICReq & ICResp
  qedn: Add support of ASYNC

Shai Malin (11):
  nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP
  qed: Add NVMeTCP Offload PF Level FW and HW HSI
  qed: Add NVMeTCP Offload Connection Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW and HW HSI
  qed: Add NVMeTCP Offload IO Level FW Initializations
  qedn: Add qedn - Marvell's NVMeTCP HW offload vendor driver
  qedn: Add qedn probe
  qedn: Add IRQ and fast-path resources initializations
  qedn: Add IO level qedn_send_req and fw_cq workqueue
  qedn: Add IO level fastpath functionality
  qedn: Add Connection and IO level recovery flows

 MAINTAINERS                                   |   18 +
 drivers/net/ethernet/qlogic/Kconfig           |    3 +
 drivers/net/ethernet/qlogic/qed/Makefile      |    5 +
 drivers/net/ethernet/qlogic/qed/qed.h         |   16 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c     |   44 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h     |    2 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c     |  157 +-
 drivers/net/ethernet/qlogic/qed/qed_hsi.h     |    6 +-
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c   |   20 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c     |   40 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c     |    3 +
 drivers/net/ethernet/qlogic/qed/qed_mng_tlv.c |    3 +-
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c |  870 +++++++++++
 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h |  114 ++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.c         |  379 +++++
 .../qlogic/qed/qed_nvmetcp_fw_funcs.h         |   43 +
 .../qlogic/qed/qed_nvmetcp_ip_services.c      |  239 +++
 drivers/net/ethernet/qlogic/qed/qed_ooo.c     |    5 +-
 drivers/net/ethernet/qlogic/qed/qed_sp.h      |    5 +
 .../net/ethernet/qlogic/qed/qed_sp_commands.c |    3 +-
 drivers/nvme/Kconfig                          |    1 +
 drivers/nvme/Makefile                         |    1 +
 drivers/nvme/host/Kconfig                     |   16 +
 drivers/nvme/host/Makefile                    |    3 +
 drivers/nvme/host/fabrics.c                   |   12 +-
 drivers/nvme/host/fabrics.h                   |    9 +
 drivers/nvme/host/tcp-offload.c               | 1319 +++++++++++++++++
 drivers/nvme/host/tcp-offload.h               |  205 +++
 drivers/nvme/hw/Kconfig                       |    9 +
 drivers/nvme/hw/Makefile                      |    3 +
 drivers/nvme/hw/qedn/Makefile                 |    4 +
 drivers/nvme/hw/qedn/qedn.h                   |  405 +++++
 drivers/nvme/hw/qedn/qedn_conn.c              | 1034 +++++++++++++
 drivers/nvme/hw/qedn/qedn_main.c              | 1111 ++++++++++++++
 drivers/nvme/hw/qedn/qedn_task.c              |  863 +++++++++++
 include/linux/qed/common_hsi.h                |    2 +-
 include/linux/qed/nvmetcp_common.h            |  531 +++++++
 include/linux/qed/qed_if.h                    |   22 +
 include/linux/qed/qed_ll2_if.h                |    2 +-
 include/linux/qed/qed_nvmetcp_if.h            |  241 +++
 .../linux/qed/qed_nvmetcp_ip_services_if.h    |   29 +
 41 files changed, 7738 insertions(+), 59 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_fw_funcs.h
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_nvmetcp_ip_services.c
 create mode 100644 drivers/nvme/host/tcp-offload.c
 create mode 100644 drivers/nvme/host/tcp-offload.h
 create mode 100644 drivers/nvme/hw/Kconfig
 create mode 100644 drivers/nvme/hw/Makefile
 create mode 100644 drivers/nvme/hw/qedn/Makefile
 create mode 100644 drivers/nvme/hw/qedn/qedn.h
 create mode 100644 drivers/nvme/hw/qedn/qedn_conn.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_main.c
 create mode 100644 drivers/nvme/hw/qedn/qedn_task.c
 create mode 100644 include/linux/qed/nvmetcp_common.h
 create mode 100644 include/linux/qed/qed_nvmetcp_if.h
 create mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h

Hannes Reinecke May 28, 2021, 10:28 a.m. UTC | #1

On 5/28/21 1:58 AM, Shai Malin wrote:
> From: Dean Balandin <dbalandin@marvell.com>

> 

> As part of create_ctrl(), it scans the registered devices and calls

> the claim_dev op on each of them, to find the first devices that matches

> the connection params. Once the correct devices is found (claim_dev

> returns true), we raise the refcnt of that device and return that device

> as the device to be used for ctrl currently being created.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Dean Balandin <dbalandin@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

> ---

>  drivers/nvme/host/tcp-offload.c | 77 +++++++++++++++++++++++++++++++++

>  1 file changed, 77 insertions(+)

> 

I'm not entirely happy with the lookup mechanism; one should look at
converting it to a proper 'bus' like the mellanox driver did.
But that's something would could be done at a later stage.

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

Hannes Reinecke May 28, 2021, 11:31 a.m. UTC | #2

On 5/28/21 1:58 AM, Shai Malin wrote:
> From: Omkar Kulkarni <okulkarni@marvell.com>

> 

> Add TCP_ULP as a storage common TCP offlload FW resource layout.

> This will be used by the core driver (QED) for both the NVMeTCP and iSCSI.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>  drivers/net/ethernet/qlogic/qed/qed.h         |  1 +

>  drivers/net/ethernet/qlogic/qed/qed_cxt.c     | 18 ++++++++---------

>  drivers/net/ethernet/qlogic/qed/qed_cxt.h     |  2 +-

>  drivers/net/ethernet/qlogic/qed/qed_dev.c     |  2 +-

>  drivers/net/ethernet/qlogic/qed/qed_hsi.h     |  2 +-

>  drivers/net/ethernet/qlogic/qed/qed_iscsi.c   | 20 +++++++++----------

>  drivers/net/ethernet/qlogic/qed/qed_ll2.c     |  8 ++++----

>  drivers/net/ethernet/qlogic/qed/qed_ooo.c     |  2 +-

>  .../net/ethernet/qlogic/qed/qed_sp_commands.c |  2 +-

>  include/linux/qed/common_hsi.h                |  2 +-

>  include/linux/qed/qed_ll2_if.h                |  2 +-

>  11 files changed, 31 insertions(+), 30 deletions(-)

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

Hannes Reinecke May 28, 2021, 12:41 p.m. UTC | #3

On 5/28/21 1:58 AM, Shai Malin wrote:
> This patch introduces the NVMeTCP Offload FW and HW  HSI in order

> to initialize the IO level configuration into a per IO HW

> resource ("task") as part of the IO path flow.

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> ---

>  include/linux/qed/nvmetcp_common.h | 335 ++++++++++++++++++++++++++++-

>  include/linux/qed/qed_nvmetcp_if.h |  31 +++

>  2 files changed, 365 insertions(+), 1 deletion(-)

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

Hannes Reinecke May 28, 2021, 12:46 p.m. UTC | #4

On 5/28/21 1:58 AM, Shai Malin wrote:
> This patch will present the adding of qedn_fp_queue - this is a per cpu

> core element which handles all of the connections on that cpu core.

> The qedn_fp_queue will handle a group of connections (NVMeoF QPs) which

> are handled on the same cpu core, and will only use the same FW-driver

> resources with no need to be related to the same NVMeoF controller.

> 

> The per qedn_fq_queue resources are the FW CQ and FW status block:

> - The FW CQ will be used for the FW to notify the driver that the

>   the exchange has ended and the FW will pass the incoming NVMeoF CQE

>   (if exist) to the driver.

> - FW status block - which is used for the FW to notify the driver with

>   the producer update of the FW CQE chain.

> 

> The FW fast-path queues are based on qed_chain.h

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>  drivers/nvme/hw/qedn/qedn.h      |  25 +++

>  drivers/nvme/hw/qedn/qedn_main.c | 289 ++++++++++++++++++++++++++++++-

>  2 files changed, 311 insertions(+), 3 deletions(-)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

Hannes Reinecke May 28, 2021, 1:06 p.m. UTC | #5

On 5/28/21 1:58 AM, Shai Malin wrote:
> From: Prabhakar Kushwaha <pkushwaha@marvell.com>

> 

> This patch will add support of Task and SGL which is used

> for slowpath and fast path IO. here Task is IO granule used

> by firmware to perform tasks

> 

> The internal implementation:

> - Create task/sgl resources used by all connection

> - Provide APIs to allocate and free task.

> - Add task support during connection establishment i.e. slowpath

> 

> Acked-by: Igor Russkikh <irusskikh@marvell.com>

> Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> Signed-off-by: Ariel Elior <aelior@marvell.com>

> Signed-off-by: Shai Malin <smalin@marvell.com>

> ---

>  drivers/nvme/hw/qedn/qedn.h      |  65 +++++++

>  drivers/nvme/hw/qedn/qedn_conn.c |  44 ++++-

>  drivers/nvme/hw/qedn/qedn_main.c |  34 +++-

>  drivers/nvme/hw/qedn/qedn_task.c | 320 +++++++++++++++++++++++++++++++

>  4 files changed, 459 insertions(+), 4 deletions(-)

> 

> diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> index d56184f58840..cfb5e1b0fbaa 100644

> --- a/drivers/nvme/hw/qedn/qedn.h

> +++ b/drivers/nvme/hw/qedn/qedn.h

> @@ -40,6 +40,20 @@

>  

>  #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"

>  

> +/* Protocol defines */

> +#define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE

> +

> +#define QEDN_SGE_BUFF_SIZE 4096


Just one 4k page per SGE?
What about architectures with larger page sizes?

> +#define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)

> +#define QEDN_FW_SGE_SIZE sizeof(struct nvmetcp_sge)

> +#define QEDN_MAX_FW_SGL_SIZE ((QEDN_MAX_SGES_PER_TASK) * QEDN_FW_SGE_SIZE)

> +#define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)

> +

> +#define QEDN_MAX_HW_SECTORS (QEDN_MAX_IO_SIZE / 512)

> +#define QEDN_MAX_SEGMENTS QEDN_MAX_SGES_PER_TASK

> +

> +#define QEDN_INVALID_ITID 0xFFFF

> +

>  /*

>   * TCP offload stack default configurations and defines.

>   * Future enhancements will allow controlling the configurable

> @@ -84,6 +98,15 @@ enum qedn_state {

>  	QEDN_STATE_MODULE_REMOVE_ONGOING,

>  };

>  

> +struct qedn_io_resources {

> +	/* Lock for IO resources */

> +	spinlock_t resources_lock;

> +	struct list_head task_free_list;

> +	u32 num_alloc_tasks;

> +	u32 num_free_tasks;

> +	u32 no_avail_resrc_cnt;

> +};

> +

>  /* Per CPU core params */

>  struct qedn_fp_queue {

>  	struct qed_chain cq_chain;

> @@ -93,6 +116,10 @@ struct qedn_fp_queue {

>  	struct qed_sb_info *sb_info;

>  	unsigned int cpu;

>  	struct work_struct fw_cq_fp_wq_entry;

> +

> +	/* IO related resources for host */

> +	struct qedn_io_resources host_resrc;

> +

>  	u16 sb_id;

>  	char irqname[QEDN_IRQ_NAME_LEN];

>  };

> @@ -116,12 +143,35 @@ struct qedn_ctx {

>  	/* Connections */

>  	DECLARE_HASHTABLE(conn_ctx_hash, 16);

>  

> +	u32 num_tasks_per_pool;

> +

>  	/* Fast path queues */

>  	u8 num_fw_cqs;

>  	struct qedn_fp_queue *fp_q_arr;

>  	struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

>  	dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

>  	struct workqueue_struct *fw_cq_fp_wq;

> +

> +	/* Fast Path Tasks */

> +	struct qed_nvmetcp_tid	tasks;

> +};

> +

> +struct qedn_task_ctx {

> +	struct qedn_conn_ctx *qedn_conn;

> +	struct qedn_ctx *qedn;

> +	void *fw_task_ctx;

> +	struct qedn_fp_queue *fp_q;

> +	struct scatterlist *nvme_sg;

> +	struct nvme_tcp_ofld_req *req; /* currently proccessed request */

> +	struct list_head entry;

> +	spinlock_t lock; /* To protect task resources */

> +	bool valid;

> +	unsigned long flags; /* Used by qedn_task_flags */

> +	u32 task_size;

> +	u16 itid;

> +	u16 cccid;

> +	int req_direction;

> +	struct storage_sgl_task_params sgl_task_params;

>  };

>  

>  struct qedn_endpoint {

> @@ -220,6 +270,7 @@ struct qedn_conn_ctx {

>  	struct nvme_tcp_ofld_ctrl *ctrl;

>  	u32 conn_handle;

>  	u32 fw_cid;

> +	u8 default_cq;

>  

>  	atomic_t est_conn_indicator;

>  	atomic_t destroy_conn_indicator;

> @@ -237,6 +288,11 @@ struct qedn_conn_ctx {

>  	dma_addr_t host_cccid_itid_phy_addr;

>  	struct qedn_endpoint ep;

>  	int abrt_flag;

> +	/* Spinlock for accessing active_task_list */

> +	spinlock_t task_list_lock;

> +	struct list_head active_task_list;

> +	atomic_t num_active_tasks;

> +	atomic_t num_active_fw_tasks;

>  

>  	/* Connection resources - turned on to indicate what resource was

>  	 * allocated, to that it can later be released.

> @@ -256,6 +312,7 @@ struct qedn_conn_ctx {

>  enum qedn_conn_resources_state {

>  	QEDN_CONN_RESRC_FW_SQ,

>  	QEDN_CONN_RESRC_ACQUIRE_CONN,

> +	QEDN_CONN_RESRC_TASKS,

>  	QEDN_CONN_RESRC_CCCID_ITID_MAP,

>  	QEDN_CONN_RESRC_TCP_PORT,

>  	QEDN_CONN_RESRC_DB_ADD,

> @@ -278,5 +335,13 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci

>  int qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);

>  void qedn_nvme_req_fp_wq_handler(struct work_struct *work);

>  void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);

> +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx);

> +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue);

> +void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);

> +void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);

> +struct qedn_task_ctx *

> +qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid);

> +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> +			     struct qedn_io_resources *io_resrc);

>  

>  #endif /* _QEDN_H_ */

> diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c

> index 049db20b69e8..7e38edccbb56 100644

> --- a/drivers/nvme/hw/qedn/qedn_conn.c

> +++ b/drivers/nvme/hw/qedn/qedn_conn.c

> @@ -29,6 +29,11 @@ static const char * const qedn_conn_state_str[] = {

>  	NULL

>  };

>  

> +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)

> +{

> +	return queue - queue->ctrl->queues;

> +}

> +

>  int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)

>  {

>  	spin_lock_bh(&conn_ctx->conn_state_lock);

> @@ -159,6 +164,11 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)

>  		clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);

>  	}

>  

> +	if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {

> +		clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);

> +			qedn_return_active_tasks(conn_ctx);

> +	}

> +

>  	if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {

>  		dma_free_coherent(&qedn->pdev->dev,

>  				  conn_ctx->sq_depth *

> @@ -261,6 +271,7 @@ static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)

>  	offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;

>  	offld_prms.sq_pbl_addr =

>  		(u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);

> +	offld_prms.default_cq = conn_ctx->default_cq;

>  

>  	rc = qed_ops->offload_conn(qedn->cdev,

>  				   conn_ctx->conn_handle,

> @@ -398,6 +409,9 @@ void qedn_prep_db_data(struct qedn_conn_ctx *conn_ctx)

>  static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

>  {

>  	struct qedn_ctx *qedn = conn_ctx->qedn;

> +	struct qedn_io_resources *io_resrc;

> +	struct qedn_fp_queue *fp_q;

> +	u8 default_cq_idx, qid;

>  	size_t dma_size;

>  	int rc;

>  

> @@ -409,6 +423,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

>  

>  	set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);

>  

> +	atomic_set(&conn_ctx->num_active_tasks, 0);

> +	atomic_set(&conn_ctx->num_active_fw_tasks, 0);

> +

>  	rc = qed_ops->acquire_conn(qedn->cdev,

>  				   &conn_ctx->conn_handle,

>  				   &conn_ctx->fw_cid,

> @@ -422,7 +439,32 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

>  		 conn_ctx->conn_handle);

>  	set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);

>  

> -	/* Placeholder - Allocate task resources and initialize fields */

> +	qid = qedn_qid(conn_ctx->queue);

> +	default_cq_idx = qid ? qid - 1 : 0; /* Offset adminq */

> +

> +	conn_ctx->default_cq = (default_cq_idx % qedn->num_fw_cqs);

> +	fp_q = &qedn->fp_q_arr[conn_ctx->default_cq];

> +	conn_ctx->fp_q = fp_q;

> +	io_resrc = &fp_q->host_resrc;

> +

> +	/* The first connection on each fp_q will fill task

> +	 * resources

> +	 */

> +	spin_lock(&io_resrc->resources_lock);

> +	if (io_resrc->num_alloc_tasks == 0) {

> +		rc = qedn_alloc_tasks(conn_ctx);

> +		if (rc) {

> +			pr_err("Failed allocating tasks: CID=0x%x\n",

> +			       conn_ctx->fw_cid);

> +			spin_unlock(&io_resrc->resources_lock);

> +			goto rel_conn;

> +		}

> +	}

> +	spin_unlock(&io_resrc->resources_lock);

> +

> +	spin_lock_init(&conn_ctx->task_list_lock);

> +	INIT_LIST_HEAD(&conn_ctx->active_task_list);

> +	set_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);

>  

>  	rc = qedn_fetch_tcp_port(conn_ctx);

>  	if (rc)

> diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> index db8c27dd8876..444db6d58a0a 100644

> --- a/drivers/nvme/hw/qedn/qedn_main.c

> +++ b/drivers/nvme/hw/qedn/qedn_main.c

> @@ -29,6 +29,12 @@ __be16 qedn_get_in_port(struct sockaddr_storage *sa)

>  		: ((struct sockaddr_in6 *)sa)->sin6_port;

>  }

>  

> +static void qedn_init_io_resc(struct qedn_io_resources *io_resrc)

> +{

> +	spin_lock_init(&io_resrc->resources_lock);

> +	INIT_LIST_HEAD(&io_resrc->task_free_list);

> +}

> +

>  struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)

>  {

>  	struct qedn_llh_filter *llh_filter = NULL;

> @@ -437,6 +443,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {

>  		 *	NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |

>  		 *	NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS

>  		 */

> +	.max_hw_sectors = QEDN_MAX_HW_SECTORS,

> +	.max_segments = QEDN_MAX_SEGMENTS,

>  	.claim_dev = qedn_claim_dev,

>  	.setup_ctrl = qedn_setup_ctrl,

>  	.release_ctrl = qedn_release_ctrl,

> @@ -642,8 +650,24 @@ static inline int qedn_core_probe(struct qedn_ctx *qedn)

>  	return rc;

>  }

>  

> +static void qedn_call_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> +					 struct qedn_io_resources *io_resrc)

> +{

> +	if (list_empty(&io_resrc->task_free_list))

> +		return;

> +

> +	if (io_resrc->num_alloc_tasks != io_resrc->num_free_tasks)

> +		pr_err("Task Pool:Not all returned allocated=0x%x, free=0x%x\n",

> +		       io_resrc->num_alloc_tasks, io_resrc->num_free_tasks);

> +

> +	qedn_destroy_free_tasks(fp_q, io_resrc);

> +	if (io_resrc->num_free_tasks)

> +		pr_err("Expected num_free_tasks to be 0\n");

> +}

> +

>  static void qedn_free_function_queues(struct qedn_ctx *qedn)

>  {

> +	struct qedn_io_resources *host_resrc;

>  	struct qed_sb_info *sb_info = NULL;

>  	struct qedn_fp_queue *fp_q;

>  	int i;

> @@ -655,6 +679,9 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)

>  	/* Free the fast path queues*/

>  	for (i = 0; i < qedn->num_fw_cqs; i++) {

>  		fp_q = &qedn->fp_q_arr[i];

> +		host_resrc = &fp_q->host_resrc;

> +

> +		qedn_call_destroy_free_tasks(fp_q, host_resrc);

>  

>  		/* Free SB */

>  		sb_info = fp_q->sb_info;

> @@ -742,7 +769,8 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

>  		goto mem_alloc_failure;

>  	}

>  

> -	/* placeholder - create task pools */

> +	qedn->num_tasks_per_pool =

> +		qedn->pf_params.nvmetcp_pf_params.num_tasks / qedn->num_fw_cqs;

>  

>  	for (i = 0; i < qedn->num_fw_cqs; i++) {

>  		fp_q = &qedn->fp_q_arr[i];

> @@ -784,7 +812,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

>  		fp_q->qedn = qedn;

>  		INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);

>  

> -		/* Placeholder - Init IO-path resources */

> +		qedn_init_io_resc(&fp_q->host_resrc);

>  	}

>  

>  	return 0;

> @@ -966,7 +994,7 @@ static int __qedn_probe(struct pci_dev *pdev)

>  

>  	/* NVMeTCP start HW PF */

>  	rc = qed_ops->start(qedn->cdev,

> -			    NULL /* Placeholder for FW IO-path resources */,

> +			    &qedn->tasks,

>  			    qedn,

>  			    qedn_event_cb);

>  	if (rc) {

> diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c

> index ea6745b94817..35cb5e8e4e61 100644

> --- a/drivers/nvme/hw/qedn/qedn_task.c

> +++ b/drivers/nvme/hw/qedn/qedn_task.c

> @@ -11,6 +11,198 @@

>  /* Driver includes */

>  #include "qedn.h"

>  

> +static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)

> +{

> +	kfree(qedn_task->nvme_sg);

> +	qedn_task->nvme_sg = NULL;

> +}

> +

> +static void qedn_free_fw_sgl(struct qedn_task_ctx *qedn_task)

> +{

> +	struct qedn_ctx *qedn = qedn_task->qedn;

> +	dma_addr_t sgl_pa;

> +

> +	sgl_pa = HILO_DMA_REGPAIR(qedn_task->sgl_task_params.sgl_phys_addr);

> +	dma_free_coherent(&qedn->pdev->dev,

> +			  QEDN_MAX_FW_SGL_SIZE,

> +			  qedn_task->sgl_task_params.sgl,

> +			  sgl_pa);

> +	qedn_task->sgl_task_params.sgl = NULL;

> +}

> +

> +static void qedn_destroy_single_task(struct qedn_task_ctx *qedn_task)

> +{

> +	u16 itid;

> +

> +	itid = qedn_task->itid;

> +	list_del(&qedn_task->entry);

> +	qedn_free_nvme_sg(qedn_task);

> +	qedn_free_fw_sgl(qedn_task);

> +	kfree(qedn_task);

> +	qedn_task = NULL;

> +}

> +

> +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> +			     struct qedn_io_resources *io_resrc)

> +{

> +	struct qedn_task_ctx *qedn_task, *task_tmp;

> +

> +	/* Destroy tasks from the free task list */

> +	list_for_each_entry_safe(qedn_task, task_tmp,

> +				 &io_resrc->task_free_list, entry) {

> +		qedn_destroy_single_task(qedn_task);

> +		io_resrc->num_free_tasks -= 1;

> +	}

> +}

> +

> +static int qedn_alloc_nvme_sg(struct qedn_task_ctx *qedn_task)

> +{

> +	int rc;

> +

> +	qedn_task->nvme_sg = kcalloc(QEDN_MAX_SGES_PER_TASK,

> +				     sizeof(*qedn_task->nvme_sg), GFP_KERNEL);

> +	if (!qedn_task->nvme_sg) {

> +		rc = -ENOMEM;

> +

> +		return rc;

> +	}

> +

> +	return 0;

> +}

> +

> +static int qedn_alloc_fw_sgl(struct qedn_task_ctx *qedn_task)

> +{

> +	struct qedn_ctx *qedn = qedn_task->qedn_conn->qedn;

> +	dma_addr_t fw_sgl_phys;

> +

> +	qedn_task->sgl_task_params.sgl =

> +		dma_alloc_coherent(&qedn->pdev->dev, QEDN_MAX_FW_SGL_SIZE,

> +				   &fw_sgl_phys, GFP_KERNEL);

> +	if (!qedn_task->sgl_task_params.sgl) {

> +		pr_err("Couldn't allocate FW sgl\n");

> +

> +		return -ENOMEM;

> +	}

> +

> +	DMA_REGPAIR_LE(qedn_task->sgl_task_params.sgl_phys_addr, fw_sgl_phys);

> +

> +	return 0;

> +}

> +

> +static inline void *qedn_get_fw_task(struct qed_nvmetcp_tid *info, u16 itid)

> +{

> +	return (void *)(info->blocks[itid / info->num_tids_per_block] +

> +			(itid % info->num_tids_per_block) * info->size);

> +}

> +

> +static struct qedn_task_ctx *qedn_alloc_task(struct qedn_conn_ctx *conn_ctx, u16 itid)

> +{

> +	struct qedn_ctx *qedn = conn_ctx->qedn;

> +	struct qedn_task_ctx *qedn_task;

> +	void *fw_task_ctx;

> +	int rc = 0;

> +

> +	qedn_task = kzalloc(sizeof(*qedn_task), GFP_KERNEL);

> +	if (!qedn_task)

> +		return NULL;

> +

> +	spin_lock_init(&qedn_task->lock);

> +	fw_task_ctx = qedn_get_fw_task(&qedn->tasks, itid);

> +	if (!fw_task_ctx) {

> +		pr_err("iTID: 0x%x; Failed getting fw_task_ctx memory\n", itid);

> +		goto release_task;

> +	}

> +

> +	/* No need to memset fw_task_ctx - its done in the HSI func */

> +	qedn_task->qedn_conn = conn_ctx;

> +	qedn_task->qedn = qedn;

> +	qedn_task->fw_task_ctx = fw_task_ctx;

> +	qedn_task->valid = 0;

> +	qedn_task->flags = 0;

> +	qedn_task->itid = itid;

> +	rc = qedn_alloc_fw_sgl(qedn_task);

> +	if (rc) {

> +		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);

> +		goto release_task;

> +	}

> +

> +	rc = qedn_alloc_nvme_sg(qedn_task);

> +	if (rc) {

> +		pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);

> +		goto release_fw_sgl;

> +	}

> +

> +	return qedn_task;

> +

> +release_fw_sgl:

> +	qedn_free_fw_sgl(qedn_task);

> +release_task:

> +	kfree(qedn_task);

> +

> +	return NULL;

> +}

> +

> +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx)

> +{

> +	struct qedn_ctx *qedn = conn_ctx->qedn;

> +	struct qedn_task_ctx *qedn_task = NULL;

> +	struct qedn_io_resources *io_resrc;

> +	u16 itid, start_itid, offset;

> +	struct qedn_fp_queue *fp_q;

> +	int i, rc;

> +

> +	fp_q = conn_ctx->fp_q;

> +

> +	offset = fp_q->sb_id;

> +	io_resrc = &fp_q->host_resrc;

> +

> +	start_itid = qedn->num_tasks_per_pool * offset;

> +	for (i = 0; i < qedn->num_tasks_per_pool; ++i) {

> +		itid = start_itid + i;

> +		qedn_task = qedn_alloc_task(conn_ctx, itid);

> +		if (!qedn_task) {

> +			pr_err("Failed allocating task\n");

> +			rc = -ENOMEM;

> +			goto release_tasks;

> +		}

> +

> +		qedn_task->fp_q = fp_q;

> +		io_resrc->num_free_tasks += 1;

> +		list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);

> +	}

> +

> +	io_resrc->num_alloc_tasks = io_resrc->num_free_tasks;

> +

> +	return 0;

> +

> +release_tasks:

> +	qedn_destroy_free_tasks(fp_q, io_resrc);

> +

> +	return rc;

> +}

> +


Well ... this is less than optimal.
In effect you are splitting the available hardware tasks between pools.
And the way I see it you allocate one pool per connection.
Is that correct?

So what about the scaling here?
How many hardware tasks do you have in total?
And what happens if you add more and more connections?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

Shai Malin May 31, 2021, 2:14 p.m. UTC | #6

On 5/28/21 4:06 PM, Hannes Reinecke wrote:
> On 5/28/21 1:58 AM, Shai Malin wrote:

> > From: Prabhakar Kushwaha <pkushwaha@marvell.com>

> >

> > This patch will add support of Task and SGL which is used

> > for slowpath and fast path IO. here Task is IO granule used

> > by firmware to perform tasks

> >

> > The internal implementation:

> > - Create task/sgl resources used by all connection

> > - Provide APIs to allocate and free task.

> > - Add task support during connection establishment i.e. slowpath

> >

> > Acked-by: Igor Russkikh <irusskikh@marvell.com>

> > Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>

> > Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>

> > Signed-off-by: Michal Kalderon <mkalderon@marvell.com>

> > Signed-off-by: Ariel Elior <aelior@marvell.com>

> > Signed-off-by: Shai Malin <smalin@marvell.com>

> > ---

> >  drivers/nvme/hw/qedn/qedn.h      |  65 +++++++

> >  drivers/nvme/hw/qedn/qedn_conn.c |  44 ++++-

> >  drivers/nvme/hw/qedn/qedn_main.c |  34 +++-

> >  drivers/nvme/hw/qedn/qedn_task.c | 320 +++++++++++++++++++++++++++++++

> >  4 files changed, 459 insertions(+), 4 deletions(-)

> >

> > diff --git a/drivers/nvme/hw/qedn/qedn.h b/drivers/nvme/hw/qedn/qedn.h

> > index d56184f58840..cfb5e1b0fbaa 100644

> > --- a/drivers/nvme/hw/qedn/qedn.h

> > +++ b/drivers/nvme/hw/qedn/qedn.h

> > @@ -40,6 +40,20 @@

> >

> >  #define QEDN_FW_CQ_FP_WQ_WORKQUEUE "qedn_fw_cq_fp_wq"

> >

> > +/* Protocol defines */

> > +#define QEDN_MAX_IO_SIZE QED_NVMETCP_MAX_IO_SIZE

> > +

> > +#define QEDN_SGE_BUFF_SIZE 4096

>

> Just one 4k page per SGE?

> What about architectures with larger page sizes?


This define is not related to platform page size, and actually,
we can remove it.

>

> > +#define QEDN_MAX_SGES_PER_TASK DIV_ROUND_UP(QEDN_MAX_IO_SIZE, QEDN_SGE_BUFF_SIZE)

> > +#define QEDN_FW_SGE_SIZE sizeof(struct nvmetcp_sge)

> > +#define QEDN_MAX_FW_SGL_SIZE ((QEDN_MAX_SGES_PER_TASK) * QEDN_FW_SGE_SIZE)

> > +#define QEDN_FW_SLOW_IO_MIN_SGE_LIMIT (9700 / 6)

> > +

> > +#define QEDN_MAX_HW_SECTORS (QEDN_MAX_IO_SIZE / 512)

> > +#define QEDN_MAX_SEGMENTS QEDN_MAX_SGES_PER_TASK

> > +

> > +#define QEDN_INVALID_ITID 0xFFFF

> > +

> >  /*

> >   * TCP offload stack default configurations and defines.

> >   * Future enhancements will allow controlling the configurable

> > @@ -84,6 +98,15 @@ enum qedn_state {

> >       QEDN_STATE_MODULE_REMOVE_ONGOING,

> >  };

> >

> > +struct qedn_io_resources {

> > +     /* Lock for IO resources */

> > +     spinlock_t resources_lock;

> > +     struct list_head task_free_list;

> > +     u32 num_alloc_tasks;

> > +     u32 num_free_tasks;

> > +     u32 no_avail_resrc_cnt;

> > +};

> > +

> >  /* Per CPU core params */

> >  struct qedn_fp_queue {

> >       struct qed_chain cq_chain;

> > @@ -93,6 +116,10 @@ struct qedn_fp_queue {

> >       struct qed_sb_info *sb_info;

> >       unsigned int cpu;

> >       struct work_struct fw_cq_fp_wq_entry;

> > +

> > +     /* IO related resources for host */

> > +     struct qedn_io_resources host_resrc;

> > +

> >       u16 sb_id;

> >       char irqname[QEDN_IRQ_NAME_LEN];

> >  };

> > @@ -116,12 +143,35 @@ struct qedn_ctx {

> >       /* Connections */

> >       DECLARE_HASHTABLE(conn_ctx_hash, 16);

> >

> > +     u32 num_tasks_per_pool;

> > +

> >       /* Fast path queues */

> >       u8 num_fw_cqs;

> >       struct qedn_fp_queue *fp_q_arr;

> >       struct nvmetcp_glbl_queue_entry *fw_cq_array_virt;

> >       dma_addr_t fw_cq_array_phy; /* Physical address of fw_cq_array_virt */

> >       struct workqueue_struct *fw_cq_fp_wq;

> > +

> > +     /* Fast Path Tasks */

> > +     struct qed_nvmetcp_tid  tasks;

> > +};

> > +

> > +struct qedn_task_ctx {

> > +     struct qedn_conn_ctx *qedn_conn;

> > +     struct qedn_ctx *qedn;

> > +     void *fw_task_ctx;

> > +     struct qedn_fp_queue *fp_q;

> > +     struct scatterlist *nvme_sg;

> > +     struct nvme_tcp_ofld_req *req; /* currently proccessed request */

> > +     struct list_head entry;

> > +     spinlock_t lock; /* To protect task resources */

> > +     bool valid;

> > +     unsigned long flags; /* Used by qedn_task_flags */

> > +     u32 task_size;

> > +     u16 itid;

> > +     u16 cccid;

> > +     int req_direction;

> > +     struct storage_sgl_task_params sgl_task_params;

> >  };

> >

> >  struct qedn_endpoint {

> > @@ -220,6 +270,7 @@ struct qedn_conn_ctx {

> >       struct nvme_tcp_ofld_ctrl *ctrl;

> >       u32 conn_handle;

> >       u32 fw_cid;

> > +     u8 default_cq;

> >

> >       atomic_t est_conn_indicator;

> >       atomic_t destroy_conn_indicator;

> > @@ -237,6 +288,11 @@ struct qedn_conn_ctx {

> >       dma_addr_t host_cccid_itid_phy_addr;

> >       struct qedn_endpoint ep;

> >       int abrt_flag;

> > +     /* Spinlock for accessing active_task_list */

> > +     spinlock_t task_list_lock;

> > +     struct list_head active_task_list;

> > +     atomic_t num_active_tasks;

> > +     atomic_t num_active_fw_tasks;

> >

> >       /* Connection resources - turned on to indicate what resource was

> >        * allocated, to that it can later be released.

> > @@ -256,6 +312,7 @@ struct qedn_conn_ctx {

> >  enum qedn_conn_resources_state {

> >       QEDN_CONN_RESRC_FW_SQ,

> >       QEDN_CONN_RESRC_ACQUIRE_CONN,

> > +     QEDN_CONN_RESRC_TASKS,

> >       QEDN_CONN_RESRC_CCCID_ITID_MAP,

> >       QEDN_CONN_RESRC_TCP_PORT,

> >       QEDN_CONN_RESRC_DB_ADD,

> > @@ -278,5 +335,13 @@ inline int qedn_validate_cccid_in_range(struct qedn_conn_ctx *conn_ctx, u16 ccci

> >  int qedn_queue_request(struct qedn_conn_ctx *qedn_conn, struct nvme_tcp_ofld_req *req);

> >  void qedn_nvme_req_fp_wq_handler(struct work_struct *work);

> >  void qedn_io_work_cq(struct qedn_ctx *qedn, struct nvmetcp_fw_cqe *cqe);

> > +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx);

> > +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue);

> > +void qedn_common_clear_fw_sgl(struct storage_sgl_task_params *sgl_task_params);

> > +void qedn_return_active_tasks(struct qedn_conn_ctx *conn_ctx);

> > +struct qedn_task_ctx *

> > +qedn_get_free_task_from_pool(struct qedn_conn_ctx *conn_ctx, u16 cccid);

> > +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> > +                          struct qedn_io_resources *io_resrc);

> >

> >  #endif /* _QEDN_H_ */

> > diff --git a/drivers/nvme/hw/qedn/qedn_conn.c b/drivers/nvme/hw/qedn/qedn_conn.c

> > index 049db20b69e8..7e38edccbb56 100644

> > --- a/drivers/nvme/hw/qedn/qedn_conn.c

> > +++ b/drivers/nvme/hw/qedn/qedn_conn.c

> > @@ -29,6 +29,11 @@ static const char * const qedn_conn_state_str[] = {

> >       NULL

> >  };

> >

> > +inline int qedn_qid(struct nvme_tcp_ofld_queue *queue)

> > +{

> > +     return queue - queue->ctrl->queues;

> > +}

> > +

> >  int qedn_set_con_state(struct qedn_conn_ctx *conn_ctx, enum qedn_conn_state new_state)

> >  {

> >       spin_lock_bh(&conn_ctx->conn_state_lock);

> > @@ -159,6 +164,11 @@ static void qedn_release_conn_ctx(struct qedn_conn_ctx *conn_ctx)

> >               clear_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);

> >       }

> >

> > +     if (test_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state)) {

> > +             clear_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);

> > +                     qedn_return_active_tasks(conn_ctx);

> > +     }

> > +

> >       if (test_bit(QEDN_CONN_RESRC_CCCID_ITID_MAP, &conn_ctx->resrc_state)) {

> >               dma_free_coherent(&qedn->pdev->dev,

> >                                 conn_ctx->sq_depth *

> > @@ -261,6 +271,7 @@ static int qedn_nvmetcp_offload_conn(struct qedn_conn_ctx *conn_ctx)

> >       offld_prms.max_rt_time = QEDN_TCP_MAX_RT_TIME;

> >       offld_prms.sq_pbl_addr =

> >               (u64)qed_chain_get_pbl_phys(&qedn_ep->fw_sq_chain);

> > +     offld_prms.default_cq = conn_ctx->default_cq;

> >

> >       rc = qed_ops->offload_conn(qedn->cdev,

> >                                  conn_ctx->conn_handle,

> > @@ -398,6 +409,9 @@ void qedn_prep_db_data(struct qedn_conn_ctx *conn_ctx)

> >  static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

> >  {

> >       struct qedn_ctx *qedn = conn_ctx->qedn;

> > +     struct qedn_io_resources *io_resrc;

> > +     struct qedn_fp_queue *fp_q;

> > +     u8 default_cq_idx, qid;

> >       size_t dma_size;

> >       int rc;

> >

> > @@ -409,6 +423,9 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

> >

> >       set_bit(QEDN_CONN_RESRC_FW_SQ, &conn_ctx->resrc_state);

> >

> > +     atomic_set(&conn_ctx->num_active_tasks, 0);

> > +     atomic_set(&conn_ctx->num_active_fw_tasks, 0);

> > +

> >       rc = qed_ops->acquire_conn(qedn->cdev,

> >                                  &conn_ctx->conn_handle,

> >                                  &conn_ctx->fw_cid,

> > @@ -422,7 +439,32 @@ static int qedn_prep_and_offload_queue(struct qedn_conn_ctx *conn_ctx)

> >                conn_ctx->conn_handle);

> >       set_bit(QEDN_CONN_RESRC_ACQUIRE_CONN, &conn_ctx->resrc_state);

> >

> > -     /* Placeholder - Allocate task resources and initialize fields */

> > +     qid = qedn_qid(conn_ctx->queue);

> > +     default_cq_idx = qid ? qid - 1 : 0; /* Offset adminq */

> > +

> > +     conn_ctx->default_cq = (default_cq_idx % qedn->num_fw_cqs);

> > +     fp_q = &qedn->fp_q_arr[conn_ctx->default_cq];

> > +     conn_ctx->fp_q = fp_q;

> > +     io_resrc = &fp_q->host_resrc;

> > +

> > +     /* The first connection on each fp_q will fill task

> > +      * resources

> > +      */

> > +     spin_lock(&io_resrc->resources_lock);

> > +     if (io_resrc->num_alloc_tasks == 0) {

> > +             rc = qedn_alloc_tasks(conn_ctx);

> > +             if (rc) {

> > +                     pr_err("Failed allocating tasks: CID=0x%x\n",

> > +                            conn_ctx->fw_cid);

> > +                     spin_unlock(&io_resrc->resources_lock);

> > +                     goto rel_conn;

> > +             }

> > +     }

> > +     spin_unlock(&io_resrc->resources_lock);

> > +

> > +     spin_lock_init(&conn_ctx->task_list_lock);

> > +     INIT_LIST_HEAD(&conn_ctx->active_task_list);

> > +     set_bit(QEDN_CONN_RESRC_TASKS, &conn_ctx->resrc_state);

> >

> >       rc = qedn_fetch_tcp_port(conn_ctx);

> >       if (rc)

> > diff --git a/drivers/nvme/hw/qedn/qedn_main.c b/drivers/nvme/hw/qedn/qedn_main.c

> > index db8c27dd8876..444db6d58a0a 100644

> > --- a/drivers/nvme/hw/qedn/qedn_main.c

> > +++ b/drivers/nvme/hw/qedn/qedn_main.c

> > @@ -29,6 +29,12 @@ __be16 qedn_get_in_port(struct sockaddr_storage *sa)

> >               : ((struct sockaddr_in6 *)sa)->sin6_port;

> >  }

> >

> > +static void qedn_init_io_resc(struct qedn_io_resources *io_resrc)

> > +{

> > +     spin_lock_init(&io_resrc->resources_lock);

> > +     INIT_LIST_HEAD(&io_resrc->task_free_list);

> > +}

> > +

> >  struct qedn_llh_filter *qedn_add_llh_filter(struct qedn_ctx *qedn, u16 tcp_port)

> >  {

> >       struct qedn_llh_filter *llh_filter = NULL;

> > @@ -437,6 +443,8 @@ static struct nvme_tcp_ofld_ops qedn_ofld_ops = {

> >                *      NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST |

> >                *      NVMF_OPT_NR_POLL_QUEUES | NVMF_OPT_TOS

> >                */

> > +     .max_hw_sectors = QEDN_MAX_HW_SECTORS,

> > +     .max_segments = QEDN_MAX_SEGMENTS,

> >       .claim_dev = qedn_claim_dev,

> >       .setup_ctrl = qedn_setup_ctrl,

> >       .release_ctrl = qedn_release_ctrl,

> > @@ -642,8 +650,24 @@ static inline int qedn_core_probe(struct qedn_ctx *qedn)

> >       return rc;

> >  }

> >

> > +static void qedn_call_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> > +                                      struct qedn_io_resources *io_resrc)

> > +{

> > +     if (list_empty(&io_resrc->task_free_list))

> > +             return;

> > +

> > +     if (io_resrc->num_alloc_tasks != io_resrc->num_free_tasks)

> > +             pr_err("Task Pool:Not all returned allocated=0x%x, free=0x%x\n",

> > +                    io_resrc->num_alloc_tasks, io_resrc->num_free_tasks);

> > +

> > +     qedn_destroy_free_tasks(fp_q, io_resrc);

> > +     if (io_resrc->num_free_tasks)

> > +             pr_err("Expected num_free_tasks to be 0\n");

> > +}

> > +

> >  static void qedn_free_function_queues(struct qedn_ctx *qedn)

> >  {

> > +     struct qedn_io_resources *host_resrc;

> >       struct qed_sb_info *sb_info = NULL;

> >       struct qedn_fp_queue *fp_q;

> >       int i;

> > @@ -655,6 +679,9 @@ static void qedn_free_function_queues(struct qedn_ctx *qedn)

> >       /* Free the fast path queues*/

> >       for (i = 0; i < qedn->num_fw_cqs; i++) {

> >               fp_q = &qedn->fp_q_arr[i];

> > +             host_resrc = &fp_q->host_resrc;

> > +

> > +             qedn_call_destroy_free_tasks(fp_q, host_resrc);

> >

> >               /* Free SB */

> >               sb_info = fp_q->sb_info;

> > @@ -742,7 +769,8 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> >               goto mem_alloc_failure;

> >       }

> >

> > -     /* placeholder - create task pools */

> > +     qedn->num_tasks_per_pool =

> > +             qedn->pf_params.nvmetcp_pf_params.num_tasks / qedn->num_fw_cqs;

> >

> >       for (i = 0; i < qedn->num_fw_cqs; i++) {

> >               fp_q = &qedn->fp_q_arr[i];

> > @@ -784,7 +812,7 @@ static int qedn_alloc_function_queues(struct qedn_ctx *qedn)

> >               fp_q->qedn = qedn;

> >               INIT_WORK(&fp_q->fw_cq_fp_wq_entry, qedn_fw_cq_fq_wq_handler);

> >

> > -             /* Placeholder - Init IO-path resources */

> > +             qedn_init_io_resc(&fp_q->host_resrc);

> >       }

> >

> >       return 0;

> > @@ -966,7 +994,7 @@ static int __qedn_probe(struct pci_dev *pdev)

> >

> >       /* NVMeTCP start HW PF */

> >       rc = qed_ops->start(qedn->cdev,

> > -                         NULL /* Placeholder for FW IO-path resources */,

> > +                         &qedn->tasks,

> >                           qedn,

> >                           qedn_event_cb);

> >       if (rc) {

> > diff --git a/drivers/nvme/hw/qedn/qedn_task.c b/drivers/nvme/hw/qedn/qedn_task.c

> > index ea6745b94817..35cb5e8e4e61 100644

> > --- a/drivers/nvme/hw/qedn/qedn_task.c

> > +++ b/drivers/nvme/hw/qedn/qedn_task.c

> > @@ -11,6 +11,198 @@

> >  /* Driver includes */

> >  #include "qedn.h"

> >

> > +static void qedn_free_nvme_sg(struct qedn_task_ctx *qedn_task)

> > +{

> > +     kfree(qedn_task->nvme_sg);

> > +     qedn_task->nvme_sg = NULL;

> > +}

> > +

> > +static void qedn_free_fw_sgl(struct qedn_task_ctx *qedn_task)

> > +{

> > +     struct qedn_ctx *qedn = qedn_task->qedn;

> > +     dma_addr_t sgl_pa;

> > +

> > +     sgl_pa = HILO_DMA_REGPAIR(qedn_task->sgl_task_params.sgl_phys_addr);

> > +     dma_free_coherent(&qedn->pdev->dev,

> > +                       QEDN_MAX_FW_SGL_SIZE,

> > +                       qedn_task->sgl_task_params.sgl,

> > +                       sgl_pa);

> > +     qedn_task->sgl_task_params.sgl = NULL;

> > +}

> > +

> > +static void qedn_destroy_single_task(struct qedn_task_ctx *qedn_task)

> > +{

> > +     u16 itid;

> > +

> > +     itid = qedn_task->itid;

> > +     list_del(&qedn_task->entry);

> > +     qedn_free_nvme_sg(qedn_task);

> > +     qedn_free_fw_sgl(qedn_task);

> > +     kfree(qedn_task);

> > +     qedn_task = NULL;

> > +}

> > +

> > +void qedn_destroy_free_tasks(struct qedn_fp_queue *fp_q,

> > +                          struct qedn_io_resources *io_resrc)

> > +{

> > +     struct qedn_task_ctx *qedn_task, *task_tmp;

> > +

> > +     /* Destroy tasks from the free task list */

> > +     list_for_each_entry_safe(qedn_task, task_tmp,

> > +                              &io_resrc->task_free_list, entry) {

> > +             qedn_destroy_single_task(qedn_task);

> > +             io_resrc->num_free_tasks -= 1;

> > +     }

> > +}

> > +

> > +static int qedn_alloc_nvme_sg(struct qedn_task_ctx *qedn_task)

> > +{

> > +     int rc;

> > +

> > +     qedn_task->nvme_sg = kcalloc(QEDN_MAX_SGES_PER_TASK,

> > +                                  sizeof(*qedn_task->nvme_sg), GFP_KERNEL);

> > +     if (!qedn_task->nvme_sg) {

> > +             rc = -ENOMEM;

> > +

> > +             return rc;

> > +     }

> > +

> > +     return 0;

> > +}

> > +

> > +static int qedn_alloc_fw_sgl(struct qedn_task_ctx *qedn_task)

> > +{

> > +     struct qedn_ctx *qedn = qedn_task->qedn_conn->qedn;

> > +     dma_addr_t fw_sgl_phys;

> > +

> > +     qedn_task->sgl_task_params.sgl =

> > +             dma_alloc_coherent(&qedn->pdev->dev, QEDN_MAX_FW_SGL_SIZE,

> > +                                &fw_sgl_phys, GFP_KERNEL);

> > +     if (!qedn_task->sgl_task_params.sgl) {

> > +             pr_err("Couldn't allocate FW sgl\n");

> > +

> > +             return -ENOMEM;

> > +     }

> > +

> > +     DMA_REGPAIR_LE(qedn_task->sgl_task_params.sgl_phys_addr, fw_sgl_phys);

> > +

> > +     return 0;

> > +}

> > +

> > +static inline void *qedn_get_fw_task(struct qed_nvmetcp_tid *info, u16 itid)

> > +{

> > +     return (void *)(info->blocks[itid / info->num_tids_per_block] +

> > +                     (itid % info->num_tids_per_block) * info->size);

> > +}

> > +

> > +static struct qedn_task_ctx *qedn_alloc_task(struct qedn_conn_ctx *conn_ctx, u16 itid)

> > +{

> > +     struct qedn_ctx *qedn = conn_ctx->qedn;

> > +     struct qedn_task_ctx *qedn_task;

> > +     void *fw_task_ctx;

> > +     int rc = 0;

> > +

> > +     qedn_task = kzalloc(sizeof(*qedn_task), GFP_KERNEL);

> > +     if (!qedn_task)

> > +             return NULL;

> > +

> > +     spin_lock_init(&qedn_task->lock);

> > +     fw_task_ctx = qedn_get_fw_task(&qedn->tasks, itid);

> > +     if (!fw_task_ctx) {

> > +             pr_err("iTID: 0x%x; Failed getting fw_task_ctx memory\n", itid);

> > +             goto release_task;

> > +     }

> > +

> > +     /* No need to memset fw_task_ctx - its done in the HSI func */

> > +     qedn_task->qedn_conn = conn_ctx;

> > +     qedn_task->qedn = qedn;

> > +     qedn_task->fw_task_ctx = fw_task_ctx;

> > +     qedn_task->valid = 0;

> > +     qedn_task->flags = 0;

> > +     qedn_task->itid = itid;

> > +     rc = qedn_alloc_fw_sgl(qedn_task);

> > +     if (rc) {

> > +             pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);

> > +             goto release_task;

> > +     }

> > +

> > +     rc = qedn_alloc_nvme_sg(qedn_task);

> > +     if (rc) {

> > +             pr_err("iTID: 0x%x; Failed allocating FW sgl\n", itid);

> > +             goto release_fw_sgl;

> > +     }

> > +

> > +     return qedn_task;

> > +

> > +release_fw_sgl:

> > +     qedn_free_fw_sgl(qedn_task);

> > +release_task:

> > +     kfree(qedn_task);

> > +

> > +     return NULL;

> > +}

> > +

> > +int qedn_alloc_tasks(struct qedn_conn_ctx *conn_ctx)

> > +{

> > +     struct qedn_ctx *qedn = conn_ctx->qedn;

> > +     struct qedn_task_ctx *qedn_task = NULL;

> > +     struct qedn_io_resources *io_resrc;

> > +     u16 itid, start_itid, offset;

> > +     struct qedn_fp_queue *fp_q;

> > +     int i, rc;

> > +

> > +     fp_q = conn_ctx->fp_q;

> > +

> > +     offset = fp_q->sb_id;

> > +     io_resrc = &fp_q->host_resrc;

> > +

> > +     start_itid = qedn->num_tasks_per_pool * offset;

> > +     for (i = 0; i < qedn->num_tasks_per_pool; ++i) {

> > +             itid = start_itid + i;

> > +             qedn_task = qedn_alloc_task(conn_ctx, itid);

> > +             if (!qedn_task) {

> > +                     pr_err("Failed allocating task\n");

> > +                     rc = -ENOMEM;

> > +                     goto release_tasks;

> > +             }

> > +

> > +             qedn_task->fp_q = fp_q;

> > +             io_resrc->num_free_tasks += 1;

> > +             list_add_tail(&qedn_task->entry, &io_resrc->task_free_list);

> > +     }

> > +

> > +     io_resrc->num_alloc_tasks = io_resrc->num_free_tasks;

> > +

> > +     return 0;

> > +

> > +release_tasks:

> > +     qedn_destroy_free_tasks(fp_q, io_resrc);

> > +

> > +     return rc;

> > +}

> > +

>

> Well ... this is less than optimal.

> In effect you are splitting the available hardware tasks between pools.

> And the way I see it you allocate one pool per connection.

> Is that correct?


No, we have a pool per CPU core (per fp_q) and not per connection.
All the connections assigned to this core will use the same pool.

>

> So what about the scaling here?

> How many hardware tasks do you have in total?


We can scale up to 64K pending tasks.

> And what happens if you add more and more connections?


In this series, if we don’t have a task resource in the per core pool we will
fail the IO. With the next version (not included in this series) we will queue
the IO until we will have a free task resource.

>

> Cheers,

>

> Hannes

> --

> Dr. Hannes Reinecke                     Kernel Storage Architect

> hare@suse.de                                   +49 911 74053 688

> SUSE Software Solutions Germany GmbH, 90409 Nürnberg

> GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

[RFC,v6,00/27] NVMeTCP Offload ULP and QEDN Device Driver

Message

Comments