mbox series

[v9,00/13] blkcg:Support to track FC storage blk io traffic

Message ID 1617750397-26466-1-git-send-email-muneendra.kumar@broadcom.com
Headers show
Series blkcg:Support to track FC storage blk io traffic | expand

Message

Muneendra Kumar April 6, 2021, 11:06 p.m. UTC
This Patch added a unique application identifier i.e
app_id  knob to  blkcg which allows identification of traffic
sources at an individual cgroup based Applications
(ex:virtual machine (VM))level in both host and
fabric infrastructure.

Added a new sysfs attribute appid_store to set the application identfier
in  the blkcg associted with cgroup id under
/sys/class/fc/fc_udev_device/*

With this new interface the user can set the application identfier
in  the blkcg associted with cgroup id.

This capability can be utilized by multiple block transport infrastructure
like fc,iscsi,roce.

Existing FC fabric will use this feature and the description of
the use case is below.

Various virtualization technologies used in Fibre Channel
SAN deployments have created the opportunity to identify
and associate traffic with specific virtualized applications.
The concepts behind the T11 Application Services standard is
to provide the general mechanisms needed to identify
virtualized services.
It enables the Fabric and the storage targets to
identify, monitor, and handle FC traffic
based on vm tags by inserting application specific identification
into the FC frame.

The patches were cut against  5.13/scsi-queue tree

v9:
Addressed the issues reported by kernel test robot
Replaced lpfc_get_vmid_from_hashtable with the
generic kernel based hashtable (include/linux/hashtable.h)
and made the changes in the code accordingly.
Addressed the locking issue and also merged few patches

v8:
Modified the structure member,log messages and function declarations
Added proper error codes and return values

v7:
Modified the Kconfig comments

v6:
Addressed the issues reported by kernel test robot
Modified the Kconfig files as per standard

v5:
Renamed the function cgroup_get_from_kernfs_id to
cgroup_get_from_id.

Moved the input validation at the beginning of the function in 
Renamed the arguments appropriatley.

Changed Return code to non-numeric/SymbolChanged Return code
to non-numeric/Symbol

Modified the comments.

v4:
Addressed the error reported by  kernel test robot

v3:
removed RFC.

Renamed the functions and app_id to more specific
Addressed the reference leaks in blkcg_set_app_identifier
Added a new config BLK_CGROUP_FC_APPID and made changes to 
select the same under SCSI_FC_ATTRS

V2:
renamed app_identifier to app_id.
removed the  sysfs interface blkio.app_identifie under
/sys/fs/cgroup/blkio
Ported the patch on top of 5.10/scsi-queue.
Removed redundant code due to changes since last submit.
Added a fix for issuing QFPA command.



Gaurav Srivastava (10):
  lpfc: vmid: Add the datastructure for supporting VMID in lpfc
  lpfc: vmid: VMID params initialization
  lpfc: vmid: Add support for vmid in mailbox command, does vmid
    resource allocation and vmid cleanup
  lpfc: vmid: Implements ELS commands for appid patch
  lpfc: vmid: Functions to manage vmids
  lpfc: vmid: Implements CT commands for appid.
  lpfc: vmid: Appends the vmid in the wqe before sending
  lpfc: vmid: Timeout implementation for vmid
  lpfc: vmid: Adding qfpa and vmid timeout check in worker thread
  lpfc: vmid: Introducing vmid in io path.

Muneendra (3):
  cgroup: Added cgroup_get_from_id
  blkcg: Added a app identifier support for blkcg
  nvme: Added a newsysfs attribute appid_store

 block/Kconfig                    |   9 +
 drivers/nvme/host/fc.c           |  73 +++++-
 drivers/scsi/Kconfig             |  13 ++
 drivers/scsi/lpfc/lpfc.h         | 122 +++++++++++
 drivers/scsi/lpfc/lpfc_attr.c    |  48 ++++
 drivers/scsi/lpfc/lpfc_crtn.h    |  11 +
 drivers/scsi/lpfc/lpfc_ct.c      | 257 +++++++++++++++++++++-
 drivers/scsi/lpfc/lpfc_disc.h    |   1 +
 drivers/scsi/lpfc/lpfc_els.c     | 366 ++++++++++++++++++++++++++++++-
 drivers/scsi/lpfc/lpfc_hbadisc.c | 147 +++++++++++++
 drivers/scsi/lpfc/lpfc_hw.h      | 124 ++++++++++-
 drivers/scsi/lpfc/lpfc_hw4.h     |  12 +
 drivers/scsi/lpfc/lpfc_init.c    | 104 +++++++++
 drivers/scsi/lpfc/lpfc_mbox.c    |   6 +
 drivers/scsi/lpfc/lpfc_scsi.c    | 316 ++++++++++++++++++++++++++
 drivers/scsi/lpfc/lpfc_sli.c     |  63 ++++++
 drivers/scsi/lpfc/lpfc_sli.h     |   8 +
 include/linux/blk-cgroup.h       |  56 +++++
 include/linux/cgroup.h           |   6 +
 kernel/cgroup/cgroup.c           |  26 +++
 20 files changed, 1758 insertions(+), 10 deletions(-)

Comments

Hannes Reinecke April 8, 2021, 8:26 a.m. UTC | #1
On 4/7/21 1:06 AM, Muneendra wrote:
> Added a new function cgroup_get_from_id  to retrieve the cgroup

> associated with cgroup id.

> Exported the same as this can be used by blk-cgorup.c

> 

> Added function declaration of cgroup_get_from_id in cgorup.h

> 

> This patch also exported the function cgroup_get_e_css

> as this is getting used in blk-cgroup.h

> 

> Reported-by: kernel test robot <lkp@intel.com>

> Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

> 

> ---

> v9:

> Addressed the issues reported by kernel test robot

> 

> v8:

> No change

> 

> v7:

> No change

> 

> v6:

> No change

> 

> v5:

> renamed the function cgroup_get_from_kernfs_id to

> cgroup_get_from_id

> 

> v4:

> No change

> 

> v3:

> Exported the cgroup_get_e_css

> 

> v2:

> New patch

> ---

>   include/linux/cgroup.h |  6 ++++++

>   kernel/cgroup/cgroup.c | 26 ++++++++++++++++++++++++++

>   2 files changed, 32 insertions(+)

> Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke April 8, 2021, 8:29 a.m. UTC | #2
On 4/7/21 1:06 AM, Muneendra wrote:
> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> 

> This patch initializes the VMID parameters like the type of vmid, max

> number of vmids supported and timeout value for the vmid registration

> based on the user input.

> 

> Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v9:

> updated comments

> 

> v8:

> No change

> 

> v7:

> No change

> 

> v6:

> No change

> 

> v5:

> No change

> 

> v4:

> No change

> 

> v3:

> No change

> 

> v2:

> Ported the patch on top of 5.10/scsi-queue

> ---

>   drivers/scsi/lpfc/lpfc_attr.c | 48 +++++++++++++++++++++++++++++++++++

>   1 file changed, 48 insertions(+)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke April 8, 2021, 8:34 a.m. UTC | #3
On 4/7/21 1:06 AM, Muneendra wrote:
> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> 

> This patch implements ELS command like QFPA and UVEM for the priority

> tagging appid support. Other supporting functions are also part of this

> patch.

> 

> Signed-off-by: Gaurav Srivastava  <gaurav.srivastava@broadcom.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v9:

> Added a lock while accessing a flag

> 

> v8:

> Added log messages modifications, memory allocation API changes,

> return error codes

> 

> v7:

> No change

> 

> v6:

> Added Forward declarations, static functions and

> removed unused variables

> 

> v5:

> Changed Return code to non-numeric/Symbol.

> Addressed the review comments by Hannes

> 

> v4:

> No change

> 

> v3:

> No change

> 

> v2:

> Ported the patch on top of 5.10/scsi-queue

> ---

>   drivers/scsi/lpfc/lpfc_els.c | 366 ++++++++++++++++++++++++++++++++++-

>   1 file changed, 362 insertions(+), 4 deletions(-)

> 

> diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c

> index a04546eca18f..22a87559f62d 100644

> --- a/drivers/scsi/lpfc/lpfc_els.c

> +++ b/drivers/scsi/lpfc/lpfc_els.c

> @@ -25,6 +25,7 @@

>   #include <linux/pci.h>

>   #include <linux/slab.h>

>   #include <linux/interrupt.h>

> +#include <linux/delay.h>

>   

>   #include <scsi/scsi.h>

>   #include <scsi/scsi_device.h>

> @@ -55,9 +56,15 @@ static int lpfc_issue_els_fdisc(struct lpfc_vport *vport,

>   				struct lpfc_nodelist *ndlp, uint8_t retry);

>   static int lpfc_issue_fabric_iocb(struct lpfc_hba *phba,

>   				  struct lpfc_iocbq *iocb);

> +static void lpfc_cmpl_els_uvem(struct lpfc_hba *, struct lpfc_iocbq *,

> +			       struct lpfc_iocbq *);

>   

>   static int lpfc_max_els_tries = 3;

>   

> +static void lpfc_init_cs_ctl_bitmap(struct lpfc_vport *vport);

> +static void lpfc_vmid_set_cs_ctl_range(struct lpfc_vport *vport, u32 min, u32 max);

> +static void lpfc_vmid_put_cs_ctl(struct lpfc_vport *vport, u32 ctcl_vmid);

> +

>   /**

>    * lpfc_els_chk_latt - Check host link attention event for a vport

>    * @vport: pointer to a host virtual N_Port data structure.

> @@ -314,10 +321,10 @@ lpfc_prep_els_iocb(struct lpfc_vport *vport, uint8_t expectRsp,

>   		lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,

>   				 "0116 Xmit ELS command x%x to remote "

>   				 "NPORT x%x I/O tag: x%x, port state:x%x "

> -				 "rpi x%x fc_flag:x%x\n",

> +				 "rpi x%x fc_flag:x%x nlp_flag:x%x vport:x%p\n",

>   				 elscmd, did, elsiocb->iotag,

>   				 vport->port_state, ndlp->nlp_rpi,

> -				 vport->fc_flag);

> +				 vport->fc_flag, ndlp->nlp_flag, vport);

>   	} else {

>   		/* Xmit ELS response <elsCmd> to remote NPORT <did> */

>   		lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,

> @@ -1112,11 +1119,15 @@ lpfc_cmpl_els_flogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,

>   	/* FLOGI completes successfully */

>   	lpfc_printf_vlog(vport, KERN_INFO, LOG_ELS,

>   			 "0101 FLOGI completes successfully, I/O tag:x%x, "

> -			 "xri x%x Data: x%x x%x x%x x%x x%x %x\n",

> +			 "xri x%x Data: x%x x%x x%x x%x x%x x%x x%x\n",

>   			 cmdiocb->iotag, cmdiocb->sli4_xritag,

>   			 irsp->un.ulpWord[4], sp->cmn.e_d_tov,

>   			 sp->cmn.w2.r_a_tov, sp->cmn.edtovResolution,

> -			 vport->port_state, vport->fc_flag);

> +			 vport->port_state, vport->fc_flag,

> +			 sp->cmn.priority_tagging);

> +

> +	if (sp->cmn.priority_tagging)

> +		vport->vmid_flag |= LPFC_VMID_ISSUE_QFPA;

>   

>   	if (vport->port_state == LPFC_FLOGI) {

>   		/*

> @@ -1299,6 +1310,18 @@ lpfc_issue_els_flogi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,

>   	if (sp->cmn.fcphHigh < FC_PH3)

>   		sp->cmn.fcphHigh = FC_PH3;

>   

> +	/* to deterine if switch supports priority tagging */


determine (sp) ...

> +	if (phba->cfg_vmid_priority_tagging) {

> +		sp->cmn.priority_tagging = 1;

> +		/* lpfc_vmid_host_uuid is combination of wwpn and wwnn */

> +		if (uuid_is_null((uuid_t *)vport->lpfc_vmid_host_uuid)) {

> +			memcpy(vport->lpfc_vmid_host_uuid, phba->wwpn,

> +			       sizeof(phba->wwpn));

> +			memcpy(&vport->lpfc_vmid_host_uuid[8], phba->wwnn,

> +			       sizeof(phba->wwnn));

> +		}

> +	}

> +

>   	if  (phba->sli_rev == LPFC_SLI_REV4) {

>   		if (bf_get(lpfc_sli_intf_if_type, &phba->sli4_hba.sli_intf) ==

>   		    LPFC_SLI_INTF_IF_TYPE_0) {

> @@ -1907,6 +1930,7 @@ lpfc_cmpl_els_plogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,

>   	struct lpfc_nodelist *ndlp, *free_ndlp;

>   	struct lpfc_dmabuf *prsp;

>   	int disc;

> +	struct serv_parm *sp = NULL;

>   

>   	/* we pass cmdiocb to state machine which needs rspiocb as well */

>   	cmdiocb->context_un.rsp_iocb = rspiocb;

> @@ -1997,6 +2021,23 @@ lpfc_cmpl_els_plogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,

>   				   cmdiocb->context2)->list.next,

>   				  struct lpfc_dmabuf, list);

>   		ndlp = lpfc_plogi_confirm_nport(phba, prsp->virt, ndlp);

> +

> +		sp = (struct serv_parm *)((u8 *)prsp->virt +

> +					  sizeof(u32));

> +

> +		ndlp->vmid_support = 0;

> +		if ((phba->cfg_vmid_app_header && sp->cmn.app_hdr_support) ||

> +		    (phba->cfg_vmid_priority_tagging &&

> +		     sp->cmn.priority_tagging)) {

> +			lpfc_printf_log(phba, KERN_DEBUG, LOG_ELS,

> +					"4018 app_hdr_support %d tagging %d DID x%x\n",

> +					sp->cmn.app_hdr_support,

> +					sp->cmn.priority_tagging,

> +					ndlp->nlp_DID);

> +			/* if the dest port supports VMID, mark it in ndlp */

> +			ndlp->vmid_support = 1;

> +		}

> +

>   		lpfc_disc_state_machine(vport, ndlp, cmdiocb,

>   					NLP_EVT_CMPL_PLOGI);

>   	}

> @@ -2119,6 +2160,14 @@ lpfc_issue_els_plogi(struct lpfc_vport *vport, uint32_t did, uint8_t retry)

>   	memset(sp->un.vendorVersion, 0, sizeof(sp->un.vendorVersion));

>   	sp->cmn.bbRcvSizeMsb &= 0xF;

>   

> +	/* check if the destination port supports VMID */

> +	ndlp->vmid_support = 0;

> +	if (vport->vmid_priority_tagging)

> +		sp->cmn.priority_tagging = 1;

> +	else if (phba->cfg_vmid_app_header &&

> +		 bf_get(lpfc_ftr_ashdr, &phba->sli4_hba.sli4_flags))

> +		sp->cmn.app_hdr_support = 1;

> +

>   	lpfc_debugfs_disc_trc(vport, LPFC_DISC_TRC_ELS_CMD,

>   		"Issue PLOGI:     did:x%x",

>   		did, 0, 0);

> @@ -10260,3 +10309,312 @@ lpfc_sli_abts_recover_port(struct lpfc_vport *vport,

>   	lpfc_unreg_rpi(vport, ndlp);

>   }

>   

> +static void lpfc_init_cs_ctl_bitmap(struct lpfc_vport *vport)

> +{

> +	bitmap_zero(vport->vmid_priority_range, LPFC_VMID_MAX_PRIORITY_RANGE);

> +}

> +

> +static void

> +lpfc_vmid_set_cs_ctl_range(struct lpfc_vport *vport, u32 min, u32 max)

> +{

> +	u32 i;

> +

> +	if ((min > max) || (max > LPFC_VMID_MAX_PRIORITY_RANGE))

> +		return;

> +

> +	for (i = min; i <= max; i++)

> +		set_bit(i, vport->vmid_priority_range);

> +}

> +

> +static void lpfc_vmid_put_cs_ctl(struct lpfc_vport *vport, u32 ctcl_vmid)

> +{

> +	set_bit(ctcl_vmid, vport->vmid_priority_range);

> +}

> +

> +u32 lpfc_vmid_get_cs_ctl(struct lpfc_vport *vport)

> +{

> +	u32 i;

> +

> +	i = find_first_bit(vport->vmid_priority_range,

> +			   LPFC_VMID_MAX_PRIORITY_RANGE);

> +

> +	if (i == LPFC_VMID_MAX_PRIORITY_RANGE)

> +		return 0;

> +

> +	clear_bit(i, vport->vmid_priority_range);

> +	return i;

> +}

> +

> +#define MAX_PRIORITY_DESC	255

> +

> +static void

> +lpfc_cmpl_els_qfpa(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,

> +		   struct lpfc_iocbq *rspiocb)

> +{

> +	struct lpfc_vport *vport = cmdiocb->vport;

> +	struct priority_range_desc *desc;

> +	struct lpfc_dmabuf *prsp = NULL;

> +	struct lpfc_vmid_priority_range *vmid_range = NULL;

> +	u32 *data;

> +	struct lpfc_dmabuf *dmabuf = cmdiocb->context2;

> +	IOCB_t *irsp = &rspiocb->iocb;

> +	u8 *pcmd, max_desc;

> +	u32 len, i;

> +	struct lpfc_nodelist *ndlp = (struct lpfc_nodelist *)cmdiocb->context1;

> +

> +	prsp = list_get_first(&dmabuf->list, struct lpfc_dmabuf, list);

> +	if (!prsp)

> +		goto out;

> +

> +	pcmd = prsp->virt;

> +	data = (u32 *)pcmd;

> +	if (data[0] == ELS_CMD_LS_RJT) {

> +		lpfc_printf_vlog(vport, KERN_WARNING, LOG_SLI,

> +				 "3277 QFPA LS_RJT x%x  x%x\n",

> +				 data[0], data[1]);

> +		goto out;

> +	}

> +	if (irsp->ulpStatus) {

> +		lpfc_printf_vlog(vport, KERN_ERR, LOG_SLI,

> +				 "6529 QFPA failed with status x%x  x%x\n",

> +				 irsp->ulpStatus, irsp->un.ulpWord[4]);

> +		goto out;

> +	}

> +

> +	if (!vport->qfpa_res) {

> +		max_desc = FCELSSIZE / sizeof(*vport->qfpa_res);

> +		vport->qfpa_res = kcalloc(max_desc, sizeof(*vport->qfpa_res),

> +					  GFP_KERNEL);

> +		if (!vport->qfpa_res)

> +			goto out;

> +	}

> +

> +	len = *((u32 *)(pcmd + 4));

> +	len = be32_to_cpu(len);

> +	memcpy(vport->qfpa_res, pcmd, len + 8);

> +	len = len / LPFC_PRIORITY_RANGE_DESC_SIZE;

> +

> +	desc = (struct priority_range_desc *)(pcmd + 8);

> +	vmid_range = vport->vmid_priority.vmid_range;

> +	if (!vmid_range) {

> +		vmid_range = kcalloc(MAX_PRIORITY_DESC, sizeof(*vmid_range),

> +				     GFP_KERNEL);

> +		if (!vmid_range) {

> +			kfree(vport->qfpa_res);

> +			goto out;

> +		}

> +		vport->vmid_priority.vmid_range = vmid_range;

> +	}

> +	vport->vmid_priority.num_descriptors = len;

> +

> +	for (i = 0; i < len; i++, vmid_range++, desc++) {

> +		lpfc_printf_vlog(vport, KERN_DEBUG, LOG_ELS,

> +				 "6539 vmid values low=%d, high=%d, qos=%d, "

> +				 "local ve id=%d\n", desc->lo_range,

> +				 desc->hi_range, desc->qos_priority,

> +				 desc->local_ve_id);

> +

> +		vmid_range->low = desc->lo_range << 1;

> +		if (desc->local_ve_id == QFPA_ODD_ONLY)

> +			vmid_range->low++;

> +		if (desc->qos_priority)

> +			vport->vmid_flag |= LPFC_VMID_QOS_ENABLED;

> +		vmid_range->qos = desc->qos_priority;

> +

> +		vmid_range->high = desc->hi_range << 1;

> +		if ((desc->local_ve_id == QFPA_ODD_ONLY) ||

> +		    (desc->local_ve_id == QFPA_EVEN_ODD))

> +			vmid_range->high++;

> +	}

> +	lpfc_init_cs_ctl_bitmap(vport);

> +	for (i = 0; i < vport->vmid_priority.num_descriptors; i++) {

> +		lpfc_vmid_set_cs_ctl_range(vport,

> +				vport->vmid_priority.vmid_range[i].low,

> +				vport->vmid_priority.vmid_range[i].high);

> +	}

> +

> +	vport->vmid_flag |= LPFC_VMID_QFPA_CMPL;

> + out:

> +	lpfc_els_free_iocb(phba, cmdiocb);

> +	lpfc_nlp_put(ndlp);

> +}

> +

> +int lpfc_issue_els_qfpa(struct lpfc_vport *vport)

> +{

> +	struct lpfc_hba *phba = vport->phba;

> +	struct lpfc_nodelist *ndlp;

> +	struct lpfc_iocbq *elsiocb;

> +	u8 *pcmd;

> +	int ret;

> +

> +	ndlp = lpfc_findnode_did(phba->pport, Fabric_DID);

> +	if (!ndlp || ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)

> +		return -ENXIO;

> +

> +	elsiocb = lpfc_prep_els_iocb(vport, 1, LPFC_QFPA_SIZE, 2, ndlp,

> +				     ndlp->nlp_DID, ELS_CMD_QFPA);

> +	if (!elsiocb)

> +		return -ENOMEM;

> +

> +	pcmd = (u8 *)(((struct lpfc_dmabuf *)elsiocb->context2)->virt);

> +

> +	*((u32 *)(pcmd)) = ELS_CMD_QFPA;

> +	pcmd += 4;

> +

> +	elsiocb->iocb_cmpl = lpfc_cmpl_els_qfpa;

> +

> +	elsiocb->context1 = lpfc_nlp_get(ndlp);

> +	if (!elsiocb->context1) {

> +		lpfc_els_free_iocb(vport->phba, elsiocb);

> +		return -ENXIO;

> +	}

> +

> +	ret = lpfc_sli_issue_iocb(phba, LPFC_ELS_RING, elsiocb, 2);

> +	if (ret != IOCB_SUCCESS) {

> +		lpfc_els_free_iocb(phba, elsiocb);

> +		lpfc_nlp_put(ndlp);

> +		return -EIO;

> +	}

> +	vport->vmid_flag &= ~LPFC_VMID_QOS_ENABLED;

> +	return 0;

> +}

> +

> +int

> +lpfc_vmid_uvem(struct lpfc_vport *vport,

> +	       struct lpfc_vmid *vmid, bool instantiated)

> +{

> +	struct lpfc_vem_id_desc *vem_id_desc;

> +	struct lpfc_nodelist *ndlp;

> +	struct lpfc_iocbq *elsiocb;

> +	struct instantiated_ve_desc *inst_desc;

> +	struct lpfc_vmid_context *vmid_context;

> +	u8 *pcmd;

> +	u32 *len;

> +	int ret = 0;

> +

> +	ndlp = lpfc_findnode_did(vport, Fabric_DID);

> +	if (!ndlp || ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)

> +		return -ENXIO;

> +

> +	vmid_context = kmalloc(sizeof(*vmid_context), GFP_KERNEL);

> +	if (!vmid_context)

> +		return -ENOMEM;

> +	elsiocb = lpfc_prep_els_iocb(vport, 1, LPFC_UVEM_SIZE, 2,

> +				     ndlp, Fabric_DID, ELS_CMD_UVEM);

> +	if (!elsiocb)

> +		goto out;

> +

> +	lpfc_printf_vlog(vport, KERN_DEBUG, LOG_ELS,

> +			 "3427 Host vmid %s %d\n",

> +			 vmid->host_vmid, instantiated);

> +	vmid_context->vmp = vmid;

> +	vmid_context->nlp = ndlp;

> +	vmid_context->instantiated = instantiated;

> +	elsiocb->vmid_tag.vmid_context = vmid_context;

> +	pcmd = (u8 *)(((struct lpfc_dmabuf *)elsiocb->context2)->virt);

> +

> +	if (uuid_is_null((uuid_t *)vport->lpfc_vmid_host_uuid))

> +		memcpy(vport->lpfc_vmid_host_uuid, vmid->host_vmid,

> +		       LPFC_COMPRESS_VMID_SIZE);

> +

> +	*((u32 *)(pcmd)) = ELS_CMD_UVEM;

> +	len = (u32 *)(pcmd + 4);

> +	*len = cpu_to_be32(LPFC_UVEM_SIZE - 8);

> +

> +	vem_id_desc = (struct lpfc_vem_id_desc *)(pcmd + 8);

> +	vem_id_desc->tag = be32_to_cpu(VEM_ID_DESC_TAG);

> +	vem_id_desc->length = be32_to_cpu(LPFC_UVEM_VEM_ID_DESC_SIZE);

> +	memcpy(vem_id_desc->vem_id, vport->lpfc_vmid_host_uuid,

> +	       LPFC_COMPRESS_VMID_SIZE);

> +

> +	inst_desc = (struct instantiated_ve_desc *)(pcmd + 32);

> +	inst_desc->tag = be32_to_cpu(INSTANTIATED_VE_DESC_TAG);

> +	inst_desc->length = be32_to_cpu(LPFC_UVEM_VE_MAP_DESC_SIZE);

> +	memcpy(inst_desc->global_vem_id, vmid->host_vmid,

> +	       LPFC_COMPRESS_VMID_SIZE);

> +

> +	bf_set(lpfc_instantiated_nport_id, inst_desc, vport->fc_myDID);

> +	bf_set(lpfc_instantiated_local_id, inst_desc,

> +	       vmid->un.cs_ctl_vmid);

> +	if (instantiated) {

> +		inst_desc->tag = be32_to_cpu(INSTANTIATED_VE_DESC_TAG);

> +	} else {

> +		inst_desc->tag = be32_to_cpu(DEINSTANTIATED_VE_DESC_TAG);

> +		lpfc_vmid_put_cs_ctl(vport, vmid->un.cs_ctl_vmid);

> +	}

> +	inst_desc->word6 = cpu_to_be32(inst_desc->word6);

> +

> +	elsiocb->iocb_cmpl = lpfc_cmpl_els_uvem;

> +

> +	elsiocb->context1 = lpfc_nlp_get(ndlp);

> +	if (!elsiocb->context1) {

> +		lpfc_els_free_iocb(vport->phba, elsiocb);

> +		goto out;

> +	}

> +

> +	ret = lpfc_sli_issue_iocb(vport->phba, LPFC_ELS_RING, elsiocb, 0);

> +	if (ret != IOCB_SUCCESS) {

> +		lpfc_els_free_iocb(vport->phba, elsiocb);

> +		lpfc_nlp_put(ndlp);

> +		goto out;

> +	}

> +

> +	return 0;

> + out:

> +	kfree(vmid_context);

> +	return -EIO;

> +}

> +

> +static void

> +lpfc_cmpl_els_uvem(struct lpfc_hba *phba, struct lpfc_iocbq *icmdiocb,

> +		   struct lpfc_iocbq *rspiocb)

> +{

> +	struct lpfc_vport *vport = icmdiocb->vport;

> +	struct lpfc_dmabuf *prsp = NULL;

> +	struct lpfc_vmid_context *vmid_context =

> +	    icmdiocb->vmid_tag.vmid_context;

> +	struct lpfc_nodelist *ndlp = icmdiocb->context1;

> +	u8 *pcmd;

> +	u32 *data;

> +	IOCB_t *irsp = &rspiocb->iocb;

> +	struct lpfc_dmabuf *dmabuf = icmdiocb->context2;

> +	struct lpfc_vmid *vmid;

> +

> +	vmid = vmid_context->vmp;

> +	if (!ndlp || ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)

> +		ndlp = NULL;

> +

> +	prsp = list_get_first(&dmabuf->list, struct lpfc_dmabuf, list);

> +	if (!prsp)

> +		goto out;

> +	pcmd = prsp->virt;

> +	data = (u32 *)pcmd;

> +	if (data[0] == ELS_CMD_LS_RJT) {

> +		lpfc_printf_vlog(vport, KERN_WARNING, LOG_SLI,

> +				 "4532 UVEM LS_RJT %x %x\n", data[0], data[1]);

> +		goto out;

> +	}

> +	if (irsp->ulpStatus) {

> +		lpfc_printf_vlog(vport, KERN_WARNING, LOG_SLI,

> +				 "4533 UVEM error status %x: %x\n",

> +				 irsp->ulpStatus, irsp->un.ulpWord[4]);

> +		goto out;

> +	}

> +	spin_lock(&phba->hbalock);

> +	/* Set IN USE flag */

> +	vport->vmid_flag |= LPFC_VMID_IN_USE;

> +	phba->pport->vmid_flag |= LPFC_VMID_IN_USE;

> +	spin_unlock(&phba->hbalock);

> +

> +	if (vmid_context->instantiated) {

> +		write_lock(&vport->vmid_lock);

> +		vmid->flag |= LPFC_VMID_REGISTERED;

> +		vmid->flag &= ~LPFC_VMID_REQ_REGISTER;

> +		write_unlock(&vport->vmid_lock);

> +	}

> +

> + out:

> +	kfree(vmid_context);

> +	lpfc_els_free_iocb(phba, icmdiocb);

> +	lpfc_nlp_put(ndlp);

> +}

> 

Other than that:

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke April 8, 2021, 8:37 a.m. UTC | #4
On 4/7/21 1:06 AM, Muneendra wrote:
> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> 

> The patch implements CT commands for registering and deregistering the

> appid for the application. Also, a small change in decrementing the ndlp

> ref counter has been added.

> 

> Signed-off-by: Gaurav Srivastava  <gaurav.srivastava@broadcom.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v9:

> Added Changes related to locking and new hashtable implementation

> 

> v8:

> modified log messages and fixed the refcount

> 

> v7:

> No Change

> 

> v6:

> Added Forward declarations and functions to static

> 

> v5:

> No change

> 

> v4:

> No change

> 

> v3:

> No change

> 

> v2:

> Ported the patch on top of 5.10/scsi-queue

> Removed redundant code due to changes since last submit

> ---

>   drivers/scsi/lpfc/lpfc_ct.c | 257 +++++++++++++++++++++++++++++++++++-

>   1 file changed, 256 insertions(+), 1 deletion(-)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke April 8, 2021, 8:38 a.m. UTC | #5
On 4/7/21 1:06 AM, Muneendra wrote:
> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> 

> This patch implements the timeout functionality for the vmid. After the

> set time period of inactivity, the vmid is deregistered from the switch.

> 

> Signed-off-by: Gaurav Srivastava  <gaurav.srivastava@broadcom.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v9:

> Modified code for use of hashtable

> 

> v8:

> Fixed the uninitialized variable

> 

> v7:

> No change

> 

> v6:

> Added Forward declarations

> 

> v5:

> No change

> 

> v4:

> No change

> 

> v3:

> No change

> 

> v2:

> Ported the patch on top of 5.10/scsi-queue

> ---

>   drivers/scsi/lpfc/lpfc_hbadisc.c | 104 +++++++++++++++++++++++++++++++

>   drivers/scsi/lpfc/lpfc_init.c    |  40 ++++++++++++

>   2 files changed, 144 insertions(+)

> 

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
Hannes Reinecke April 8, 2021, 8:46 a.m. UTC | #6
On 4/7/21 1:06 AM, Muneendra wrote:
> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> 

> The patch introduces the vmid in the io path. It checks if the vmid is

> enabled and if io belongs to a vm or not and acts accordingly. Other

> supporing APIs are also included in the patch.


supporting (sp)

> 

> Signed-off-by: Gaurav Srivastava  <gaurav.srivastava@broadcom.com>

> Signed-off-by: James Smart <jsmart2021@gmail.com>

> 

> ---

> v9:

> Added changes related to locking and new hashtable implementation

> 

> v8:

> Added proper error codes

> updated logic while handling vmid

> 

> v7:

> No change

> 

> v6:

> No change

> 

> v5:

> No change

> 

> v4:

> No change

> 

> v3:

> Replaced blkcg_get_app_identifier with blkcg_get_fc_appid

> 

> v2:

> Ported the patch on top of 5.10/scsi-queue

> Added a fix for issuing QFPA command which was not included in the

> last submit

> ---

>   drivers/scsi/lpfc/lpfc_scsi.c | 169 ++++++++++++++++++++++++++++++++++

>   1 file changed, 169 insertions(+)

> 

> diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c

> index f81178464436..3267c5858238 100644

> --- a/drivers/scsi/lpfc/lpfc_scsi.c

> +++ b/drivers/scsi/lpfc/lpfc_scsi.c

> @@ -5263,6 +5263,155 @@ static void lpfc_vmid_assign_cs_ctl(struct lpfc_vport *vport,

>   	}

>   }

>   

> +/*

> + * lpfc_vmid_get_appid- get the vmid associated with the uuid

> + * @vport: The virtual port for which this call is being executed.

> + * @uuid: uuid associated with the VE

> + * @cmd: address of scsi cmmd descriptor

> + * @tag: VMID tag

> + * Returns status of the function

> + */

> +static int lpfc_vmid_get_appid(struct lpfc_vport *vport, char *uuid, struct

> +			       scsi_cmnd * cmd, union lpfc_vmid_io_tag *tag)

> +{

> +	struct lpfc_vmid *vmp = NULL;

> +	int hash, len, rc, i;

> +	u8 pending = 0;

> +

> +	/* check if QFPA is complete */

> +	if (lpfc_vmid_is_type_priority_tag(vport) && !(vport->vmid_flag &

> +	      LPFC_VMID_QFPA_CMPL)) {

> +		vport->work_port_events |= WORKER_CHECK_VMID_ISSUE_QFPA;

> +		return -EAGAIN;

> +	}

> +

> +	/* search if the uuid has already been mapped to the vmid */

> +	len = strlen(uuid);

> +	hash = lpfc_vmid_hash_fn(uuid, len);

> +

> +	/* search for the VMID in the table */

> +	read_lock(&vport->vmid_lock);

> +	vmp = lpfc_get_vmid_from_hastable(vport, hash, uuid);

> +

> +	/* if found, check if its already registered  */

> +	if (vmp  && vmp->flag & LPFC_VMID_REGISTERED) {

> +		read_unlock(&vport->vmid_lock);

> +		lpfc_vmid_update_entry(vport, cmd, vmp, tag);

> +		rc = 0;

> +	} else if (vmp && (vmp->flag & LPFC_VMID_REQ_REGISTER ||

> +			   vmp->flag & LPFC_VMID_DE_REGISTER)) {

> +		/* else if register or dereg request has already been sent */

> +		/* Hence vmid tag will not be added for this IO */

> +		read_unlock(&vport->vmid_lock);

> +		rc = -EBUSY;

> +	} else {

> +		/* The vmid was not found in the hashtable. At this point, */

> +		/* drop the read lock first before proceeding further */

> +		read_unlock(&vport->vmid_lock);

> +		/* start the process to obtain one as per the */

> +		/* type of the vmid indicated */

> +		write_lock(&vport->vmid_lock);

> +		vmp = lpfc_get_vmid_from_hastable(vport, hash, uuid);

> +

> +		/* while the read lock was released, in case the entry was */

> +		/* added by other context or is in process of being added */

> +		if (vmp && vmp->flag & LPFC_VMID_REGISTERED) {

> +			lpfc_vmid_update_entry(vport, cmd, vmp, tag);

> +			write_unlock(&vport->vmid_lock);

> +			return 0;

> +		} else if (vmp && vmp->flag & LPFC_VMID_REQ_REGISTER) {

> +			write_unlock(&vport->vmid_lock);

> +			return -EBUSY;

> +		}

> +

> +		/* else search and allocate a free slot in the hash table */

> +		if (vport->cur_vmid_cnt < vport->max_vmid) {

> +			for (i = 0; i < vport->max_vmid; ++i) {

> +				vmp = vport->vmid + i;

> +				if (vmp->flag == LPFC_VMID_SLOT_FREE) {

> +					vmp = vport->vmid + i;

> +					break;

> +				}

> +			}

> +		} else {

> +			write_unlock(&vport->vmid_lock);

> +			return -ENOMEM;

> +		}

> +

> +		if (vmp && (vmp->flag == LPFC_VMID_SLOT_FREE)) {

> +			/* Add the vmid and register  */

> +			lpfc_put_vmid_in_hashtable(vport, hash, vmp);

> +			vmp->vmid_len = len;

> +			memcpy(vmp->host_vmid, uuid, vmp->vmid_len);

> +			vmp->io_rd_cnt = 0;

> +			vmp->io_wr_cnt = 0;

> +			vmp->flag = LPFC_VMID_SLOT_USED;

> +

> +			vmp->delete_inactive =

> +			    vport->vmid_inactivity_timeout ? 1 : 0;

> +

> +			/* if type priority tag, get next available vmid */

> +			if (lpfc_vmid_is_type_priority_tag(vport))

> +				lpfc_vmid_assign_cs_ctl(vport, vmp);

> +

> +			/* allocate the per cpu variable for holding */

> +			/* the last access time stamp only if vmid is enabled */

> +			if (!vmp->last_io_time)

> +				vmp->last_io_time =

> +				    __alloc_percpu(sizeof(u64),

> +						   __alignof__(struct

> +							       lpfc_vmid));

> +

> +			/* registration pending */

> +			pending = 1;

> +		} else {

> +			rc = -ENOMEM;

> +		}

> +		write_unlock(&vport->vmid_lock);

> +

> +		/* complete transaction with switch */

> +		if (pending) {

> +			if (lpfc_vmid_is_type_priority_tag(vport))

> +				rc = lpfc_vmid_uvem(vport, vmp, true);

> +			else

> +				rc = lpfc_vmid_cmd(vport,

> +						   SLI_CTAS_RAPP_IDENT,

> +						   vmp);

> +			if (!rc) {

> +				write_lock(&vport->vmid_lock);

> +				vport->cur_vmid_cnt++;

> +				vmp->flag |= LPFC_VMID_REQ_REGISTER;

> +				write_unlock(&vport->vmid_lock);

> +			}

> +		}

> +

> +		/* finally, enable the idle timer once */

> +		if (!(vport->phba->pport->vmid_flag & LPFC_VMID_TIMER_ENBLD)) {

> +			mod_timer(&vport->phba->inactive_vmid_poll,

> +				  jiffies +

> +				  msecs_to_jiffies(1000 * LPFC_VMID_TIMER));

> +			vport->phba->pport->vmid_flag |= LPFC_VMID_TIMER_ENBLD;

> +		}

> +	}

> +	return rc;

> +}

> +

> +/*

> + * lpfc_is_command_vm_io - get the uuid from blk cgroup

> + * @cmd:Pointer to scsi_cmnd data structure

> + * Returns uuid if present if not null

> + */

> +static char *lpfc_is_command_vm_io(struct scsi_cmnd *cmd)

> +{

> +	char *uuid = NULL;

> +

> +	if (cmd->request) {

> +		if (cmd->request->bio)

> +			uuid = blkcg_get_fc_appid(cmd->request->bio);

> +	}

> +	return uuid;

> +}

> +

>   /**

>    * lpfc_queuecommand - scsi_host_template queuecommand entry point

>    * @shost: kernel scsi host pointer.

> @@ -5288,6 +5437,7 @@ lpfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)

>   	int err, idx;

>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>   	uint64_t start = 0L;

> +	u8 *uuid = NULL;

>   

>   	if (phba->ktime_on)

>   		start = ktime_get_ns();

> @@ -5415,6 +5565,25 @@ lpfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)

>   	}

>   

>   

> +	/* check the necessary and sufficient condition to support VMID */

> +	if (lpfc_is_vmid_enabled(phba) &&

> +	    (ndlp->vmid_support ||

> +	     phba->pport->vmid_priority_tagging ==

> +	     LPFC_VMID_PRIO_TAG_ALL_TARGETS)) {

> +		/* is the IO generated by a VM, get the associated virtual */

> +		/* entity id */

> +		uuid = lpfc_is_command_vm_io(cmnd);

> +

> +		if (uuid) {

> +			err = lpfc_vmid_get_appid(vport, uuid, cmnd,

> +				(union lpfc_vmid_io_tag *)

> +					&lpfc_cmd->cur_iocbq.vmid_tag);

> +			if (!err)

> +				lpfc_cmd->cur_iocbq.iocb_flag |= LPFC_IO_VMID;

> +		}

> +	}

> +

> +	atomic_inc(&ndlp->cmd_pending);

>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>   	if (unlikely(phba->hdwqstat_on & LPFC_CHECK_SCSI_IO))

>   		this_cpu_inc(phba->sli4_hba.c_stat->xmt_io);

> 

And that's the bit which I don't particular like.

Essentially we'll have to inject additional ELS commands _on each I/O_ 
to get a valid VMID.
Where there are _so_ many things which might get wrong, causing an I/O 
stall.

I would have vastly preferred if we could _avoid_ having to do 
additional ELS commands for VMID registration in the I/O path
(ie only allow for I/O with a valid VMID), and reject the I/O otherwise 
until VMID registration is complete.

IE return 'BUSY' (or even a command retry?) when no valid VMID for this 
particular I/O is found, register the VMID (preferably in another 
thread), and restart the queue once the VMID is registered.

That way we have a clear separation, and the I/O path will always work 
with valid VMIDs.

Hmm?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
James Smart April 10, 2021, 3 p.m. UTC | #7
On 4/8/2021 1:46 AM, Hannes Reinecke wrote:
> On 4/7/21 1:06 AM, Muneendra wrote:

>> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

...
>> +        /* while the read lock was released, in case the entry was */

>> +        /* added by other context or is in process of being added */

>> +        if (vmp && vmp->flag & LPFC_VMID_REGISTERED) {

>> +            lpfc_vmid_update_entry(vport, cmd, vmp, tag);

>> +            write_unlock(&vport->vmid_lock);

>> +            return 0;

>> +        } else if (vmp && vmp->flag & LPFC_VMID_REQ_REGISTER) {

>> +            write_unlock(&vport->vmid_lock);

>> +            return -EBUSY;

>> +        }

>> +

>> +        /* else search and allocate a free slot in the hash table */

>> +        if (vport->cur_vmid_cnt < vport->max_vmid) {

>> +            for (i = 0; i < vport->max_vmid; ++i) {

>> +                vmp = vport->vmid + i;

>> +                if (vmp->flag == LPFC_VMID_SLOT_FREE) {

>> +                    vmp = vport->vmid + i;


delete this last line and adjust parens - really odd that it replicates 
the assignment 2 lines earlier.

>> +                    break;

>> +                }

>> +            }


I would prefer that if the table is expended and no slots free, that 
-ENOMEM is returned here. Rather than falling down below and qualifying 
by slot free, then by pending (set only if slot free).  I can't believe 
there is a reason the idle timer has to be started if no slots free as 
all the other fail cases don't bother with it.

This also helps indentation levels below.

>> +        } else {

>> +            write_unlock(&vport->vmid_lock);

>> +            return -ENOMEM;

>> +        }

>> +

>> +        if (vmp && (vmp->flag == LPFC_VMID_SLOT_FREE)) {

>> +            /* Add the vmid and register  */

>> +            lpfc_put_vmid_in_hashtable(vport, hash, vmp);

>> +            vmp->vmid_len = len;

>> +            memcpy(vmp->host_vmid, uuid, vmp->vmid_len);

>> +            vmp->io_rd_cnt = 0;

>> +            vmp->io_wr_cnt = 0;

>> +            vmp->flag = LPFC_VMID_SLOT_USED;

>> +

>> +            vmp->delete_inactive =

>> +                vport->vmid_inactivity_timeout ? 1 : 0;

>> +

>> +            /* if type priority tag, get next available vmid */

>> +            if (lpfc_vmid_is_type_priority_tag(vport))

>> +                lpfc_vmid_assign_cs_ctl(vport, vmp);

>> +

>> +            /* allocate the per cpu variable for holding */

>> +            /* the last access time stamp only if vmid is enabled */

>> +            if (!vmp->last_io_time)

>> +                vmp->last_io_time =

>> +                    __alloc_percpu(sizeof(u64),

>> +                           __alignof__(struct

>> +                                   lpfc_vmid));

>> +

>> +            /* registration pending */

>> +            pending = 1;

>> +        } else {

>> +            rc = -ENOMEM;

>> +        }

>> +        write_unlock(&vport->vmid_lock);

>> +

>> +        /* complete transaction with switch */

>> +        if (pending) {

>> +            if (lpfc_vmid_is_type_priority_tag(vport))

>> +                rc = lpfc_vmid_uvem(vport, vmp, true);

>> +            else

>> +                rc = lpfc_vmid_cmd(vport,

>> +                           SLI_CTAS_RAPP_IDENT,

>> +                           vmp);

>> +            if (!rc) {

>> +                write_lock(&vport->vmid_lock);

>> +                vport->cur_vmid_cnt++;

>> +                vmp->flag |= LPFC_VMID_REQ_REGISTER;

>> +                write_unlock(&vport->vmid_lock);

>> +            }

>> +        }

>> +

>> +        /* finally, enable the idle timer once */

>> +        if (!(vport->phba->pport->vmid_flag & LPFC_VMID_TIMER_ENBLD)) {

>> +            mod_timer(&vport->phba->inactive_vmid_poll,

>> +                  jiffies +

>> +                  msecs_to_jiffies(1000 * LPFC_VMID_TIMER));

>> +            vport->phba->pport->vmid_flag |= LPFC_VMID_TIMER_ENBLD;

>> +        }

>> +    }

>> +    return rc;

>> +}

>> +

>> +/*

>> + * lpfc_is_command_vm_io - get the uuid from blk cgroup

>> + * @cmd:Pointer to scsi_cmnd data structure

>> + * Returns uuid if present if not null

>> + */

>> +static char *lpfc_is_command_vm_io(struct scsi_cmnd *cmd)

>> +{

>> +    char *uuid = NULL;

>> +

>> +    if (cmd->request) {

>> +        if (cmd->request->bio)

>> +            uuid = blkcg_get_fc_appid(cmd->request->bio);

>> +    }

>> +    return uuid;

>> +}

>> +

>>   /**

>>    * lpfc_queuecommand - scsi_host_template queuecommand entry point

>>    * @shost: kernel scsi host pointer.

>> @@ -5288,6 +5437,7 @@ lpfc_queuecommand(struct Scsi_Host *shost, 

>> struct scsi_cmnd *cmnd)

>>       int err, idx;

>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>>       uint64_t start = 0L;

>> +    u8 *uuid = NULL;

>>       if (phba->ktime_on)

>>           start = ktime_get_ns();

>> @@ -5415,6 +5565,25 @@ lpfc_queuecommand(struct Scsi_Host *shost, 

>> struct scsi_cmnd *cmnd)

>>       }

>> +    /* check the necessary and sufficient condition to support VMID */

>> +    if (lpfc_is_vmid_enabled(phba) &&

>> +        (ndlp->vmid_support ||

>> +         phba->pport->vmid_priority_tagging ==

>> +         LPFC_VMID_PRIO_TAG_ALL_TARGETS)) {

>> +        /* is the IO generated by a VM, get the associated virtual */

>> +        /* entity id */

>> +        uuid = lpfc_is_command_vm_io(cmnd);

>> +

>> +        if (uuid) {

>> +            err = lpfc_vmid_get_appid(vport, uuid, cmnd,

>> +                (union lpfc_vmid_io_tag *)

>> +                    &lpfc_cmd->cur_iocbq.vmid_tag);

>> +            if (!err)

>> +                lpfc_cmd->cur_iocbq.iocb_flag |= LPFC_IO_VMID;

>> +        }

>> +    }

>> +

>> +    atomic_inc(&ndlp->cmd_pending);

>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>>       if (unlikely(phba->hdwqstat_on & LPFC_CHECK_SCSI_IO))

>>           this_cpu_inc(phba->sli4_hba.c_stat->xmt_io);

>>

> And that's the bit which I don't particular like.

> 

> Essentially we'll have to inject additional ELS commands _on each I/O_ 

> to get a valid VMID.

> Where there are _so_ many things which might get wrong, causing an I/O 

> stall.


I don't follow you - yes ELS's are injected when there isn't an entry 
for the VM, but once there is, there isn't further ELS's. That is the 
cost. as we don't know what uuid's I/O will be sent to before hand, so 
it has to be siphoned off during the I/O flow.  I/O's can be sent 
non-tagged while the ELS's are completing (and there aren't multiple 
sets of ELS's as long as it's the same uuid), which is fine.

so I disagree with "_on each I/O_".

> I would have vastly preferred if we could _avoid_ having to do 

> additional ELS commands for VMID registration in the I/O path

> (ie only allow for I/O with a valid VMID), and reject the I/O otherwise 

> until VMID registration is complete.

> 

> IE return 'BUSY' (or even a command retry?) when no valid VMID for this 

> particular I/O is found, register the VMID (preferably in another 

> thread), and restart the queue once the VMID is registered.


Why does it bother you with the I/O path ?  It's actually happening in 
parallel with no real relation between the two.

I seriously disagree with reject if no vmid tag. Why?  what do you gain 
? This actually introduces more disruption than the parallel flow with 
the ELSs.   Also, after link bounce, where all VMID's have to be done, 
it adds a stall window after link up right when things are trying to 
resume after rports rejoin. Why add the i/o rebouncing ? There no real 
benefit. Issuing a few untagged I/O doesn't hurt.

> 

> That way we have a clear separation, and the I/O path will always work 

> with valid VMIDs.

> 

> Hmm?

> 

> Cheers,

> 

> Hannes


-- james
Hannes Reinecke April 12, 2021, 5:27 a.m. UTC | #8
On 4/10/21 5:00 PM, James Smart wrote:
> On 4/8/2021 1:46 AM, Hannes Reinecke wrote:

>> On 4/7/21 1:06 AM, Muneendra wrote:

>>> From: Gaurav Srivastava <gaurav.srivastava@broadcom.com>

> ...

>>> +        /* while the read lock was released, in case the entry was */

>>> +        /* added by other context or is in process of being added */

>>> +        if (vmp && vmp->flag & LPFC_VMID_REGISTERED) {

>>> +            lpfc_vmid_update_entry(vport, cmd, vmp, tag);

>>> +            write_unlock(&vport->vmid_lock);

>>> +            return 0;

>>> +        } else if (vmp && vmp->flag & LPFC_VMID_REQ_REGISTER) {

>>> +            write_unlock(&vport->vmid_lock);

>>> +            return -EBUSY;

>>> +        }

>>> +

>>> +        /* else search and allocate a free slot in the hash table */

>>> +        if (vport->cur_vmid_cnt < vport->max_vmid) {

>>> +            for (i = 0; i < vport->max_vmid; ++i) {

>>> +                vmp = vport->vmid + i;

>>> +                if (vmp->flag == LPFC_VMID_SLOT_FREE) {

>>> +                    vmp = vport->vmid + i;

> 

> delete this last line and adjust parens - really odd that it replicates 

> the assignment 2 lines earlier.

> 

>>> +                    break;

>>> +                }

>>> +            }

> 

> I would prefer that if the table is expended and no slots free, that 

> -ENOMEM is returned here. Rather than falling down below and qualifying 

> by slot free, then by pending (set only if slot free).  I can't believe 

> there is a reason the idle timer has to be started if no slots free as 

> all the other fail cases don't bother with it.

> 

> This also helps indentation levels below.

> 

>>> +        } else {

>>> +            write_unlock(&vport->vmid_lock);

>>> +            return -ENOMEM;

>>> +        }

>>> +

>>> +        if (vmp && (vmp->flag == LPFC_VMID_SLOT_FREE)) {

>>> +            /* Add the vmid and register  */

>>> +            lpfc_put_vmid_in_hashtable(vport, hash, vmp);

>>> +            vmp->vmid_len = len;

>>> +            memcpy(vmp->host_vmid, uuid, vmp->vmid_len);

>>> +            vmp->io_rd_cnt = 0;

>>> +            vmp->io_wr_cnt = 0;

>>> +            vmp->flag = LPFC_VMID_SLOT_USED;

>>> +

>>> +            vmp->delete_inactive =

>>> +                vport->vmid_inactivity_timeout ? 1 : 0;

>>> +

>>> +            /* if type priority tag, get next available vmid */

>>> +            if (lpfc_vmid_is_type_priority_tag(vport))

>>> +                lpfc_vmid_assign_cs_ctl(vport, vmp);

>>> +

>>> +            /* allocate the per cpu variable for holding */

>>> +            /* the last access time stamp only if vmid is enabled */

>>> +            if (!vmp->last_io_time)

>>> +                vmp->last_io_time =

>>> +                    __alloc_percpu(sizeof(u64),

>>> +                           __alignof__(struct

>>> +                                   lpfc_vmid));

>>> +

>>> +            /* registration pending */

>>> +            pending = 1;

>>> +        } else {

>>> +            rc = -ENOMEM;

>>> +        }

>>> +        write_unlock(&vport->vmid_lock);

>>> +

>>> +        /* complete transaction with switch */

>>> +        if (pending) {

>>> +            if (lpfc_vmid_is_type_priority_tag(vport))

>>> +                rc = lpfc_vmid_uvem(vport, vmp, true);

>>> +            else

>>> +                rc = lpfc_vmid_cmd(vport,

>>> +                           SLI_CTAS_RAPP_IDENT,

>>> +                           vmp);

>>> +            if (!rc) {

>>> +                write_lock(&vport->vmid_lock);

>>> +                vport->cur_vmid_cnt++;

>>> +                vmp->flag |= LPFC_VMID_REQ_REGISTER;

>>> +                write_unlock(&vport->vmid_lock);

>>> +            }

>>> +        }

>>> +

>>> +        /* finally, enable the idle timer once */

>>> +        if (!(vport->phba->pport->vmid_flag & LPFC_VMID_TIMER_ENBLD)) {

>>> +            mod_timer(&vport->phba->inactive_vmid_poll,

>>> +                  jiffies +

>>> +                  msecs_to_jiffies(1000 * LPFC_VMID_TIMER));

>>> +            vport->phba->pport->vmid_flag |= LPFC_VMID_TIMER_ENBLD;

>>> +        }

>>> +    }

>>> +    return rc;

>>> +}

>>> +

>>> +/*

>>> + * lpfc_is_command_vm_io - get the uuid from blk cgroup

>>> + * @cmd:Pointer to scsi_cmnd data structure

>>> + * Returns uuid if present if not null

>>> + */

>>> +static char *lpfc_is_command_vm_io(struct scsi_cmnd *cmd)

>>> +{

>>> +    char *uuid = NULL;

>>> +

>>> +    if (cmd->request) {

>>> +        if (cmd->request->bio)

>>> +            uuid = blkcg_get_fc_appid(cmd->request->bio);

>>> +    }

>>> +    return uuid;

>>> +}

>>> +

>>>   /**

>>>    * lpfc_queuecommand - scsi_host_template queuecommand entry point

>>>    * @shost: kernel scsi host pointer.

>>> @@ -5288,6 +5437,7 @@ lpfc_queuecommand(struct Scsi_Host *shost, 

>>> struct scsi_cmnd *cmnd)

>>>       int err, idx;

>>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>>>       uint64_t start = 0L;

>>> +    u8 *uuid = NULL;

>>>       if (phba->ktime_on)

>>>           start = ktime_get_ns();

>>> @@ -5415,6 +5565,25 @@ lpfc_queuecommand(struct Scsi_Host *shost, 

>>> struct scsi_cmnd *cmnd)

>>>       }

>>> +    /* check the necessary and sufficient condition to support VMID */

>>> +    if (lpfc_is_vmid_enabled(phba) &&

>>> +        (ndlp->vmid_support ||

>>> +         phba->pport->vmid_priority_tagging ==

>>> +         LPFC_VMID_PRIO_TAG_ALL_TARGETS)) {

>>> +        /* is the IO generated by a VM, get the associated virtual */

>>> +        /* entity id */

>>> +        uuid = lpfc_is_command_vm_io(cmnd);

>>> +

>>> +        if (uuid) {

>>> +            err = lpfc_vmid_get_appid(vport, uuid, cmnd,

>>> +                (union lpfc_vmid_io_tag *)

>>> +                    &lpfc_cmd->cur_iocbq.vmid_tag);

>>> +            if (!err)

>>> +                lpfc_cmd->cur_iocbq.iocb_flag |= LPFC_IO_VMID;

>>> +        }

>>> +    }

>>> +

>>> +    atomic_inc(&ndlp->cmd_pending);

>>>   #ifdef CONFIG_SCSI_LPFC_DEBUG_FS

>>>       if (unlikely(phba->hdwqstat_on & LPFC_CHECK_SCSI_IO))

>>>           this_cpu_inc(phba->sli4_hba.c_stat->xmt_io);

>>>

>> And that's the bit which I don't particular like.

>>

>> Essentially we'll have to inject additional ELS commands _on each I/O_ 

>> to get a valid VMID.

>> Where there are _so_ many things which might get wrong, causing an I/O 

>> stall.

> 

> I don't follow you - yes ELS's are injected when there isn't an entry 

> for the VM, but once there is, there isn't further ELS's. That is the 

> cost. as we don't know what uuid's I/O will be sent to before hand, so 

> it has to be siphoned off during the I/O flow.  I/O's can be sent 

> non-tagged while the ELS's are completing (and there aren't multiple 

> sets of ELS's as long as it's the same uuid), which is fine.

> 

> so I disagree with "_on each I/O_".

> 

Yeah, that was an exaggeration.
Not on each I/O; I should've said 'on each unregistered I/O'.

This really is my unhappiness with the entire VMID specification,
where you can time out VMIDs after a certain period, and hence never
be sure if any particular I/O really _is_ registered.

>> I would have vastly preferred if we could _avoid_ having to do 

>> additional ELS commands for VMID registration in the I/O path

>> (ie only allow for I/O with a valid VMID), and reject the I/O 

>> otherwise until VMID registration is complete.

>>

>> IE return 'BUSY' (or even a command retry?) when no valid VMID for 

>> this particular I/O is found, register the VMID (preferably in another 

>> thread), and restart the queue once the VMID is registered.

> 

> Why does it bother you with the I/O path ?  It's actually happening in 

> parallel with no real relation between the two.

> 

> I seriously disagree with reject if no vmid tag. Why?  what do you gain 

> ? This actually introduces more disruption than the parallel flow with 

> the ELSs.   Also, after link bounce, where all VMID's have to be done, 

> it adds a stall window after link up right when things are trying to 

> resume after rports rejoin. Why add the i/o rebouncing ? There no real 

> benefit. Issuing a few untagged I/O doesn't hurt.

> 

That indeed is a valid point. I retract my objection.

Reviewed-by: Hannes Reinecke <hare@suse.de>


Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer