From patchwork Thu Mar 17 03:27:34 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Smart <jsmart2021@gmail.com>
X-Patchwork-Id: 552475
Return-Path: <linux-scsi-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by smtp.lore.kernel.org (Postfix) with ESMTP id BF81FC433EF
 for <linux-scsi@archiver.kernel.org>; Thu, 17 Mar 2022 03:28:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S231487AbiCQD3f (ORCPT <rfc822;linux-scsi@archiver.kernel.org>);
 Wed, 16 Mar 2022 23:29:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49790 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S229973AbiCQD30 (ORCPT
 <rfc822;linux-scsi@vger.kernel.org>); Wed, 16 Mar 2022 23:29:26 -0400
Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com
 [IPv6:2607:f8b0:4864:20::531])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D0A517ABC
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:02 -0700 (PDT)
Received: by mail-pg1-x531.google.com with SMTP id o13so1669094pgc.12
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=pSpFlYaCSiaVTrVF8uAcBP0I5mwjunvhvdL9HpVUiU0=;
 b=Y6L47+jyAYU2cMkPHbrbojdTPguV7EaYJcXTmaXCe/aLZnw1NJHQjM82WFTosxYdEr
 yO2sycQ1SE20byXXHK74hzh+BStWinhIXZQP8anmdJrfh2aVhbaTgz4poQtd5XS8fomU
 xvj8oOYOEeO5sYQvgIfUWeq/zXMg1PrE13chA4ktsMinBldc+W4x9foHjlVU/IN1sg0Y
 G5RTVuDBzf2D2qnkB8fELqT8A3SXcvfto48EsbzdLLHpsWD+CFLXGTkvYG+6NKhsB4Ex
 LOXNT2Ke1Lhf5S5cNhrPjfE9nnP5fxK2+oZls01Ng0ph04Sv8cUwAwD09KF8FFxm3sDE
 eoGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=pSpFlYaCSiaVTrVF8uAcBP0I5mwjunvhvdL9HpVUiU0=;
 b=laTJg2y1eO8ZGciTvOV1t/6ErvqCH42JQmQXv/nwt48v+PkEzJfL8+A02xmPYDm/KP
 MXQswiI7ngAe3xW8UpPfzASxH0bn4o3DdmbRtgcAgVYdQxkROFJdAIwNBR4dzLQuXu81
 Js8sf/LtqSe4AcRYT5g6O2ciMH6rudiRPniojTftHs/vIsuelqSzeL8N2HrM8phGDcbm
 cwQgIGghRYST5AZHfqfDRf7J9UM1RNg2T3iCS2lDbKEdpRu7HNk4/83NJ1neTssP2nrO
 e+uI+S1G0+8Zog+rSTHo415+BhCvjndKsA+10+M6JOTu/SAx4AbKZJYFQKlbq5qxveQk
 Nylw==
X-Gm-Message-State: AOAM532KW5lLJ5hgdYZmNtKwHzb+6a5ZNyROH0FdUujMMqpL81xyhPdd
 CYY4eoPITw+9wCJithyXsGUg3JgtkKo=
X-Google-Smtp-Source: ABdhPJzRTtDAIsg7scHRr/oSDwmSseQ0YsPNXR26AOaLxG+fFeCaIjjEzeMvOmfWF9/Sh8qMJ7jqDA==
X-Received: by 2002:a63:ef4f:0:b0:375:6898:f6b8 with SMTP id
 c15-20020a63ef4f000000b003756898f6b8mr2067871pgk.59.1647487668910;
 Wed, 16 Mar 2022 20:27:48 -0700 (PDT)
Received: from mail-lvn-it-01.broadcom.com (ip174-67-196-173.oc.oc.cox.net.
 [174.67.196.173]) by smtp.gmail.com with ESMTPSA id
 k11-20020a056a00168b00b004f7e1555538sm5017511pfc.190.2022.03.16.20.27.48
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 16 Mar 2022 20:27:48 -0700 (PDT)
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 1/4] lpfc: Improve PCI EEH Error and Recovery Handling
Date: Wed, 16 Mar 2022 20:27:34 -0700
Message-Id: <20220317032737.45308-2-jsmart2021@gmail.com>
X-Mailer: git-send-email 2.26.2
In-Reply-To: <20220317032737.45308-1-jsmart2021@gmail.com>
References: <20220317032737.45308-1-jsmart2021@gmail.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org

Following EEH errors, the driver can crash or hang when deleting the
localport or when attempting to unload.

The EEH handlers in the driver did not notify the nvme-fc transport
before tearing the driver down. This was delayed until the resume steps.
THis worked for scsi because the lpfc_block_scsi would notify the
scsi_fc_transport that the target was not available but it would not
clean up all the references to the ndlp.

The sli3 prep for dev reset handler did the lpfc_offline_prep and
lpfc_offline calls to get the port stopped before restarting. The sli4
version of the prep for dev reset just destroyed the queues and did not
stop nvme from continuing.  Also because the port was not really stopped
the localport destroy would hang because the transport was still waiting
for IO. Additionally, a Devloss tmo can fire and post events to a
stopped worker thread creating another hang condition.

To Fix:
lpfc_sli4_prep_dev_for_reset is modified to call offline_prep and
offline rather than just lpfc_scsi_dev_block to ensure both SCSI and NVME
transports are notified to block I/O to the driver.

Logic is added to devloss handler and worker thread to clean up ndlp
references and quiesce appropriately.

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc.h         |   7 +-
 drivers/scsi/lpfc/lpfc_crtn.h    |   3 +
 drivers/scsi/lpfc/lpfc_hbadisc.c | 119 +++++++++++++++++++++++++------
 drivers/scsi/lpfc/lpfc_init.c    |  60 ++++++++++------
 drivers/scsi/lpfc/lpfc_nvme.c    |  11 ++-
 drivers/scsi/lpfc/lpfc_sli.c     |  15 ++--
 6 files changed, 157 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 86653aa9b389..8405fd0bbc59 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -896,6 +896,11 @@ enum lpfc_irq_chann_mode {
 	NHT_MODE,
 };
 
+enum lpfc_hba_bit_flags {
+	FABRIC_COMANDS_BLOCKED,
+	HBA_PCI_ERR,
+};
+
 struct lpfc_hba {
 	/* SCSI interface function jump table entries */
 	struct lpfc_io_buf * (*lpfc_get_scsi_buf)
@@ -1042,7 +1047,6 @@ struct lpfc_hba {
 					 * Firmware supports Forced Link Speed
 					 * capability
 					 */
-#define HBA_PCI_ERR		0x80000 /* The PCI slot is offline */
 #define HBA_FLOGI_ISSUED	0x100000 /* FLOGI was issued */
 #define HBA_SHORT_CMF		0x200000 /* shorter CMF timer routine */
 #define HBA_CGN_DAY_WRAP	0x400000 /* HBA Congestion info day wraps */
@@ -1349,7 +1353,6 @@ struct lpfc_hba {
 	atomic_t fabric_iocb_count;
 	struct timer_list fabric_block_timer;
 	unsigned long bit_flags;
-#define	FABRIC_COMANDS_BLOCKED	0
 	atomic_t num_rsrc_err;
 	atomic_t num_cmd_success;
 	unsigned long last_rsrc_error_time;
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 96408cd6c4c8..9897a1aa387b 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -670,3 +670,6 @@ struct lpfc_vmid *lpfc_get_vmid_from_hashtable(struct lpfc_vport *vport,
 					      uint32_t hash, uint8_t *buf);
 void lpfc_vmid_vport_cleanup(struct lpfc_vport *vport);
 int lpfc_issue_els_qfpa(struct lpfc_vport *vport);
+
+void lpfc_sli_rpi_release(struct lpfc_vport *vport,
+			  struct lpfc_nodelist *ndlp);
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 0144da30e3db..6983c70f2fc6 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -109,8 +109,8 @@ lpfc_rport_invalid(struct fc_rport *rport)
 
 	ndlp = rdata->pnode;
 	if (!rdata->pnode) {
-		pr_err("**** %s: NULL ndlp on rport x%px SID x%x\n",
-		       __func__, rport, rport->scsi_target_id);
+		pr_info("**** %s: NULL ndlp on rport x%px SID x%x\n",
+			__func__, rport, rport->scsi_target_id);
 		return -EINVAL;
 	}
 
@@ -169,9 +169,10 @@ lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
 
 	lpfc_printf_vlog(ndlp->vport, KERN_INFO, LOG_NODE,
 			 "3181 dev_loss_callbk x%06x, rport x%px flg x%x "
-			 "load_flag x%x refcnt %d\n",
+			 "load_flag x%x refcnt %d state %d xpt x%x\n",
 			 ndlp->nlp_DID, ndlp->rport, ndlp->nlp_flag,
-			 vport->load_flag, kref_read(&ndlp->kref));
+			 vport->load_flag, kref_read(&ndlp->kref),
+			 ndlp->nlp_state, ndlp->fc4_xpt_flags);
 
 	/* Don't schedule a worker thread event if the vport is going down.
 	 * The teardown process cleans up the node via lpfc_drop_node.
@@ -181,6 +182,11 @@ lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
 		ndlp->rport = NULL;
 
 		ndlp->fc4_xpt_flags &= ~SCSI_XPT_REGD;
+		/* clear the NLP_XPT_REGD if the node is not registered
+		 * with nvme-fc
+		 */
+		if (ndlp->fc4_xpt_flags == NLP_XPT_REGD)
+			ndlp->fc4_xpt_flags &= ~NLP_XPT_REGD;
 
 		/* Remove the node reference from remote_port_add now.
 		 * The driver will not call remote_port_delete.
@@ -225,18 +231,36 @@ lpfc_dev_loss_tmo_callbk(struct fc_rport *rport)
 	ndlp->rport = NULL;
 	spin_unlock_irqrestore(&ndlp->lock, iflags);
 
-	/* We need to hold the node by incrementing the reference
-	 * count until this queued work is done
-	 */
-	evtp->evt_arg1 = lpfc_nlp_get(ndlp);
+	if (phba->worker_thread) {
+		/* We need to hold the node by incrementing the reference
+		 * count until this queued work is done
+		 */
+		evtp->evt_arg1 = lpfc_nlp_get(ndlp);
+
+		spin_lock_irqsave(&phba->hbalock, iflags);
+		if (evtp->evt_arg1) {
+			evtp->evt = LPFC_EVT_DEV_LOSS;
+			list_add_tail(&evtp->evt_listp, &phba->work_list);
+			lpfc_worker_wake_up(phba);
+		}
+		spin_unlock_irqrestore(&phba->hbalock, iflags);
+	} else {
+		lpfc_printf_vlog(ndlp->vport, KERN_INFO, LOG_NODE,
+				 "3188 worker thread is stopped %s x%06x, "
+				 " rport x%px flg x%x load_flag x%x refcnt "
+				 "%d\n", __func__, ndlp->nlp_DID,
+				 ndlp->rport, ndlp->nlp_flag,
+				 vport->load_flag, kref_read(&ndlp->kref));
+		if (!(ndlp->fc4_xpt_flags & NVME_XPT_REGD)) {
+			spin_lock_irqsave(&ndlp->lock, iflags);
+			/* Node is in dev loss.  No further transaction. */
+			ndlp->nlp_flag &= ~NLP_IN_DEV_LOSS;
+			spin_unlock_irqrestore(&ndlp->lock, iflags);
+			lpfc_disc_state_machine(vport, ndlp, NULL,
+						NLP_EVT_DEVICE_RM);
+		}
 
-	spin_lock_irqsave(&phba->hbalock, iflags);
-	if (evtp->evt_arg1) {
-		evtp->evt = LPFC_EVT_DEV_LOSS;
-		list_add_tail(&evtp->evt_listp, &phba->work_list);
-		lpfc_worker_wake_up(phba);
 	}
-	spin_unlock_irqrestore(&phba->hbalock, iflags);
 
 	return;
 }
@@ -503,11 +527,12 @@ lpfc_dev_loss_tmo_handler(struct lpfc_nodelist *ndlp)
 		lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
 				 "0203 Devloss timeout on "
 				 "WWPN %02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x "
-				 "NPort x%06x Data: x%x x%x x%x\n",
+				 "NPort x%06x Data: x%x x%x x%x refcnt %d\n",
 				 *name, *(name+1), *(name+2), *(name+3),
 				 *(name+4), *(name+5), *(name+6), *(name+7),
 				 ndlp->nlp_DID, ndlp->nlp_flag,
-				 ndlp->nlp_state, ndlp->nlp_rpi);
+				 ndlp->nlp_state, ndlp->nlp_rpi,
+				 kref_read(&ndlp->kref));
 	} else {
 		lpfc_printf_vlog(vport, KERN_INFO, LOG_TRACE_EVENT,
 				 "0204 Devloss timeout on "
@@ -755,18 +780,22 @@ lpfc_work_list_done(struct lpfc_hba *phba)
 	int free_evt;
 	int fcf_inuse;
 	uint32_t nlp_did;
+	bool hba_pci_err;
 
 	spin_lock_irq(&phba->hbalock);
 	while (!list_empty(&phba->work_list)) {
 		list_remove_head((&phba->work_list), evtp, typeof(*evtp),
 				 evt_listp);
 		spin_unlock_irq(&phba->hbalock);
+		hba_pci_err = test_bit(HBA_PCI_ERR, &phba->bit_flags);
 		free_evt = 1;
 		switch (evtp->evt) {
 		case LPFC_EVT_ELS_RETRY:
 			ndlp = (struct lpfc_nodelist *) (evtp->evt_arg1);
-			lpfc_els_retry_delay_handler(ndlp);
-			free_evt = 0; /* evt is part of ndlp */
+			if (!hba_pci_err) {
+				lpfc_els_retry_delay_handler(ndlp);
+				free_evt = 0; /* evt is part of ndlp */
+			}
 			/* decrement the node reference count held
 			 * for this queued work
 			 */
@@ -788,8 +817,10 @@ lpfc_work_list_done(struct lpfc_hba *phba)
 			break;
 		case LPFC_EVT_RECOVER_PORT:
 			ndlp = (struct lpfc_nodelist *)(evtp->evt_arg1);
-			lpfc_sli_abts_recover_port(ndlp->vport, ndlp);
-			free_evt = 0;
+			if (!hba_pci_err) {
+				lpfc_sli_abts_recover_port(ndlp->vport, ndlp);
+				free_evt = 0;
+			}
 			/* decrement the node reference count held for
 			 * this queued work
 			 */
@@ -859,14 +890,18 @@ lpfc_work_done(struct lpfc_hba *phba)
 	struct lpfc_vport **vports;
 	struct lpfc_vport *vport;
 	int i;
+	bool hba_pci_err;
 
+	hba_pci_err = test_bit(HBA_PCI_ERR, &phba->bit_flags);
 	spin_lock_irq(&phba->hbalock);
 	ha_copy = phba->work_ha;
 	phba->work_ha = 0;
 	spin_unlock_irq(&phba->hbalock);
+	if (hba_pci_err)
+		ha_copy = 0;
 
 	/* First, try to post the next mailbox command to SLI4 device */
-	if (phba->pci_dev_grp == LPFC_PCI_DEV_OC)
+	if (phba->pci_dev_grp == LPFC_PCI_DEV_OC && !hba_pci_err)
 		lpfc_sli4_post_async_mbox(phba);
 
 	if (ha_copy & HA_ERATT) {
@@ -886,7 +921,7 @@ lpfc_work_done(struct lpfc_hba *phba)
 		lpfc_handle_latt(phba);
 
 	/* Handle VMID Events */
-	if (lpfc_is_vmid_enabled(phba)) {
+	if (lpfc_is_vmid_enabled(phba) && !hba_pci_err) {
 		if (phba->pport->work_port_events &
 		    WORKER_CHECK_VMID_ISSUE_QFPA) {
 			lpfc_check_vmid_qfpa_issue(phba);
@@ -936,6 +971,8 @@ lpfc_work_done(struct lpfc_hba *phba)
 			work_port_events = vport->work_port_events;
 			vport->work_port_events &= ~work_port_events;
 			spin_unlock_irq(&vport->work_port_lock);
+			if (hba_pci_err)
+				continue;
 			if (work_port_events & WORKER_DISC_TMO)
 				lpfc_disc_timeout_handler(vport);
 			if (work_port_events & WORKER_ELS_TMO)
@@ -1173,12 +1210,14 @@ lpfc_linkdown(struct lpfc_hba *phba)
 	struct lpfc_vport **vports;
 	LPFC_MBOXQ_t          *mb;
 	int i;
+	int offline;
 
 	if (phba->link_state == LPFC_LINK_DOWN)
 		return 0;
 
 	/* Block all SCSI stack I/Os */
 	lpfc_scsi_dev_block(phba);
+	offline = pci_channel_offline(phba->pcidev);
 
 	phba->defer_flogi_acc_flag = false;
 
@@ -1219,7 +1258,7 @@ lpfc_linkdown(struct lpfc_hba *phba)
 	lpfc_destroy_vport_work_array(phba, vports);
 
 	/* Clean up any SLI3 firmware default rpi's */
-	if (phba->sli_rev > LPFC_SLI_REV3)
+	if (phba->sli_rev > LPFC_SLI_REV3 || offline)
 		goto skip_unreg_did;
 
 	mb = mempool_alloc(phba->mbox_mem_pool, GFP_KERNEL);
@@ -4712,6 +4751,11 @@ lpfc_nlp_unreg_node(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
 	spin_lock_irqsave(&ndlp->lock, iflags);
 	if (!(ndlp->fc4_xpt_flags & NLP_XPT_REGD)) {
 		spin_unlock_irqrestore(&ndlp->lock, iflags);
+		lpfc_printf_vlog(vport, KERN_INFO, LOG_SLI,
+				 "0999 %s Not regd: ndlp x%px rport x%px DID "
+				 "x%x FLG x%x XPT x%x\n",
+				  __func__, ndlp, ndlp->rport, ndlp->nlp_DID,
+				  ndlp->nlp_flag, ndlp->fc4_xpt_flags);
 		return;
 	}
 
@@ -4722,6 +4766,13 @@ lpfc_nlp_unreg_node(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
 	    ndlp->fc4_xpt_flags & SCSI_XPT_REGD) {
 		vport->phba->nport_event_cnt++;
 		lpfc_unregister_remote_port(ndlp);
+	} else if (!ndlp->rport) {
+		lpfc_printf_vlog(vport, KERN_INFO, LOG_SLI,
+				 "1999 %s NDLP in devloss x%px DID x%x FLG x%x"
+				 " XPT x%x refcnt %d\n",
+				 __func__, ndlp, ndlp->nlp_DID, ndlp->nlp_flag,
+				 ndlp->fc4_xpt_flags,
+				 kref_read(&ndlp->kref));
 	}
 
 	if (ndlp->fc4_xpt_flags & NVME_XPT_REGD) {
@@ -6097,12 +6148,34 @@ lpfc_disc_flush_list(struct lpfc_vport *vport)
 	}
 }
 
+/*
+ * lpfc_notify_xport_npr - notifies xport of node disappearance
+ * @vport: Pointer to Virtual Port object.
+ *
+ * Transitions all ndlps to NPR state.  When lpfc_nlp_set_state
+ * calls lpfc_nlp_state_cleanup, the ndlp->rport is unregistered
+ * and transport notified that the node is gone.
+ * Return Code:
+ *	none
+ */
+static void
+lpfc_notify_xport_npr(struct lpfc_vport *vport)
+{
+	struct lpfc_nodelist *ndlp, *next_ndlp;
+
+	list_for_each_entry_safe(ndlp, next_ndlp, &vport->fc_nodes,
+				 nlp_listp) {
+		lpfc_nlp_set_state(vport, ndlp, NLP_STE_NPR_NODE);
+	}
+}
 void
 lpfc_cleanup_discovery_resources(struct lpfc_vport *vport)
 {
 	lpfc_els_flush_rscn(vport);
 	lpfc_els_flush_cmd(vport);
 	lpfc_disc_flush_list(vport);
+	if (pci_channel_offline(vport->phba->pcidev))
+		lpfc_notify_xport_npr(vport);
 }
 
 /*****************************************************************************/
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index eed6464bd880..b8ab6dcbadc5 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -1642,7 +1642,7 @@ lpfc_sli4_offline_eratt(struct lpfc_hba *phba)
 {
 	spin_lock_irq(&phba->hbalock);
 	if (phba->link_state == LPFC_HBA_ERROR &&
-	    phba->hba_flag & HBA_PCI_ERR) {
+		test_bit(HBA_PCI_ERR, &phba->bit_flags)) {
 		spin_unlock_irq(&phba->hbalock);
 		return;
 	}
@@ -3682,7 +3682,8 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action)
 	struct lpfc_vport **vports;
 	struct Scsi_Host *shost;
 	int i;
-	int offline = 0;
+	int offline;
+	bool hba_pci_err;
 
 	if (vport->fc_flag & FC_OFFLINE_MODE)
 		return;
@@ -3692,6 +3693,7 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action)
 	lpfc_linkdown(phba);
 
 	offline =  pci_channel_offline(phba->pcidev);
+	hba_pci_err = test_bit(HBA_PCI_ERR, &phba->bit_flags);
 
 	/* Issue an unreg_login to all nodes on all vports */
 	vports = lpfc_create_vport_work_array(phba);
@@ -3715,11 +3717,14 @@ lpfc_offline_prep(struct lpfc_hba *phba, int mbx_action)
 				ndlp->nlp_flag &= ~NLP_NPR_ADISC;
 				spin_unlock_irq(&ndlp->lock);
 
-				if (offline) {
+				if (offline || hba_pci_err) {
 					spin_lock_irq(&ndlp->lock);
 					ndlp->nlp_flag &= ~(NLP_UNREG_INP |
 							    NLP_RPI_REGISTERED);
 					spin_unlock_irq(&ndlp->lock);
+					if (phba->sli_rev == LPFC_SLI_REV4)
+						lpfc_sli_rpi_release(vports[i],
+								     ndlp);
 				} else {
 					lpfc_unreg_rpi(vports[i], ndlp);
 				}
@@ -13374,15 +13379,12 @@ lpfc_sli4_hba_unset(struct lpfc_hba *phba)
 	/* Disable FW logging to host memory */
 	lpfc_ras_stop_fwlog(phba);
 
-	/* Unset the queues shared with the hardware then release all
-	 * allocated resources.
-	 */
-	lpfc_sli4_queue_unset(phba);
-	lpfc_sli4_queue_destroy(phba);
-
 	/* Reset SLI4 HBA FCoE function */
 	lpfc_pci_function_reset(phba);
 
+	/* release all queue allocated resources. */
+	lpfc_sli4_queue_destroy(phba);
+
 	/* Free RAS DMA memory */
 	if (phba->ras_fwlog.ras_enabled)
 		lpfc_sli4_ras_dma_free(phba);
@@ -15057,24 +15059,28 @@ lpfc_sli4_prep_dev_for_recover(struct lpfc_hba *phba)
 static void
 lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba)
 {
-	lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
-			"2826 PCI channel disable preparing for reset\n");
+	int offline =  pci_channel_offline(phba->pcidev);
+
+	lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+			"2826 PCI channel disable preparing for reset offline"
+			" %d\n", offline);
 
 	/* Block any management I/Os to the device */
 	lpfc_block_mgmt_io(phba, LPFC_MBX_NO_WAIT);
 
-	/* Block all SCSI devices' I/Os on the host */
-	lpfc_scsi_dev_block(phba);
 
+	/* HBA_PCI_ERR was set in io_error_detect */
+	lpfc_offline_prep(phba, LPFC_MBX_NO_WAIT);
 	/* Flush all driver's outstanding I/Os as we are to reset */
 	lpfc_sli_flush_io_rings(phba);
+	lpfc_offline(phba);
 
 	/* stop all timers */
 	lpfc_stop_hba_timers(phba);
 
+	lpfc_sli4_queue_destroy(phba);
 	/* Disable interrupt and pci device */
 	lpfc_sli4_disable_intr(phba);
-	lpfc_sli4_queue_destroy(phba);
 	pci_disable_device(phba->pcidev);
 }
 
@@ -15123,6 +15129,7 @@ lpfc_io_error_detected_s4(struct pci_dev *pdev, pci_channel_state_t state)
 {
 	struct Scsi_Host *shost = pci_get_drvdata(pdev);
 	struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba;
+	bool hba_pci_err;
 
 	switch (state) {
 	case pci_channel_io_normal:
@@ -15130,17 +15137,24 @@ lpfc_io_error_detected_s4(struct pci_dev *pdev, pci_channel_state_t state)
 		lpfc_sli4_prep_dev_for_recover(phba);
 		return PCI_ERS_RESULT_CAN_RECOVER;
 	case pci_channel_io_frozen:
-		phba->hba_flag |= HBA_PCI_ERR;
+		hba_pci_err = test_and_set_bit(HBA_PCI_ERR, &phba->bit_flags);
 		/* Fatal error, prepare for slot reset */
-		lpfc_sli4_prep_dev_for_reset(phba);
+		if (!hba_pci_err)
+			lpfc_sli4_prep_dev_for_reset(phba);
+		else
+			lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+					"2832  Already handling PCI error "
+					"state: x%x\n", state);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
-		phba->hba_flag |= HBA_PCI_ERR;
+		set_bit(HBA_PCI_ERR, &phba->bit_flags);
 		/* Permanent failure, prepare for device down */
 		lpfc_sli4_prep_dev_for_perm_failure(phba);
 		return PCI_ERS_RESULT_DISCONNECT;
 	default:
-		phba->hba_flag |= HBA_PCI_ERR;
+		hba_pci_err = test_and_set_bit(HBA_PCI_ERR, &phba->bit_flags);
+		if (!hba_pci_err)
+			lpfc_sli4_prep_dev_for_reset(phba);
 		/* Unknown state, prepare and request slot reset */
 		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
 				"2825 Unknown PCI error state: x%x\n", state);
@@ -15174,17 +15188,21 @@ lpfc_io_slot_reset_s4(struct pci_dev *pdev)
 	struct lpfc_hba *phba = ((struct lpfc_vport *)shost->hostdata)->phba;
 	struct lpfc_sli *psli = &phba->sli;
 	uint32_t intr_mode;
+	bool hba_pci_err;
 
 	dev_printk(KERN_INFO, &pdev->dev, "recovering from a slot reset.\n");
 	if (pci_enable_device_mem(pdev)) {
 		printk(KERN_ERR "lpfc: Cannot re-enable "
-			"PCI device after reset.\n");
+		       "PCI device after reset.\n");
 		return PCI_ERS_RESULT_DISCONNECT;
 	}
 
 	pci_restore_state(pdev);
 
-	phba->hba_flag &= ~HBA_PCI_ERR;
+	hba_pci_err = test_and_clear_bit(HBA_PCI_ERR, &phba->bit_flags);
+	if (!hba_pci_err)
+		dev_info(&pdev->dev,
+			 "hba_pci_err was not set, recovering slot reset.\n");
 	/*
 	 * As the new kernel behavior of pci_restore_state() API call clears
 	 * device saved_state flag, need to save the restored state again.
@@ -15239,8 +15257,6 @@ lpfc_io_resume_s4(struct pci_dev *pdev)
 	 */
 	if (!(phba->sli.sli_flag & LPFC_SLI_ACTIVE)) {
 		/* Perform device reset */
-		lpfc_offline_prep(phba, LPFC_MBX_WAIT);
-		lpfc_offline(phba);
 		lpfc_sli_brdrestart(phba);
 		/* Bring the device back online */
 		lpfc_online(phba);
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 1213a299f9aa..e47205e0d3e2 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2169,8 +2169,7 @@ lpfc_nvme_lport_unreg_wait(struct lpfc_vport *vport,
 			abts_nvme = 0;
 			for (i = 0; i < phba->cfg_hdw_queue; i++) {
 				qp = &phba->sli4_hba.hdwq[i];
-				if (!vport || !vport->localport ||
-				    !qp || !qp->io_wq)
+				if (!vport->localport || !qp || !qp->io_wq)
 					return;
 
 				pring = qp->io_wq->pring;
@@ -2180,8 +2179,9 @@ lpfc_nvme_lport_unreg_wait(struct lpfc_vport *vport,
 				abts_scsi += qp->abts_scsi_io_bufs;
 				abts_nvme += qp->abts_nvme_io_bufs;
 			}
-			if (!vport || !vport->localport ||
-			    vport->phba->hba_flag & HBA_PCI_ERR)
+			if (!vport->localport ||
+			    test_bit(HBA_PCI_ERR, &vport->phba->bit_flags) ||
+			    vport->load_flag & FC_UNLOADING)
 				return;
 
 			lpfc_printf_vlog(vport, KERN_ERR, LOG_TRACE_EVENT,
@@ -2541,8 +2541,7 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
 		 * return values is ignored.  The upcall is a courtesy to the
 		 * transport.
 		 */
-		if (vport->load_flag & FC_UNLOADING ||
-		    unlikely(vport->phba->hba_flag & HBA_PCI_ERR))
+		if (vport->load_flag & FC_UNLOADING)
 			(void)nvme_fc_set_remoteport_devloss(remoteport, 0);
 
 		ret = nvme_fc_unregister_remoteport(remoteport);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 20d40957a385..26f6a147b5ae 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -2828,6 +2828,12 @@ __lpfc_sli_rpi_release(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
 	ndlp->nlp_flag &= ~NLP_UNREG_INP;
 }
 
+void
+lpfc_sli_rpi_release(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
+{
+	__lpfc_sli_rpi_release(vport, ndlp);
+}
+
 /**
  * lpfc_sli_def_mbox_cmpl - Default mailbox completion handler
  * @phba: Pointer to HBA context object.
@@ -4624,11 +4630,6 @@ lpfc_sli_flush_io_rings(struct lpfc_hba *phba)
 	struct lpfc_iocbq *piocb, *next_iocb;
 
 	spin_lock_irq(&phba->hbalock);
-	if (phba->hba_flag & HBA_IOQ_FLUSH ||
-	    !phba->sli4_hba.hdwq) {
-		spin_unlock_irq(&phba->hbalock);
-		return;
-	}
 	/* Indicate the I/O queues are flushed */
 	phba->hba_flag |= HBA_IOQ_FLUSH;
 	spin_unlock_irq(&phba->hbalock);
@@ -10997,6 +10998,10 @@ lpfc_sli_issue_iocb(struct lpfc_hba *phba, uint32_t ring_number,
 	unsigned long iflags;
 	int rc;
 
+	/* If the PCI channel is in offline state, do not post iocbs. */
+	if (unlikely(pci_channel_offline(phba->pcidev)))
+		return IOCB_ERROR;
+
 	if (phba->sli_rev == LPFC_SLI_REV4) {
 		lpfc_sli_prep_wqe(phba, piocb);
 

From patchwork Thu Mar 17 03:27:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Smart <jsmart2021@gmail.com>
X-Patchwork-Id: 552474
Return-Path: <linux-scsi-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 5B2C9C433EF
 for <linux-scsi@archiver.kernel.org>; Thu, 17 Mar 2022 03:28:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S231322AbiCQD3i (ORCPT <rfc822;linux-scsi@archiver.kernel.org>);
 Wed, 16 Mar 2022 23:29:38 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48320 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S230019AbiCQD32 (ORCPT
 <rfc822;linux-scsi@vger.kernel.org>); Wed, 16 Mar 2022 23:29:28 -0400
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com
 [IPv6:2607:f8b0:4864:20::429])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C4F41A822
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:05 -0700 (PDT)
Received: by mail-pf1-x429.google.com with SMTP id d19so5746045pfv.7
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=AHh4nQqf7Y/RrT3b7PuiNJBygSME1QGieUa1Ks6mEeA=;
 b=nfvSvZwNMVOpQXAk8ti7odHQSrJdG9Z5MHIq/tfcI9C99kJ0wp+y64lGPhx9fo2f6d
 7t7OOvWalfW3LTXKiUF9B81mksBkRRm+LMgm8TGi6ybv6mnxAjTq1QwMquR0FOGL3B8q
 ceie7W0iA5k/5QnP1TbatfDE6uElXgqyvqvHAQ/h3E6QneQYTgZVNPKBu2v1Nh970Shh
 OVXnJcRcit4rTD04SvxNq8fQPvqzzCcqNnMkLsuOzQs+VhmT+ryMzxkLWxTZrb8ZlEO4
 8I6iq0G4ujpKbMaG4UcXyUD9jcdojcKF+gD8hrquuaZE5Ko7SwHEfSE2ChV8I9M6hfPL
 P5xw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=AHh4nQqf7Y/RrT3b7PuiNJBygSME1QGieUa1Ks6mEeA=;
 b=okyv57DP6QK8loNFI6oScVJtZCufI9H5Kd/jj0Wi98KX8tj/7oQeGD1j94ipiQuNgm
 EcemxqYt7TNAZmt4YyJcCKPYrEZF6dWEgpM44btcOfJR3GbtvlDop+znPhV9katqE6ih
 HMDdi3vg64AAsBxU5p/EWcDxAnCn5Im2joyPBKMHQ7gGo4MGTZOMf9W37uf2h5W6q4Ep
 kpPpy6OkUDhxCqV4eYXhJp8qImekfSyRDUMz7Sk3/imfp8dMlAB15no2Afsf91YUqnid
 OUeI82H2XQYuqxJ72+IrskNXDk/7SpIimBMfTLMKlFX8PtSa0jC+f2H6ag9RdFatZzDo
 xo9g==
X-Gm-Message-State: AOAM5304yrQkZkvq/wAwRWP5sbRCz5IDAy7MUg70AKceNrFxKA8duJMK
 haoBAVdhVZPTP+uiP3DQU5tsXOauqBQ=
X-Google-Smtp-Source: ABdhPJybYkaA2X4xFLspNyHtFsFjm+TkAZu4bxRDExjZ1KIbhLPPpaRXG9O9xUPYSJEJyEcrcfG+Uw==
X-Received: by 2002:a63:20f:0:b0:378:9f08:a0a4 with SMTP id
 15-20020a63020f000000b003789f08a0a4mr2094712pgc.17.1647487669701;
 Wed, 16 Mar 2022 20:27:49 -0700 (PDT)
Received: from mail-lvn-it-01.broadcom.com (ip174-67-196-173.oc.oc.cox.net.
 [174.67.196.173]) by smtp.gmail.com with ESMTPSA id
 k11-20020a056a00168b00b004f7e1555538sm5017511pfc.190.2022.03.16.20.27.49
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 16 Mar 2022 20:27:49 -0700 (PDT)
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 2/4] lpfc: Fix unload hang after back to back PCI EEH faults
Date: Wed, 16 Mar 2022 20:27:35 -0700
Message-Id: <20220317032737.45308-3-jsmart2021@gmail.com>
X-Mailer: git-send-email 2.26.2
In-Reply-To: <20220317032737.45308-1-jsmart2021@gmail.com>
References: <20220317032737.45308-1-jsmart2021@gmail.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org

When injecting eeh errors the port is getting hung up waiting on the node
list to empty, message number 0233. The driver is stuck at this point and
also can't unload. The driver makes transport remoteport delete calls
which try to abort I/O's, but the eeh daemon has already called the
driver to detach and the detachment has set the global FC_UNLOADING flag.
There are several code paths that will avoid I/O cleanup if the
FC_UNLOADING flag is set, resulting in transports waiting for I/O while
the driver is waiting on transports to clean up.

Additionally, During study of the list, a locking issue was found in
lpfc_sli_abort_iocb_ring that could corrupt the list.

To Fix:
A special case was added to the lpfc_cleanup routine to call
  lpfc_sli_flush_rings if the driver is FC_UNLOADING and if the pci-slot
  is offline (e.g. eeh).
Changed the sli4 part of lpfc_sli_abort_iocb_ring to use the ring_lock.
  Also added code to cancel the IOs if the pci-slot is offline.
Added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
  to prevent trying to send an IO that we cannot handle.

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c |  1 +
 drivers/scsi/lpfc/lpfc_init.c    | 26 +++++++++++++++--
 drivers/scsi/lpfc/lpfc_nvme.c    | 16 ++++++++--
 drivers/scsi/lpfc/lpfc_sli.c     | 50 ++++++++++++++++++++++----------
 4 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index 6983c70f2fc6..2b877dff5ed4 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -5422,6 +5422,7 @@ lpfc_unreg_rpi(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp)
 				ndlp->nlp_flag &= ~NLP_UNREG_INP;
 				mempool_free(mbox, phba->mbox_mem_pool);
 				acc_plogi = 1;
+				lpfc_nlp_put(ndlp);
 			}
 		} else {
 			lpfc_printf_vlog(vport, KERN_INFO,
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index b8ab6dcbadc5..cf83bc0e27c0 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -95,6 +95,7 @@ static void lpfc_sli4_oas_verify(struct lpfc_hba *phba);
 static uint16_t lpfc_find_cpu_handle(struct lpfc_hba *, uint16_t, int);
 static void lpfc_setup_bg(struct lpfc_hba *, struct Scsi_Host *);
 static int lpfc_sli4_cgn_parm_chg_evt(struct lpfc_hba *);
+static void lpfc_sli4_prep_dev_for_reset(struct lpfc_hba *phba);
 
 static struct scsi_transport_template *lpfc_transport_template = NULL;
 static struct scsi_transport_template *lpfc_vport_transport_template = NULL;
@@ -1985,6 +1986,7 @@ lpfc_handle_eratt_s4(struct lpfc_hba *phba)
 	if (pci_channel_offline(phba->pcidev)) {
 		lpfc_printf_log(phba, KERN_ERR, LOG_TRACE_EVENT,
 				"3166 pci channel is offline\n");
+		lpfc_sli_flush_io_rings(phba);
 		return;
 	}
 
@@ -2973,6 +2975,22 @@ lpfc_cleanup(struct lpfc_vport *vport)
 					NLP_EVT_DEVICE_RM);
 	}
 
+	/* This is a special case flush to return all
+	 * IOs before entering this loop. There are
+	 * two points in the code where a flush is
+	 * avoided if the FC_UNLOADING flag is set.
+	 * one is in the multipool destroy,
+	 * (this prevents a crash) and the other is
+	 * in the nvme abort handler, ( also prevents
+	 * a crash). Both of these exceptions are
+	 * cases where the slot is still accessible.
+	 * The flush here is only when the pci slot
+	 * is offline.
+	 */
+	if (vport->load_flag & FC_UNLOADING &&
+	    pci_channel_offline(phba->pcidev))
+		lpfc_sli_flush_io_rings(vport->phba);
+
 	/* At this point, ALL ndlp's should be gone
 	 * because of the previous NLP_EVT_DEVICE_RM.
 	 * Lets wait for this to happen, if needed.
@@ -2985,7 +3003,7 @@ lpfc_cleanup(struct lpfc_vport *vport)
 			list_for_each_entry_safe(ndlp, next_ndlp,
 						&vport->fc_nodes, nlp_listp) {
 				lpfc_printf_vlog(ndlp->vport, KERN_ERR,
-						 LOG_TRACE_EVENT,
+						 LOG_DISCOVERY,
 						 "0282 did:x%x ndlp:x%px "
 						 "refcnt:%d xflags x%x nflag x%x\n",
 						 ndlp->nlp_DID, (void *)ndlp,
@@ -13359,8 +13377,9 @@ lpfc_sli4_hba_unset(struct lpfc_hba *phba)
 	/* Abort all iocbs associated with the hba */
 	lpfc_sli_hba_iocb_abort(phba);
 
-	/* Wait for completion of device XRI exchange busy */
-	lpfc_sli4_xri_exchange_busy_wait(phba);
+	if (!pci_channel_offline(phba->pcidev))
+		/* Wait for completion of device XRI exchange busy */
+		lpfc_sli4_xri_exchange_busy_wait(phba);
 
 	/* per-phba callback de-registration for hotplug event */
 	if (phba->pport)
@@ -14264,6 +14283,7 @@ lpfc_sli_prep_dev_for_perm_failure(struct lpfc_hba *phba)
 			"2711 PCI channel permanent disable for failure\n");
 	/* Block all SCSI devices' I/Os on the host */
 	lpfc_scsi_dev_block(phba);
+	lpfc_sli4_prep_dev_for_reset(phba);
 
 	/* stop all timers */
 	lpfc_stop_hba_timers(phba);
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index e47205e0d3e2..8d26f207ebd2 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -93,6 +93,11 @@ lpfc_nvme_create_queue(struct nvme_fc_local_port *pnvme_lport,
 
 	lport = (struct lpfc_nvme_lport *)pnvme_lport->private;
 	vport = lport->vport;
+
+	if (!vport || vport->load_flag & FC_UNLOADING ||
+	    vport->phba->hba_flag & HBA_IOQ_FLUSH)
+		return -ENODEV;
+
 	qhandle = kzalloc(sizeof(struct lpfc_nvme_qhandle), GFP_KERNEL);
 	if (qhandle == NULL)
 		return -ENOMEM;
@@ -267,7 +272,8 @@ lpfc_nvme_handle_lsreq(struct lpfc_hba *phba,
 		return -EINVAL;
 
 	remoteport = lpfc_rport->remoteport;
-	if (!vport->localport)
+	if (!vport->localport ||
+	    vport->phba->hba_flag & HBA_IOQ_FLUSH)
 		return -EINVAL;
 
 	lport = vport->localport->private;
@@ -559,6 +565,8 @@ __lpfc_nvme_ls_req(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp,
 				 ndlp->nlp_DID, ntype, nstate);
 		return -ENODEV;
 	}
+	if (vport->phba->hba_flag & HBA_IOQ_FLUSH)
+		return -ENODEV;
 
 	if (!vport->phba->sli4_hba.nvmels_wq)
 		return -ENOMEM;
@@ -662,7 +670,8 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 		return -EINVAL;
 
 	vport = lport->vport;
-	if (vport->load_flag & FC_UNLOADING)
+	if (vport->load_flag & FC_UNLOADING ||
+	    vport->phba->hba_flag & HBA_IOQ_FLUSH)
 		return -ENODEV;
 
 	atomic_inc(&lport->fc4NvmeLsRequests);
@@ -1516,7 +1525,8 @@ lpfc_nvme_fcp_io_submit(struct nvme_fc_local_port *pnvme_lport,
 
 	phba = vport->phba;
 
-	if (unlikely(vport->load_flag & FC_UNLOADING)) {
+	if ((unlikely(vport->load_flag & FC_UNLOADING)) ||
+	    phba->hba_flag & HBA_IOQ_FLUSH) {
 		lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_IOERR,
 				 "6124 Fail IO, Driver unload\n");
 		atomic_inc(&lport->xmt_fcp_err);
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 26f6a147b5ae..70c3929a9fb4 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -4542,42 +4542,62 @@ lpfc_sli_handle_slow_ring_event_s4(struct lpfc_hba *phba,
 void
 lpfc_sli_abort_iocb_ring(struct lpfc_hba *phba, struct lpfc_sli_ring *pring)
 {
-	LIST_HEAD(completions);
+	LIST_HEAD(tx_completions);
+	LIST_HEAD(txcmplq_completions);
 	struct lpfc_iocbq *iocb, *next_iocb;
+	int offline;
 
 	if (pring->ringno == LPFC_ELS_RING) {
 		lpfc_fabric_abort_hba(phba);
 	}
+	offline = pci_channel_offline(phba->pcidev);
 
 	/* Error everything on txq and txcmplq
 	 * First do the txq.
 	 */
 	if (phba->sli_rev >= LPFC_SLI_REV4) {
 		spin_lock_irq(&pring->ring_lock);
-		list_splice_init(&pring->txq, &completions);
+		list_splice_init(&pring->txq, &tx_completions);
 		pring->txq_cnt = 0;
-		spin_unlock_irq(&pring->ring_lock);
 
-		spin_lock_irq(&phba->hbalock);
-		/* Next issue ABTS for everything on the txcmplq */
-		list_for_each_entry_safe(iocb, next_iocb, &pring->txcmplq, list)
-			lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
-		spin_unlock_irq(&phba->hbalock);
+		if (offline) {
+			list_splice_init(&pring->txcmplq,
+					 &txcmplq_completions);
+		} else {
+			/* Next issue ABTS for everything on the txcmplq */
+			list_for_each_entry_safe(iocb, next_iocb,
+						 &pring->txcmplq, list)
+				lpfc_sli_issue_abort_iotag(phba, pring,
+							   iocb, NULL);
+		}
+		spin_unlock_irq(&pring->ring_lock);
 	} else {
 		spin_lock_irq(&phba->hbalock);
-		list_splice_init(&pring->txq, &completions);
+		list_splice_init(&pring->txq, &tx_completions);
 		pring->txq_cnt = 0;
 
-		/* Next issue ABTS for everything on the txcmplq */
-		list_for_each_entry_safe(iocb, next_iocb, &pring->txcmplq, list)
-			lpfc_sli_issue_abort_iotag(phba, pring, iocb, NULL);
+		if (offline) {
+			list_splice_init(&pring->txcmplq, &txcmplq_completions);
+		} else {
+			/* Next issue ABTS for everything on the txcmplq */
+			list_for_each_entry_safe(iocb, next_iocb,
+						 &pring->txcmplq, list)
+				lpfc_sli_issue_abort_iotag(phba, pring,
+							   iocb, NULL);
+		}
 		spin_unlock_irq(&phba->hbalock);
 	}
-	/* Make sure HBA is alive */
-	lpfc_issue_hb_tmo(phba);
 
+	if (offline) {
+		/* Cancel all the IOCBs from the completions list */
+		lpfc_sli_cancel_iocbs(phba, &txcmplq_completions,
+				      IOSTAT_LOCAL_REJECT, IOERR_SLI_ABORTED);
+	} else {
+		/* Make sure HBA is alive */
+		lpfc_issue_hb_tmo(phba);
+	}
 	/* Cancel all the IOCBs from the completions list */
-	lpfc_sli_cancel_iocbs(phba, &completions, IOSTAT_LOCAL_REJECT,
+	lpfc_sli_cancel_iocbs(phba, &tx_completions, IOSTAT_LOCAL_REJECT,
 			      IOERR_SLI_ABORTED);
 }
 

From patchwork Thu Mar 17 03:27:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Smart <jsmart2021@gmail.com>
X-Patchwork-Id: 553160
Return-Path: <linux-scsi-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 56FA7C433FE
 for <linux-scsi@archiver.kernel.org>; Thu, 17 Mar 2022 03:28:27 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S231327AbiCQD3j (ORCPT <rfc822;linux-scsi@archiver.kernel.org>);
 Wed, 16 Mar 2022 23:29:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48322 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S229818AbiCQD33 (ORCPT
 <rfc822;linux-scsi@vger.kernel.org>); Wed, 16 Mar 2022 23:29:29 -0400
Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com
 [IPv6:2607:f8b0:4864:20::52d])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52A1B22B18
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:07 -0700 (PDT)
Received: by mail-pg1-x52d.google.com with SMTP id e6so1668530pgn.2
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=khuzMAwy7oWcqRIw4zoNf8o7dmNh1+plsVLzDS9L6Qk=;
 b=qv/9/3onbgmJU5nr2j9K7dqbGlsWxSQP+VZHelWHpiLQC6FgmadNGFlnhmYRDHh6PY
 Jcuf04DAHCrgIl9trfrXmJaM16kx6QZbXMmtXvDKhZO0hwNAmpT5MN+i0JKosUUAN4di
 vSXOhZxQE5X5zOH2zXSG81rSf3bnJeoEFuUWmyvTVo9rc+hENmwBcmyvj3d8ElqKPW0R
 R9KPJn74rRWvBzKiuCbcegZPeYFau4/h4EPEucKOeqAN4rZmclYmChGKRCwcPZ3S+/2U
 5uUxKhqtb9C6uipoNIy9BIKp5dA8iwbdhSQcK0QJeWAYYR74zT+Kesm6MKZof0Yg8Sry
 uJ+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=khuzMAwy7oWcqRIw4zoNf8o7dmNh1+plsVLzDS9L6Qk=;
 b=olEDKGlrrVbdh/bjl1GyaRh0Ww89q3HV153yAaG3Bq3ibeSfBL5zJj9NRoyLh1enFM
 o/Lmn4aJw4ngbYTlsThn/4+r6+/HvNgMdwFilFIGG8hrDyLglObFKmR7o2FcsG9NSGoN
 iTQviJfx1BqgSSJTivuMc5PVFxMxaAcboaw+sOeZ5dir7JugG6Y5itf+j8jzgazOCYnY
 uuvTPJsBV2/Kl4VR9rAUqbORI2REsC68+OnLRazommiMyhAF1krY8u0e9R8S/Ssvqu8A
 vOvl2OUIXv3pqZDCwh8zFTpKFAwN5ShU0KO8K6MIwuss1ZuGqItu57f+GywO+T1qrDtI
 pw7A==
X-Gm-Message-State: AOAM530Hz1bAdGmmzrJexIdKohWL6hp0jF7eDl7foEtGOBP4ITUN2LVp
 XItvhbAqAIxc9itf04phxISJAT0DHd4=
X-Google-Smtp-Source: ABdhPJwP4hoabfLqb9tgDiFvEMrctH1GTXZvu3fb89y0VnHZf6Q4Eawy/3X4YlDS53ooRS43pupjrA==
X-Received: by 2002:a63:114:0:b0:380:29f7:a97a with SMTP id
 20-20020a630114000000b0038029f7a97amr2048259pgb.361.1647487670567;
 Wed, 16 Mar 2022 20:27:50 -0700 (PDT)
Received: from mail-lvn-it-01.broadcom.com (ip174-67-196-173.oc.oc.cox.net.
 [174.67.196.173]) by smtp.gmail.com with ESMTPSA id
 k11-20020a056a00168b00b004f7e1555538sm5017511pfc.190.2022.03.16.20.27.49
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 16 Mar 2022 20:27:50 -0700 (PDT)
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 3/4] lpfc: Fix queue failures when recovering from PCI parity
 error
Date: Wed, 16 Mar 2022 20:27:36 -0700
Message-Id: <20220317032737.45308-4-jsmart2021@gmail.com>
X-Mailer: git-send-email 2.26.2
In-Reply-To: <20220317032737.45308-1-jsmart2021@gmail.com>
References: <20220317032737.45308-1-jsmart2021@gmail.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org

When recovering from a pci-parity error the driver is failing to
re-create queues, causing recovery to fail. Looking deeper, it was
found tha the interrupt vector count allocated on the recovery is
fewer than the vectors originally allocated. This disparity resulted
in cpu map entries with stale information. When the driver tries to
re-create the queues, it attempts to use the stale information which
indicates an eq/interrupt vector that was no longer created.

Fix by clearng the cpup map array before enabling and requesting the
irqs in the lpfc_sli_reset_slot_s4 routine.

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index cf83bc0e27c0..461d333b1b3a 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -15236,6 +15236,8 @@ lpfc_io_slot_reset_s4(struct pci_dev *pdev)
 	psli->sli_flag &= ~LPFC_SLI_ACTIVE;
 	spin_unlock_irq(&phba->hbalock);
 
+	/* Init cpu_map array */
+	lpfc_cpu_map_array_init(phba);
 	/* Configure and enable interrupt */
 	intr_mode = lpfc_sli4_enable_intr(phba, phba->intr_mode);
 	if (intr_mode == LPFC_INTR_ERROR) {

From patchwork Thu Mar 17 03:27:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Smart <jsmart2021@gmail.com>
X-Patchwork-Id: 552473
Return-Path: <linux-scsi-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
 aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
 by smtp.lore.kernel.org (Postfix) with ESMTP id 8294FC433F5
 for <linux-scsi@archiver.kernel.org>; Thu, 17 Mar 2022 03:28:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S231521AbiCQD3l (ORCPT <rfc822;linux-scsi@archiver.kernel.org>);
 Wed, 16 Mar 2022 23:29:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45958 "EHLO
 lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S229899AbiCQD3c (ORCPT
 <rfc822;linux-scsi@vger.kernel.org>); Wed, 16 Mar 2022 23:29:32 -0400
Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com
 [IPv6:2607:f8b0:4864:20::429])
 by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 550DC2459F
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:09 -0700 (PDT)
Received: by mail-pf1-x429.google.com with SMTP id t2so5735890pfj.10
 for <linux-scsi@vger.kernel.org>;
 Wed, 16 Mar 2022 20:28:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=iCHyUsk0ZznoID/NdRfLq/nAsK4oYMYrFSCQKyku1/0=;
 b=BxqA0rthWP/bxNRz2x9m9CWLaHKynu87dQBrMQiN+HSF9p5elxAo5pbhbaj4t8R7Pd
 hrbA6AnS1X8QQxzuQ2mrK/2oYYkvK1EriFxCljpOgIrTuta7f66ax/sS9VoTAyGu7eDa
 VT8CQ+SLPLl3vSjfbfYEaHu77PasR7GOOC9D+OTugGSbM/1joTLb2nVuWnrdyQuxTrTU
 wR4m4x2VZpEu3zohTKsOy0b52t9jDHmx43e9KmoX+qJqd5r8oFC43jy+4Gowd84vXWt6
 3MnuobFoG9RLZAZM/6Zd5GmtNR5TyYf2yxKJ1SOVw/Ogc+Xj3x1ZGCu2/lZ+umGWhf6p
 hkCw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=iCHyUsk0ZznoID/NdRfLq/nAsK4oYMYrFSCQKyku1/0=;
 b=06nqC7qJS/rKXEbQtdRxiA6lgrn4uh0aSzmnuT8asMDb/t75TSuIP30r5SYMFJXVVQ
 z3ELLi88WARwbhVYYgQvT3cQiyUfgSPCW3grjcDf3G9IlelcuQ2ZQyaab4dN3OxXylls
 PuQLNp1RMi5/GFNHbmkfhuiWZDCgufC843Ej8J0/VEZ7riJXZ0wBlPVgGm5CDtVtvJfH
 DCP7AnUynTI7PnzYQnBTlBQo8OQk/AK88/0HN1XTcZgJwMHRydrNFzCqJHA7qmr2nJSJ
 LI2mo80hdmNXmSTi1Tsbz9VrksSpc6EKXjfTtodJO38EHmJTCBUg8XQFi3ncbyVCxISD
 6tVQ==
X-Gm-Message-State: AOAM533WmCc6Kprt6EDVzwE0lgafXwdnATKhN8QdjX8TEx5utyT0SyJi
 kVTtOAxrrASiQd1T+7DgN+5hLFO2Eok=
X-Google-Smtp-Source: ABdhPJwKsecya+AOWhZZ+MdTK7NI19HTOYbMmt4l3Ha5YZlPGYeblPUR0xLDkKyhQFb6Dm7bOWpW7A==
X-Received: by 2002:a63:cf09:0:b0:372:d564:8024 with SMTP id
 j9-20020a63cf09000000b00372d5648024mr2079076pgg.251.1647487671689;
 Wed, 16 Mar 2022 20:27:51 -0700 (PDT)
Received: from mail-lvn-it-01.broadcom.com (ip174-67-196-173.oc.oc.cox.net.
 [174.67.196.173]) by smtp.gmail.com with ESMTPSA id
 k11-20020a056a00168b00b004f7e1555538sm5017511pfc.190.2022.03.16.20.27.50
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 16 Mar 2022 20:27:51 -0700 (PDT)
From: James Smart <jsmart2021@gmail.com>
To: linux-scsi@vger.kernel.org
Cc: James Smart <jsmart2021@gmail.com>, Justin Tee <justin.tee@broadcom.com>
Subject: [PATCH 4/4] lpfc: Update lpfc version to 14.2.0.1
Date: Wed, 16 Mar 2022 20:27:37 -0700
Message-Id: <20220317032737.45308-5-jsmart2021@gmail.com>
X-Mailer: git-send-email 2.26.2
In-Reply-To: <20220317032737.45308-1-jsmart2021@gmail.com>
References: <20220317032737.45308-1-jsmart2021@gmail.com>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-scsi.vger.kernel.org>
X-Mailing-List: linux-scsi@vger.kernel.org

Update lpfc version to 14.2.0.1

Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index e52f37e5d896..a4d3259b8c52 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package.                                     *
  *******************************************************************/
 
-#define LPFC_DRIVER_VERSION "14.2.0.0"
+#define LPFC_DRIVER_VERSION "14.2.0.1"
 #define LPFC_DRIVER_NAME		"lpfc"
 
 /* Used for SLI 2/3 */