diff mbox series

[3/9] lpfc: Account for fabric domain ctlr device loss recovery

Message ID 20230523183206.7728-4-justintee8345@gmail.com
State New
Headers show
Series lpfc: Update lpfc to revision 14.2.0.13 | expand

Commit Message

Justin Tee May 23, 2023, 6:32 p.m. UTC
Pre-existing device loss recovery logic via the NLP_IN_RECOV_POST_DEV_LOSS
flag only handled Fabric Port Login, Fabric Controller, Management, and
Name Server addresses.

Fabric domain controllers fall under the same category for usage of the
NLP_IN_RECOV_POST_DEV_LOSS flag.  Add a default case statement to mark
an ndlp for device loss recovery.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Comments

Martin Wilck May 31, 2023, 4:47 p.m. UTC | #1
On Tue, 2023-05-23 at 11:32 -0700, Justin Tee wrote:
> Pre-existing device loss recovery logic via the
> NLP_IN_RECOV_POST_DEV_LOSS
> flag only handled Fabric Port Login, Fabric Controller, Management,
> and
> Name Server addresses.
> 
> Fabric domain controllers fall under the same category for usage of
> the
> NLP_IN_RECOV_POST_DEV_LOSS flag.  Add a default case statement to
> mark
> an ndlp for device loss recovery.
> 
> Signed-off-by: Justin Tee <justin.tee@broadcom.com>

This patch fixed a customer issue for us.

Acked-by: Martin Wilck <mwilck@suse.com>


> ---
>  drivers/scsi/lpfc/lpfc_hbadisc.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c
> b/drivers/scsi/lpfc/lpfc_hbadisc.c
> index f99b5c206cdb..a5c69d4bf2e0 100644
> --- a/drivers/scsi/lpfc/lpfc_hbadisc.c
> +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
> @@ -458,11 +458,9 @@ lpfc_dev_loss_tmo_handler(struct lpfc_nodelist
> *ndlp)
>         if (ndlp->nlp_type & NLP_FABRIC) {
>                 spin_lock_irqsave(&ndlp->lock, iflags);
>  
> -               /* In massive vport configuration settings or when
> the FLOGI
> -                * completes with a sequence timeout, it's possible
> -                * dev_loss_tmo fired during node recovery.  The
> driver has to
> -                * account for this race to allow for recovery and
> keepThis patch fixed a customer issue for us.

Acked-by: Martin Wilck <mwilck@suse.com>

> -                * the reference counting correct.
> +               /* The driver has to account for a race between any
> fabric
> +                * node that's in recovery when dev_loss_tmo expires.
> When this
> +                * happens, the driver has to allow node recovery.
>                  */
>                 switch (ndlp->nlp_DID) {
>                 case Fabric_DID:
> @@ -489,6 +487,17 @@ lpfc_dev_loss_tmo_handler(struct lpfc_nodelist
> *ndlp)
>                             ndlp->nlp_state <=
> NLP_STE_REG_LOGIN_ISSUE)
>                                 recovering = true;
>                         break;
> +               default:
> +                       /* Ensure the nlp_DID at least has the
> correct prefix.
> +                        * The fabric domain controller's last three
> nibbles
> +                        * vary so we handle it in the default case.
> +                        */
> +                       if (ndlp->nlp_DID & Fabric_DID_MASK) {
> +                               if (ndlp->nlp_state >=
> NLP_STE_PLOGI_ISSUE &&
> +                                   ndlp->nlp_state <=
> NLP_STE_REG_LOGIN_ISSUE)
> +                                       recovering = true;
> +                       }
> +                       break;
>                 }
>                 spin_unlock_irqrestore(&ndlp->lock, iflags);
>
diff mbox series

Patch

diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index f99b5c206cdb..a5c69d4bf2e0 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -458,11 +458,9 @@  lpfc_dev_loss_tmo_handler(struct lpfc_nodelist *ndlp)
 	if (ndlp->nlp_type & NLP_FABRIC) {
 		spin_lock_irqsave(&ndlp->lock, iflags);
 
-		/* In massive vport configuration settings or when the FLOGI
-		 * completes with a sequence timeout, it's possible
-		 * dev_loss_tmo fired during node recovery.  The driver has to
-		 * account for this race to allow for recovery and keep
-		 * the reference counting correct.
+		/* The driver has to account for a race between any fabric
+		 * node that's in recovery when dev_loss_tmo expires. When this
+		 * happens, the driver has to allow node recovery.
 		 */
 		switch (ndlp->nlp_DID) {
 		case Fabric_DID:
@@ -489,6 +487,17 @@  lpfc_dev_loss_tmo_handler(struct lpfc_nodelist *ndlp)
 			    ndlp->nlp_state <= NLP_STE_REG_LOGIN_ISSUE)
 				recovering = true;
 			break;
+		default:
+			/* Ensure the nlp_DID at least has the correct prefix.
+			 * The fabric domain controller's last three nibbles
+			 * vary so we handle it in the default case.
+			 */
+			if (ndlp->nlp_DID & Fabric_DID_MASK) {
+				if (ndlp->nlp_state >= NLP_STE_PLOGI_ISSUE &&
+				    ndlp->nlp_state <= NLP_STE_REG_LOGIN_ISSUE)
+					recovering = true;
+			}
+			break;
 		}
 		spin_unlock_irqrestore(&ndlp->lock, iflags);