mbox series

[00/15] qla2xxx bug fixes

Message ID 20201201082730.24158-1-njavali@marvell.com
Headers show
Series qla2xxx bug fixes | expand

Message

Nilesh Javali Dec. 1, 2020, 8:27 a.m. UTC
Martin,
Please apply the qla2xxx bug fixes to the scsi tree at your
earliest convenience.

Thanks,
Nilesh

Arun Easi (5):
  qla2xxx: Fix compilation issue in PPC systems
  qla2xxx: Fix crash during driver load on big endian machines
  qla2xxx: Fix FW initialization error on big endian machines
  qla2xxx: Fix flash update in 28XX adapters on big endian machines
  qla2xxx: Fix device loss on 4G and older HBAs.

Daniel Wagner (1):
  scsi: qla2xxx: Return EBUSY on fcport deletion

Nilesh Javali (1):
  qla2xxx: Update version to 10.02.00.104-k

Quinn Tran (3):
  qla2xxx: limit interrupt vectors to number of cpu
  qla2xxx: tear down session if FW say its down
  qla2xxx: fix N2N and NVME connect retry failure

Saurav Kashyap (5):
  qla2xxx: Change post del message from debug level to log level
  qla2xxx: Don't check for fw_started while posting nvme command
  qla2xxx: Handle aborts correctly for port undergoing deletion
  qla2xxx: Fix the call trace for flush workqueue
  qla2xxx: If fcport is undergoing deletion return IO with retry

 drivers/scsi/qla2xxx/qla_gs.c      |  8 ++--
 drivers/scsi/qla2xxx/qla_init.c    | 74 ++++++++++++++++++++++--------
 drivers/scsi/qla2xxx/qla_isr.c     | 34 ++++++++++++--
 drivers/scsi/qla2xxx/qla_mbx.c     |  9 ++--
 drivers/scsi/qla2xxx/qla_nvme.c    | 14 +++---
 drivers/scsi/qla2xxx/qla_nx.c      |  2 +-
 drivers/scsi/qla2xxx/qla_nx2.c     |  4 +-
 drivers/scsi/qla2xxx/qla_os.c      | 10 ++--
 drivers/scsi/qla2xxx/qla_sup.c     | 10 ++--
 drivers/scsi/qla2xxx/qla_tmpl.c    |  9 ++--
 drivers/scsi/qla2xxx/qla_tmpl.h    |  2 +-
 drivers/scsi/qla2xxx/qla_version.h |  4 +-
 12 files changed, 120 insertions(+), 60 deletions(-)


base-commit: cf4d4d8ebdb838ee996e09e3ee18deb9a7737dea

Comments

Saurav Kashyap Dec. 1, 2020, 9:39 a.m. UTC | #1
Hi Daniel,
Comments inline..

> -----Original Message-----
> From: Daniel Wagner <dwagner@suse.de>
> Sent: Tuesday, December 1, 2020 2:32 PM
> To: Nilesh Javali <njavali@marvell.com>
> Cc: martin.petersen@oracle.com; linux-scsi@vger.kernel.org; GR-QLogic-
> Storage-Upstream <GR-QLogic-Storage-Upstream@marvell.com>
> Subject: Re: [PATCH 05/15] qla2xxx: Don't check for fw_started while posting
> nvme command
> 
> On Tue, Dec 01, 2020 at 12:27:20AM -0800, Nilesh Javali wrote:
> > From: Saurav Kashyap <skashyap@marvell.com>
> >
> > NVMe commands can come only after successful addition of rport and nvme
> > connect, and rport is only registered after FW started bit is set. Remove the
> > redundant check.
> >
> > Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> > Signed-off-by: Nilesh Javali <njavali@marvell.com>
> > ---
> >  drivers/scsi/qla2xxx/qla_nvme.c | 12 ++++--------
> >  1 file changed, 4 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/scsi/qla2xxx/qla_nvme.c
> b/drivers/scsi/qla2xxx/qla_nvme.c
> > index b7a1dc24db38..d4159d5a4ffd 100644
> > --- a/drivers/scsi/qla2xxx/qla_nvme.c
> > +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> > @@ -554,19 +554,15 @@ static int qla_nvme_post_cmd(struct
> nvme_fc_local_port *lport,
> >
> >  	fcport = qla_rport->fcport;
> >
> > -	if (!qpair || !fcport)
> > -		return -ENODEV;
> > -
> > -	if (!qpair->fw_started || fcport->deleted)
> > +	if (unlikely(!qpair || !fcport || fcport->deleted))
> >  		return -EBUSY;
> 
> This reverts the fix from patch #1 in this series. What's the reasoning
> that needs to return EBUSY when !qpair || !fcport is true?
<SK> Ideally driver should not hit (!qpair || !fcport) case.  The patch was to remove fw_started flag and consolidate other checks.
We want IO to retry until remote port is deleted and below condition is hit.
----------<condition>--------
        if (!(fcport->nvme_flag & NVME_FLAG_REGISTERED))
                return rval;
----------<condition>---------

Thanks,
~Saurav
Daniel Wagner Dec. 1, 2020, 9:53 a.m. UTC | #2
Hi Saurav

On Tue, Dec 01, 2020 at 09:39:05AM +0000, Saurav Kashyap wrote:
> > > --- a/drivers/scsi/qla2xxx/qla_nvme.c
> > > +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> > > @@ -554,19 +554,15 @@ static int qla_nvme_post_cmd(struct
> > nvme_fc_local_port *lport,
> > >
> > >  	fcport = qla_rport->fcport;
> > >
> > > -	if (!qpair || !fcport)
> > > -		return -ENODEV;
> > > -
> > > -	if (!qpair->fw_started || fcport->deleted)
> > > +	if (unlikely(!qpair || !fcport || fcport->deleted))
> > >  		return -EBUSY;
> > 
> > This reverts the fix from patch #1 in this series. What's the reasoning
> > that needs to return EBUSY when !qpair || !fcport is true?
>
> Ideally driver should not hit (!qpair || !fcport) case.  The patch was
> to remove fw_started flag and consolidate other checks.

Looking again on the patch I think I got confused.

> We want IO to retry until remote port is deleted and below condition is hit.

The result of this patch is that in EBUSY will be returned in all the
cases, not just for the case of fcport->deleted. So all is good from my
point of view. Thanks for explaining.

Sorry for the noise.

Thanks,
Daniel
Himanshu Madhani Dec. 1, 2020, 3:40 p.m. UTC | #3
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Daniel Wagner <dwagner@suse.de>
> 
> When the fcport is about to be deleted we should return EBUSY instead of
> ENODEV. Only for EBUSY will the request be requeued in a multipath setup.
> 
> Also return EBUSY when the firmware has not yet started to avoid dropping
> the request.
> 
> Link: https://lore.kernel.org/r/20201014073048.36219-1-dwagner@suse.de
> Reviewed-by: Arun Easi <aeasi@marvell.com>
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
> drivers/scsi/qla2xxx/qla_nvme.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
> index 1f9005125313..b7a1dc24db38 100644
> --- a/drivers/scsi/qla2xxx/qla_nvme.c
> +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> @@ -554,10 +554,12 @@ static int qla_nvme_post_cmd(struct nvme_fc_local_port *lport,
> 
> 	fcport = qla_rport->fcport;
> 
> -	if (!qpair || !fcport || (qpair && !qpair->fw_started) ||
> -	    (fcport && fcport->deleted))
> +	if (!qpair || !fcport)
> 		return -ENODEV;
> 
> +	if (!qpair->fw_started || fcport->deleted)
> +		return -EBUSY;
> +
> 	vha = fcport->vha;
> 
> 	if (!(fcport->nvme_flag & NVME_FLAG_REGISTERED))
> -- 
> 2.19.0.rc0
> 

Looks Good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering
Himanshu Madhani Dec. 1, 2020, 3:46 p.m. UTC | #4
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Quinn Tran <qutran@marvell.com>
> 
> Driver created too many QPairs(126) with 28xx adapter.
> Limit the number of CPUs to lower wasted resources.
> 
> Signed-off-by: Quinn Tran <qutran@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_isr.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c
> index a24b82de4aab..77dd7630c3f8 100644
> --- a/drivers/scsi/qla2xxx/qla_isr.c
> +++ b/drivers/scsi/qla2xxx/qla_isr.c
> @@ -3952,10 +3952,12 @@ qla24xx_enable_msix(struct qla_hw_data *ha, struct rsp_que *rsp)
> 	if (USER_CTRL_IRQ(ha) || !ha->mqiobase) {
> 		/* user wants to control IRQ setting for target mode */
> 		ret = pci_alloc_irq_vectors(ha->pdev, min_vecs,
> -		    ha->msix_count, PCI_IRQ_MSIX);
> +		    min((u16)ha->msix_count, (u16)num_online_cpus()),
> +		    PCI_IRQ_MSIX);
> 	} else
> 		ret = pci_alloc_irq_vectors_affinity(ha->pdev, min_vecs,
> -		    ha->msix_count, PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
> +		    min((u16)ha->msix_count, (u16)num_online_cpus()),
> +		    PCI_IRQ_MSIX | PCI_IRQ_AFFINITY,
> 		    &desc);
> 
> 	if (ret < 0) {
> -- 
> 2.19.0.rc0
> 

Looks good.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering
Himanshu Madhani Dec. 1, 2020, 3:50 p.m. UTC | #5
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Saurav Kashyap <skashyap@marvell.com>
> 
> NVMe commands can come only after successful addition of rport and nvme
> connect, and rport is only registered after FW started bit is set. Remove the
> redundant check.
> 
> Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_nvme.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_nvme.c b/drivers/scsi/qla2xxx/qla_nvme.c
> index b7a1dc24db38..d4159d5a4ffd 100644
> --- a/drivers/scsi/qla2xxx/qla_nvme.c
> +++ b/drivers/scsi/qla2xxx/qla_nvme.c
> @@ -554,19 +554,15 @@ static int qla_nvme_post_cmd(struct nvme_fc_local_port *lport,
> 
> 	fcport = qla_rport->fcport;
> 
> -	if (!qpair || !fcport)
> -		return -ENODEV;
> -
> -	if (!qpair->fw_started || fcport->deleted)
> +	if (unlikely(!qpair || !fcport || fcport->deleted))
> 		return -EBUSY;
> 
> -	vha = fcport->vha;
> -
> 	if (!(fcport->nvme_flag & NVME_FLAG_REGISTERED))
> 		return -ENODEV;
> 
> -	if (test_bit(ABORT_ISP_ACTIVE, &vha->dpc_flags) ||
> -	    (qpair && !qpair->fw_started) || fcport->deleted)
> +	vha = fcport->vha;
> +
> +	if (test_bit(ABORT_ISP_ACTIVE, &vha->dpc_flags))
> 		return -EBUSY;
> 
> 	/*
> -- 
> 2.19.0.rc0
> 

Looks Good

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering
Himanshu Madhani Dec. 1, 2020, 3:54 p.m. UTC | #6
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Arun Easi <aeasi@marvell.com>
> 
> Crash stack:
> 	[576544.715489] Unable to handle kernel paging request for data at address 0xd00000000f970000
> 	[576544.715497] Faulting instruction address: 0xd00000000f880f64
> 	[576544.715503] Oops: Kernel access of bad area, sig: 11 [#1]
> 	[576544.715506] SMP NR_CPUS=2048 NUMA pSeries
> 	:
> 	[576544.715703] NIP [d00000000f880f64] .qla27xx_fwdt_template_valid+0x94/0x100 [qla2xxx]
> 	[576544.715722] LR [d00000000f7952dc] .qla24xx_load_risc_flash+0x2fc/0x590 [qla2xxx]
> 	[576544.715726] Call Trace:
> 	[576544.715731] [c0000004d0ffb000] [c0000006fe02c350] 0xc0000006fe02c350 (unreliable)
> 	[576544.715750] [c0000004d0ffb080] [d00000000f7952dc] .qla24xx_load_risc_flash+0x2fc/0x590 [qla2xxx]
> 	[576544.715770] [c0000004d0ffb170] [d00000000f7aa034] .qla81xx_load_risc+0x84/0x1a0 [qla2xxx]
> 	[576544.715789] [c0000004d0ffb210] [d00000000f79f7c8] .qla2x00_setup_chip+0xc8/0x910 [qla2xxx]
> 	[576544.715808] [c0000004d0ffb300] [d00000000f7a631c] .qla2x00_initialize_adapter+0x4dc/0xb00 [qla2xxx]
> 	[576544.715826] [c0000004d0ffb3e0] [d00000000f78ce28] .qla2x00_probe_one+0xf08/0x2200 [qla2xxx]
> 
> Fixes: f73cb695d3ec ("[SCSI] qla2xxx: Add support for ISP2071.")
> Signed-off-by: Arun Easi <aeasi@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>

I think this needs stable tag as well. 

> ---
> drivers/scsi/qla2xxx/qla_tmpl.c | 9 +++++----
> drivers/scsi/qla2xxx/qla_tmpl.h | 2 +-
> 2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_tmpl.c b/drivers/scsi/qla2xxx/qla_tmpl.c
> index 84f4416d366f..a6bb1c0e2245 100644
> --- a/drivers/scsi/qla2xxx/qla_tmpl.c
> +++ b/drivers/scsi/qla2xxx/qla_tmpl.c
> @@ -928,7 +928,8 @@ qla27xx_template_checksum(void *p, ulong size)
> static inline int
> qla27xx_verify_template_checksum(struct qla27xx_fwdt_template *tmp)
> {
> -	return qla27xx_template_checksum(tmp, tmp->template_size) == 0;
> +	return qla27xx_template_checksum(tmp,
> +		le32_to_cpu(tmp->template_size)) == 0;
> }
> 
> static inline int
> @@ -944,7 +945,7 @@ qla27xx_execute_fwdt_template(struct scsi_qla_host *vha,
> 	ulong len = 0;
> 
> 	if (qla27xx_fwdt_template_valid(tmp)) {
> -		len = tmp->template_size;
> +		len = le32_to_cpu(tmp->template_size);
> 		tmp = memcpy(buf, tmp, len);
> 		ql27xx_edit_template(vha, tmp);
> 		qla27xx_walk_template(vha, tmp, buf, &len);
> @@ -960,7 +961,7 @@ qla27xx_fwdt_calculate_dump_size(struct scsi_qla_host *vha, void *p)
> 	ulong len = 0;
> 
> 	if (qla27xx_fwdt_template_valid(tmp)) {
> -		len = tmp->template_size;
> +		len = le32_to_cpu(tmp->template_size);
> 		qla27xx_walk_template(vha, tmp, NULL, &len);
> 	}
> 
> @@ -972,7 +973,7 @@ qla27xx_fwdt_template_size(void *p)
> {
> 	struct qla27xx_fwdt_template *tmp = p;
> 
> -	return tmp->template_size;
> +	return le32_to_cpu(tmp->template_size);
> }
> 
> int
> diff --git a/drivers/scsi/qla2xxx/qla_tmpl.h b/drivers/scsi/qla2xxx/qla_tmpl.h
> index c47184db5081..6e0987edfceb 100644
> --- a/drivers/scsi/qla2xxx/qla_tmpl.h
> +++ b/drivers/scsi/qla2xxx/qla_tmpl.h
> @@ -12,7 +12,7 @@
> struct __packed qla27xx_fwdt_template {
> 	__le32 template_type;
> 	__le32 entry_offset;
> -	uint32_t template_size;
> +	__le32 template_size;
> 	uint32_t count;		/* borrow field for running/residual count */
> 
> 	__le32 entry_count;
> -- 
> 2.19.0.rc0
> 

Otherwise.. Looks good

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering
Himanshu Madhani Dec. 1, 2020, 4:21 p.m. UTC | #7
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> From: Saurav Kashyap <skashyap@marvell.com>
> 
> Driver unload with IOs causes server to crash.
> Return IO with retry if fcport undergoing deletion.
> 

Any call stack or panic signature to share in commit message?

> Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_os.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index a75edba2b334..be9d10092dd3 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -884,8 +884,8 @@ qla2xxx_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
> 			goto qc24_fail_command;
> 	}
> 
> -	if (!fcport) {
> -		cmd->result = DID_NO_CONNECT << 16;
> +	if (!fcport || fcport->deleted) {
> +		cmd->result = DID_IMM_RETRY << 16;
> 		goto qc24_fail_command;
> 	}
> 
> @@ -966,8 +966,8 @@ qla2xxx_mqueuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd,
> 		goto qc24_fail_command;
> 	}
> 
> -	if (!fcport) {
> -		cmd->result = DID_NO_CONNECT << 16;
> +	if (!fcport || fcport->deleted) {
> +		cmd->result = DID_IMM_RETRY << 16;
> 		goto qc24_fail_command;
> 	}
> 
> -- 
> 2.19.0.rc0
> 

Patch itself looks good.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering
Himanshu Madhani Dec. 1, 2020, 4:23 p.m. UTC | #8
> On Dec 1, 2020, at 2:27 AM, Nilesh Javali <njavali@marvell.com> wrote:
> 
> Signed-off-by: Nilesh Javali <njavali@marvell.com>
> ---
> drivers/scsi/qla2xxx/qla_version.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_version.h b/drivers/scsi/qla2xxx/qla_version.h
> index c2d4da52f4a9..ccec858875dd 100644
> --- a/drivers/scsi/qla2xxx/qla_version.h
> +++ b/drivers/scsi/qla2xxx/qla_version.h
> @@ -6,9 +6,9 @@
> /*
>  * Driver version
>  */
> -#define QLA2XXX_VERSION      "10.02.00.103-k"
> +#define QLA2XXX_VERSION      "10.02.00.104-k"
> 
> #define QLA_DRIVER_MAJOR_VER	10
> #define QLA_DRIVER_MINOR_VER	2
> #define QLA_DRIVER_PATCH_VER	0
> -#define QLA_DRIVER_BETA_VER	103
> +#define QLA_DRIVER_BETA_VER	104
> -- 
> 2.19.0.rc0
> 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>

--
Himanshu Madhani	 Oracle Linux Engineering