mbox series

[PATCHv2,00/51] SCSI EH argument reshuffle part II

Message ID 20210817091456.73342-1-hare@suse.de
Headers show
Series SCSI EH argument reshuffle part II | expand

Message

Hannes Reinecke Aug. 17, 2021, 9:14 a.m. UTC
Hi all,

finally here's the patchset to revamp the SCSI EH callback arguments
which I promised to do (some years ago ...).
(And, hint: do not order the series by date; some patches are _old_)

The overall idea is to match the scope of the eh_XXX callbacks with
the appropriate argument, eg eh_device_reset_handler() should have a
scsi device as argument etc.
Relying on the scsi command has the problem that
a) we're holding a reference on that command for the entire lifetime
of the error handling and
b) it leads to some 'interesting' driver implementations; some
higher-level EH callback implementations go through the list of
outstanding commands and try to abort this particular command only;
makes one wonder what'll happen to the other commands ...

However, this patchset has the nice side-effect that we don't need to
allocate an out-of-order scsi command in scsi_ioctl_reset(), but can
call the function directly.

This is the second part which rearranges the individual EH handler
implementation to not rely on the passed in scsi command and do
the final conversion to the new calling convention.

The entire patchset can be found at

https://git.kernel.org/hare/scsi-devel/h/eh-rework.v2

As usual, comments and reviews are welcome.

Changes to the original submission:
- Updated to 5.15/scsi-queue
- Add patches for mpi3mr driver

Hannes Reinecke (51):
  lpfc: kill lpfc_bus_reset_handler
  lpfc: drop lpfc_no_handler()
  sym53c8xx_2: split off bus reset from host reset
  ips: Do not try to abort command from host reset
  snic: reserve tag for TMF
  qla1280: separate out host reset function from qla1280_error_action()
  megaraid: pass in NULL scb for host reset
  zfcp: open-code fc_block_scsi_eh() for host reset
  mpi3mr: split off bus_reset function from host_reset
  scsi: Use Scsi_Host as argument for eh_host_reset_handler
  mptfc: simplify mpt_fc_block_error_handler()
  mptfusion: correct definitions for mptscsih_dev_reset()
  mptfc: open-code mptfc_block_error_handler() for bus reset
  pmcraid: Select device in pmcraid_eh_bus_reset_handler()
  qla2xxx: open-code qla2xxx_generic_reset()
  qla2xxx: Do not call fc_block_scsi_eh() during bus reset
  visorhba: select first device on the bus for bus_reset()
  ncr53c8xx: remove 'sync_reset' argument from ncr_reset_bus()
  ncr53c8xx: Complete all commands during bus reset
  ncr53c8xx: Remove unused code
  scsi: Use Scsi_Host and channel number as argument for
    eh_bus_reset_handler()
  libiscsi: use cls_session as argument for target and session reset
  bnx2fc: Do not rely on a scsi command when issueing lun or target
    reset
  ibmvfc: open-code reset loop for target reset
  lpfc: use fc_block_rport()
  lpfc: use rport as argument for lpfc_send_taskmgmt()
  lpfc: use rport as argument for lpfc_chk_tgt_mapped()
  csiostor: use fc_block_rport()
  qla2xxx: use fc_block_rport()
  fc_fcp: use fc_block_rport()
  qedf: use fc rport as argument for qedf_initiate_tmf()
  sym53c8xx_2: rework reset handling
  bfa: Do not use scsi command to signal TMF status
  scsi_transport_iscsi: use session as argument for
    iscsi_block_scsi_eh()
  pmcraid: select first available device for target reset
  scsi: Use scsi_target as argument for eh_target_reset_handler()
  aha152x: look for stuck command when resetting device
  fnic: use dedicated device reset command
  a1000u2w: do not rely on the command for inia100_device_reset()
  aic7xxx: use scsi device as argument for BUILD_SCSIID()
  aic79xx: use scsi device as argument for BUILD_SCSIID()
  aic7xxx: do not reference scsi command when resetting device
  aic79xx: do not reference scsi command when resetting device
  xen-scsifront: add scsi device as argument to scsifront_do_request()
  fas216: Rework device reset to not rely on SCSI command pointer
  csiostor: use separate TMF command
  snic: use dedicated device reset command
  snic: Use scsi_host_busy_iter() to traverse commands
  scsi: Move eh_device_reset_handler() to use scsi_device as argument
  scsi: Do not allocate scsi command in scsi_ioctl_reset()
  scsi_error: streamline scsi_eh_bus_device_reset()

 Documentation/scsi/scsi_eh.rst                |  16 +-
 Documentation/scsi/scsi_mid_low_api.rst       |  31 +-
 drivers/infiniband/ulp/srp/ib_srp.c           |  12 +-
 drivers/message/fusion/mptfc.c                |  99 ++++--
 drivers/message/fusion/mptsas.c               |  10 +-
 drivers/message/fusion/mptscsih.c             | 115 ++++--
 drivers/message/fusion/mptscsih.h             |   7 +-
 drivers/message/fusion/mptspi.c               |   8 +-
 drivers/s390/scsi/zfcp_scsi.c                 |  41 ++-
 drivers/scsi/3w-9xxx.c                        |  11 +-
 drivers/scsi/3w-sas.c                         |  11 +-
 drivers/scsi/3w-xxxx.c                        |  11 +-
 drivers/scsi/53c700.c                         |  39 +-
 drivers/scsi/BusLogic.c                       |  14 +-
 drivers/scsi/NCR5380.c                        |   3 +-
 drivers/scsi/a100u2w.c                        |  52 +--
 drivers/scsi/aacraid/linit.c                  |  35 +-
 drivers/scsi/advansys.c                       |  26 +-
 drivers/scsi/aha152x.c                        |  34 +-
 drivers/scsi/aha1542.c                        |  30 +-
 drivers/scsi/aic7xxx/aic79xx_osm.c            |  64 ++--
 drivers/scsi/aic7xxx/aic7xxx_osm.c            | 120 ++++---
 drivers/scsi/arcmsr/arcmsr_hba.c              |   6 +-
 drivers/scsi/arm/acornscsi.c                  |   8 +-
 drivers/scsi/arm/fas216.c                     |  55 ++-
 drivers/scsi/arm/fas216.h                     |  17 +-
 drivers/scsi/atari_scsi.c                     |   4 +-
 drivers/scsi/be2iscsi/be_main.c               |  18 +-
 drivers/scsi/bfa/bfad_im.c                    | 113 +++---
 drivers/scsi/bfa/bfad_im.h                    |   2 +
 drivers/scsi/bnx2fc/bnx2fc.h                  |   5 +-
 drivers/scsi/bnx2fc/bnx2fc_hwi.c              |  14 +-
 drivers/scsi/bnx2fc/bnx2fc_io.c               |  95 +++--
 drivers/scsi/csiostor/csio_hw.h               |   2 +
 drivers/scsi/csiostor/csio_init.c             |   2 +-
 drivers/scsi/csiostor/csio_scsi.c             |  52 ++-
 drivers/scsi/cxlflash/main.c                  |  10 +-
 drivers/scsi/dc395x.c                         |  25 +-
 drivers/scsi/dpt_i2o.c                        |  43 +--
 drivers/scsi/dpti.h                           |   6 +-
 drivers/scsi/esas2r/esas2r.h                  |   8 +-
 drivers/scsi/esas2r/esas2r_main.c             |  55 +--
 drivers/scsi/esp_scsi.c                       |   8 +-
 drivers/scsi/fdomain.c                        |   3 +-
 drivers/scsi/fnic/fnic.h                      |   4 +-
 drivers/scsi/fnic/fnic_scsi.c                 | 149 +++-----
 drivers/scsi/hpsa.c                           |  14 +-
 drivers/scsi/hptiop.c                         |   6 +-
 drivers/scsi/ibmvscsi/ibmvfc.c                |  55 +--
 drivers/scsi/ibmvscsi/ibmvscsi.c              |  23 +-
 drivers/scsi/imm.c                            |   6 +-
 drivers/scsi/initio.c                         |  11 +-
 drivers/scsi/ipr.c                            |  35 +-
 drivers/scsi/ips.c                            |  40 +--
 drivers/scsi/libfc/fc_fcp.c                   |  18 +-
 drivers/scsi/libiscsi.c                       |  38 +-
 drivers/scsi/libsas/sas_scsi_host.c           |  21 +-
 drivers/scsi/lpfc/lpfc_scsi.c                 | 160 ++-------
 drivers/scsi/mac53c94.c                       |   8 +-
 drivers/scsi/megaraid.c                       |  46 +--
 drivers/scsi/megaraid.h                       |   2 +-
 drivers/scsi/megaraid/megaraid_mbox.c         |  14 +-
 drivers/scsi/megaraid/megaraid_sas.h          |   3 +-
 drivers/scsi/megaraid/megaraid_sas_base.c     |  44 +--
 drivers/scsi/megaraid/megaraid_sas_fusion.c   |  56 +--
 drivers/scsi/mesh.c                           |  10 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c               | 144 ++++----
 drivers/scsi/mpt3sas/mpt3sas_scsih.c          |  80 ++---
 drivers/scsi/mvumi.c                          |   7 +-
 drivers/scsi/myrb.c                           |   3 +-
 drivers/scsi/myrs.c                           |   3 +-
 drivers/scsi/ncr53c8xx.c                      | 203 +----------
 drivers/scsi/nsp32.c                          |  12 +-
 drivers/scsi/pcmcia/nsp_cs.c                  |  10 +-
 drivers/scsi/pcmcia/nsp_cs.h                  |   6 +-
 drivers/scsi/pcmcia/qlogic_stub.c             |   4 +-
 drivers/scsi/pcmcia/sym53c500_cs.c            |   8 +-
 drivers/scsi/pmcraid.c                        |  73 +++-
 drivers/scsi/ppa.c                            |   6 +-
 drivers/scsi/qedf/qedf.h                      |   5 +-
 drivers/scsi/qedf/qedf_io.c                   |  80 ++---
 drivers/scsi/qedf/qedf_main.c                 |  24 +-
 drivers/scsi/qedi/qedi_iscsi.c                |   3 +-
 drivers/scsi/qla1280.c                        |  74 ++--
 drivers/scsi/qla2xxx/qla_os.c                 | 155 ++++----
 drivers/scsi/qla4xxx/ql4_os.c                 |  85 +++--
 drivers/scsi/qlogicfas408.c                   |  10 +-
 drivers/scsi/qlogicfas408.h                   |   2 +-
 drivers/scsi/qlogicpti.c                      |   3 +-
 drivers/scsi/scsi_debug.c                     |  78 ++--
 drivers/scsi/scsi_error.c                     | 160 +++++----
 drivers/scsi/scsi_transport_iscsi.c           |   6 +-
 drivers/scsi/smartpqi/smartpqi_init.c         |  11 +-
 drivers/scsi/snic/snic.h                      |   6 +-
 drivers/scsi/snic/snic_main.c                 |   3 +
 drivers/scsi/snic/snic_scsi.c                 | 333 ++++++++----------
 drivers/scsi/stex.c                           |   7 +-
 drivers/scsi/storvsc_drv.c                    |   4 +-
 drivers/scsi/sym53c8xx_2/sym_glue.c           | 190 ++++++----
 drivers/scsi/ufs/ufshcd.c                     |  14 +-
 drivers/scsi/virtio_scsi.c                    |  12 +-
 drivers/scsi/vmw_pvscsi.c                     |  20 +-
 drivers/scsi/wd33c93.c                        |   5 +-
 drivers/scsi/wd33c93.h                        |   2 +-
 drivers/scsi/wd719x.c                         |  17 +-
 drivers/scsi/xen-scsifront.c                  |  53 +--
 drivers/staging/rts5208/rtsx.c                |   4 +-
 .../staging/unisys/visorhba/visorhba_main.c   |  40 +--
 drivers/target/loopback/tcm_loop.c            |  17 +-
 drivers/usb/image/microtek.c                  |   4 +-
 drivers/usb/storage/scsiglue.c                |   8 +-
 drivers/usb/storage/uas.c                     |   3 +-
 include/scsi/libfc.h                          |   4 +-
 include/scsi/libiscsi.h                       |   6 +-
 include/scsi/libsas.h                         |   4 +-
 include/scsi/scsi_host.h                      |   8 +-
 include/scsi/scsi_transport_iscsi.h           |   2 +-
 117 files changed, 1960 insertions(+), 2164 deletions(-)

Comments

Benjamin Block Aug. 17, 2021, 11:53 a.m. UTC | #1
On Tue, Aug 17, 2021 at 11:14:13AM +0200, Hannes Reinecke wrote:
> @@ -383,9 +385,24 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>  	}
>  	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>  	zfcp_erp_wait(adapter);
> -	fc_ret = fc_block_scsi_eh(scpnt);
> -	if (fc_ret)
> -		ret = fc_ret;
> +retry_rport_blocked:
> +	spin_lock_irqsave(host->host_lock, flags);
> +	list_for_each_entry(port, &adapter->port_list, list) {

You need to take the `adapter->port_list_lock` to iterate over the `port_list`.

i.e.: read_lock_irqsave(&adapter->port_list_lock, flags);

> +		struct fc_rport *rport = port->rport;
> +
> +		if (rport->port_state == FC_PORTSTATE_BLOCKED) {
> +			if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
> +				ret = FAST_IO_FAIL;
> +			else
> +				ret = NEEDS_RETRY;
> +			break;
> +		}
> +	}
> +	spin_unlock_irqrestore(host->host_lock, flags);
> +	if (ret == NEEDS_RETRY) {
> +		msleep(1000);
> +		goto retry_rport_blocked;
> +	}

I really can't say I like this open coded FC code in the driver at all.

Is there a reason we can't use `fc_block_rport()` for all the rports of
the adapter?

We already do use it for other EH callbacks in the same file, and you
already look up the rports in the adapters rport-list; so using that on
the rports in the loop, instead of open-coding it doesn't seem bad? Or
is there a locking problem? 

We might waste a few cycles with that, but frankly, this is all in EH
and after adapter reset.. all performance concerns went our of the
window with that already.
Christoph Hellwig Aug. 17, 2021, 12:26 p.m. UTC | #2
On Tue, Aug 17, 2021 at 11:14:12AM +0200, Hannes Reinecke wrote:
> When calling a host reset we shouldn't rely on the command triggering
> the reset, so allow megaraid_abort_and_reset() to be called with a
> NULL scb.
> And drop the pointless 'bus_reset' and 'target_reset' handlers, which
> just call the same function as host_reset.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/megaraid.c | 42 ++++++++++++++++-------------------------
>  1 file changed, 16 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
> index 56910e94dbf2..7c53933fb1b4 100644
> --- a/drivers/scsi/megaraid.c
> +++ b/drivers/scsi/megaraid.c
> @@ -1905,7 +1905,7 @@ megaraid_reset(struct scsi_cmnd *cmd)
>  
>  	spin_lock_irq(&adapter->lock);
>  
> -	rval =  megaraid_abort_and_reset(adapter, cmd, SCB_RESET);
> +	rval =  megaraid_abort_and_reset(adapter, NULL, SCB_RESET);
>  
>  	/*
>  	 * This is required here to complete any completed requests
> @@ -1944,7 +1944,7 @@ megaraid_abort_and_reset(adapter_t *adapter, struct scsi_cmnd *cmd, int aor)
>  
>  		scb = list_entry(pos, scb_t, list);

Ther's a dev_warn before this, which dereferences cmd.

> -		if (scb->cmd == cmd) { /* Found command */
> +		if (!cmd || scb->cmd == cmd) { /* Found command */
>  
>  			scb->state |= aor;

But more importantly, this function doesn't make much sense for the
!cmd case, as it doesn't really do anything when not matched. It
seems like the legacy megaraid driver should just stop calling it
for resets.
Hannes Reinecke Aug. 17, 2021, 12:54 p.m. UTC | #3
On 8/17/21 1:53 PM, Benjamin Block wrote:
> On Tue, Aug 17, 2021 at 11:14:13AM +0200, Hannes Reinecke wrote:
>> @@ -383,9 +385,24 @@ static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
>>  	}
>>  	zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");
>>  	zfcp_erp_wait(adapter);
>> -	fc_ret = fc_block_scsi_eh(scpnt);
>> -	if (fc_ret)
>> -		ret = fc_ret;
>> +retry_rport_blocked:
>> +	spin_lock_irqsave(host->host_lock, flags);
>> +	list_for_each_entry(port, &adapter->port_list, list) {
> 
> You need to take the `adapter->port_list_lock` to iterate over the `port_list`.
> 
> i.e.: read_lock_irqsave(&adapter->port_list_lock, flags);
> 
>> +		struct fc_rport *rport = port->rport;
>> +
>> +		if (rport->port_state == FC_PORTSTATE_BLOCKED) {
>> +			if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)
>> +				ret = FAST_IO_FAIL;
>> +			else
>> +				ret = NEEDS_RETRY;
>> +			break;
>> +		}
>> +	}
>> +	spin_unlock_irqrestore(host->host_lock, flags);
>> +	if (ret == NEEDS_RETRY) {
>> +		msleep(1000);
>> +		goto retry_rport_blocked;
>> +	}
> 
> I really can't say I like this open coded FC code in the driver at all.
> 
> Is there a reason we can't use `fc_block_rport()` for all the rports of
> the adapter?
> 
> We already do use it for other EH callbacks in the same file, and you
> already look up the rports in the adapters rport-list; so using that on
> the rports in the loop, instead of open-coding it doesn't seem bad? Or
> is there a locking problem? 
> 
> We might waste a few cycles with that, but frankly, this is all in EH
> and after adapter reset.. all performance concerns went our of the
> window with that already.
> 

Question would be why we need to call fc_block_rport() at all in host reset.
To my understanding a host reset is expected to do a full resync of the
SAN topology, so the expectation is that after zfcp_erp_wait() the port
list is stable (ie the HBA has finished processing all RSCNs related to
the SAN resync).
So can't we just drop the fc_block_rport() call here?
All the other FC drivers do fine without that ...

Cheers,

Hannes
Hannes Reinecke Aug. 17, 2021, 2:05 p.m. UTC | #4
On 8/17/21 2:22 PM, Christoph Hellwig wrote:
> On Tue, Aug 17, 2021 at 11:14:11AM +0200, Hannes Reinecke wrote:
>> There's not much in common between host reset and all other error
>> handlers, so use a separate function here.
> 
> This loses the search in the internal queueing list.  Which is probably
> fine, but needs to be explained.
> 
Yeah, correct, I misread the comment.
Will be adding a 'qla1280_wait_for_pending_commands()'.

Cheers,

Hannes
Steffen Maier Aug. 17, 2021, 2:55 p.m. UTC | #5
On 8/17/21 11:14 AM, Hannes Reinecke wrote:
> Issuing a host reset should not rely on any commands.
> So use Scsi_Host as argument for eh_host_reset_handler.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Steffen Maier <maier@linux.ibm.com> # for zfcp and common code

Acked-by: Steffen Maier <maier@linux.ibm.com> # for zfcp

> ---
>   Documentation/scsi/scsi_eh.rst                |  2 +-
>   Documentation/scsi/scsi_mid_low_api.rst       |  4 +-

>   drivers/s390/scsi/zfcp_scsi.c                 |  3 +-

>   drivers/scsi/scsi_error.c                     |  2 +-

>   include/scsi/scsi_host.h                      |  2 +-
>   78 files changed, 271 insertions(+), 334 deletions(-)
> 
> diff --git a/Documentation/scsi/scsi_eh.rst b/Documentation/scsi/scsi_eh.rst
> index 7d78c2475615..1ca451ad57df 100644
> --- a/Documentation/scsi/scsi_eh.rst
> +++ b/Documentation/scsi/scsi_eh.rst
> @@ -216,7 +216,7 @@ considered to fail always.
>       int (* eh_abort_handler)(struct scsi_cmnd *);
>       int (* eh_device_reset_handler)(struct scsi_cmnd *);
>       int (* eh_bus_reset_handler)(struct scsi_cmnd *);
> -    int (* eh_host_reset_handler)(struct scsi_cmnd *);
> +    int (* eh_host_reset_handler)(struct Scsi_Host *);
> 
>   Higher-severity actions are taken only when lower-severity actions
>   cannot recover some of failed scmds.  Also, note that failure of the
> diff --git a/Documentation/scsi/scsi_mid_low_api.rst b/Documentation/scsi/scsi_mid_low_api.rst
> index 63ddea2b9640..784587ea7eee 100644
> --- a/Documentation/scsi/scsi_mid_low_api.rst
> +++ b/Documentation/scsi/scsi_mid_low_api.rst
> @@ -777,7 +777,7 @@ Details::
> 
>       /**
>       *      eh_host_reset_handler - reset host (host bus adapter)
> -    *      @scp: SCSI host that contains this device should be reset
> +    *      @shp: SCSI host that contains this device should be reset
>       *
>       *      Returns SUCCESS if command aborted else FAILED
>       *
> @@ -794,7 +794,7 @@ Details::
>       *
>       *      Optionally defined in: LLD
>       **/
> -	int eh_host_reset_handler(struct scsi_cmnd * scp)
> +	int eh_host_reset_handler(struct Scsi_Host * shp)
> 
> 
>       /**


> diff --git a/drivers/s390/scsi/zfcp_scsi.c b/drivers/s390/scsi/zfcp_scsi.c
> index 9393f1587e8a..8bfa8ffd9ff6 100644
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -371,9 +371,8 @@ static int zfcp_scsi_eh_target_reset_handler(struct scsi_cmnd *scpnt)
>   	return ret;
>   }
> 
> -static int zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)
> +static int zfcp_scsi_eh_host_reset_handler(struct Scsi_Host *host)
>   {
> -	struct Scsi_Host *host = scpnt->device->host;
>   	struct zfcp_adapter *adapter = (struct zfcp_adapter *)host->hostdata[0];
>   	int ret = SUCCESS;
>   	unsigned long flags;

> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 58a252c38992..8218e2976482 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -811,7 +811,7 @@ static enum scsi_disposition scsi_try_host_reset(struct scsi_cmnd *scmd)
>   	if (!hostt->eh_host_reset_handler)
>   		return FAILED;
> 
> -	rtn = hostt->eh_host_reset_handler(scmd);
> +	rtn = hostt->eh_host_reset_handler(host);
> 
>   	if (rtn == SUCCESS) {
>   		if (!hostt->skip_settle_delay)


> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 75363707b73f..3b1acf91f4d0 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -142,7 +142,7 @@ struct scsi_host_template {
>   	int (* eh_device_reset_handler)(struct scsi_cmnd *);
>   	int (* eh_target_reset_handler)(struct scsi_cmnd *);
>   	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
> -	int (* eh_host_reset_handler)(struct scsi_cmnd *);
> +	int (* eh_host_reset_handler)(struct Scsi_Host *);
> 
>   	/*
>   	 * Before the mid layer attempts to scan for a new device where none
>
James Smart Aug. 17, 2021, 5:37 p.m. UTC | #6
On 8/17/2021 2:14 AM, Hannes Reinecke wrote:
> lpfc_bus_reset_handler is really just a loop calling
> lpfc_target_reset_handler() over all targets, which is what
> the error handler will be doing anyway.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Cc: James Smart <james.smart@broadcom.com>
> ---
>   drivers/scsi/lpfc/lpfc_scsi.c | 91 -----------------------------------
>   1 file changed, 91 deletions(-)
> 

looks fine

Reviewed-by: James Smart <jsmart2021@gmail.com>

-- james
Steffen Maier Aug. 18, 2021, 11 a.m. UTC | #7
On 8/17/21 4:10 PM, Hannes Reinecke wrote:
> On 8/17/21 4:03 PM, Steffen Maier wrote:

>> On 8/17/21 2:54 PM, Hannes Reinecke wrote:

>>> On 8/17/21 1:53 PM, Benjamin Block wrote:

>>>> On Tue, Aug 17, 2021 at 11:14:13AM +0200, Hannes Reinecke wrote:

>>>>> @@ -383,9 +385,24 @@ static int

>>>>> zfcp_scsi_eh_host_reset_handler(struct scsi_cmnd *scpnt)

>>>>>        }

>>>>>        zfcp_erp_adapter_reopen(adapter, 0, "schrh_1");

>>>>>        zfcp_erp_wait(adapter);

>>>>> -    fc_ret = fc_block_scsi_eh(scpnt);

>>>>> -    if (fc_ret)

>>>>> -        ret = fc_ret;

>>>>> +retry_rport_blocked:

>>>>> +    spin_lock_irqsave(host->host_lock, flags);

>>>>> +    list_for_each_entry(port, &adapter->port_list, list) {

>>>>

>>>> You need to take the `adapter->port_list_lock` to iterate over the

>>>> `port_list`.

>>>>

>>>> i.e.: read_lock_irqsave(&adapter->port_list_lock, flags);

>>>>

>>>>> +        struct fc_rport *rport = port->rport;

>>>>> +

>>>>> +        if (rport->port_state == FC_PORTSTATE_BLOCKED) {

>>>>> +            if (rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT)

>>>>> +                ret = FAST_IO_FAIL;

>>>>> +            else

>>>>> +                ret = NEEDS_RETRY;

>>>>> +            break;

>>>>> +        }

>>>>> +    }

>>>>> +    spin_unlock_irqrestore(host->host_lock, flags);

>>>>> +    if (ret == NEEDS_RETRY) {

>>>>> +        msleep(1000);

>>>>> +        goto retry_rport_blocked;

>>>>> +    }

>>>>

>>>> I really can't say I like this open coded FC code in the driver at all.

>>>>

>>>> Is there a reason we can't use `fc_block_rport()` for all the rports of

>>>> the adapter?

>>

>> Waiting for all rports to unblock in host_reset has been on my todo list

>> since we prepared the eh callbacks to get rid of scsi_cmnd with v4.18

>> commits:

>> 674595d8519f ("scsi: zfcp: decouple our scsi_eh callbacks from scsi_cmnd")

>> 42afc6527d43 ("scsi: zfcp: decouple TMFs from scsi_cmnd by using

>> fc_block_rport")

>> 26f5fa9d47c1 ("scsi: zfcp: decouple SCSI setup of TMF from scsi_cmnd")

>> 39abb11aca00 ("scsi: zfcp: decouple FSF request setup of TMF from

>> scsi_cmnd")

>> e0116c91c7d8 ("scsi: zfcp: split FCP_CMND IU setup between SCSI I/O and

>> TMF again")

>> 266883f2f7d5 ("scsi: zfcp: decouple TMF response handler from scsi_cmnd")

>> 822121186375 ("scsi: zfcp: decouple SCSI traces for scsi_eh / TMF from

>> scsi_cmnd")

>>

>> But the synchronization is non-trivial as Benjamin's question shows.

>> There are also considerations about lock order, etc.

>>

>> I'm busy with other things, so don't hold your breath until I can review

>> and test the code; I don't want any regression in that recovery code.

>>

>>>> We already do use it for other EH callbacks in the same file, and you

>>>> already look up the rports in the adapters rport-list; so using that on

>>>> the rports in the loop, instead of open-coding it doesn't seem bad? Or

>>>> is there a locking problem?

>>>>

>>>> We might waste a few cycles with that, but frankly, this is all in EH

>>>> and after adapter reset.. all performance concerns went our of the

>>>> window with that already.

>>>>

>>>

>>> Question would be why we need to call fc_block_rport() at all in host

>>> reset.

>>> To my understanding a host reset is expected to do a full resync of the

>>> SAN topology, so the expectation is that after zfcp_erp_wait() the port

>>> list is stable (ie the HBA has finished processing all RSCNs related to

>>> the SAN resync).

>>

>> There is more to do in zfcp than in other FC HBA drivers, e.g. LUN open

>> recoveries and how they related to rport unblock:

>> v4.10 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN

>> recovery").

>> The rport unblock is async to our internal recovery. zfcp_erp_wait()

>> only waits for the latter by design.

>>

>>> So can't we just drop the fc_block_rport() call here?

>>

>> I don't think so.

>>

>>> All the other FC drivers do fine without that ...

>>

>> It would have been nice to have a common interface for all scsi_eh

>> scopes. I.e. fc_block_host(struct Scsi_Host*) like we already have for

>> fc_block_scsi_eh(struct scsi_cmnd*) and fc_block_rport(struct fc_rport*)

>> [the latter having been introduced at the time of above eh callback

>> preparations].

>> But if zfcp is the only one needing it for host_reset, having the code

>> only in zfcp seems fine to me.

>>

>>

> Right. Just wanted to clarify that.

> If we need to use fc_block_rport() in host reset so be it; just wanted

> to clarify if this _really_ is the case (and not just some copy'n'paste

> stuff).

> I'll be reworking the patch to call fc_block_rport().


On second thought, I might have been wrong.

The argument I used with the old commit was that we must not unblock the rport 
too early with regards to zfcp-internal recovery. This is fixed within zfcp 
recovery (erp) code. So after zfcp_erp_wait() in host_reset, this is still 
ensured; and eventually the rport unblock will occur.

I guess I was rather worried about returning from the host_reset callback with 
the async rport(s) unblock still pending. After all, (some) other reset_handler 
sync with rport unblock. However I cannot remember all details right now.

Before you invest more time into this, maybe just drop this patch from the 
series for now and we solve it later on? I mean it's not necessary for the 
reset_handler function signature change.


-- 
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z and LinuxONE

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Hannes Reinecke Aug. 18, 2021, 11:58 a.m. UTC | #8
On 8/18/21 1:00 PM, Steffen Maier wrote:
> On 8/17/21 4:10 PM, Hannes Reinecke wrote:

>> On 8/17/21 4:03 PM, Steffen Maier wrote:

[ .. ]
>>> It would have been nice to have a common interface for all scsi_eh

>>> scopes. I.e. fc_block_host(struct Scsi_Host*) like we already have for

>>> fc_block_scsi_eh(struct scsi_cmnd*) and fc_block_rport(struct fc_rport*)

>>> [the latter having been introduced at the time of above eh callback

>>> preparations].

>>> But if zfcp is the only one needing it for host_reset, having the code

>>> only in zfcp seems fine to me.

>>>

>>>

>> Right. Just wanted to clarify that.

>> If we need to use fc_block_rport() in host reset so be it; just wanted

>> to clarify if this _really_ is the case (and not just some copy'n'paste

>> stuff).

>> I'll be reworking the patch to call fc_block_rport().

> 

> On second thought, I might have been wrong.

> 

> The argument I used with the old commit was that we must not unblock the

> rport too early with regards to zfcp-internal recovery. This is fixed

> within zfcp recovery (erp) code. So after zfcp_erp_wait() in host_reset,

> this is still ensured; and eventually the rport unblock will occur.

> 

> I guess I was rather worried about returning from the host_reset

> callback with the async rport(s) unblock still pending. After all,

> (some) other reset_handler sync with rport unblock. However I cannot

> remember all details right now.

> 

> Before you invest more time into this, maybe just drop this patch from

> the series for now and we solve it later on? I mean it's not necessary

> for the reset_handler function signature change.

> 

Well, actually it is.
With the signature change host_reset is being called with a Scsi_Host
argument, so we cannot identify 'the' rport.
But I've modified the patch to cycle through all rports and call
fc_block_rport() on each of them.
That should be good enough for now.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
James Smart Aug. 18, 2021, 4:35 p.m. UTC | #9
On 8/17/2021 2:14 AM, Hannes Reinecke wrote:
> Use fc_block_rport() instead of fc_block_scsi_eh() as the SCSI command

> will be removed as argument for SCSI EH functions.

> 

> Signed-off-by: Hannes Reinecke <hare@suse.de>

> Cc: James Smart <james.smart@broadcom.com>

> ---

>   drivers/scsi/lpfc/lpfc_scsi.c | 13 ++++++++-----

>   1 file changed, 8 insertions(+), 5 deletions(-)

> 


looks good

Reviewed-by: James Smart <jsmart2021@gmail.com>


-- james
James Smart Aug. 18, 2021, 4:35 p.m. UTC | #10
On 8/17/2021 2:14 AM, Hannes Reinecke wrote:
> Instead of passing in a scsi device we should be using the rport;

> we already have the target and lun id as parameters, so there's

> no need to pass the scsi device, too.

> 

> Signed-off-by: Hannes Reinecke <hare@suse.com>

> Cc: James Smart <james.smart@broadcom.com>

> ---

>   drivers/scsi/lpfc/lpfc_scsi.c | 12 ++++++------

>   1 file changed, 6 insertions(+), 6 deletions(-)

> 


looks good

Reviewed-by: James Smart <jsmart2021@gmail.com>


-- james
Christoph Hellwig Feb. 23, 2022, 12:49 p.m. UTC | #11
On Tue, Aug 17, 2021 at 02:55:25PM +0200, Hannes Reinecke wrote:
> On 8/17/21 2:13 PM, Christoph Hellwig wrote:
> > First, thanks for resurrecting the series.
> > 
> > Second, this giant patchbomb is almost impossible to review.  It might
> > make sense to resend what is the prep patches without the prototype
> > changes after the first round of review - maybe we can squeeze those
> > into 5.15 still and do the heavy lifting with another series per
> > actually changes method or so.
> > 
> Sure, can do.

Do you have another chunk of these patches ready?  It would be so nice
to see the EH code finally cleaned up..
Hannes Reinecke Feb. 23, 2022, 1:03 p.m. UTC | #12
On 2/23/22 13:49, Christoph Hellwig wrote:
> On Tue, Aug 17, 2021 at 02:55:25PM +0200, Hannes Reinecke wrote:
>> On 8/17/21 2:13 PM, Christoph Hellwig wrote:
>>> First, thanks for resurrecting the series.
>>>
>>> Second, this giant patchbomb is almost impossible to review.  It might
>>> make sense to resend what is the prep patches without the prototype
>>> changes after the first round of review - maybe we can squeeze those
>>> into 5.15 still and do the heavy lifting with another series per
>>> actually changes method or so.
>>>
>> Sure, can do.
> 
> Do you have another chunk of these patches ready?  It would be so nice
> to see the EH code finally cleaned up..

Sure. Against which tree?
The SCSI pointer and scsi_request removal will have an impact onto these 
patches I guess, so I'd rather code against those patches folded in.

Cheers,

Hannes
Christoph Hellwig Feb. 23, 2022, 1:05 p.m. UTC | #13
On Wed, Feb 23, 2022 at 02:03:01PM +0100, Hannes Reinecke wrote:
> Sure. Against which tree?
> The SCSI pointer and scsi_request removal will have an impact onto these 
> patches I guess, so I'd rather code against those patches folded in.

The SCSI pointer removal already is in Martin's staging tree, so that
is a good baseline.  I don't think the the scsi_request removal should
have much overlap with the various driver patches and just a trivial
overlap with the scsi_ioctl_reset changes.

So maybe just prepare a first batch that is needed for the eh_host_reset
prototype changes against Martin's staging tree?