diff mbox series

scsi: qla2xx: wait for stop_phase1 at wwn removal

Message ID 20210415203554.27890-1-d.bogdanov@yadro.com
State New
Headers show
Series scsi: qla2xx: wait for stop_phase1 at wwn removal | expand

Commit Message

Dmitry Bogdanov April 15, 2021, 8:35 p.m. UTC
Target de-configuration panics at high CPU load.
TPGT and WWPN can be removed on separate threads.
TPGT removal requests a reset HBA on a separate thread and waits for
reset complete (qlt_stop_phase1). Due to high CPU load that HBA reset
can be delayed for some time.
WWPN removal does qlt_stop_phase2 where it is thinked that phase1
has been already finished and zeroed tgt.tgt_ops that is used by
incoming traffic and causes several panics:

NIP qlt_reset+0x7c/0x220 [qla2xxx]
LR  qlt_reset+0x68/0x220 [qla2xxx]
Call Trace:
0xc000003ffff63a78 (unreliable)
qlt_handle_imm_notify+0x800/0x10c0 [qla2xxx]
qlt_24xx_atio_pkt+0x208/0x590 [qla2xxx]
qlt_24xx_process_atio_queue+0x33c/0x7a0 [qla2xxx]
qla83xx_msix_atio_q+0x54/0x90 [qla2xxx]

or

NIP qlt_24xx_handle_abts+0xd0/0x2a0 [qla2xxx]
LR  qlt_24xx_handle_abts+0xb4/0x2a0 [qla2xxx]
Call Trace:
qlt_24xx_handle_abts+0x90/0x2a0 [qla2xxx] (unreliable)
qlt_24xx_process_atio_queue+0x500/0x7a0 [qla2xxx]
qla83xx_msix_atio_q+0x54/0x90 [qla2xxx]

or

NIP qlt_create_sess+0x90/0x4e0 [qla2xxx]
LR  qla24xx_do_nack_work+0xa8/0x180 [qla2xxx]
Call Trace:
0xc0000000348fba30 (unreliable)
qla24xx_do_nack_work+0xa8/0x180 [qla2xxx]
qla2x00_do_work+0x674/0xbf0 [qla2xxx]
qla2x00_iocb_work_fn

The patch fixes the issue by serializing qlt_stop_phase1 and
qlt_stop_phase2 functions to make WWPN removal waits for phase1
completion.

Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
Patch is for scsi-fixes tree.
The issue is very old, but the patch is applicable for 4.20+ versions.

 drivers/scsi/qla2xxx/qla_target.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Martin K. Petersen May 22, 2021, 4:40 a.m. UTC | #1
On Thu, 15 Apr 2021 23:35:54 +0300, Dmitry Bogdanov wrote:

> Target de-configuration panics at high CPU load.

> TPGT and WWPN can be removed on separate threads.

> TPGT removal requests a reset HBA on a separate thread and waits for

> reset complete (qlt_stop_phase1). Due to high CPU load that HBA reset

> can be delayed for some time.

> WWPN removal does qlt_stop_phase2 where it is thinked that phase1

> has been already finished and zeroed tgt.tgt_ops that is used by

> incoming traffic and causes several panics:

> 

> [...]


Applied to 5.13/scsi-fixes, thanks!

[1/1] scsi: qla2xx: wait for stop_phase1 at wwn removal
      https://git.kernel.org/mkp/scsi/c/2ef7665dfd88

-- 
Martin K. Petersen	Oracle Linux Engineering
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
index 480e7d2dcf3e..745d6d98c02e 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -1558,10 +1558,12 @@  void qlt_stop_phase2(struct qla_tgt *tgt)
 		return;
 	}
 
+	mutex_lock(&tgt->ha->optrom_mutex);
 	mutex_lock(&vha->vha_tgt.tgt_mutex);
 	tgt->tgt_stop = 0;
 	tgt->tgt_stopped = 1;
 	mutex_unlock(&vha->vha_tgt.tgt_mutex);
+	mutex_unlock(&tgt->ha->optrom_mutex);
 
 	ql_dbg(ql_dbg_tgt_mgt, vha, 0xf00c, "Stop of tgt %p finished\n",
 	    tgt);