mbox series

[for-5.4,0/2] scsi: qla2xxx: Fix P2P mode

Message ID 20210415195144.91903-1-a.kovaleva@yadro.com
Headers show
Series scsi: qla2xxx: Fix P2P mode | expand

Message

Anastasia Kovaleva April 15, 2021, 7:51 p.m. UTC
Hi Greg and Sasha,

QLogic FC adapters don’t work in P2P mode on the latest stable 5.4 (at least
QLE2692, and QLE2694, QLE2742 are affected).

We’ve tested and bisected from 5.4 up to 5.4.112 and figured out the
following:


1. From 5.4 to 5.4.5 inclusively direct mode doesn’t work at all.

   Stable commit a82545b62e07 ("scsi: qla2xxx: Change discovery state before
   PLOGI") fixes the issue in 5.4.6

2. From 5.4.6 to 5.4.68 inclusively direct mode works but FC link cannot be
   recovered after issue_lip and all presented LUNs are lost

   Not working issue_lip is an outcome of a82545b62e07 applied to stable
   without the upstream commit 65e920093805 ("scsi: qla2xxx: Fix device connect
   issues in P2P configuration.").

3. From 5.4.69 up till now (5.4.112) direct mode doesn’t work again

   The issue was introduced by stable commit 74924e407bf7 ("scsi: qla2xxx:
   Retry PLOGI on FC-NVMe PRLI failure").

   Upstream commit 84ed362ac40c ("scsi: qla2xxx: Dual FCP-NVMe target port
   support") fixes the issue.

So, in order to fix both P2P issues we need to apply upstream commits
65e920093805 and 84ed362ac40c.

However, stable commits 0b84591fdd5e ("scsi: qla2xxx: Fix stuck login session
using prli_pend_timer") introduced in 5.4.19 and 74924e407bf7 ("scsi: qla2xxx:
Retry PLOGI on FC-NVMe PRLI failure") in 5.4.69 were applied in the wrong
order, upstream and chronological-wise with regards to required upstream fixes,
and cherry-picking of the fixes is not possible without a merge conflict.

The series provides merge conflict resolution and resolves both P2P discovery
and issue_lip issue. It has been tested over Linux stable 5.4.112 and
Ubuntu 20.04 kernel 5.4.0-71.79 (that's based off stable 5.4.101).

Please apply at your earliest convenience to 5.4 stable.

thanks,
Anastasia

Arun Easi (1):
  scsi: qla2xxx: Fix device connect issues in P2P configuration

Michael Hernandez (1):
  scsi: qla2xxx: Dual FCP-NVMe target port support

 drivers/scsi/qla2xxx/qla_def.h    | 26 +++++++++++-
 drivers/scsi/qla2xxx/qla_fw.h     |  2 +
 drivers/scsi/qla2xxx/qla_gbl.h    |  1 +
 drivers/scsi/qla2xxx/qla_gs.c     | 42 +++++++++++--------
 drivers/scsi/qla2xxx/qla_init.c   | 70 +++++++++++++++++++------------
 drivers/scsi/qla2xxx/qla_inline.h | 12 ++++++
 drivers/scsi/qla2xxx/qla_iocb.c   |  5 +--
 drivers/scsi/qla2xxx/qla_mbx.c    | 11 ++---
 drivers/scsi/qla2xxx/qla_os.c     | 17 ++++----
 9 files changed, 124 insertions(+), 62 deletions(-)

Comments

Sasha Levin April 16, 2021, 2:20 p.m. UTC | #1
On Thu, Apr 15, 2021 at 10:51:42PM +0300, Anastasia Kovaleva wrote:
>Hi Greg and Sasha,
>
>QLogic FC adapters don’t work in P2P mode on the latest stable 5.4 (at least
>QLE2692, and QLE2694, QLE2742 are affected).
>
>We’ve tested and bisected from 5.4 up to 5.4.112 and figured out the
>following:

>
>1. From 5.4 to 5.4.5 inclusively direct mode doesn’t work at all.
>
>   Stable commit a82545b62e07 ("scsi: qla2xxx: Change discovery state before
>   PLOGI") fixes the issue in 5.4.6
>
>2. From 5.4.6 to 5.4.68 inclusively direct mode works but FC link cannot be
>   recovered after issue_lip and all presented LUNs are lost
>
>   Not working issue_lip is an outcome of a82545b62e07 applied to stable
>   without the upstream commit 65e920093805 ("scsi: qla2xxx: Fix device connect
>   issues in P2P configuration.").
>
>3. From 5.4.69 up till now (5.4.112) direct mode doesn’t work again
>
>   The issue was introduced by stable commit 74924e407bf7 ("scsi: qla2xxx:
>   Retry PLOGI on FC-NVMe PRLI failure").
>
>   Upstream commit 84ed362ac40c ("scsi: qla2xxx: Dual FCP-NVMe target port
>   support") fixes the issue.
>
>So, in order to fix both P2P issues we need to apply upstream commits
>65e920093805 and 84ed362ac40c.

That's a great analysis, thank you.

>However, stable commits 0b84591fdd5e ("scsi: qla2xxx: Fix stuck login session
>using prli_pend_timer") introduced in 5.4.19 and 74924e407bf7 ("scsi: qla2xxx:
>Retry PLOGI on FC-NVMe PRLI failure") in 5.4.69 were applied in the wrong
>order, upstream and chronological-wise with regards to required upstream fixes,
>and cherry-picking of the fixes is not possible without a merge conflict.

Right, in particular: 74924e407bf7 ("scsi: qla2xxx: Retry PLOGI on
FC-NVMe PRLI failure") was modified to work around missing 84ed362ac40c
("scsi: qla2xxx: Dual FCP-NVMe target port support"), which is where the
rest of the conflicts are coming from.

>The series provides merge conflict resolution and resolves both P2P discovery
>and issue_lip issue. It has been tested over Linux stable 5.4.112 and
>Ubuntu 20.04 kernel 5.4.0-71.79 (that's based off stable 5.4.101).
>
>Please apply at your earliest convenience to 5.4 stable.

So instead of applying even more modified patches that'll create similar
issue in the future, I backed up 0b84591fdd5e and 74924e407bf7, and
applied the 4 commits you've pointed out in the "correct" order. I also
grabbed 27258a577144 ("scsi: qla2xxx: Add a shadow variable to hold
disc_state history of fcport") for completeness.

Thanks for diagnosing this issue! Please let me know if something is
still broken.