diff mbox series

[RFT] scsi: pm8001: Fix FW crash for maxcpus=1

Message ID 1641320780-81620-1-git-send-email-john.garry@huawei.com
State New
Headers show
Series [RFT] scsi: pm8001: Fix FW crash for maxcpus=1 | expand

Commit Message

John Garry Jan. 4, 2022, 6:26 p.m. UTC
According to the comment in check_fw_ready() we should not check the
IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers.

However we check this very field in process_oq() for processing the highest
index interrupt vector. Change that function to not check IOP1_READY for
those mentioned controllers, but do check ILA_READY in both cases.

The reason I assume that this was not hit earlier was because we always
allocated 64 MSI(X), and just did not pass the vector index check in
process_oq(), i.e.  the handler never ran for vector index 63.

Signed-off-by: John Garry <john.garry@huawei.com>

Comments

John Garry Jan. 5, 2022, 11:28 a.m. UTC | #1
On 05/01/2022 04:03, Damien Le Moal wrote:
> On 1/5/22 03:26, John Garry wrote:
>> According to the comment in check_fw_ready() we should not check the
>> IOP1_READY field in register SCRATCH_PAD_1 for 8008 or 8009 controllers.
>>
>> However we check this very field in process_oq() for processing the highest
>> index interrupt vector. Change that function to not check IOP1_READY for
>> those mentioned controllers, but do check ILA_READY in both cases.
>>
>> The reason I assume that this was not hit earlier was because we always
>> allocated 64 MSI(X), and just did not pass the vector index check in
>> process_oq(), i.e.  the handler never ran for vector index 63.
>>
>> Signed-off-by: John Garry<john.garry@huawei.com>
>>
>> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
>> index 2101fc5761c3..77b8bb30615b 100644
>> --- a/drivers/scsi/pm8001/pm80xx_hwi.c
>> +++ b/drivers/scsi/pm8001/pm80xx_hwi.c
>> @@ -4162,9 +4162,16 @@ static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec)
>>   	u32 regval;
>>   
>>   	if (vec == (pm8001_ha->max_q_num - 1)) {
>> +		u32 mipsall_ready;
>> +
>> +		if ((pm8001_ha->chip_id == chip_8008) ||
>> +		    (pm8001_ha->chip_id == chip_8009))
> nit: no need for the inner brackets here.

ok, I can fix that.

But I would also like opinion from microchip guys/maintainer on why this 
code is here at all. Seems strange in the way we check in this register 
in the interrupt handler for only a specific vector and, also, why we 
check at all in an interrupt handler.
> 
>> +			mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
>> +		else
>> +			mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT;
>> +
>>   		regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
>> -		if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
>> -					SCRATCH_PAD_MIPSALL_READY) {
>> +		if ((regval & mipsall_ready) != mipsall_ready) {
>>   			pm8001_ha->controller_fatal_error = true;
>>   			pm8001_dbg(pm8001_ha, FAIL,
>>   				   "Firmware Fatal error! Regval:0x%x\n",
>> diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h
>> index c7e5d93bea92..c41ed039c92a 100644
>> --- a/drivers/scsi/pm8001/pm80xx_hwi.h
>> +++ b/drivers/scsi/pm8001/pm80xx_hwi.h
>> @@ -1405,8 +1405,12 @@ typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t;
>>   #define SCRATCH_PAD_BOOT_LOAD_SUCCESS	0x0
>>   #define SCRATCH_PAD_IOP0_READY		0xC00
>>   #define SCRATCH_PAD_IOP1_READY		0x3000
>> -#define SCRATCH_PAD_MIPSALL_READY	(SCRATCH_PAD_IOP1_READY | \
>> +#define SCRATCH_PAD_MIPSALL_READY_16PORT	(SCRATCH_PAD_IOP1_READY | \
>>   					SCRATCH_PAD_IOP0_READY | \
>> +					SCRATCH_PAD_ILA_READY | \
>> +					SCRATCH_PAD_RAAE_READY)
>> +#define SCRATCH_PAD_MIPSALL_READY_8PORT	(SCRATCH_PAD_IOP0_READY | \
>> +					SCRATCH_PAD_ILA_READY | \
>>   					SCRATCH_PAD_RAAE_READY)
>>   
>>   /* boot loader state */
> Otherwise, looks OK to me.
> I tested with and without max_cpus=1 with a ATTO Technology, Inc.
> ExpressSAS 12Gb/s SAS/SATA HBA (rev 06) adapter and everything is OK.
> That adapter uses chip_8072 though, not 8008 or 8009.
> 
> Feel free to add:
> 
> Reviewed-by: Damien Le Moal<damien.lemoal@opensource.wdc.com>
> Tested-by: Damien Le Moal<damien.lemoal@opensource.wdc.com>

Thanks!

john
Damien Le Moal Jan. 6, 2022, 11:59 p.m. UTC | #2
On 2022/01/07 0:32, John Garry wrote:
> On 06/01/2022 13:03, Ajish.Koshy@microchip.com wrote:
>>>   only a specific vector and, also, why we check at all in
>>> an interrupt handler.
>> Here is my initial understanding so far based on the code
>> and data sheet
>>
>> 1. Controller has the capability to communicate
>> to the host about fatal error condition via configured
>> interrupt vector MSI/MSI-X.
>> 2. This capability is achieved by setting two fields
>>   a. Enable Controller Fatal error notification
>>      Dowrd 0x1C, Bit[0].
>>      1 - Enable; 0 - Disable
>>      Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
>>      fatal_err_interrupt = 0x01;
>>   b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
>>      This parameter configures which interrupt vector
>>      is used to notify the host of the fatal error.
>>      Code: /* Update Fatal error interrupt vector */
>>      pm8001_ha->main_cfg_tbl.pm80xx_tbl.
>>      fatal_err_interrupt |=
>>      ((pm8001_ha->max_q_num - 1) << 8);
>>
>> Probably this will be the reason why we check
>> the vector in process_oq() for processing
>> controller fatal error
>>
>> if (vec == (pm8001_ha->max_q_num - 1)) {
>>   
>> Please do let me know if it helped in clarification.
>>
> 
> Sounds reasonable. And we only discover the issue for 8008/8009 now as 
> we have that (pm8001_ha->max_q_num - 1) vector being used for standard IO.
> 
> So let me know of any other issue, otherwise I'll send a v2 with the 
> coding style fixup.

And maybe add comments about the above so that the information does not get lost ?

> 
> Thanks,
> John
Ajish Koshy Jan. 7, 2022, 11:05 a.m. UTC | #3
Hi John,

> On 06/01/2022 13:03, Ajish.Koshy@microchip.com wrote:
> >>   only a specific vector and, also, why we check at all in an
> >> interrupt handler.
> > Here is my initial understanding so far based on the code and data
> > sheet
> >
> > 1. Controller has the capability to communicate to the host about
> > fatal error condition via configured interrupt vector MSI/MSI-X.
> > 2. This capability is achieved by setting two fields
> >   a. Enable Controller Fatal error notification
> >      Dowrd 0x1C, Bit[0].
> >      1 - Enable; 0 - Disable
> >      Code: pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> >      fatal_err_interrupt = 0x01;
> >   b. Fatal Error Interrupt Vector Dword 0x1C, bit[15:8]
> >      This parameter configures which interrupt vector
> >      is used to notify the host of the fatal error.
> >      Code: /* Update Fatal error interrupt vector */
> >      pm8001_ha->main_cfg_tbl.pm80xx_tbl.
> >      fatal_err_interrupt |=
> >      ((pm8001_ha->max_q_num - 1) << 8);
> >
> > Probably this will be the reason why we check the vector in
> > process_oq() for processing controller fatal error
> >
> > if (vec == (pm8001_ha->max_q_num - 1)) {
> >
> > Please do let me know if it helped in clarification.
> >
> 
> Sounds reasonable. And we only discover the issue for 8008/8009 now as we
> have that (pm8001_ha->max_q_num - 1) vector being used for standard IO.
> 
> So let me know of any other issue, otherwise I'll send a v2 with the coding
> style fixup.

After going through check_fw_ready(), the change here looks fine.

> 
> Thanks,
> John
diff mbox series

Patch

diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index 2101fc5761c3..77b8bb30615b 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -4162,9 +4162,16 @@  static int process_oq(struct pm8001_hba_info *pm8001_ha, u8 vec)
 	u32 regval;
 
 	if (vec == (pm8001_ha->max_q_num - 1)) {
+		u32 mipsall_ready;
+
+		if ((pm8001_ha->chip_id == chip_8008) ||
+		    (pm8001_ha->chip_id == chip_8009))
+			mipsall_ready = SCRATCH_PAD_MIPSALL_READY_8PORT;
+		else
+			mipsall_ready = SCRATCH_PAD_MIPSALL_READY_16PORT;
+
 		regval = pm8001_cr32(pm8001_ha, 0, MSGU_SCRATCH_PAD_1);
-		if ((regval & SCRATCH_PAD_MIPSALL_READY) !=
-					SCRATCH_PAD_MIPSALL_READY) {
+		if ((regval & mipsall_ready) != mipsall_ready) {
 			pm8001_ha->controller_fatal_error = true;
 			pm8001_dbg(pm8001_ha, FAIL,
 				   "Firmware Fatal error! Regval:0x%x\n",
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.h b/drivers/scsi/pm8001/pm80xx_hwi.h
index c7e5d93bea92..c41ed039c92a 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.h
+++ b/drivers/scsi/pm8001/pm80xx_hwi.h
@@ -1405,8 +1405,12 @@  typedef struct SASProtocolTimerConfig SASProtocolTimerConfig_t;
 #define SCRATCH_PAD_BOOT_LOAD_SUCCESS	0x0
 #define SCRATCH_PAD_IOP0_READY		0xC00
 #define SCRATCH_PAD_IOP1_READY		0x3000
-#define SCRATCH_PAD_MIPSALL_READY	(SCRATCH_PAD_IOP1_READY | \
+#define SCRATCH_PAD_MIPSALL_READY_16PORT	(SCRATCH_PAD_IOP1_READY | \
 					SCRATCH_PAD_IOP0_READY | \
+					SCRATCH_PAD_ILA_READY | \
+					SCRATCH_PAD_RAAE_READY)
+#define SCRATCH_PAD_MIPSALL_READY_8PORT	(SCRATCH_PAD_IOP0_READY | \
+					SCRATCH_PAD_ILA_READY | \
 					SCRATCH_PAD_RAAE_READY)
 
 /* boot loader state */