mbox series

[00/16] scsi: libsas and users: Factor out LLDD TMF code

Message ID 1643110372-85470-1-git-send-email-john.garry@huawei.com
Headers show
Series scsi: libsas and users: Factor out LLDD TMF code | expand

Message

John Garry Jan. 25, 2022, 11:32 a.m. UTC
The LLDD TMF code is almost identical between hisi_sas, pm8001, and mvsas
drivers.

This series factors out that code into libsas, thus reducing much
duplication and giving a net reduction of ~250 LoC.

There are some subtle differences between the core TMF handler and each
of the LLDDs old implementation, so any review and testing is appreciated.

Some other minor patches are thrown in:
- Delete unused macro in hisi_sas driver
- Delete unused libsas callback
- Add enum for response frame datapres field

I have another follow-up series to factor out the internal abort code,
which is common to hisi_sas and pm8001 drivers.

Based on v5.17-rc1

John Garry (16):
  scsi: libsas: Use enum for response frame DATAPRES field
  scsi: libsas: Delete lldd_clear_aca callback
  scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
  scsi: libsas: Move SMP task handlers to core
  scsi: libsas: Add struct sas_tmf_task
  scsi: libsas: Add sas_task.tmf
  scsi: libsas: Add sas_execute_tmf()
  scsi: libsas: Add sas_execute_ssp_tmf()
  scsi: libsas: Add TMF handler exec complete callback
  scsi: libsas: Add TMF handler aborted callback
  scsi: libsas: Add sas_abort_task_set()
  scsi: libsas: Add sas_clear_task_set()
  scsi: libsas: Add sas_lu_reset()
  scsi: libsas: Add sas_query_task()
  scsi: libsas: Add sas_abort_task()
  scsi: libsas: Add sas_execute_ata_cmd()

 Documentation/scsi/libsas.rst          |   2 -
 drivers/scsi/aic94xx/aic94xx.h         |   1 -
 drivers/scsi/aic94xx/aic94xx_init.c    |   1 -
 drivers/scsi/aic94xx/aic94xx_tmf.c     |   9 -
 drivers/scsi/hisi_sas/hisi_sas.h       |   9 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 235 ++++---------------------
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   2 +-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   9 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   2 +-
 drivers/scsi/isci/init.c               |   1 -
 drivers/scsi/isci/task.c               |  18 --
 drivers/scsi/isci/task.h               |   4 -
 drivers/scsi/libsas/sas_ata.c          |   8 +
 drivers/scsi/libsas/sas_expander.c     |  24 +--
 drivers/scsi/libsas/sas_internal.h     |   6 +
 drivers/scsi/libsas/sas_scsi_host.c    | 220 +++++++++++++++++++++++
 drivers/scsi/libsas/sas_task.c         |  12 +-
 drivers/scsi/mvsas/mv_defs.h           |   5 -
 drivers/scsi/mvsas/mv_init.c           |   5 +-
 drivers/scsi/mvsas/mv_sas.c            | 177 +------------------
 drivers/scsi/mvsas/mv_sas.h            |   3 -
 drivers/scsi/pm8001/pm8001_hwi.c       |   4 +-
 drivers/scsi/pm8001/pm8001_init.c      |   4 +-
 drivers/scsi/pm8001/pm8001_sas.c       | 180 +++----------------
 drivers/scsi/pm8001/pm8001_sas.h       |  13 +-
 include/scsi/libsas.h                  |  23 ++-
 26 files changed, 353 insertions(+), 624 deletions(-)

Comments

Damien Le Moal Jan. 27, 2022, 6:37 a.m. UTC | #1
On 1/25/22 20:32, John Garry wrote:
> The LLDD TMF code is almost identical between hisi_sas, pm8001, and mvsas
> drivers.
> 
> This series factors out that code into libsas, thus reducing much
> duplication and giving a net reduction of ~250 LoC.
> 
> There are some subtle differences between the core TMF handler and each
> of the LLDDs old implementation, so any review and testing is appreciated.
> 
> Some other minor patches are thrown in:
> - Delete unused macro in hisi_sas driver
> - Delete unused libsas callback
> - Add enum for response frame datapres field
> 
> I have another follow-up series to factor out the internal abort code,
> which is common to hisi_sas and pm8001 drivers.
> 
> Based on v5.17-rc1
> 
> John Garry (16):
>   scsi: libsas: Use enum for response frame DATAPRES field
>   scsi: libsas: Delete lldd_clear_aca callback
>   scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT
>   scsi: libsas: Move SMP task handlers to core
>   scsi: libsas: Add struct sas_tmf_task
>   scsi: libsas: Add sas_task.tmf
>   scsi: libsas: Add sas_execute_tmf()
>   scsi: libsas: Add sas_execute_ssp_tmf()
>   scsi: libsas: Add TMF handler exec complete callback
>   scsi: libsas: Add TMF handler aborted callback
>   scsi: libsas: Add sas_abort_task_set()
>   scsi: libsas: Add sas_clear_task_set()
>   scsi: libsas: Add sas_lu_reset()
>   scsi: libsas: Add sas_query_task()
>   scsi: libsas: Add sas_abort_task()
>   scsi: libsas: Add sas_execute_ata_cmd()
> 
>  Documentation/scsi/libsas.rst          |   2 -
>  drivers/scsi/aic94xx/aic94xx.h         |   1 -
>  drivers/scsi/aic94xx/aic94xx_init.c    |   1 -
>  drivers/scsi/aic94xx/aic94xx_tmf.c     |   9 -
>  drivers/scsi/hisi_sas/hisi_sas.h       |   9 +-
>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 235 ++++---------------------
>  drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   2 +-
>  drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   9 +-
>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |   2 +-
>  drivers/scsi/isci/init.c               |   1 -
>  drivers/scsi/isci/task.c               |  18 --
>  drivers/scsi/isci/task.h               |   4 -
>  drivers/scsi/libsas/sas_ata.c          |   8 +
>  drivers/scsi/libsas/sas_expander.c     |  24 +--
>  drivers/scsi/libsas/sas_internal.h     |   6 +
>  drivers/scsi/libsas/sas_scsi_host.c    | 220 +++++++++++++++++++++++
>  drivers/scsi/libsas/sas_task.c         |  12 +-
>  drivers/scsi/mvsas/mv_defs.h           |   5 -
>  drivers/scsi/mvsas/mv_init.c           |   5 +-
>  drivers/scsi/mvsas/mv_sas.c            | 177 +------------------
>  drivers/scsi/mvsas/mv_sas.h            |   3 -
>  drivers/scsi/pm8001/pm8001_hwi.c       |   4 +-
>  drivers/scsi/pm8001/pm8001_init.c      |   4 +-
>  drivers/scsi/pm8001/pm8001_sas.c       | 180 +++----------------
>  drivers/scsi/pm8001/pm8001_sas.h       |  13 +-
>  include/scsi/libsas.h                  |  23 ++-
>  26 files changed, 353 insertions(+), 624 deletions(-)
> 

John,

I did some light testing of this series (boot + some fio runs) and
everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
SAS/SATA HBA (rev 06)" HBA (x86_64 host).

Of note is that "make W=1 M=drivers/scsi" complains with:

drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or
member 'circularQ' not described in 'process_one_iomb'

And sparse/make C=1 complains about:

drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
'sas_form_port' - different lock contexts for basic block

But I have not checked if it is something that your series touch.

And there is a ton of complaints about __le32 use in the pm80xx code...
I can try to have a look at these if you want, on top of your series.

Cheers.
John Garry Jan. 27, 2022, 10:17 a.m. UTC | #2
On 27/01/2022 06:37, Damien Le Moal wrote:

Hi Damien,

> I did some light testing of this series (boot + some fio runs) and
> everything looks good using my "ATTO Technology, Inc. ExpressSAS 12Gb/s
> SAS/SATA HBA (rev 06)" HBA (x86_64 host).

Yeah, unfortunately these steps prob won't exercise much of the code 
changes here since I figure error handling would not kick in.

However using this same adapter type on my arm64 system has error 
handling kick in almost straight away - and the handling looks sane. A 
silver lining, I suppose ..

> 
> Of note is that "make W=1 M=drivers/scsi" complains with:
> 
> drivers/scsi/pm8001/pm80xx_hwi.c:3938: warning: Function parameter or
> member 'circularQ' not described in 'process_one_iomb'

That's per-existing. I'll send a patch for that now along with another 
fix I found for that driver. ....

> 
> And sparse/make C=1 complains about:
> 
> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
> 'sas_form_port' - different lock contexts for basic block

I think it's talking about the port->phy_list_lock usage - it prob 
doesn't like segments where we fall out a loop with the lock held (which 
was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.

> 
> But I have not checked if it is something that your series touch.
> 
> And there is a ton of complaints about __le32 use in the pm80xx code...
> I can try to have a look at these if you want, on top of your series.

I really need to get make C=1 working for me - it segfaults in any env I 
have :(

Thanks,
John
Christoph Hellwig Jan. 27, 2022, 10:21 a.m. UTC | #3
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig Jan. 27, 2022, 10:23 a.m. UTC | #4
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig Jan. 27, 2022, 10:24 a.m. UTC | #5
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig Jan. 27, 2022, 10:26 a.m. UTC | #6
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
John Garry Jan. 28, 2022, 9:09 a.m. UTC | #7
On 28/01/2022 06:37, Damien Le Moal wrote:

Hi Damien,

>> However using this same adapter type on my arm64 system has error
>> handling kick in almost straight away - and the handling looks sane. A
>> silver lining, I suppose ..
> I ran some more tests. In particular, I ran libzbc compliance tests on a
> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
> series, I see command timeout that take forever to recover from, with
> the drive revalidation failing after that.
> 
> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
> timeout.
> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is aborted
> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
> aborted
> [  411.192602] ata21.00: qc timeout (cmd 0xec)
> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
> timeout.
> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [  431.685544] ata21.00: revalidation failed (errno=-5)
> [  441.911948] ata21.00: qc timeout (cmd 0xec)
> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
> timeout.
> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [  462.404992] ata21.00: revalidation failed (errno=-5)
> [  492.598769] ata21.00: qc timeout (cmd 0xec)
> ...
> 
> So there is a problem. Need to dig into this. I see this issue only with
> libzbc passthrough tests. fio runs with libaio are fine.

Thanks for the notice. I think that I also saw a hang, but, IIRC, it 
happened on mainline for me - but it's hard to know if I broke something 
if it is already broke in another way. That is why I wanted this card 
working properly...

Anyway, I will investigate more.

> 
>>> And sparse/make C=1 complains about:
>>>
>>> drivers/scsi/libsas/sas_port.c:77:13: warning: context imbalance in
>>> 'sas_form_port' - different lock contexts for basic block
>> I think it's talking about the port->phy_list_lock usage - it prob
>> doesn't like segments where we fall out a loop with the lock held (which
>> was grabbed in the loop). Anyway it looks ok. Maybe we can improve this.
>>
>>> But I have not checked if it is something that your series touch.
>>>
>>> And there is a ton of complaints about __le32 use in the pm80xx code...
>>> I can try to have a look at these if you want, on top of your series.
>> I really need to get make C=1 working for me - it segfaults in any env I
>> have:(
> I now have a 12 patch series that fixes*all*  the sparse warnings. Some
> of the fixes were trivial, but most of them are simply hard bugs with
> the handling of le32 struct field values. There is no way that this
> driver is working as-is on big-endian machines. Some calculations are
> actually done using cpu_to_le32() values !

Great, I'll have a look when you send them.

> 
> But even though these fixes should have essentially no effect on
> little-endian x86_64, with my series applied, I see the same command
> timeout problem as with your libsas update, and both series together
> result in the same timeout issue too.
> 
> So it looks like "fixing" the code actually is revealing some other bug
> that was previously hidden... This will take some time to debug.
> 
> Another problem I noticed: doing "rmmod pm80xx; modprobe pm80xx" result
> in a failure of device scans. I get loops of "link is slow to respond
> ->reset". For the above tests, I had to reboot every time I changed the
> driver module code. Another thing to look at.

Sounds odd, I would expect everything runs from afresh when insmod.

Thanks,
John
John Garry Jan. 31, 2022, 3:58 p.m. UTC | #8
On 28/01/2022 09:09, John Garry wrote:
>> I ran some more tests. In particular, I ran libzbc compliance tests on a
>> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
>> series, I see command timeout that take forever to recover from, with
>> the drive revalidation failing after that.
>>
>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is 
>> aborted
>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>> aborted
>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>> timeout.
>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>> ...
>>
>> So there is a problem. Need to dig into this. I see this issue only with
>> libzbc passthrough tests. fio runs with libaio are fine.
> 
> Thanks for the notice. I think that I also saw a hang, but, IIRC, it 
> happened on mainline for me - but it's hard to know if I broke something 
> if it is already broke in another way. That is why I wanted this card 
> working properly...

Hi Damien,

 From testing mainline, I can see a hang on my arm64 system for SAS 
disks. I think that the reason is the we don't finish some commands in 
EH properly for pm8001:
- In EH, we attempt to abort the task in sas_scsi_find_task() -> 
lldd_abort_task()
The default return from pm8001_exec_internal_tmf_task() is 
-TMF_RESP_FUNC_FAILED, so if the TMF does not execute properly we return 
this value
- sas_scsi_find_task() cannot handle -TMF_RESP_FUNC_FAILED, and returns 
-TMF_RESP_FUNC_FAILED directly to sas_eh_handle_sas_errors(), which, 
again, does not handle -TMF_RESP_FUNC_FAILED. So we don't progress to 
ever finish the comand.

This looks like the correct fix for mainline:

--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -766,7 +766,7 @@ static int pm8001_exec_internal_tmf_task(struct 
domain_device *dev,
pm8001_dev, DS_OPERATIONAL);
wait_for_completion(&completion_setstate);
}
- res = -TMF_RESP_FUNC_FAILED;
+ res = TMF_RESP_FUNC_FAILED;

That's effectively the same as what I have in this series in 
sas_execute_tmf().

However your testing is a SATA device, which I'll check further.

Thanks,
John
Damien Le Moal Feb. 3, 2022, 9:44 a.m. UTC | #9
On 2/1/22 00:58, John Garry wrote:
> On 28/01/2022 09:09, John Garry wrote:
>>> I ran some more tests. In particular, I ran libzbc compliance tests on a
>>> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your
>>> series, I see command timeout that take forever to recover from, with
>>> the drive revalidation failing after that.
>>>
>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is 
>>> aborted
>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>> aborted
>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>> timeout.
>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>> ...
>>>
>>> So there is a problem. Need to dig into this. I see this issue only with
>>> libzbc passthrough tests. fio runs with libaio are fine.
>>
>> Thanks for the notice. I think that I also saw a hang, but, IIRC, it 
>> happened on mainline for me - but it's hard to know if I broke something 
>> if it is already broke in another way. That is why I wanted this card 
>> working properly...
> 
> Hi Damien,
> 
>  From testing mainline, I can see a hang on my arm64 system for SAS 
> disks. I think that the reason is the we don't finish some commands in 
> EH properly for pm8001:
> - In EH, we attempt to abort the task in sas_scsi_find_task() -> 
> lldd_abort_task()
> The default return from pm8001_exec_internal_tmf_task() is 
> -TMF_RESP_FUNC_FAILED, so if the TMF does not execute properly we return 
> this value
> - sas_scsi_find_task() cannot handle -TMF_RESP_FUNC_FAILED, and returns 
> -TMF_RESP_FUNC_FAILED directly to sas_eh_handle_sas_errors(), which, 
> again, does not handle -TMF_RESP_FUNC_FAILED. So we don't progress to 
> ever finish the comand.
> 
> This looks like the correct fix for mainline:
> 
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -766,7 +766,7 @@ static int pm8001_exec_internal_tmf_task(struct 
> domain_device *dev,
> pm8001_dev, DS_OPERATIONAL);
> wait_for_completion(&completion_setstate);
> }
> - res = -TMF_RESP_FUNC_FAILED;
> + res = TMF_RESP_FUNC_FAILED;
> 
> That's effectively the same as what I have in this series in 
> sas_execute_tmf().
> 
> However your testing is a SATA device, which I'll check further.

This did not help. Still seeing 100% reproducible hangs.

I did a lot of testing/digging today, and the hang cause seems to be
missing task completions. At random, a task times out as its completion
does not come, and subsequent abort trial for the task fail, revalidate
fails and the device is dropped (capacity goes to 0). But at that point,
doing rmmod/modprobe to reset the device does not work. sync cache
command issued at rmmod time never completes. I end up needing to power
cycle the machine every time...

No clue about the root cause yet, but it definitely seem to be related
to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
everything is fine and the tests run to completion without any issue.

I wonder if their is a tag management bug somewhere...

I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
drops and retake the ata port lock. No other ATA driver do that since
the ata completion also take that lock. The ata port lock is taken
before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
ata_qc_issue()) seems like a very bad idea. I removed that and
everything work the same way (the lld execute does not sleep). But that
did not solve the hang problem.

Of note is this is all with your libsas patches applied. Without the
patches, I have KASAN screaming at me about use-after-free in completion
context. With your patches, KASAN is silent.

Another thing: this driver does not allow changing the max qd... Very
annoying.

echo 1 > /sys/block/sdX/device/queue_depth

has no effect. QD stays at 32 for an ATA drive. Need to look into that too.
John Garry Feb. 3, 2022, 3:55 p.m. UTC | #10
On 03/02/2022 09:44, Damien Le Moal wrote:

Hi Damien,

>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task

Contrary to mentioning TMF in the log, this is not a TMF but rather an 
internal abort timing out. I don't think that this should ever happen. 
This command should just abort pending IO commands in the controller and 
not send anything to the target. So for this to timeout means a HW fault 
or driver bug. And I did not touch this code for pm8001.

>>>> timeout.
>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>> aborted
>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>> aborted
>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>> timeout.
>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>> timeout.
>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>> ...
>>>>

Do you have a fuller dmesg with my series?

...

>> }
>> - res = -TMF_RESP_FUNC_FAILED;
>> + res = TMF_RESP_FUNC_FAILED;
>>
>> That's effectively the same as what I have in this series in
>> sas_execute_tmf().
>>
>> However your testing is a SATA device, which I'll check further.
> This did not help. Still seeing 100% reproducible hangs.

OK, but I think that we should also have this change as the mainline 
codes looks broken to be begin with:

--->8 ---

[PATCH] scsi: libsas: Handle all errors in sas_scsi_find_task()

LLDD TMFs callbacks may return linux or other error codes instead of TMF
codes. This may cause problems in sas_scsi_find_task() ->
.lldd_query_task(), as only TMF codes are handled there. As such, we may
not return a task_disposition type. Function sas_eh_handle_sas_errors() 
only handles that type, and may exit error
handling early for unrecognised types.

So use TASK_ABORT_FAILED for non-TMF types returned from
.lldd_query_task(), on the assumption that the command may still be 
alive and error handling should be escalated.

Signed-off-by: John Garry <john.garry@huawei.com>

diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
b/drivers/scsi/libsas/sas_scsi_host.c
index 53d8b7ede0cd..02274f471308 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -316,8 +316,11 @@ static enum task_disposition 
sas_scsi_find_task(struct sas_task *task)
  				pr_notice("%s: task 0x%p failed to abort\n",
  					  __func__, task);
  				return TASK_ABORT_FAILED;
+			default:
+				pr_notice("%s: task 0x%p result code %d not handled, assuming 
failed\n",
+					  __func__, task, res);
+				return TASK_ABORT_FAILED;
  			}
-
  		}
  	}
  	return res;

---8< ----

> 
> I did a lot of testing/digging today, 

Thanks for the effort!

 > and the hang cause seems to be
 > missing task completions.
> At random, a task times out as its completion

That sounds fimilar to my general issue running this driver on an arm64 
host...

> does not come, and subsequent abort trial for the task fail, revalidate
> fails

I assume SMP IOs fail if revalidation fails - if this is the case, then 
the controller seems to be in bad state.

> and the device is dropped (capacity goes to 0). But at that point,
> doing rmmod/modprobe to reset the device does not work. sync cache
> command issued at rmmod time never completes. I end up needing to power
> cycle the machine every time...
> 
> No clue about the root cause yet, but it definitely seem to be related
> to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
> everything is fine and the tests run to completion without any issue.
> 
> I wonder if their is a tag management bug somewhere...

Maybe. Not sure.

On a related point, Hannes' change here could avoid it:

https://lore.kernel.org/linux-scsi/20210222132405.91369-32-hare@suse.de/

> 
> I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
> drops and retake the ata port lock. No other ATA driver do that since
> the ata completion also take that lock. The ata port lock is taken
> before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
> So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
> ata_qc_issue()) seems like a very bad idea. I removed that and
> everything work the same way (the lld execute does not sleep). But that
> did not solve the hang problem.

I would need to check why this is done again. Before my time...

> 
> Of note is this is all with your libsas patches applied. Without the
> patches, I have KASAN screaming at me about use-after-free in completion
> context. With your patches, KASAN is silent.
> 
> Another thing: this driver does not allow changing the max qd... Very
> annoying.
> 
> echo 1 > /sys/block/sdX/device/queue_depth
> 
> has no effect. QD stays at 32 for an ATA drive. Need to look into that too.

I had a look at this. It seems that we fail in 
__ata_change_queue_depth() -> ata_scsi_find_dev() returning NULL.

Thanks again for your effort, I will continue to look.

john
Damien Le Moal Feb. 4, 2022, 12:36 a.m. UTC | #11
On 2/4/22 00:55, John Garry wrote:
> On 03/02/2022 09:44, Damien Le Moal wrote:
> 
> Hi Damien,
> 
>>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
> 
> Contrary to mentioning TMF in the log, this is not a TMF but rather an 
> internal abort timing out. I don't think that this should ever happen. 
> This command should just abort pending IO commands in the controller and 
> not send anything to the target. So for this to timeout means a HW fault 
> or driver bug. And I did not touch this code for pm8001.
> 
>>>>> timeout.
>>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>>> aborted
>>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>>> aborted
>>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>> timeout.
>>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>> timeout.
>>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>>> ...
>>>>>
> 
> Do you have a fuller dmesg with my series?

Here it is below. Conditions: I rebased everything on Linus latest
master, applied your series and the patch you sent below, rebooted with
pm80xx module option logging_level=31.

Device scan is all OK. With the system idle, I simply start libzbc tests
on the SATA SMR drive I have on the HBA.

The first command issued is 0x63 == NCQ NON DATA and seems to be OK, but
it seems that there are inconsistencies. And that may be what breaks the
adapter/driver state because everything after that command miserably
fails and does not complete.

The inconsistency is this line says:
[  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
Which seems to be sensical for NCQ_NON_DATA command, but then, this line
seems wrong:
[  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8

I need to go and check the specs what the FIS reply format is for
NCQ_NON_DATA.


[  137.187184] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
[  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
[  137.199339] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
command 0x63 inb 4
[  137.207577] pm80xx0:: pm8001_mpi_msg_consume  1446:: CI=46 PI=47
msgHeader=8104200d
[  137.215399] pm80xx0:: mpi_sata_completion  2481:IO_SUCCESS
[  137.220961] pm80xx0:: mpi_sata_completion  2503:SAS_PROTO_RESPONSE
len = 20
[  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
[  137.233878] pm80xx0:: pm8001_mpi_msg_free_set  1403: CI=47 PI=47
[  137.236696] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
[  137.247102] pm80xx0:: pm80xx_chip_sata_req  4585:DMA
[  137.252186] pm80xx0:: pm80xx_chip_sata_req  4593:FPDMA
[  137.257400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
command 0x65 inb f
[  167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
[  167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
[  167.519049] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 2, abort
task tag = 0x1
[  187.969173] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
timeout.
[  187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
[  187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
aborted
[  188.144734] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
[  188.151555] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
[  188.156648] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
command 0xec inb 8
[  193.600813] ata21.00: qc timeout (cmd 0xec)
[  193.605976] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 4, abort
task tag = 0x3
[  214.080236] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
timeout.
[  214.087563] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  214.093929] ata21.00: revalidation failed (errno=-5)
[  214.256233] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
[  214.263043] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
[  214.268128] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
command 0xec inb 8
[  224.319899] ata21.00: qc timeout (cmd 0xec)
[  224.324317] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 6, abort
task tag = 0x5
[  244.799433] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
timeout.
[  244.806750] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  244.813110] ata21.00: revalidation failed (errno=-5)
[  244.975500] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
[  244.982314] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
[  244.987400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
command 0xec inb 8
[  275.006814] ata21.00: qc timeout (cmd 0xec)
[  275.011236] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 8, abort
task tag = 0x7
[  295.486387] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
timeout.
[  295.494025] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  295.500390] ata21.00: revalidation failed (errno=-5)
[  295.509179] ata21.00: disable device
[  295.670556] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
tries: 1
[  295.670570] sd 19:0:2:0: [sdj] REPORT ZONES start lba 0 failed
[  295.689972] sd 19:0:2:0: [sdj] REPORT ZONES: Result: hostbyte=DID_OK
driverbyte=DRIVER_OK
[  295.700621] sd 19:0:2:0: [sdj] Sense Key : Not Ready [current]
[  295.708975] sd 19:0:2:0: [sdj] Add. Sense: Logical unit not ready,
hard reset required
[  295.719331] sd 19:0:2:0: [sdj] 0 4096-byte logical blocks: (0 B/0 B)
[  295.728727] sd 19:0:2:0: [sdj] Write Protect is on
[  295.737928] sdj: detected capacity change from 39063650304 to 0
[  295.826347] sd 19:0:2:0: [sdj] tag#158 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  295.838761] sd 19:0:2:0: [sdj] tag#158 CDB: Test Unit Ready 00 00 00
00 00 00
[  295.920864] sd 19:0:2:0: [sdj] tag#780 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  295.933341] sd 19:0:2:0: [sdj] tag#780 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.010417] sd 19:0:2:0: [sdj] tag#248 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.022275] sd 19:0:2:0: [sdj] tag#248 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.101348] sd 19:0:2:0: [sdj] tag#949 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.113924] sd 19:0:2:0: [sdj] tag#949 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.192258] sd 19:0:2:0: [sdj] tag#25 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.204788] sd 19:0:2:0: [sdj] tag#25 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.284546] sd 19:0:2:0: [sdj] tag#273 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.297126] sd 19:0:2:0: [sdj] tag#273 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.376896] sd 19:0:2:0: [sdj] tag#653 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.389475] sd 19:0:2:0: [sdj] tag#653 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.468095] sd 19:0:2:0: [sdj] tag#159 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.479974] sd 19:0:2:0: [sdj] tag#159 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.560861] sd 19:0:2:0: [sdj] tag#274 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.572792] sd 19:0:2:0: [sdj] tag#274 CDB: Test Unit Ready 00 00 00
00 00 00
[  296.654506] sd 19:0:2:0: [sdj] tag#727 FAILED Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
[  296.667151] sd 19:0:2:0: [sdj] tag#727 CDB: Test Unit Ready 00 00 00
00 00 00

After these messages, the tests exit on failure (drive dropped) and
there are no more messages. Doing rmmod or anything else get stuck too.
I have to reset the machine to get back to a good state.

I am starting to think that NCQ NON DATA command is being very badly
handled... Switching the tests to run with all non-NCQ commands is
working fine, albeit horribly slow (much slower than with other HBAs,
e.g. Broadcom).

Digging...

> 
> ...
> 
>>> }
>>> - res = -TMF_RESP_FUNC_FAILED;
>>> + res = TMF_RESP_FUNC_FAILED;
>>>
>>> That's effectively the same as what I have in this series in
>>> sas_execute_tmf().
>>>
>>> However your testing is a SATA device, which I'll check further.
>> This did not help. Still seeing 100% reproducible hangs.
> 
> OK, but I think that we should also have this change as the mainline 
> codes looks broken to be begin with:
> 
> --->8 ---
> 
> [PATCH] scsi: libsas: Handle all errors in sas_scsi_find_task()
> 
> LLDD TMFs callbacks may return linux or other error codes instead of TMF
> codes. This may cause problems in sas_scsi_find_task() ->
> .lldd_query_task(), as only TMF codes are handled there. As such, we may
> not return a task_disposition type. Function sas_eh_handle_sas_errors() 
> only handles that type, and may exit error
> handling early for unrecognised types.
> 
> So use TASK_ABORT_FAILED for non-TMF types returned from
> .lldd_query_task(), on the assumption that the command may still be 
> alive and error handling should be escalated.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> 
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c 
> b/drivers/scsi/libsas/sas_scsi_host.c
> index 53d8b7ede0cd..02274f471308 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -316,8 +316,11 @@ static enum task_disposition 
> sas_scsi_find_task(struct sas_task *task)
>   				pr_notice("%s: task 0x%p failed to abort\n",
>   					  __func__, task);
>   				return TASK_ABORT_FAILED;
> +			default:
> +				pr_notice("%s: task 0x%p result code %d not handled, assuming 
> failed\n",
> +					  __func__, task, res);
> +				return TASK_ABORT_FAILED;
>   			}
> -
>   		}
>   	}
>   	return res;
> 
> ---8< ----
> 
>>
>> I did a lot of testing/digging today, 
> 
> Thanks for the effort!
> 
>  > and the hang cause seems to be
>  > missing task completions.
>> At random, a task times out as its completion
> 
> That sounds fimilar to my general issue running this driver on an arm64 
> host...
> 
>> does not come, and subsequent abort trial for the task fail, revalidate
>> fails
> 
> I assume SMP IOs fail if revalidation fails - if this is the case, then 
> the controller seems to be in bad state.
> 
>> and the device is dropped (capacity goes to 0). But at that point,
>> doing rmmod/modprobe to reset the device does not work. sync cache
>> command issued at rmmod time never completes. I end up needing to power
>> cycle the machine every time...
>>
>> No clue about the root cause yet, but it definitely seem to be related
>> to NCQ/high QD operation. If I force my tests to use non-NCQ commands,
>> everything is fine and the tests run to completion without any issue.
>>
>> I wonder if their is a tag management bug somewhere...
> 
> Maybe. Not sure.
> 
> On a related point, Hannes' change here could avoid it:
> 
> https://lore.kernel.org/linux-scsi/20210222132405.91369-32-hare@suse.de/
> 
>>
>> I did stumble on something very ugly in libsas too: sas_ata_qc_issue()
>> drops and retake the ata port lock. No other ATA driver do that since
>> the ata completion also take that lock. The ata port lock is taken
>> before ata_qc_issue() is called with IRQ disabled (spin_lock_irqsave()).
>> So doing a spin_unlock()/spin_lock() in sas_ata_qc_issue() (called from
>> ata_qc_issue()) seems like a very bad idea. I removed that and
>> everything work the same way (the lld execute does not sleep). But that
>> did not solve the hang problem.
> 
> I would need to check why this is done again. Before my time...
> 
>>
>> Of note is this is all with your libsas patches applied. Without the
>> patches, I have KASAN screaming at me about use-after-free in completion
>> context. With your patches, KASAN is silent.
>>
>> Another thing: this driver does not allow changing the max qd... Very
>> annoying.
>>
>> echo 1 > /sys/block/sdX/device/queue_depth
>>
>> has no effect. QD stays at 32 for an ATA drive. Need to look into that too.
> 
> I had a look at this. It seems that we fail in 
> __ata_change_queue_depth() -> ata_scsi_find_dev() returning NULL.
> 
> Thanks again for your effort, I will continue to look.
> 
> john
Damien Le Moal Feb. 4, 2022, 3:02 a.m. UTC | #12
On 2/4/22 09:36, Damien Le Moal wrote:
> On 2/4/22 00:55, John Garry wrote:
>> On 03/02/2022 09:44, Damien Le Moal wrote:
>>
>> Hi Damien,
>>
>>>>>> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>>>>> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73
>>>>>> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>
>> Contrary to mentioning TMF in the log, this is not a TMF but rather an 
>> internal abort timing out. I don't think that this should ever happen. 
>> This command should just abort pending IO commands in the controller and 
>> not send anything to the target. So for this to timeout means a HW fault 
>> or driver bug. And I did not touch this code for pm8001.
>>
>>>>>> timeout.
>>>>>> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is
>>>>>> aborted
>>>>>> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is
>>>>>> aborted
>>>>>> [  411.192602] ata21.00: qc timeout (cmd 0xec)
>>>>>> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>>> timeout.
>>>>>> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [  431.685544] ata21.00: revalidation failed (errno=-5)
>>>>>> [  441.911948] ata21.00: qc timeout (cmd 0xec)
>>>>>> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task
>>>>>> timeout.
>>>>>> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>>>>> [  462.404992] ata21.00: revalidation failed (errno=-5)
>>>>>> [  492.598769] ata21.00: qc timeout (cmd 0xec)
>>>>>> ...
>>>>>>
>>
>> Do you have a fuller dmesg with my series?
> 
> Here it is below. Conditions: I rebased everything on Linus latest
> master, applied your series and the patch you sent below, rebooted with
> pm80xx module option logging_level=31.
> 
> Device scan is all OK. With the system idle, I simply start libzbc tests
> on the SATA SMR drive I have on the HBA.
> 
> The first command issued is 0x63 == NCQ NON DATA and seems to be OK, but
> it seems that there are inconsistencies. And that may be what breaks the
> adapter/driver state because everything after that command miserably
> fails and does not complete.
> 
> The inconsistency is this line says:
> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
> seems wrong:
> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
> 
> I need to go and check the specs what the FIS reply format is for
> NCQ_NON_DATA.
> 
> 
> [  137.187184] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
> [  137.199339] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
> command 0x63 inb 4
> [  137.207577] pm80xx0:: pm8001_mpi_msg_consume  1446:: CI=46 PI=47
> msgHeader=8104200d
> [  137.215399] pm80xx0:: mpi_sata_completion  2481:IO_SUCCESS
> [  137.220961] pm80xx0:: mpi_sata_completion  2503:SAS_PROTO_RESPONSE
> len = 20
> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
> [  137.233878] pm80xx0:: pm8001_mpi_msg_free_set  1403: CI=47 PI=47
> [  137.236696] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
> [  137.247102] pm80xx0:: pm80xx_chip_sata_req  4585:DMA
> [  137.252186] pm80xx0:: pm80xx_chip_sata_req  4593:FPDMA
> [  137.257400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
> command 0x65 inb f
> [  167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
> [  167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
> [  167.519049] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 2, abort
> task tag = 0x1
> [  187.969173] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
> timeout.
> [  187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
> [  187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
> aborted
> [  188.144734] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
> [  188.151555] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
> [  188.156648] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
> command 0xec inb 8
> [  193.600813] ata21.00: qc timeout (cmd 0xec)
> [  193.605976] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 4, abort
> task tag = 0x3
> [  214.080236] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
> timeout.
> [  214.087563] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [  214.093929] ata21.00: revalidation failed (errno=-5)
> [  214.256233] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
> [  214.263043] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
> [  214.268128] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
> command 0xec inb 8
> [  224.319899] ata21.00: qc timeout (cmd 0xec)
> [  224.324317] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 6, abort
> task tag = 0x5
> [  244.799433] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
> timeout.
> [  244.806750] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [  244.813110] ata21.00: revalidation failed (errno=-5)
> [  244.975500] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
> [  244.982314] pm80xx0:: pm80xx_chip_sata_req  4588:PIO
> [  244.987400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
> command 0xec inb 8
> [  275.006814] ata21.00: qc timeout (cmd 0xec)
> [  275.011236] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 8, abort
> task tag = 0x7
> [  295.486387] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
> timeout.
> [  295.494025] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [  295.500390] ata21.00: revalidation failed (errno=-5)
> [  295.509179] ata21.00: disable device
> [  295.670556] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1
> tries: 1
> [  295.670570] sd 19:0:2:0: [sdj] REPORT ZONES start lba 0 failed
> [  295.689972] sd 19:0:2:0: [sdj] REPORT ZONES: Result: hostbyte=DID_OK
> driverbyte=DRIVER_OK
> [  295.700621] sd 19:0:2:0: [sdj] Sense Key : Not Ready [current]
> [  295.708975] sd 19:0:2:0: [sdj] Add. Sense: Logical unit not ready,
> hard reset required
> [  295.719331] sd 19:0:2:0: [sdj] 0 4096-byte logical blocks: (0 B/0 B)
> [  295.728727] sd 19:0:2:0: [sdj] Write Protect is on
> [  295.737928] sdj: detected capacity change from 39063650304 to 0
> [  295.826347] sd 19:0:2:0: [sdj] tag#158 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  295.838761] sd 19:0:2:0: [sdj] tag#158 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  295.920864] sd 19:0:2:0: [sdj] tag#780 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  295.933341] sd 19:0:2:0: [sdj] tag#780 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.010417] sd 19:0:2:0: [sdj] tag#248 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.022275] sd 19:0:2:0: [sdj] tag#248 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.101348] sd 19:0:2:0: [sdj] tag#949 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.113924] sd 19:0:2:0: [sdj] tag#949 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.192258] sd 19:0:2:0: [sdj] tag#25 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.204788] sd 19:0:2:0: [sdj] tag#25 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.284546] sd 19:0:2:0: [sdj] tag#273 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.297126] sd 19:0:2:0: [sdj] tag#273 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.376896] sd 19:0:2:0: [sdj] tag#653 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.389475] sd 19:0:2:0: [sdj] tag#653 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.468095] sd 19:0:2:0: [sdj] tag#159 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.479974] sd 19:0:2:0: [sdj] tag#159 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.560861] sd 19:0:2:0: [sdj] tag#274 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.572792] sd 19:0:2:0: [sdj] tag#274 CDB: Test Unit Ready 00 00 00
> 00 00 00
> [  296.654506] sd 19:0:2:0: [sdj] tag#727 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK cmd_age=0s
> [  296.667151] sd 19:0:2:0: [sdj] tag#727 CDB: Test Unit Ready 00 00 00
> 00 00 00
> 
> After these messages, the tests exit on failure (drive dropped) and
> there are no more messages. Doing rmmod or anything else get stuck too.
> I have to reset the machine to get back to a good state.
> 
> I am starting to think that NCQ NON DATA command is being very badly
> handled... Switching the tests to run with all non-NCQ commands is
> working fine, albeit horribly slow (much slower than with other HBAs,
> e.g. Broadcom).
> 
> Digging...

I missed a KASAN splat during device scan on boot:

   33.725184]
==================================================================
[   33.746168] BUG: KASAN: use-after-free in __lock_acquire+0x41a5/0x5b50
[   33.764181] Read of size 8 at addr ffff88818a318660 by task
kworker/u64:6/583
[   33.786726]
[   33.802181] CPU: 6 PID: 583 Comm: kworker/u64:6 Not tainted
5.17.0-rc2+ #1425
[   33.823777] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS
2.1 06/02/2021
[   33.845112] Workqueue: events_unbound async_run_entry_fn
[   33.864185] Call Trace:
[   33.880147]  <TASK>
[   33.896183]  dump_stack_lvl+0x45/0x59
[   33.913180]  print_address_description.constprop.0+0x1f/0x120
[   33.932108]  ? __lock_acquire+0x41a5/0x5b50
[   33.949156]  kasan_report.cold+0x83/0xdf
[   33.965184]  ? __lock_acquire+0x41a5/0x5b50
[   33.982178]  __lock_acquire+0x41a5/0x5b50
[   33.998919]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[   34.019831]  ? pm8001_mpi_build_cmd+0x3ad/0x780 [pm80xx]
[   34.039171]  ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
[   34.057941]  ? pm80xx_chip_sata_req+0xa78/0x1cb0 [pm80xx]
[   34.076181]  lock_acquire+0x194/0x490
[   34.092185]  ? pm8001_queue_command+0x842/0x1150 [pm80xx]
[   34.113174]  ? lock_release+0x6b0/0x6b0
[   34.130184]  _raw_spin_lock+0x2c/0x40
[   34.147848]  ? pm8001_queue_command+0x842/0x1150 [pm80xx]
[   34.165964]  pm8001_queue_command+0x842/0x1150 [pm80xx]
[   34.185178]  ? __raw_spin_lock_init+0x3b/0x110
[   34.202748]  sas_ata_qc_issue+0x6d8/0xba0 [libsas]
[   34.220097]  ata_qc_issue+0x4f8/0xcc0 [libata]
[   34.242185]  ata_exec_internal_sg+0xacd/0x1790 [libata]
[   34.261176]  ? ata_qc_issue+0xcc0/0xcc0 [libata]
[   34.279059]  ? memset+0x20/0x40
[   34.296170]  ata_read_log_page+0x361/0x5d0 [libata]
[   34.313999]  ? ata_dev_set_feature+0x330/0x330 [libata]
[   34.332183]  ? ata_dev_set_feature+0x330/0x330 [libata]
[   34.353174]  ? vsprintf+0x10/0x10
[   34.369194]  ata_identify_page_supported+0xb8/0x2e0 [libata]
[   34.388168]  ata_dev_configure+0x161b/0x4b90 [libata]
[   34.407172]  ? _raw_spin_trylock_bh+0x50/0x70
[   34.424188]  ? ata_do_dev_read_id+0xe0/0xe0 [libata]
[   34.442100]  ? ata_hpa_resize+0xce0/0xce0 [libata]
[   34.458927]  ? memcpy+0x39/0x60
[   34.475172]  ? ata_dev_read_id+0xf70/0xf70 [libata]
[   34.492172]  ata_dev_revalidate+0x172/0x2b0 [libata]
[   34.508945]  ata_do_set_mode+0x11f5/0x2410 [libata]
[   34.525762]  ? find_held_lock+0x2c/0x110
[   34.542169]  ? ata_dev_revalidate+0x2b0/0x2b0 [libata]
[   34.558174]  ? ata_eh_recover+0x181e/0x33e0 [libata]
[   34.575698]  ata_set_mode+0x2e8/0x3f0 [libata]
[   34.592016]  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[   34.609166]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
[   34.625180]  ? trace_hardirqs_on+0x1c/0x110
[   34.641190]  ata_eh_recover+0x23be/0x33e0 [libata]
[   34.658188]  ? sas_ata_hard_reset+0x310/0x310 [libsas]
[   34.675144]  ? sas_ata_qc_fill_rtf+0x90/0x90 [libsas]
[   34.691182]  ? ata_link_nr_enabled+0x50/0x50 [libata]
[   34.708173]  ? find_held_lock+0x2c/0x110
[   34.723717]  ? lock_release+0x1dd/0x6b0
[   34.740168]  ? ata_scsi_port_error_handler+0x4d1/0xe60 [libata]
[   34.758180]  ? sas_ata_hard_reset+0x310/0x310 [libsas]
[   34.774868]  ? sas_ata_hard_reset+0x310/0x310 [libsas]
[   34.791740]  ? sas_ata_qc_fill_rtf+0x90/0x90 [libsas]
[   34.808169]  ata_do_eh+0x75/0x150 [libata]
[   34.824178]  ? trace_hardirqs_on+0x1c/0x110
[   34.840176]  ata_scsi_port_error_handler+0x536/0xe60 [libata]
[   34.857110]  ? sas_fail_probe.constprop.0+0xb5/0xb5 [libsas]
[   34.874168]  async_sas_ata_eh+0xcf/0x112 [libsas]
[   34.890978]  async_run_entry_fn+0x93/0x500
[   34.907184]  process_one_work+0x7f0/0x1310
[   34.923188]  ? lock_release+0x6b0/0x6b0
[   34.939169]  ? pwq_dec_nr_in_flight+0x230/0x230
[   34.955754]  ? rwlock_bug.part.0+0x90/0x90
[   34.972232]  worker_thread+0x598/0xf70
[   34.987993]  ? process_one_work+0x1310/0x1310
[   35.004971]  kthread+0x28f/0x330
[   35.020168]  ? kthread_complete_and_exit+0x20/0x20
[   35.037168]  ret_from_fork+0x1f/0x30
[   35.052525]  </TASK>
[   35.066623]
[   35.081181] Allocated by task 583:
[   35.097188]  kasan_save_stack+0x1e/0x40
[   35.113930]  __kasan_slab_alloc+0x64/0x80
[   35.130571]  kmem_cache_alloc+0x1a6/0x450
[   35.148041]  sas_alloc_task+0x1b/0x80 [libsas]
[   35.163187]  sas_ata_qc_issue+0x1a8/0xba0 [libsas]
[   35.180145]  ata_qc_issue+0x4f8/0xcc0 [libata]
[   35.195456]  ata_exec_internal_sg+0xacd/0x1790 [libata]
[   35.213179]  ata_read_log_page+0x361/0x5d0 [libata]
[   35.228915]  ata_identify_page_supported+0xb8/0x2e0 [libata]
[   35.246182]  ata_dev_configure+0x161b/0x4b90 [libata]
[   35.262189]  ata_dev_revalidate+0x172/0x2b0 [libata]
[   35.280168]  ata_do_set_mode+0x11f5/0x2410 [libata]
[   35.296178]  ata_set_mode+0x2e8/0x3f0 [libata]
[   35.311176]  ata_eh_recover+0x23be/0x33e0 [libata]
[   35.327168]  ata_do_eh+0x75/0x150 [libata]
[   35.342188]  ata_scsi_port_error_handler+0x536/0xe60 [libata]
[   35.360166]  async_sas_ata_eh+0xcf/0x112 [libsas]
[   35.376365]  async_run_entry_fn+0x93/0x500
[   35.392175]  process_one_work+0x7f0/0x1310
[   35.408169]  worker_thread+0x598/0xf70
[   35.424172]  kthread+0x28f/0x330
[   35.439850]  ret_from_fork+0x1f/0x30
[   35.456177]
[   35.469109] Freed by task 0:
[   35.482174]  kasan_save_stack+0x1e/0x40
[   35.496889]  kasan_set_track+0x21/0x30
[   35.512165]  kasan_set_free_info+0x20/0x30
[   35.527168]  __kasan_slab_free+0xd8/0x110
[   35.542454]  kmem_cache_free.part.0+0x67/0x170
[   35.559111]  mpi_sata_completion+0x99c/0x2d70 [pm80xx]
[   35.576044]  process_oq+0xbd2/0x7c20 [pm80xx]
[   35.592169]  pm80xx_chip_isr+0x94/0x130 [pm80xx]
[   35.608180]  tasklet_action_common.constprop.0+0x24b/0x2f0
[   35.625171]  __do_softirq+0x1b5/0x82d
[   35.640187]
[   35.653755] The buggy address belongs to the object at ffff88818a318640
[   35.653755]  which belongs to the cache sas_task of size 320
[   35.688176] The buggy address is located 32 bytes inside of
[   35.688176]  320-byte region [ffff88818a318640, ffff88818a318780)
[   35.723167] The buggy address belongs to the page:
[   35.740168] page:000000006ae5e64e refcount:1 mapcount:0
mapping:0000000000000000 index:0xffff88818a3184c0 pfn:0x18a318
[   35.762689] flags: 0x20000000000200(slab|node=0|zone=2)
[   35.780183] raw: 0020000000000200 ffff888100f02440 ffff888100f02440
ffff888100f09e00
[   35.800300] raw: ffff88818a3184c0 ffff88818a318040 0000000100000008
0000000000000000
[   35.821174] page dumped because: kasan: bad access detected
[   35.840167]
[   35.855170] Memory state around the buggy address:
[   35.873169]  ffff88818a318500: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[   35.893918]  ffff88818a318580: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[   35.913172] >ffff88818a318600: fc fc fc fc fc fc fc fc fa fb fb fb fb
fb fb fb
[   35.932176]                                                        ^
[   35.953171]  ffff88818a318680: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[   35.974171]  ffff88818a318700: fb fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[   35.994168]
==================================================================

This is the submission path, not completion. The code is:

(gdb) list *(pm8001_queue_command+0x842)
0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
486				atomic_dec(&pm8001_dev->running_req);
487				goto err_out_tag;
488			}
489			/* TODO: select normal or high priority */
490			spin_lock(&t->task_state_lock);
491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
492			spin_unlock(&t->task_state_lock);
493		} while (0);
494		rc = 0;
495		goto out_done;

So the task is already completed when the submission path tries to set
the state flag ? Debugging...
John Garry Feb. 4, 2022, 10:36 a.m. UTC | #13
On 04/02/2022 03:02, Damien Le Moal wrote:
>> The inconsistency is this line says:
>> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
>> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
>> seems wrong:
>> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
>>
>> I need to go and check the specs what the FIS reply format is for
>> NCQ_NON_DATA.
>>
>>
>> [  137.187184] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
>> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
>> [  137.199339] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
>> command 0x63 inb 4
>> [  137.207577] pm80xx0:: pm8001_mpi_msg_consume  1446:: CI=46 PI=47
>> msgHeader=8104200d
>> [  137.215399] pm80xx0:: mpi_sata_completion  2481:IO_SUCCESS
>> [  137.220961] pm80xx0:: mpi_sata_completion  2503:SAS_PROTO_RESPONSE
>> len = 20
>> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
>> [  137.233878] pm80xx0:: pm8001_mpi_msg_free_set  1403: CI=47 PI=47
>> [  137.236696] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
>> [  137.247102] pm80xx0:: pm80xx_chip_sata_req  4585:DMA
>> [  137.252186] pm80xx0:: pm80xx_chip_sata_req  4593:FPDMA
>> [  137.257400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
>> command 0x65 inb f
>> [  167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>> [  167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
>> [  167.519049] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 2, abort
>> task tag = 0x1
>> [  187.969173] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
>> timeout.

As I mentioned, having this fail is a red flag. If I was pushed to guess 
what has happened, I'd say the FW is faulting due to some erroneous 
driver behaviour.

>> [  187.976450] sas: sas_scsi_find_task: task 0x00000000aa372627 is aborted
>> [  187.983244] sas: sas_eh_handle_sas_errors: task 0x00000000aa372627 is
>> aborted



>>
>> After these messages, the tests exit on failure (drive dropped) and
>> there are no more messages. Doing rmmod or anything else get stuck too.
>> I have to reset the machine to get back to a good state.
>>
>> I am starting to think that NCQ NON DATA command is being very badly
>> handled... Switching the tests to run with all non-NCQ commands is
>> working fine, albeit horribly slow (much slower than with other HBAs,
>> e.g. Broadcom).
>>
>> Digging...
> I missed a KASAN splat during device scan on boot:
> 
>     33.725184]
> ==================================================================
> [   33.746168] BUG: KASAN: use-after-free in __lock_acquire+0x41a5/0x5b50
> [   33.764181] Read of size 8 at addr ffff88818a318660 by task
> kworker/u64:6/583

...

> ==================================================================
> 
> This is the submission path, not completion. The code is:
> 
> (gdb) list *(pm8001_queue_command+0x842)
> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
> 486				atomic_dec(&pm8001_dev->running_req);
> 487				goto err_out_tag;
> 488			}
> 489			/* TODO: select normal or high priority */
> 490			spin_lock(&t->task_state_lock);
> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> 492			spin_unlock(&t->task_state_lock);
> 493		} while (0);
> 494		rc = 0;
> 495		goto out_done;
> 
> So the task is already completed when the submission path tries to set
> the state flag ? Debugging...

Yeah, that's how it looks.

I already mentioned this problem here:

https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/

Maybe we should just fix it now to rule it out of possibly causing other 
issues... I was reluctant to fix it as many places seems to need to be 
touched. Let me check it.

Thanks,
John
Damien Le Moal Feb. 4, 2022, 11:27 a.m. UTC | #14
On 2/4/22 19:36, John Garry wrote:
> On 04/02/2022 03:02, Damien Le Moal wrote:
>>> The inconsistency is this line says:
>>> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
>>> Which seems to be sensical for NCQ_NON_DATA command, but then, this line
>>> seems wrong:
>>> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
>>>
>>> I need to go and check the specs what the FIS reply format is for
>>> NCQ_NON_DATA.
>>>
>>>
>>> [  137.187184] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
>>> [  137.193944] pm80xx0:: pm80xx_chip_sata_req  4581:no data
>>> [  137.199339] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
>>> command 0x63 inb 4
>>> [  137.207577] pm80xx0:: pm8001_mpi_msg_consume  1446:: CI=46 PI=47
>>> msgHeader=8104200d
>>> [  137.215399] pm80xx0:: mpi_sata_completion  2481:IO_SUCCESS
>>> [  137.220961] pm80xx0:: mpi_sata_completion  2503:SAS_PROTO_RESPONSE
>>> len = 20
>>> [  137.228015] pm80xx0:: mpi_sata_completion  2515:FPDMA len = 8
>>> [  137.233878] pm80xx0:: pm8001_mpi_msg_free_set  1403: CI=47 PI=47
>>> [  137.236696] pm80xx0:: pm8001_queue_command  408:pm8001_task_exec device
>>> [  137.247102] pm80xx0:: pm80xx_chip_sata_req  4585:DMA
>>> [  137.252186] pm80xx0:: pm80xx_chip_sata_req  4593:FPDMA
>>> [  137.257400] pm80xx0:: pm80xx_chip_sata_req  4682:Sending Normal SATA
>>> command 0x65 inb f
>>> [  167.506280] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
>>> [  167.512363] sas: sas_scsi_find_task: aborting task 0x00000000aa372627
>>> [  167.519049] pm80xx0:: pm8001_chip_abort_task  4607:cmd_tag = 2, abort
>>> task tag = 0x1
>>> [  187.969173] pm80xx0:: pm8001_exec_internal_task_abort  753:TMF task
>>> timeout.
> 
> As I mentioned, having this fail is a red flag. If I was pushed to guess 
> what has happened, I'd say the FW is faulting due to some erroneous 
> driver behaviour.

I am still thinking that there is something wrong with the handling of
NCQ NON DATA command. There are several places where the code determines
non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
the fpdma bucket, which is wrong.

>> This is the submission path, not completion. The code is:
>>
>> (gdb) list *(pm8001_queue_command+0x842)
>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>> 486				atomic_dec(&pm8001_dev->running_req);
>> 487				goto err_out_tag;
>> 488			}
>> 489			/* TODO: select normal or high priority */
>> 490			spin_lock(&t->task_state_lock);
>> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>> 492			spin_unlock(&t->task_state_lock);
>> 493		} while (0);
>> 494		rc = 0;
>> 495		goto out_done;
>>
>> So the task is already completed when the submission path tries to set
>> the state flag ? Debugging...
> 
> Yeah, that's how it looks.
> 
> I already mentioned this problem here:
> 
> https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/
> 
> Maybe we should just fix it now to rule it out of possibly causing other 
> issues... I was reluctant to fix it as many places seems to need to be 
> touched. Let me check it.

Here is my current fix:

diff --git a/drivers/scsi/pm8001/pm8001_sas.c
b/drivers/scsi/pm8001/pm8001_sas.c
index 1b95c73d12d1..16c101577dd3 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)
                ccb->ccb_tag = tag;
                ccb->task = t;
                ccb->device = pm8001_dev;
+
+               /* TODO: select normal or high priority */
+               atomic_inc(&pm8001_dev->running_req);
+               spin_lock(&t->task_state_lock);
+               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
+               spin_unlock(&t->task_state_lock);
+
                switch (task_proto) {
                case SAS_PROTOCOL_SMP:
-                       atomic_inc(&pm8001_dev->running_req);
                        rc = pm8001_task_prep_smp(pm8001_ha, ccb);
                        break;
                case SAS_PROTOCOL_SSP:
-                       atomic_inc(&pm8001_dev->running_req);
                        if (is_tmf)
                                rc = pm8001_task_prep_ssp_tm(pm8001_ha,
                                        ccb, tmf);
@@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)
                        break;
                case SAS_PROTOCOL_SATA:
                case SAS_PROTOCOL_STP:
-                       atomic_inc(&pm8001_dev->running_req);
                        rc = pm8001_task_prep_ata(pm8001_ha, ccb);
                        break;
                default:
@@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
gfp_t gfp_flags)

                if (rc) {
                        pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
+                       spin_lock(&t->task_state_lock);
+                       t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
+                       spin_unlock(&t->task_state_lock);
                        atomic_dec(&pm8001_dev->running_req);
                        goto err_out_tag;
                }
-               /* TODO: select normal or high priority */
-               spin_lock(&t->task_state_lock);
-               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
-               spin_unlock(&t->task_state_lock);
        } while (0);
        rc = 0;
        goto out_done;

With this, No KASAN complaint. I will send a proper patch ASAP.

Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
It is set and unset only, never tested anywhere in libsas nor pm8001
driver. This flag seems totally useless to me, unless this is something
that the HW can see ?
John Garry Feb. 4, 2022, 11:50 a.m. UTC | #15
On 04/02/2022 11:27, Damien Le Moal wrote:
>> As I mentioned, having this fail is a red flag. If I was pushed to guess
>> what has happened, I'd say the FW is faulting due to some erroneous
>> driver behaviour.
> I am still thinking that there is something wrong with the handling of
> NCQ NON DATA command. There are several places where the code determines
> non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
> the fpdma bucket, which is wrong.
> 

Ok, I will have a look at this. We made some libsas changes related to 
this not so long ago to "fix" something, see:

53de092f47f ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc 
as NODATA")

176ddd89171d ("scsi: libsas: Reset num_scatter if libata marks qc as 
NODATA")

>>> This is the submission path, not completion. The code is:
>>>
>>> (gdb) list *(pm8001_queue_command+0x842)
>>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>>> 486				atomic_dec(&pm8001_dev->running_req);
>>> 487				goto err_out_tag;
>>> 488			}
>>> 489			/* TODO: select normal or high priority */
>>> 490			spin_lock(&t->task_state_lock);
>>> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>>> 492			spin_unlock(&t->task_state_lock);
>>> 493		} while (0);
>>> 494		rc = 0;
>>> 495		goto out_done;
>>>
>>> So the task is already completed when the submission path tries to set
>>> the state flag ? Debugging...
>> Yeah, that's how it looks.
>>
>> I already mentioned this problem here:
>>
>> https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/
>>
>> Maybe we should just fix it now to rule it out of possibly causing other
>> issues... I was reluctant to fix it as many places seems to need to be
>> touched. Let me check it.
> Here is my current fix:
> 
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c
> b/drivers/scsi/pm8001/pm8001_sas.c
> index 1b95c73d12d1..16c101577dd3 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>                  ccb->ccb_tag = tag;
>                  ccb->task = t;
>                  ccb->device = pm8001_dev;
> +
> +               /* TODO: select normal or high priority */
> +               atomic_inc(&pm8001_dev->running_req);
> +               spin_lock(&t->task_state_lock);
> +               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> +               spin_unlock(&t->task_state_lock);
> +
>                  switch (task_proto) {
>                  case SAS_PROTOCOL_SMP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          rc = pm8001_task_prep_smp(pm8001_ha, ccb);
>                          break;
>                  case SAS_PROTOCOL_SSP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          if (is_tmf)
>                                  rc = pm8001_task_prep_ssp_tm(pm8001_ha,
>                                          ccb, tmf);
> @@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>                          break;
>                  case SAS_PROTOCOL_SATA:
>                  case SAS_PROTOCOL_STP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          rc = pm8001_task_prep_ata(pm8001_ha, ccb);
>                          break;
>                  default:
> @@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> 
>                  if (rc) {
>                          pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
> +                       spin_lock(&t->task_state_lock);
> +                       t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
> +                       spin_unlock(&t->task_state_lock);
>                          atomic_dec(&pm8001_dev->running_req);
>                          goto err_out_tag;
>                  }
> -               /* TODO: select normal or high priority */
> -               spin_lock(&t->task_state_lock);
> -               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> -               spin_unlock(&t->task_state_lock);
>          } while (0);
>          rc = 0;
>          goto out_done;
> 
> With this, No KASAN complaint. I will send a proper patch ASAP.

I had just prepared a patch, but different, attached.

> 
> Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
> It is set and unset only, never tested anywhere in libsas nor pm8001
> driver. This flag seems totally useless to me, unless this is something
> that the HW can see ?

Right, it is only checked by isci AFAIC. And it is not something which 
HW can see.

Since libsas does not check it the semantics are ill-defined. However I 
think that it's worth setting it completeness; I thought it was better 
setting it just before the command is delivered to HW, but the 
implementation touches more SW. Simpler approach may be better since it 
aligns with how this flag may be cleared in pm8001_chip_sata_req() under 
asusmption it is already set.

Thanks,
John
From 1aea5d406de8806de227f79c2728899959153400 Mon Sep 17 00:00:00 2001
From: John Garry <john.garry@huawei.com>
Date: Fri, 4 Feb 2022 11:12:12 +0000
Subject: [PATCH] scsi: pm8001: Set SAS_TASK_AT_INITIATOR before delivering
 task

Both Damien and I have seen a KASAN use-after-free warn for accessing a
sas_task after delivering the associated command to HW.

The issue is that once the command is delivered it is completed (and freed)
async, and, as such, it is not safe to touch the sas_task again in the
delivery path.

However the sas_task is touched again in setting the task state
SAS_TASK_AT_INITIATOR flag. The semantics for that flag is not defined,
specifically because it is not checked by libsas. Indeed, it is not even
checked by this driver. For the sake of completeness, set that flag before
delivering the command to the HW.

Reported-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>

diff --git a/drivers/scsi/pm8001/pm8001_hwi.c b/drivers/scsi/pm8001/pm8001_hwi.c
index c814e5071712..c6f5aa27bfbf 100644
--- a/drivers/scsi/pm8001/pm8001_hwi.c
+++ b/drivers/scsi/pm8001/pm8001_hwi.c
@@ -1796,6 +1796,7 @@ static void pm8001_send_abort_all(struct pm8001_hba_info *pm8001_ha,
 	task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
 	task_abort.tag = cpu_to_le32(ccb_tag);
 
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &task_abort,
 			sizeof(task_abort), 0);
 	if (ret)
@@ -1868,6 +1869,7 @@ static void pm8001_send_read_log(struct pm8001_hba_info *pm8001_ha,
 	sata_cmd.ncqtag_atap_dir_m |= ((0x1 << 7) | (0x5 << 9));
 	memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
 
+	pm8001_set_task_at_initiator(task);
 	res = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &sata_cmd,
 			sizeof(sata_cmd), 0);
 	if (res) {
@@ -4198,6 +4200,8 @@ static int pm8001_chip_smp_req(struct pm8001_hba_info *pm8001_ha,
 	smp_cmd.long_smp_req.long_resp_size =
 		cpu_to_le32((u32)sg_dma_len(&task->smp_task.smp_resp)-4);
 	build_smp_cmd(pm8001_dev->device_id, smp_cmd.tag, &smp_cmd);
+
+	pm8001_set_task_at_initiator(task);
 	rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc,
 			&smp_cmd, sizeof(smp_cmd), 0);
 	if (rc)
@@ -4266,6 +4270,7 @@ static int pm8001_chip_ssp_io_req(struct pm8001_hba_info *pm8001_ha,
 		ssp_cmd.len = cpu_to_le32(task->total_xfer_len);
 		ssp_cmd.esgl = 0;
 	}
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &ssp_cmd,
 			sizeof(ssp_cmd), 0);
 	return ret;
@@ -4375,6 +4380,7 @@ static int pm8001_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 		}
 	}
 
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &sata_cmd,
 			sizeof(sata_cmd), 0);
 	return ret;
@@ -4646,6 +4652,7 @@ int pm8001_chip_ssp_tm_req(struct pm8001_hba_info *pm8001_ha,
 	if (pm8001_ha->chip_id != chip_8001)
 		sspTMCmd.ds_ads_m = 0x08;
 	circularQ = &pm8001_ha->inbnd_q_tbl[0];
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &sspTMCmd,
 			sizeof(sspTMCmd), 0);
 	return ret;
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index cec1ad47bf5b..104fb4fe3526 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -488,9 +488,6 @@ static int pm8001_task_exec(struct sas_task *task,
 		}
 		mdelay(10);
 		/* TODO: select normal or high priority */
-		spin_lock(&t->task_state_lock);
-		t->task_state_flags |= SAS_TASK_AT_INITIATOR;
-		spin_unlock(&t->task_state_lock);
 	} while (0);
 	rc = 0;
 	goto out_done;
diff --git a/drivers/scsi/pm8001/pm8001_sas.h b/drivers/scsi/pm8001/pm8001_sas.h
index a17da1cebce1..a7da0cb35a77 100644
--- a/drivers/scsi/pm8001/pm8001_sas.h
+++ b/drivers/scsi/pm8001/pm8001_sas.h
@@ -748,5 +748,12 @@ pm8001_ccb_task_free_done(struct pm8001_hba_info *pm8001_ha,
 	task->task_done(task);
 }
 
+static inline void pm8001_set_task_at_initiator(struct sas_task *t)
+{
+	spin_lock(&t->task_state_lock);
+	t->task_state_flags |= SAS_TASK_AT_INITIATOR;
+	spin_unlock(&t->task_state_lock);
+}
+
 #endif
 
diff --git a/drivers/scsi/pm8001/pm80xx_hwi.c b/drivers/scsi/pm8001/pm80xx_hwi.c
index 2530d1365556..11b911fac2c8 100644
--- a/drivers/scsi/pm8001/pm80xx_hwi.c
+++ b/drivers/scsi/pm8001/pm80xx_hwi.c
@@ -1809,6 +1809,7 @@ static void pm80xx_send_abort_all(struct pm8001_hba_info *pm8001_ha,
 	task_abort.device_id = cpu_to_le32(pm8001_ha_dev->device_id);
 	task_abort.tag = cpu_to_le32(ccb_tag);
 
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &task_abort,
 			sizeof(task_abort), 0);
 	pm8001_dbg(pm8001_ha, FAIL, "Executing abort task end\n");
@@ -1885,6 +1886,7 @@ static void pm80xx_send_read_log(struct pm8001_hba_info *pm8001_ha,
 	sata_cmd.ncqtag_atap_dir_m_dad |= ((0x1 << 7) | (0x5 << 9));
 	memcpy(&sata_cmd.sata_fis, &fis, sizeof(struct host_to_dev_fis));
 
+	pm8001_set_task_at_initiator(task);
 	res = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &sata_cmd,
 			sizeof(sata_cmd), 0);
 	pm8001_dbg(pm8001_ha, FAIL, "Executing read log end\n");
@@ -4342,6 +4344,7 @@ static int pm80xx_chip_smp_req(struct pm8001_hba_info *pm8001_ha,
 	kunmap_atomic(to);
 	build_smp_cmd(pm8001_dev->device_id, smp_cmd.tag,
 				&smp_cmd, pm8001_ha->smp_exp_mode, length);
+	pm8001_set_task_at_initiator(task);
 	rc = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc, &smp_cmd,
 			sizeof(smp_cmd), 0);
 	if (rc)
@@ -4538,6 +4541,7 @@ static int pm80xx_chip_ssp_io_req(struct pm8001_hba_info *pm8001_ha,
 			ssp_cmd.esgl = 0;
 		}
 	}
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc,
 			&ssp_cmd, sizeof(ssp_cmd), q_index);
 	return ret;
@@ -4774,6 +4778,7 @@ static int pm80xx_chip_sata_req(struct pm8001_hba_info *pm8001_ha,
 				ccb->ccb_tag, opc,
 				qc ? qc->tf.command : 0, // ata opcode
 				ccb->device ? atomic_read(&ccb->device->running_req) : 0);
+	pm8001_set_task_at_initiator(task);
 	ret = pm8001_mpi_build_cmd(pm8001_ha, circularQ, opc,
 			&sata_cmd, sizeof(sata_cmd), q_index);
 	return ret;
John Garry Feb. 7, 2022, 1:09 p.m. UTC | #16
Hi Damien,

On 04/02/2022 03:02, Damien Le Moal wrote:
> This is the submission path, not completion. The code is:
> 
> (gdb) list *(pm8001_queue_command+0x842)
> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
> 486				atomic_dec(&pm8001_dev->running_req);
> 487				goto err_out_tag;
> 488			}
> 489			/* TODO: select normal or high priority */
> 490			spin_lock(&t->task_state_lock);
> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> 492			spin_unlock(&t->task_state_lock);
> 493		} while (0);
> 494		rc = 0;
> 495		goto out_done;
> 
> So the task is already completed when the submission path tries to set
> the state flag ? Debugging...

JFYI, I am putting together a patch to drop SAS_TASK_AT_INITIATOR 
altogether. I just need to ensure that the logic in the isci code is 
correct.

Thanks,
John
Damien Le Moal Feb. 7, 2022, 1:13 p.m. UTC | #17
On 2/7/22 22:09, John Garry wrote:
> Hi Damien,
> 
> On 04/02/2022 03:02, Damien Le Moal wrote:
>> This is the submission path, not completion. The code is:
>>
>> (gdb) list *(pm8001_queue_command+0x842)
>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>> 486				atomic_dec(&pm8001_dev->running_req);
>> 487				goto err_out_tag;
>> 488			}
>> 489			/* TODO: select normal or high priority */
>> 490			spin_lock(&t->task_state_lock);
>> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>> 492			spin_unlock(&t->task_state_lock);
>> 493		} while (0);
>> 494		rc = 0;
>> 495		goto out_done;
>>
>> So the task is already completed when the submission path tries to set
>> the state flag ? Debugging...
> 
> JFYI, I am putting together a patch to drop SAS_TASK_AT_INITIATOR 
> altogether. I just need to ensure that the logic in the isci code is 
> correct.

Great. Less dead code for the pm8001 driver :)

I was busy with a nasty libata bug today so I did not continue my
investigations. Will try tomorrow.


> 
> Thanks,
> John