mbox series

[v3,00/31] libsas and pm8001 fixes

Message ID 20220214021747.4976-1-damien.lemoal@opensource.wdc.com
Headers show
Series libsas and pm8001 fixes | expand

Message

Damien Le Moal Feb. 14, 2022, 2:17 a.m. UTC
This first part of this series (patches 1 to 24) fixes handling of NCQ
NON DATA commands in libsas and many bugs in the pm8001 driver.

The fixes for the pm8001 driver include:
* Suppression of all sparse warnings, fixing along the way many le32
  handling bugs for big-endian architectures
* Fix handling of NCQ NON DATA commands
* Fix of tag values handling (0 *is* a valid tag value)
* Fix many tag iand memory leaks in error path
* Fix NCQ error recovery (abort all task execution) that was causing a
  crash

The second part of the series (patches 25 to 31) iadd a small cleanup of
libsas code and many simplifications of the pm8001 driver code.

With these fixes, libzbc test suite passes all test case. This test
suite was used with an SMR drive for testing because it generates many
NCQ NON DATA commands (for zone management commands) and also generates
many NCQ command errors to check ASC/ASCQ returned by the device. With
the test suite, the error recovery path was extensively exercised. The
same tests were also executed with a SAS SMR drives to exercise the
error path.

The patches are based on the 5.18/scsi-staging tree.

Changes from v2:
* Reorganized the series: fixes first, cleanups second.
* Added more bug/leaks fix patches
* Addressed Gary's comment for the ccb alloc helper (patch 28)
* Rebased (and tested) all patches on 5.18/scsi-staging

Changes from v1:
* Added reviewed-by tags
* Addressed Christoph's comments on patch 4 and 8
* Added patches 21 and 22 to fix 2 additional problems found while
  preparing this v2 series
* Added patch 23 and 24 to cleanup the code further.

Damien Le Moal (31):
  scsi: libsas: Fix sas_ata_qc_issue() handling of NCQ NON DATA commands
  scsi: pm8001: Fix __iomem pointer use in pm8001_phy_control()
  scsi: pm8001: Fix pm8001_update_flash() local variable type
  scsi: pm8001: Fix command initialization in pm80XX_send_read_log()
  scsi: pm8001: Fix pm80xx_pci_mem_copy() interface
  scsi: pm8001: Fix command initialization in pm8001_chip_ssp_tm_req()
  scsi: pm8001: Fix payload initialization in pm80xx_set_thermal_config()
  scsi: pm8001: Fix le32 values handling in pm80xx_set_sas_protocol_timer_config()
  scsi: pm8001: Fix payload initialization in pm80xx_encrypt_update()
  scsi: pm8001: Fix le32 values handling in pm80xx_chip_ssp_io_req()
  scsi: pm8001: Fix le32 values handling in pm80xx_chip_sata_req()
  scsi: pm8001: Fix use of struct set_phy_profile_req fields
  scsi: pm8001: Remove local variable in pm8001_pci_resume()
  scsi: pm8001: Fix NCQ NON DATA command task initialization
  scsi: pm8001: Fix NCQ NON DATA command completion handling
  scsi: pm8001: Fix abort all task initialization
  scsi: pm8001: Fix pm8001_tag_alloc() failures handling
  scsi: pm8001: Fix pm80xx_chip_phy_ctl_req()
  scsi: pm8001: Fix pm8001_mpi_task_abort_resp()
  scsi: pm8001: Fix tag values handling
  scsi: pm8001: Fix task leak in pm8001_send_abort_all()
  scsi: pm8001: Fix tag leaks on error
  scsi: pm8001: fix memory leak in pm8001_chip_fw_flash_update_req()
  scsi: pm8001: Fix process_one_iomb() kdoc comment
  scsi: libsas: Simplify sas_ata_qc_issue() detection of NCQ commands
  scsi: pm8001: Simplify pm8001_get_ncq_tag()
  scsi: pm8001: Cleanup pm8001_queue_command()
  scsi: pm8001: Introduce ccb alloc/free helpers
  scsi: pm8001: Simplify pm8001_mpi_build_cmd() interface
  scsi: pm8001: Simplify pm8001_task_exec()
  scsi: pm8001: Simplify pm8001_ccb_task_free()

 drivers/scsi/libsas/sas_ata.c     |  11 +-
 drivers/scsi/pm8001/pm8001_ctl.c  |   5 +-
 drivers/scsi/pm8001/pm8001_hwi.c  | 450 ++++++++++++-----------------
 drivers/scsi/pm8001/pm8001_init.c |  11 +-
 drivers/scsi/pm8001/pm8001_sas.c  | 297 +++++++++----------
 drivers/scsi/pm8001/pm8001_sas.h  |  64 ++++-
 drivers/scsi/pm8001/pm80xx_hwi.c  | 458 ++++++++++++++----------------
 drivers/scsi/pm8001/pm80xx_hwi.h  |   2 +-
 8 files changed, 605 insertions(+), 693 deletions(-)

Comments

Damien Le Moal Feb. 14, 2022, 10:29 p.m. UTC | #1
On 2/15/22 03:06, John Garry wrote:
> On 14/02/2022 02:17, Damien Le Moal wrote:
>> This first part of this series (patches 1 to 24) fixes handling of NCQ
>> NON DATA commands in libsas and many bugs in the pm8001 driver.
>>
>> The fixes for the pm8001 driver include:
>> * Suppression of all sparse warnings, fixing along the way many le32
>>    handling bugs for big-endian architectures
>> * Fix handling of NCQ NON DATA commands
>> * Fix of tag values handling (0*is*  a valid tag value)
>> * Fix many tag iand memory leaks in error path
>> * Fix NCQ error recovery (abort all task execution) that was causing a
>>    crash
>>
>> The second part of the series (patches 25 to 31) iadd a small cleanup of
>> libsas code and many simplifications of the pm8001 driver code.
>>
>> With these fixes, libzbc test suite passes all test case. This test
>> suite was used with an SMR drive for testing because it generates many
>> NCQ NON DATA commands (for zone management commands) and also generates
>> many NCQ command errors to check ASC/ASCQ returned by the device. With
>> the test suite, the error recovery path was extensively exercised. The
>> same tests were also executed with a SAS SMR drives to exercise the
>> error path.
>>
>> The patches are based on the 5.18/scsi-staging tree.
> 
> Hi Damien,
> 
> jfyi, I still see the hang with this series. I don't think that the tag 
> fixes were relevant unfortunately.

As mentioned above, I did test with a SAS drive too (an SMR one to
heavily test the error path) and it worked perfectly.
Note that using Martin's rc1 based scsi-staging tree, I did see a lot of
KASAN complaints on boot regarding MSI/PCI setup. These warnings are
gone with rc3/4. What kernel version base are you using ?

I could not find the ARM board I have in the lab yesterday. Will try
again to find it and test with it.

> 
> btw, how about add guys from get_maintainers.pl to lighten the review 
> workload (and we should have the official maintainer anyway)? There are 
> quite a few patches now...

I did that... I will check again. I may have made mistake creating the
distribution list.

> 
> Thanks,
> John
Martin K. Petersen Feb. 15, 2022, 3:18 a.m. UTC | #2
Hi Damien!

> Note that there is a conflict between 5.18/scsi-staging and 5.17-rc3/4
> in the pm8001 driver. And I need to touch rc3/rc4 code too. Could you
> rebase scsi-staging ?

I pulled in 5.17/scsi-fixes and fixed up the conflicts.

I suggest you merge your -rc of choice when testing. That's what I do if
the baseline kernel has problems on my lab systems.
Damien Le Moal Feb. 15, 2022, 7:38 a.m. UTC | #3
On 2/15/22 12:18, Martin K. Petersen wrote:
> 
> Hi Damien!
> 
>> Note that there is a conflict between 5.18/scsi-staging and 5.17-rc3/4
>> in the pm8001 driver. And I need to touch rc3/rc4 code too. Could you
>> rebase scsi-staging ?
> 
> I pulled in 5.17/scsi-fixes and fixed up the conflicts.

Thanks !

> I suggest you merge your -rc of choice when testing. That's what I do if
> the baseline kernel has problems on my lab systems.

That is what I normally do, but for some reason, this time, I kept
ending up with a non working kernel...
I must have done something very wrong :)