mbox series

[v4,0/4] spi: cadence-qspi: Fix runtime PM and system-wide suspend

Message ID 20240222-cdns-qspi-pm-fix-v4-0-6b6af8bcbf59@bootlin.com
Headers show
Series spi: cadence-qspi: Fix runtime PM and system-wide suspend | expand

Message

Théo Lebrun Feb. 22, 2024, 10:12 a.m. UTC
Hi,

This fixes runtime PM and system-wide suspend for the cadence-qspi
driver. Seeing how runtime PM and autosuspend are enabled by default, I
believe this affects all users of the driver.

This series has been tested on both Mobileye EyeQ5 hardware and the TI
J7200 EVM board, under s2idle.

Thanks all,
Théo

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Changes in v4:
- Take Reviewed-by Dhruva Gole on patch 1/4.
- Fix struct dev_pm_ops declaration to avoid -Wunused-function warning
  when CONFIG_PM_SLEEP=n. Replace SET_*_PM_OPS() by *_PM_OPS(). See
  kernel test robot warning:
  https://lore.kernel.org/oe-kbuild-all/202402221505.712Q7MSU-lkp@intel.com/
- Link to v3: https://lore.kernel.org/r/20240209-cdns-qspi-pm-fix-v3-0-540ac222f26b@bootlin.com

Changes in v3:
- Move both bugfix patches to the start of the series.
- Remove Fixes: trailer from the function renaming patch.
- Link to v2: https://lore.kernel.org/r/20240205-cdns-qspi-pm-fix-v2-0-2e7bbad49a46@bootlin.com

Changes in v2:
- Split the initial change into three separate commits, to make intents
  clearer.
- Mark controller as suspended during the system-wide suspend.
- Link to v1: https://lore.kernel.org/r/20240202-cdns-qspi-pm-fix-v1-1-3c8feb2bfdd8@bootlin.com

---
Théo Lebrun (4):
      spi: cadence-qspi: fix pointer reference in runtime PM hooks
      spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
      spi: cadence-qspi: put runtime in runtime PM hooks names
      spi: cadence-qspi: add system-wide suspend and resume callbacks

 drivers/spi/spi-cadence-quadspi.c | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)
---
base-commit: 13acce918af915278e49980a3038df31845dbf39
change-id: 20240202-cdns-qspi-pm-fix-29600cc6d7bf

Best regards,

Comments

Mark Brown Feb. 22, 2024, 7:13 p.m. UTC | #1
On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote:
> This fixes runtime PM and system-wide suspend for the cadence-qspi
> driver. Seeing how runtime PM and autosuspend are enabled by default, I
> believe this affects all users of the driver.
> 
> This series has been tested on both Mobileye EyeQ5 hardware and the TI
> J7200 EVM board, under s2idle.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks
      commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc
[2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
      commit: 959043afe53ae80633e810416cee6076da6e91c6
[3/4] spi: cadence-qspi: put runtime in runtime PM hooks names
      commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a
[4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks
      commit: 078d62de433b4f4556bb676e5dd670f0d4103376

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
Mark Brown Feb. 26, 2024, 1:27 p.m. UTC | #2
On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote:
> On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:

> > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks
> >       commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc
> > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
> >       commit: 959043afe53ae80633e810416cee6076da6e91c6
> > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names
> >       commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a
> > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks
> >       commit: 078d62de433b4f4556bb676e5dd670f0d4103376

> It seems like between 6.8.0-rc5-next-20240220 and
> 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken.

Is this with some specific kernel configuration?

> It particularly seemed related to these patches because we can see
> cqspi_probe in the call trace and also cqspi_suspend toward the top.

It would be useful to bisect which patch, there's only 4 of them...

> See logs for kernel crash in [0] and working in [1]

> [0] https://gist.github.com/DhruvaG2000/ed997452b41d6e5301598225fc579800
> [1] https://gist.github.com/DhruvaG2000/d4e73111aeafaca555ba2d5208deb6dd

The relevant section from the failing log is:

[    1.516342] printk: legacy bootconsole [ns16550a0] disabled
[    1.533247] Unable to handle kernel paging request at virtual address 12800000340001b4

...

[    1.709414] Call trace:
[    1.711852]  __mutex_lock.constprop.0+0x84/0x540
[    1.716460]  __mutex_lock_slowpath+0x14/0x20
[    1.720719]  mutex_lock+0x48/0x54
[    1.724026]  spi_controller_suspend+0x30/0x7c
[    1.728377]  cqspi_suspend+0x1c/0x6c
[    1.731944]  pm_generic_runtime_suspend+0x2c/0x44
[    1.736640]  genpd_runtime_suspend+0xa8/0x254

(it's generally helpful to provide the most relevant section directly.)

The issue here appears to be that we've registered for runtime suspend
prior to registering the controller...
Théo Lebrun Feb. 26, 2024, 1:36 p.m. UTC | #3
Hello Dhruva,

On Mon Feb 26, 2024 at 1:18 PM CET, Dhruva Gole wrote:
> Hi Mark, Theo,
>
> + Nishanth, Vignesh (maintainers of TI K3)
>
> On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:
> > On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote:
> > > This fixes runtime PM and system-wide suspend for the cadence-qspi
> > > driver. Seeing how runtime PM and autosuspend are enabled by default, I
> > > believe this affects all users of the driver.
> > > 
> > > This series has been tested on both Mobileye EyeQ5 hardware and the TI
> > > J7200 EVM board, under s2idle.
> > > 
> > > [...]
> > 
> > Applied to
> > 
> >    https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
> > 
> > Thanks!
> > 
> > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks
> >       commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc
> > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
> >       commit: 959043afe53ae80633e810416cee6076da6e91c6
> > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names
> >       commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a
> > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks
> >       commit: 078d62de433b4f4556bb676e5dd670f0d4103376
>
> It seems like between 6.8.0-rc5-next-20240220 and
> 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken.
>
> It particularly seemed related to these patches because we can see
> cqspi_probe in the call trace and also cqspi_suspend toward the top.
>
> See logs for kernel crash in [0] and working in [1]

I'm guessing we are talking about tags next-20240220 and next-20240222
on: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/

Neither of those tags include the patches about fixing PM hooks.

   ⟩ # next-20240220
   ⟩ git log --oneline --author theo.lebrun 2d5c7b7eb345 \
      drivers/spi/spi-cadence-quadspi.c

   ⟩ # next-20240222
   ⟩ git log --oneline --author theo.lebrun e31185ce00a9 \
      drivers/spi/spi-cadence-quadspi.c
   0f3841a5e115 spi: cadence-qspi: report correct number of chip-select
   7cc3522aedb5 spi: cadence-qspi: set maximum chip-select to 4
   0d62c64a8e48 spi: cadence-qspi: assert each subnode flash CS is valid
   ⟩ # Those are unrelated patches.

Also it shows from the calltrace: this series renames the runtime
suspend/resume hooks to cqspi_runtime_* while the callstack you gave
talks about cqspi_suspend. It only gets called at system-wide suspend
following this series.

My guess is that this series will rather fix the issue that you are now
facing. :-) Could you try applying them and checking if that fixes your
error?

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Mark Brown Feb. 26, 2024, 1:40 p.m. UTC | #4
On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote:
> On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote:
> > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:

> [    1.709414] Call trace:
> [    1.711852]  __mutex_lock.constprop.0+0x84/0x540
> [    1.716460]  __mutex_lock_slowpath+0x14/0x20
> [    1.720719]  mutex_lock+0x48/0x54
> [    1.724026]  spi_controller_suspend+0x30/0x7c
> [    1.728377]  cqspi_suspend+0x1c/0x6c
> [    1.731944]  pm_generic_runtime_suspend+0x2c/0x44
> [    1.736640]  genpd_runtime_suspend+0xa8/0x254

> (it's generally helpful to provide the most relevant section directly.)

> The issue here appears to be that we've registered for runtime suspend
> prior to registering the controller...

Actually, no - after this series cqspi_suspend() is the system not
runtime PM operation and should not be called from runtime suspend.  How
is that happening?
Théo Lebrun Feb. 26, 2024, 1:42 p.m. UTC | #5
Hello,

On Mon Feb 26, 2024 at 2:40 PM CET, Mark Brown wrote:
> On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote:
> > On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote:
> > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:
>
> > [    1.709414] Call trace:
> > [    1.711852]  __mutex_lock.constprop.0+0x84/0x540
> > [    1.716460]  __mutex_lock_slowpath+0x14/0x20
> > [    1.720719]  mutex_lock+0x48/0x54
> > [    1.724026]  spi_controller_suspend+0x30/0x7c
> > [    1.728377]  cqspi_suspend+0x1c/0x6c
> > [    1.731944]  pm_generic_runtime_suspend+0x2c/0x44
> > [    1.736640]  genpd_runtime_suspend+0xa8/0x254
>
> > (it's generally helpful to provide the most relevant section directly.)
>
> > The issue here appears to be that we've registered for runtime suspend
> > prior to registering the controller...
>
> Actually, no - after this series cqspi_suspend() is the system not
> runtime PM operation and should not be called from runtime suspend.  How
> is that happening?

You might have seen my answer by now. This series is not in the tags
quoted. I believe the memory corruption I fixed with this series is
being encountered for the first time on TI hardware. They probably did
not encounter it previously by luck.

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

------------------------------------------------------------------------
Dhruva Gole Feb. 27, 2024, 5:03 a.m. UTC | #6
Hi,

On Feb 26, 2024 at 13:40:00 +0000, Mark Brown wrote:
> On Mon, Feb 26, 2024 at 01:27:57PM +0000, Mark Brown wrote:
> > On Mon, Feb 26, 2024 at 05:48:03PM +0530, Dhruva Gole wrote:
> > > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:
> 
> > [    1.709414] Call trace:
> > [    1.711852]  __mutex_lock.constprop.0+0x84/0x540
> > [    1.716460]  __mutex_lock_slowpath+0x14/0x20
> > [    1.720719]  mutex_lock+0x48/0x54
> > [    1.724026]  spi_controller_suspend+0x30/0x7c
> > [    1.728377]  cqspi_suspend+0x1c/0x6c
> > [    1.731944]  pm_generic_runtime_suspend+0x2c/0x44
> > [    1.736640]  genpd_runtime_suspend+0xa8/0x254
> 
> > (it's generally helpful to provide the most relevant section directly.)
> 
> > The issue here appears to be that we've registered for runtime suspend
> > prior to registering the controller...
> 
> Actually, no - after this series cqspi_suspend() is the system not
> runtime PM operation and should not be called from runtime suspend.  How
> is that happening?

I tried dropping this entire series, it doesn't really solve the kernel
boot issues. Also this particular stack dump isn't easily reproducible
either. Perhaps this series may not be the rootcause, I will need some
more time to see what's breaking boot for us.

But for now this series seems to be in the clear. Will keep you posted
if I find anything funny here.

FYI- We're just using the arm64 defconfig and respective device DTs
Dhruva Gole Feb. 27, 2024, noon UTC | #7
Hi,

On Feb 26, 2024 at 14:36:17 +0100, Théo Lebrun wrote:
> Hello Dhruva,
> 
> On Mon Feb 26, 2024 at 1:18 PM CET, Dhruva Gole wrote:
> > Hi Mark, Theo,
> >
> > + Nishanth, Vignesh (maintainers of TI K3)
> >
> > On Feb 22, 2024 at 19:13:29 +0000, Mark Brown wrote:
> > > On Thu, 22 Feb 2024 11:12:28 +0100, Théo Lebrun wrote:
> > > > This fixes runtime PM and system-wide suspend for the cadence-qspi
> > > > driver. Seeing how runtime PM and autosuspend are enabled by default, I
> > > > believe this affects all users of the driver.
> > > > 
> > > > This series has been tested on both Mobileye EyeQ5 hardware and the TI
> > > > J7200 EVM board, under s2idle.
> > > > 
> > > > [...]
> > > 
> > > Applied to
> > > 
> > >    https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
> > > 
> > > Thanks!
> > > 
> > > [1/4] spi: cadence-qspi: fix pointer reference in runtime PM hooks
> > >       commit: 32ce3bb57b6b402de2aec1012511e7ac4e7449dc
> > > [2/4] spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks
> > >       commit: 959043afe53ae80633e810416cee6076da6e91c6
> > > [3/4] spi: cadence-qspi: put runtime in runtime PM hooks names
> > >       commit: 4efa1250b59ebf47ce64a7b6b7c3e2e0a2a9d35a
> > > [4/4] spi: cadence-qspi: add system-wide suspend and resume callbacks
> > >       commit: 078d62de433b4f4556bb676e5dd670f0d4103376
> >
> > It seems like between 6.8.0-rc5-next-20240220 and
> > 6.8.0-rc5-next-20240222 some of TI K3 platform boot have been broken.
> >
> > It particularly seemed related to these patches because we can see
> > cqspi_probe in the call trace and also cqspi_suspend toward the top.
> >
> > See logs for kernel crash in [0] and working in [1]
> 
> I'm guessing we are talking about tags next-20240220 and next-20240222
> on: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/
> 
> Neither of those tags include the patches about fixing PM hooks.
> 
>    ⟩ # next-20240220
>    ⟩ git log --oneline --author theo.lebrun 2d5c7b7eb345 \
>       drivers/spi/spi-cadence-quadspi.c
> 
>    ⟩ # next-20240222
>    ⟩ git log --oneline --author theo.lebrun e31185ce00a9 \
>       drivers/spi/spi-cadence-quadspi.c
>    0f3841a5e115 spi: cadence-qspi: report correct number of chip-select
>    7cc3522aedb5 spi: cadence-qspi: set maximum chip-select to 4
>    0d62c64a8e48 spi: cadence-qspi: assert each subnode flash CS is valid
>    ⟩ # Those are unrelated patches.
> 
> Also it shows from the calltrace: this series renames the runtime
> suspend/resume hooks to cqspi_runtime_* while the callstack you gave
> talks about cqspi_suspend. It only gets called at system-wide suspend
> following this series.
> 
> My guess is that this series will rather fix the issue that you are now
> facing. :-) Could you try applying them and checking if that fixes your
> error?

Indeed, it seems like kernelci generated 22Feb and no future builds in
our case hence we were not testing the -next with your patches applied.

Please pardon the confusion.

The boot logs are here with local linux build from 27 Feb -next:

https://gist.github.com/DhruvaG2000/78ef6f2953b0940ef8ea38797f2ec6cb

It does seem like these patches help us fix the previous regressions.
Thanks for the fixes.