mbox series

[0/7] memory: renesas-rpc-if: Rebind and s2ram fixes

Message ID cover.1656341824.git.geert+renesas@glider.be
Headers show
Series memory: renesas-rpc-if: Rebind and s2ram fixes | expand

Message

Geert Uytterhoeven June 27, 2022, 3:31 p.m. UTC
Hi all,

The Renesas RPC-IF provides either HyperFlash or SPI host access.
To handle this, three drivers are used:
  1. The RPC-IF core diver,
  2. An HyperFlash child driver,
  3. An SPI child driver.

Currently this driver collection has the following issues:
  1. After manually unbinding the child driver, rebinding the child
     driver fails with -EBUSY,
  2. During PSCI system suspend, the SoC may be powered down, losing
     RPC-IF register state, and causing data corruption after resume.

This patch series aims to fix this:
  - Patches 1-4 contain preparatory cleanups and improvements,
  - Patch 5 fixes unbind/rebind,
  - Patch 6 cleans up the internal API between the RPC-IF core diver,
    and the HF and SPI child drivers, and thus touches the MTD/HYPERBUS
    and SPI subsystems, too,
  - Patch 7 adds system suspend/resume support to the RPC-IF core
    driver.

This has been tested on the Salvator-XS (HyperFlash) and Falcon (QSPI
FLASH) development boards.

At least with HyperFlash, successful RPC-IF operation after s2ram is
still not guaranteed (more details below).
I do not have physical access to a board that uses the RPC-IF in SPI
mode, so I could not test s2ram with RPC-SPI.  I am wondering if it
suffers from similar problems, or if these are purely related to
HyperFlash?

Findings:

  - Sometimes RPC-HF still works after resume from s2ram

  - Sometimes RPC-HF read data is corrupted after resume from s2ram:

      - Data read looks like (for each block of 16 bytes at offset i):
          - 8 bytes of data stored at offset (i % 262144) * 256,
	  - 8 bytes duplicate of the above.

      - After that, unbind/rebind fails:

          # echo rpc-if-hyperflash > /sys/bus/platform/drivers/rpc-if-hyperflash/unbind
	  # echo rpc-if-hyperflash > /sys/bus/platform/drivers/rpc-if-hyperflash/bind
	  rpc-if-hyperflash rpc-if-hyperflash: probing of hyperbus device failed

      - After doing s2ram again, rebind (usually) succeeds again, and
	reading from HF returns the expected data again:

	  # echo rpc-if-hyperflash > /sys/bus/platform/drivers/rpc-if-hyperflash/bind
	  rpc-if-hyperflash: Found 1 x16 devices at 0x0 in 16-bit bank. Manufacturer ID 0x000001 Chip ID 0x007000

      - When doing unbind before s2ram, rebind after resume usually
	works (better success rate than without unbind), but not always.

Things I have tried:

  - Always resetting the device in rpcif_hw_init(), like is done on
    RZ/G2L, does not make a difference.

  - Dumping the full RPC register space before/after s2ram, but there
    does not seem to be any relation between register contents (which
    vary) and successful operation.

  - Adding HF calibration like hbmc-am654 (and never setting the
    controller's calibrated flag) does not help: either calibration
    succeeds with 5 passes on 5 tries, or fails with 0 passes on 25
    tries.

  - Browsing the TF/A and U-Boot sources also didn't help.

Thanks for your comments!

Geert Uytterhoeven (7):
  memory: renesas-rpc-if: Always use dev in rpcif_sw_init()
  memory: renesas-rpc-if: Add dev helper to rpcif_probe()
  memory: renesas-rpc-if: Improve Runtime PM handling
  memory: renesas-rpc-if: Split-off private data from struct rpcif
  memory: renesas-rpc-if: Move resource acquisition to .probe()
  memory: renesas-rpc-if: Pass device instead of rpcif to rpcif_*()
  memory: renesas-rpc-if: Reinitialize registers during system resume

 drivers/memory/renesas-rpc-if.c | 167 +++++++++++++++++++++-----------
 drivers/mtd/hyperbus/rpc-if.c   |  18 ++--
 drivers/spi/spi-rpc-if.c        |  14 +--
 include/memory/renesas-rpc-if.h |  32 ++----
 4 files changed, 137 insertions(+), 94 deletions(-)

Comments

Geert Uytterhoeven Nov. 23, 2022, 2:49 p.m. UTC | #1
On Mon, Jun 27, 2022 at 5:31 PM Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
> During PSCI system suspend, R-Car Gen3 SoCs may be powered down, and
> thus the RPC-IF register state may be lost.  Consequently, when using
> the RPC-IF after system resume, data corruption may happen.
>
> Fix this by reinitializing the hardware state during system resume.
> As this requires resuming the RPC-IF core device, this can only be done
> when the device is under active control of the HyperBus or SPI child
> driver.
>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

For v2, I dropped this patch from the series.

Apparently this patch is not needed, nor does it have any impact on
HyperFLASH read operation on Salvator-XS with R-Car M3-N ES1.0 and
Ebisu-4D with R-Car E3 ES1.0.

On Salvator-X with R-Car M3-W ES1.0, there is a different issue causing
random bit flips (which is not solved by the Strobe Timing Adjustment
bit (STRTIM) fix for R-Car M3-W ES1.x in the BSP).

On Salvator-XS with R-Car H3-ES2.0, corruption is seen after s2ram.
TL;DR: while this patch does have impact on that, RPC operation after
s2ram is still not guaranteed, and the core issue is still not
understood.

---
For testing, I use the following command to read /dev/mtdblock1 (which
contains the BL2 bootloader) and check its checksum 100 times:

    time sha256sum -c <(yes $(cat mtdblock1.sha256sum) | head -100)

After boot and s2idle, the success rate is 100%.

1. Without this patch, the failure rate after s2ram is ca. 65%.

When splitting and comparing the data read back, some blocks of 64 KiB
(but not always the same on different runs) have been replaced by bad
data, containing either data from elsewhere, or all ones.  The latter is
probably the same symptom as the former, as the HyperFLASH does contain
blocks with all-ones data.

The data from elsewhere looks like e.g.:

    00046f10  75 75 69 64 5f 64 69 73  75 75 69 64 5f 64 69 73
|uuid_disuuid_dis|
    00046f20  64 72 27 0a 20 20 20 20  64 72 27 0a 20 20 20 20  |dr'.
  dr'.    |
    00046f30  6c 69 67 6e 65 64 20 74  6c 69 67 6e 65 64 20 74
|ligned tligned t|
    00046f40  61 64 64 72 32 20 63 6f  61 64 64 72 32 20 63 6f  |addr2
coaddr2 co|
    00046f50  0a 6d 6d 63 20 72 70 6d  0a 6d 6d 63 20 72 70 6d  |.mmc
rpm.mmc rpm|
    00046f60  20 6c 61 72 67 65 72 0a  20 6c 61 72 67 65 72 0a  |
larger. larger.|
    00046f70  6d 32 00 63 6d 6d 31 00  6d 32 00 63 6d 6d 31 00
|m2.cmm1.m2.cmm1.|
    00046f80  32 5f 64 61 74 61 34 00  32 5f 64 61 74 61 34 00
|2_data4.2_data4.|
    00046f90  31 00 47 50 5f 35 5f 31  31 00 47 50 5f 35 5f 31
|1.GP_5_11.GP_5_1|
    00046fa0  65 78 63 65 65 64 65 64  65 78 63 65 65 64 65 64
|exceededexceeded|
    00046fb0  30 34 2d 72 63 34 2d 30  30 34 2d 72 63 34 2d 30
|04-rc4-004-rc4-0|

which seems to originate from two copies of the first 8 bytes of each of
the following lines, found elsewhere in the HyperFLASH:

    006f1000  75 75 69 64 5f 64 69 73  6b 3d 00 6e 61 6d 65 3d
|uuid_disk=.name=|
    ...
    006f2000  64 72 27 0a 20 20 20 20  20 20 70 61 73 73 69 6e  |dr'.
    passin|
    ...
    006f3000  6c 69 67 6e 65 64 20 74  6f 0a 20 20 20 20 20 20
|ligned to.      |
    ...
    006f4000  61 64 64 72 32 20 63 6f  75 6e 74 00 6d 65 6d 6f  |addr2
count.memo|
    ...
    006f5000  0a 6d 6d 63 20 72 70 6d  62 20 6b 65 79 20 3c 61  |.mmc
rpmb key <a|
    ...
    006f6000  20 6c 61 72 67 65 72 0a  00 75 6e 7a 69 70 00 75  |
larger..unzip.u|
    ...
    006f7000  6d 32 00 63 6d 6d 31 00  63 6d 6d 30 00 63 73 69
|m2.cmm1.cmm0.csi|
    ...
    006f8000  32 5f 64 61 74 61 34 00  73 64 68 69 32 5f 64 61
|2_data4.sdhi2_da|
    ...
    006f9000  31 00 47 50 5f 35 5f 31  32 00 47 50 5f 35 5f 31
|1.GP_5_12.GP_5_1|
    ...
    006fa000  65 78 63 65 65 64 65 64  00 0a 25 73 3b 20 73 74
|exceeded..%s; st|
    ...
    006fb000  30 34 2d 72 63 34 2d 30  30 30 38 32 2d 67 35 34
|04-rc4-00082-g54|

For both hexdumps above, offsets are absolute FLASH offsets, not relative
partition offsets.  Still, there is some similarity (e.g. 0x46f10 vs.
0x6f1000).

2. With this patch, the failure rate of 100 reads after s2ram is either
0%, or 100%.  The same is true after subsequent s2ram operations.

Contrary to before this patch, in case of corruption, the data read back
is always the same: i.e. either the good data is read back, or the exact
same bad data is read back.  In case of bad data, some blocks of 64 KiB
have been replaced by blocks containing either data from elsewhere,
or all ones again....

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds