mbox series

[v4,0/5] PCI: brcmstb: Configure appropriate HW CLKREQ# mode

Message ID 20230428223500.23337-1-jim2101024@gmail.com
Headers show
Series PCI: brcmstb: Configure appropriate HW CLKREQ# mode | expand

Message

Jim Quinlan April 28, 2023, 10:34 p.m. UTC
Note: (a) With this series, all downstream devices should work w/o DT changes.
          Only if the user desires L1SS savings and has an L1SS-capable
          device is a DT change required (brcm,enable-l1ss).
      (b) No code changes between V2->V3 except to remove a dev_info()
          and change the string of another dev_info().

v4 -- New commit that asserts PERST# for 2711/RPi SOCs at PCIe RC
      driver probe() time.  This is done in Raspian Linux and its
      absence may be the cause of a failing test case.
   -- New commit that removes stale comment.

v3 -- Rewrote commit msgs and comments refering panics if L1SS
      is enabled/disabled; the code snippet that unadvertises L1SS
      eliminates the panic scenario. (Bjorn)
   -- Add reference for "400ns of CLKREQ# assertion" blurb (Bjorn)
   -- Put binding names in DT commit Subject (Bjorn)
   -- Add a verb to a commit's subject line (Bjorn)
   -- s/accomodat(\w+)/accommodat$1/g (Bjorn)
   -- Rewrote commit msgs and comments refering panics if L1SS
      is enabled/disabled; the code snippet that unadvertises L1SS
      eliminates the panic scenario. (Bjorn)

v2 -- Changed binding property 'brcm,completion-timeout-msec' to
      'brcm,completion-timeout-us'.  (StefanW for standard suffix).
   -- Warn when clamping timeout value, and include clamped
      region in message. Also add min and max in YAML. (StefanW)
   -- Qualify description of "brcm,completion-timeout-us" so that
      it refers to PCIe transactions. (StefanW)
   -- Remvove mention of Linux specifics in binding description. (StefanW)
   -- s/clkreq#/CLKREQ#/g (Bjorn)
   -- Refactor completion-timeout-us code to compare max and min to
      value given by the property (as opposed to the computed value).

v1 -- The current driver assumes the downstream devices can
      provide CLKREQ# for ASPM.  These commits accomodate devices
      w/ or w/o clkreq# and also handle L1SS-capable devices.

   -- The Raspian Linux folks have already been using a PCIe RC
      property "brcm,enable-l1ss".  These commits use the same
      property, in a backward-compatible manner, and the implementaion
      adds more detail and also automatically identifies devices w/o
      a clkreq# signal, i.e. most devices plugged into an RPi CM4
      IO board.


Jim Quinlan (5):
  dt-bindings: PCI: brcmstb: brcm,{enable-l1ss,completion-timeout-us}
    props
  PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream
    device
  PCI: brcmstb: Set PCIe transaction completion timeout
  PCI: brcmstb: Don't assume 2711 bootloader leaves PERST# asserted
  PCI: brcmstb: Remove stale comment

 .../bindings/pci/brcm,stb-pcie.yaml           |  16 +++
 drivers/pci/controller/pcie-brcmstb.c         | 105 ++++++++++++++++--
 2 files changed, 110 insertions(+), 11 deletions(-)


base-commit: 76f598ba7d8e2bfb4855b5298caedd5af0c374a8
prerequisite-patch-id: f38de8681d8746126d60b3430eaf218d2dd169cd
prerequisite-patch-id: 23e13189f200358976abf5bf3600973a20cf386c
prerequisite-patch-id: edfbe6ea39ed6a4937e2cec3bb8ee0e60091546d
prerequisite-patch-id: c87dd155e8506a2a277726c47d85bf3fa83727d5
prerequisite-patch-id: 579841e1dc179517506a7a7c42e0e651b3bc3649
prerequisite-patch-id: b5b079998ea451821edffd7c52cd3d89d06046a1
prerequisite-patch-id: b51b3918e554e279b2ace1f68ed6b4176f8ccc24
prerequisite-patch-id: 333e5188fb27d0ed010f5359e83e539172a67690
prerequisite-patch-id: bb107ee7b4811a9719508ea667cad2466933dec0

Comments

Cyril Brulebois May 2, 2023, 11:15 p.m. UTC | #1
Hi,

Jim Quinlan <jim2101024@gmail.com> (2023-04-28):
> Note: (a) With this series, all downstream devices should work w/o DT changes.
>           Only if the user desires L1SS savings and has an L1SS-capable
>           device is a DT change required (brcm,enable-l1ss).

I'm still seeing some problems, but tweaking two things can lead to
massive improvements:
 - setting brcm,enable-l1ss;
 - upgrading the CM4's EEPROM.

Seeing how patch #4 was about the bootloader, I've prepared an updated
test image the following way:
 - Kernel: 76f598ba7d8e2bfb4855b5298caedd5af0c374a8 + this series.
 - Userland: Debian testing as of 2023-05-01.
 - Serial console set as previously.
 - Bootloader: latest release upstream, 1.20230405
   (That is: bootcode.bin, *.dat, *.elf)

Then, seeing how setting brcm,enable-l1ss was helping some test cases,
I've extended testing to be quite systematic, using those components:
 - CM4 IO Board (always the same).
 - 1 CM4 among:
    - CM4 Lite Rev 1.0 (extra test, storage much quicker to deploy)
    - CM4 8/32 Rev 1.0 (as before)
    - CM4 4/32 Rev 1.1 (as before)
 - 1 PCIe card among:
    - 006 = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006
       → based on Renesas UPD720201/UPD720202
       → CONFIG_USB_XHCI_PCI_RENESAS=m
       → /lib/firmware/renesas_usb_fw.mem
    - 006S = SupaHub PCIe->USB adapter, reference PCE6U1C-R02, VER 006S
       → based on Renesas UPD720201/UPD720202
       → CONFIG_USB_XHCI_PCI_RENESAS=m
       → /lib/firmware/renesas_usb_fw.mem
    - VIA = Waveshare PCIe-to-multiple-USB adapter (no obvious reference)
       → based on VIA VL805/806
 - 1 Kingston DataTraveler G4 32G (always the same), plugged on one of the
   USB port of the PCIe card being tested.

I've tested a cold boot with each combination, first without touching the
DTB at all (pristine), then after enabling L1SS. The results are as
follows, legend is below.

                           +----------+----------+----------+
                           |   006    |   006S   |   VIA    |
  +------------------------+----------+----------+----------+
  | 1. CM4 Lite Rev 1.0    |    KP*   |    KP*   |  OK, 72  |
  |    pristine            |          |          |          |
  +------------------------+----------+----------+----------+
  | 2. CM4 Lite Rev 1.0    |  boot +  |  OK, 72  |  OK, 72  |
  |    + brcm,enable-l1ss  | timeouts |          |          |
  +------------------------+----------+----------+----------+
  | 3. CM4 8/32 Rev 1.0    |    KP    |    KP    |    KP    |
  |    pristine            |          |          |          |
  +------------------------+----------+----------+----------+
  | 4. CM4 8/32 Rev 1.0    |  OK, 69  |  OK, 69  |  OK, 69  |
  |    + brcm,enable-l1ss  |          |          |          |
  +------------------------+----------+----------+----------+
  | 5. CM4 4/32 Rev 1.1    |  boot +  |  OK, 69  |  OK, 69  |
  |    pristine            | timeouts |          |          |
  +------------------------+----------+----------+----------+
  | 6. CM4 4/32 Rev 1.1    |  OK, 82  |  OK, 69  |  OK, 69  |
  |    + brcm,enable-l1ss  |          |          |          |
  +------------------------+----------+----------+----------+

Legend:
 - OK, XXX = boots fine, memory stick visible, and reading from it using
   `dd if=/dev/sda of=/dev/null bs=8M status=progress` for a few seconds
   gives an XXX MB/s transfer rate.
 - KP = kernel panic involving brcm_pcie_probe().
 - KP* = probably the same kernel panic; unfortunately, the serial console
   hardly works, but booting for 1 minute, shutting down for 10 seconds,
   in a loop… ends up showing excerpts from a trace, one word or sometimes
   several lines at a time. Since brcm_pcie_driver_init() or SError
   appeared, getting the same trace looks probable to me. [See also the
   very end of the mail.]
 - boot + timeouts = the system boots, the memory stick is not visible
   though, as XHCI timeouts show up, e.g.:

   [   34.144748] xhci_hcd 0000:01:00.0: Timeout while waiting for setup device command
   [   34.357273] usb 3-1.4: Device not responding to setup address.
   [   34.568429] usb 3-1.4: device not accepting address 6, error -71
   [   34.575730] usb 3-1-port4: unable to enumerate USB device

So it looks like *for these combinations* setting brcm,enable-l1ss is only
helping, even if one particular thing remains not fully fonctional (but
at least boots now): CM4 Lite Rev 1.0 + 006 card.


And since you mentioned the EEPROM topic off-list, I've investigated that
part as well. It turns out that what *seemed* (at least to my non-expert
eyes) sort of related to the hardware revisions… could have actually be
directly linked to the EEPROM version shipped with each Compute Module.

After deploying the relevant tooling, and based on the reported
timestamps, here are the relevant EEPROM filenames in the rpi-eeprom
repository (https://github.com/raspberrypi/rpi-eeprom):
 - CM4 Lite Rev 1.0 [lines 1-2]
    → firmware/stable/pieeprom-2021-02-16.bin
 - CM4 8/32 Rev 1.0 [lines 3-4]
    → firmware/stable/pieeprom-2021-02-16.bin
 - CM4 4/32 Rev 1.1 [lines 5-6]
    → firmware/stable/pieeprom-2021-12-02.bin

Try to upgrade a first CM4 Lite to the latest version (2023-01-11) gave
solid results (which I'm not including in this report as I was only in
exploratory mode, with a slightly different Kingston DataTraveler anyway;
for reference its EEPROM dated back to 2020, and it seemed to have ever
been in some beta state…), so I decided to replicate all the tests above with
the very same 3 CM4, upgraded to 2023-01-11.

In passing: That might explain why it always felt like later revisions
were working “better” than the old ones: being designed + manufactured
later, they just ended up being shipped with a newer/better EEPROM?

Upgrade: via usbboot (https://github.com/raspberrypi/usbboot) and the
recovery procedure (which by default deploys the latest stable version).

Results with everyone at 2023-01-11.

                           +----------+----------+----------+
                           |   006    |   006S   |   VIA    |
  +------------------------+----------+----------+----------+
  | 1. CM4 Lite Rev 1.0    |  OK, 83  |  OK, 72  |  OK, 72  |
  |    pristine            |          |          |          |
  +------------------------+----------+----------+----------+
  | 2. CM4 Lite Rev 1.0    |  OK, 82  |  OK, 72  |  OK, 72  |
  |    + brcm,enable-l1ss  |          |          |          |
  +------------------------+----------+----------+----------+
  | 3. CM4 8/32 Rev 1.0    |  OK, 82  |  OK, 69  |  OK, 69  |
  |    pristine            |          |          |          |
  +------------------------+----------+----------+----------+
  | 4. CM4 8/32 Rev 1.0    |  OK, 82  |  OK, 69  |  OK, 69  |
  |    + brcm,enable-l1ss  |          |          |          |
  +------------------------+----------+----------+----------+
  | 5. CM4 4/32 Rev 1.1    |  OK, 82  |  OK, 69  |  OK, 69  |
  |    pristine            |          |          |          |
  +------------------------+----------+----------+----------+
  | 6. CM4 4/32 Rev 1.1    |  OK, 82  |  OK, 69  |  OK, 69  |
  |    + brcm,enable-l1ss  |          |          |          |
  +------------------------+----------+----------+----------+

Takeaways:
 - Upgrading the EEPROM solved all problems;
 - brcm,enable-l1ss (which used to help) is not needed, as mentioned in
   your cover letter.

Now that I'm a little more familiar with the EEPROM tooling:
 - It looks like I'm able to downgrade the EEPROM to an earlier version.
   But I cannot guarantee I can recover exactly the previous state as
   there are two different things at least: the EEPROM itself and what's
   called “bootloader config” in vcgencmd). I've seen at least the LED
   change behaviour (via POWER_OFF_ON_HALT).
 - Upon downgrading, without brcm,enable-l1ss, the CM4 Lite is indeed
   showing me a black screen/no logs in the serial console again with
   either one of the 006/006S cards.
 - It's possible to specify a boot config file when deploying the EEPROM,
   and I've tried enabling BOOT_UART on the CM4 Lite. Now I'm getting the
   kernel panic on the console!

Where should I go from here?
 - Does it make sense to gather a trace for the kernel panic on say two
   combinations, without brcm,enable-l1ss set:
    + CM4 Lite Rev 1.0 (old EEPROM) + 006 [first KP* in first table]
    + CM4 8/32 Rev 1.0 (old EEPROM) + 006 [first KP in first table]
   then get a trace without your patches, and attach all four resulting
   files?
 - Or should one just consider that the very first thing that each and
   every CM4 user is supposed to do is upgrade their EEPROM?

On a personal side, I'm very fine with being told to just upgrade the
EEPROM already (and that seems to cover any use case I could think of,
and test). But if getting and comparing traces before/after your patches
is helpful to you and the wider community, I'm happy to spend some more
time testing and gathering details.
 

Cheers,