diff mbox series

wifi breakage due to commit "wifi: brcmfmac: add support for vendor-specific firmware api"

Message ID CANnVG6=a3etRagG+RaSEH-b4_nfzxpEKffQtuMWrttrbgjunZQ@mail.gmail.com
State New
Headers show
Series wifi breakage due to commit "wifi: brcmfmac: add support for vendor-specific firmware api" | expand

Commit Message

Michael Stapelberg March 14, 2023, 7:38 p.m. UTC
Hey Arend

I recently bisected a user-reported WiFi breakage on the Raspberry Pi
3B+ to your commit
https://github.com/torvalds/linux/commit/d6a5c562214f26e442c8ec3ff1e28e16675d1bcf
https://lore.kernel.org/r/20221129135446.151065-4-arend.vanspriel@broadcom.com

On our end, the issue was reported here:
https://github.com/gokrazy/wifi/issues/3

As of Linux 6.2.6, this seems to still be unfixed, so I wanted to
check in and see if you could take a look please?

Thank you.

Here’s a dmesg diff between the previous commit (da6d9c8ecd00) and the
broken commit (d6a5c562214f):

SMP PREEMPT Wed Mar  1 20:57:29 UTC 2017
  random: crng init done
  Machine model: Raspberry Pi 3 Model B Plus Rev 1.3
  efi: UEFI not found.
@@ -29,7 +29,7 @@
  Fallback order for Node 0: 0
  Built 1 zonelists, mobility grouping on.  Total pages: 238896
  Policy zone: DMA
- Kernel command line:
video=Composite-1:720x480@60i,margin_left=32,margin_right=32,margin_top=32,margin_bottom=32
dma.dmachans=0x7ff5 bcm2709.boardrev=0xa020d3
bcm2709.serial=0xe33b7b2a bcm2709.uart_clock=48000000
bcm2709.disk_led_gpio=29 bcm2709.disk_led_active_low=0
smsc95xx.macaddr=B8:27:EB:3B:7B:2A vc_mem.mem_base=0x3ec00000
vc_mem.mem_size=0x40000000  console=tty1 console=ttyS0,115200
root=PARTUUID=60c24cc1-f3f9-427a-8199-2e18c40c0001/PARTNROFF=2
init=/gokrazy/init rootwait panic=10 oops=panic
+ Kernel command line:
video=Composite-1:720x480@60i,margin_left=32,margin_right=32,margin_top=32,margin_bottom=32
dma.dmachans=0x7ff5 bcm2709.boardrev=0xa020d3
bcm2709.serial=0xe33b7b2a bcm2709.uart_clock=48000000
bcm2709.disk_led_gpio=29 bcm2709.disk_led_active_low=0
smsc95xx.macaddr=B8:27:EB:3B:7B:2A vc_mem.mem_base=0x3ec00000
vc_mem.mem_size=0x40000000  console=tty1 console=ttyS0,115200
root=PARTUUID=60c24cc1-f3f9-427a-8199-2e18c40c0001/PARTNROFF=1
init=/gokrazy/init rootwait panic=10 oops=panic
  Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
  Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
  mem auto-init: stack:all(zero), heap alloc:off, heap free:off
@@ -245,13 +245,13 @@
  Console: switching to colour dummy device 80x25
  vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops)
  mmc0: host does not support reading read-only switch, assuming write-enable
- vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
  mmc0: new high speed SDHC card at address aaaa
- vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
+ vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops)
  mmcblk0: mmc0:aaaa SC16G 14.8 GiB
+ vc4-drm soc:gpu: bound 3f806000.vec (ops vc4_vec_ops)
  vc4-drm soc:gpu: bound 3f004000.txp (ops vc4_txp_ops)
- vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
   mmcblk0: p1 p2 p3 p4
+ vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops)
  vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops)
  vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops)
  vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops)
@@ -269,11 +269,11 @@
  cfg80211: Loading compiled-in X.509 certificates for regulatory database
  cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
  platform regulatory.0: Direct firmware load for regulatory.db failed
with error -2
- ALSA device list:
  cfg80211: failed to load regulatory.db
-   #0: vc4-hdmi
  mmc1: SDHCI controller on 3f300000.mmc [3f300000.mmc] using PIO
- VFS: Mounted root (squashfs filesystem) readonly on device 179:3.
+ ALSA device list:
+   #0: vc4-hdmi
+ VFS: Mounted root (squashfs filesystem) readonly on device 179:2.
  devtmpfs: mounted
  Freeing unused kernel memory: 9216K
  Run /gokrazy/init as init process
@@ -284,6 +284,7 @@
      TERM=linux
  mmc1: new high speed SDIO card at address 0001
  usb 1-1: new high-speed USB device number 2 using dwc2
+ EXT4-fs (mmcblk0p4): recovery complete
  EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode.
Quota mode: none.
  hub 1-1:1.0: USB hub found
  hub 1-1:1.0: 4 ports detected
@@ -300,12 +301,14 @@
  Bluetooth: HCI socket layer initialized
  Bluetooth: L2CAP socket layer initialized
  Bluetooth: SCO socket layer initialized
+ brcmfmac: brcmf_fwvid_request_module: mod=wcc: failed -2
+ ieee80211 phy0: brcmf_attach: brcmf_fwvid_attach failed
  Bluetooth: HCI UART driver ver 2.3
+ brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed
  Bluetooth: HCI UART protocol H4 registered
  Bluetooth: HCI UART protocol Broadcom registered
  hci_uart_bcm serial0-0: supply vbat not found, using dummy regulator
  hci_uart_bcm serial0-0: supply vddio not found, using dummy regulator
- brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Mar  2
2020 23:30:41 version 7.45.202 (r724630 CY) FWID 01-72f6ece2
  uart-pl011 3f201000.serial: no DMA platform data
  Bluetooth: hci0: BCM: chip id 107
  Bluetooth: hci0: BCM: features 0x2f
@@ -316,7 +319,4 @@
  Bluetooth: hci0: BCM: 'brcm/BCM4345C0.hcd'
  Bluetooth: hci0: BCM: 'brcm/BCM.raspberrypi,3-model-b-plus.hcd'
  Bluetooth: hci0: BCM: 'brcm/BCM.hcd'
- Bluetooth: hci0: unexpected cc 0x1003 length: 1 < 9
- Bluetooth: hci0: Opcode 0x1003 failed: -38
  IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
- IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Comments

Michael Stapelberg March 16, 2023, 7:54 a.m. UTC | #1
Hmm, I’m not sure either.

My guess would be that the module loading happens only when actually
initializing the device,
not when merely loading the kernel module? At least I don’t see any
module-related errors in dmesg:

[    5.603264] brcmfmac: brcmf_fw_alloc_request: using
brcm/brcmfmac43455-sdio for chip BCM4345/6
[    5.612966] brcmfmac mmc1:0001:1: Direct firmware load for
brcm/brcmfmac43455-sdio.raspberrypi,3-model-b-plus.bin failed with
error -2
[    5.704359] Bluetooth: Core ver 2.22
[    5.708263] NET: Registered PF_BLUETOOTH protocol family
[    5.715884] Bluetooth: HCI device and connection manager initialized
[    5.722720] Bluetooth: HCI socket layer initialized
[    5.728316] Bluetooth: L2CAP socket layer initialized
[    5.733655] Bluetooth: SCO socket layer initialized
[    5.851351] Bluetooth: HCI UART driver ver 2.3
[    5.856653] Bluetooth: HCI UART protocol H4 registered
[    5.862449] Bluetooth: HCI UART protocol Broadcom registered
[    5.863571] hci_uart_bcm serial0-0: supply vbat not found, using
dummy regulator
[    5.878107] hci_uart_bcm serial0-0: supply vddio not found, using
dummy regulator
[    5.888944] brcmfmac_wcc: brcmf_wcc_attach: executing
[    5.912344] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6
wl0: Mar  2 2020 23:30:41 version 7.45.202 (r724630 CY) FWID
01-72f6ece2
[    6.006077] uart-pl011 3f201000.serial: no DMA platform data
[    6.246580] Bluetooth: hci0: BCM: chip id 107
[    6.251455] Bluetooth: hci0: BCM: features 0x2f
[    6.257631] Bluetooth: hci0: BCM4345C0
[    6.261481] Bluetooth: hci0: BCM4345C0 (003.001.025) build 0000
[    6.267670] Bluetooth: hci0: BCM: firmware Patch file not found, tried:
[    6.274395] Bluetooth: hci0: BCM:
'brcm/BCM4345C0.raspberrypi,3-model-b-plus.hcd'
[    6.282062] Bluetooth: hci0: BCM: 'brcm/BCM4345C0.hcd'
[    6.287369] Bluetooth: hci0: BCM: 'brcm/BCM.raspberrypi,3-model-b-plus.hcd'
[    6.294496] Bluetooth: hci0: BCM: 'brcm/BCM.hcd'
[    7.903878] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    9.926759] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready


On Thu, 16 Mar 2023 at 07:17, Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
>
> On March 15, 2023 8:53:50 PM Michael Stapelberg
> <michael+lkml@stapelberg.ch> wrote:
>
> > Hey Arend
> >
> > Thanks for looking into it.
> >
> > Your suspicion was spot on: we don’t have /sbin/modprobe (we don’t
> > even have /sbin!)
> >
> > I addressed this issue by loading the brcmfmac-wcc module where we
> > also load the others:
> > https://github.com/gokrazy/wifi/commit/f611656338b68faa5f42bc2c8bbf1e0231a74a50
> >
> > I’m not entirely sure why I concluded in my earlier testing that
> > loading the module doesn’t help.
> > Perhaps the issue was that I was manually loading the module after the
> > wifi program
> > tried to use the interface already and had it fail its firmware load —
> > in either case,
> > loading brcmfmac-wcc from the wifi program seems to work fine.
>
> I was going to say it seems racy, but I am not sure why that works at all.
> In brcmf_fwvid_request_module it says:
>
>     ret = request_module("brcmfmac-%s", fwvid_list[fwvid].name);
>         if (ret)
>             goto fail;
>
> and the missing modprobe would always cause request_module() to fail (see
> [1]) hence the device probe would fail. What am I missing here?
>
> Regards,
> Arend
>
> [1] https://elixir.bootlin.com/linux/latest/source/kernel/kmod.c#L138
>
> >
> > I also realize that the error message gives a number of pointers once
> > you know how to interpret it:
> >
> > + brcmfmac: brcmf_fwvid_request_module: mod=wcc: failed -2
> >
> > I now understand this was trying to tell me that
> > request_module("brcmfmac-wcc") failed
> > with error code -2 (ENOENT), which means that the kernel’s
> > request_module() function
> > could not find CONFIG_MODPROBE_PATH.
> >
> > For comparison, when the user space tool exists but fails, including
> > when it can’t find
> > the requested module (ENOENT), it always returns exit code 1:
> > https://salsa.debian.org/md/kmod/-/blob/7d761268ebb68245daf47167e418823c68c3eb98/tools/modprobe.c#L1025
> >
> > I think it would be nice if the Linux kernel could print a more
> > descriptive error message in this case.
> > I’m not sure what the kernel conventions are: should the caller of
> > request_module() produce a good error message?
> > Ideally, the message would translate -2 into ENOENT, and mention the
> > name “brcmfmac-wcc”.
> >
> > Or maybe we could have a kconfig option to indicate there is no
> > CONFIG_MODPROBE_PATH,
> > and instead make any request_module() call fail loudly with a
> > descriptive error in dmesg?
> >
> > Thanks for considering
> > Best regards
> > Michael
> >
> > On Wed, 15 Mar 2023 at 11:11, Arend van Spriel
> > <arend.vanspriel@broadcom.com> wrote:
> >>
> >> On 3/15/2023 10:57 AM, Johannes Berg wrote:
> >>> On Wed, 2023-03-15 at 10:12 +0100, Arend van Spriel wrote:
> >>>>
> >>>> I think it works pretty similar to firmware loading. These days firmware
> >>>> loading does not rely on a user-space helper and I assumed the same is
> >>>> true for module loading. Will look into it.
> >>>
> >>> As far as I know you still need the modprobe helper in
> >>>
> >>> /proc/sys/kernel/modprobe
> >>
> >> Hi Michael,
> >>
> >> With hint from Johannes I found this Kconfig option:
> >>
> >> config MODPROBE_PATH
> >> string "Path to modprobe binary"
> >> default "/sbin/modprobe"
> >> help
> >> When kernel code requests a module, it does so by calling
> >> the "modprobe" userspace utility. This option allows you to
> >> set the path where that binary is found. This can be changed
> >> at runtime via the sysctl file
> >> /proc/sys/kernel/modprobe. Setting this to the empty string
> >> removes the kernel's ability to request modules (but
> >> userspace can still load modules explicitly).
> >>
> >> Now checking the .config that you sent it says:
> >>
> >> CONFIG_MODPROBE_PATH="/sbin/modprobe"
> >>
> >> So my suspicion is that modprobe is not at that given location?
> >>
> >> Regards,
> >> Arend
>
>
>
Arend Van Spriel March 16, 2023, 9:40 a.m. UTC | #2
On 3/16/2023 8:54 AM, Michael Stapelberg wrote:
> Hmm, I’m not sure either.

Actually, the check I referred to only bails out when 
CONFIG_MODPROBE_PATH is unset. In your case it is set but it is not 
existing. I dived into request_module() and it ends up scheduling a work 
to call the modprobe which will fail. However, brcmfmac will wait for 
brcmfmac-wcc to register itself so doing the insmod of brcmfmac-wcc will 
do the trick. However, now I am back at saying this can be racy. It 
would be better to come up with a modprobe.go for your user-space 
environment and put that in kernel config.

Regards,
Arend
Michael Stapelberg March 17, 2023, 8:04 a.m. UTC | #3
Can you outline under what conditions the race can happen?

I’ll take care of implementing a userspace modprobe helper.

Thanks


On Thu, 16 Mar 2023 at 10:40, Arend van Spriel
<arend.vanspriel@broadcom.com> wrote:
>
> On 3/16/2023 8:54 AM, Michael Stapelberg wrote:
> > Hmm, I’m not sure either.
>
> Actually, the check I referred to only bails out when
> CONFIG_MODPROBE_PATH is unset. In your case it is set but it is not
> existing. I dived into request_module() and it ends up scheduling a work
> to call the modprobe which will fail. However, brcmfmac will wait for
> brcmfmac-wcc to register itself so doing the insmod of brcmfmac-wcc will
> do the trick. However, now I am back at saying this can be racy. It
> would be better to come up with a modprobe.go for your user-space
> environment and put that in kernel config.
>
> Regards,
> Arend
>
Arend Van Spriel March 17, 2023, 9:13 a.m. UTC | #4
On 3/17/2023 9:04 AM, Michael Stapelberg wrote:
> Can you outline under what conditions the race can happen?

I was concerned about brcmfmac-wcc doing register before brcmfmac is 
waiting for that. Looking more into it I think this is not a concern. We 
are using a completion construct which will synchronize properly.

Regards,
Arend
diff mbox series

Patch

--- /proc/self/fd/11 2023-03-14 20:28:20.167241380 +0100
+++ /proc/self/fd/12 2023-03-14 20:28:20.167241380 +0100
@@ -1,5 +1,5 @@ 
  Booting Linux on physical CPU 0x0000000000 [0x410fd034]
- Linux version 6.1.0-rc7-01766-gda6d9c8ecd00 (gokrazy@docker)
(aarch64-linux-gnu-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #81
SMP PREEMPT Wed Mar  1 20:57:29 UTC 2017
+ Linux version 6.1.0-rc7-01767-gd6a5c562214f (gokrazy@docker)
(aarch64-linux-gnu-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #82