diff mbox series

ath9k: fix calibration data endianness

Message ID 20230415150542.2368179-1-noltari@gmail.com
State New
Headers show
Series ath9k: fix calibration data endianness | expand

Commit Message

Álvaro Fernández Rojas April 15, 2023, 3:05 p.m. UTC
BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
partitions but it needs to be swapped in order to work, otherwise it fails:
ath9k 0000:00:01.0: enabling device (0000 -> 0002)
ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
ath: phy0: Unable to initialize hardware; initialization status: -22
ath9k 0000:00:01.0: Failed to initialize device
ath9k: probe of 0000:00:01.0 failed with error -22

Fixes: eb3a97a69be8 ("ath9k: fetch calibration data via nvmem subsystem")
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
---
 drivers/net/wireless/ath/ath9k/init.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Toke Høiland-Jørgensen April 15, 2023, 3:25 p.m. UTC | #1
Álvaro Fernández Rojas <noltari@gmail.com> writes:

> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
> partitions but it needs to be swapped in order to work, otherwise it fails:
> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
> ath: phy0: Unable to initialize hardware; initialization status: -22
> ath9k 0000:00:01.0: Failed to initialize device
> ath9k: probe of 0000:00:01.0 failed with error -22

How does this affect other platforms? Why was the NO_EEP_SWAP flag set
in the first place? Christian, care to comment on this?

-Toke
Christian Lamparter April 15, 2023, 4:02 p.m. UTC | #2
Hi,

On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
> Álvaro Fernández Rojas <noltari@gmail.com> writes:
> 
>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>> partitions but it needs to be swapped in order to work, otherwise it fails:
>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>> ath: phy0: Unable to initialize hardware; initialization status: -22
>> ath9k 0000:00:01.0: Failed to initialize device
>> ath9k: probe of 0000:00:01.0 failed with error -22
> 
> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
> in the first place? Christian, care to comment on this?

I knew this would come up. I've written what I know and remember in the
pull-request/buglink.

Maybe this can be added to the commit?
Link: https://github.com/openwrt/openwrt/pull/12365

| From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.

Since the existing request_firmware eeprom fetcher code set the flag,
the nvmem code had to do it too.

In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
I don't know if there are devices out there, which have a swapped magic (which is
used to detect the endianess), but the caldata is in the correct endiannes (or
vice versa - Magic is correct, but data needs swapping).

I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
So I don't expect there to be a new issue there).

The older platform_data code has a "endian_check" flag in
the ath9k_platform_data struct:
<https://github.com/torvalds/linux/blob/master/include/linux/ath9k_platform.h#L38>
Which enables the check and the endian-swapping (though it's disabled by default).

Cheers,
Christian
Christian Lamparter April 15, 2023, 7:32 p.m. UTC | #3
On 4/15/23 18:02, Christian Lamparter wrote:
> Hi,
> 
> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
>>
>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>> ath9k 0000:00:01.0: Failed to initialize device
>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>
>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>> in the first place? Christian, care to comment on this?
> 
> I knew this would come up. I've written what I know and remember in the
> pull-request/buglink.
> 
> Maybe this can be added to the commit?
> Link: https://github.com/openwrt/openwrt/pull/12365
> 
> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
> 
> Since the existing request_firmware eeprom fetcher code set the flag,
> the nvmem code had to do it too.
> 
> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
> I don't know if there are devices out there, which have a swapped magic (which is
> used to detect the endianess), but the caldata is in the correct endiannes (or
> vice versa - Magic is correct, but data needs swapping).
> 
> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
> So I don't expect there to be a new issue there).

Nope! This is a classic self-own!... Well at least, this now gets documented!

Here are my findings. Please excuse the overlong lines.

## The good news / AVM FritzBox 7360v2 ##

The good news: The AVM FritzBox 7360v2 worked the same as before.

Bootlog with the patch applied - cold start:

|[   14.738647] ath9k_pci_owl_loader 0000:01:00.0: enabling device (0000 -> 0002)
|[   14.747344] ath9k_pci_owl_loader 0000:01:00.0: fixup device configuration
|[   15.116403] pci 0000:01:00.0: [168c:002e] type 00 class 0x028000
|[   15.121083] pci 0000:01:00.0: reg 0x10: [mem 0x1c000000-0x1c00ffff 64bit]
|[   15.128162] pci 0000:01:00.0: supports D1
|[   15.131841] pci 0000:01:00.0: PME# supported from D0 D1 D3hot
|[   15.144186] pci 0000:01:00.0: BAR 0: assigned [mem 0x1c000000-0x1c00ffff 64bit]
|[...] Wireguard loading
|[   15.367461] ifx_pcie_bios_map_irq port 0 dev 0000:01:00.0 slot 0 pin 1
|[   15.372676] ifx_pcie_bios_map_irq dev 0000:01:00.0 irq 144 assigned
|[   15.379031] ath9k 0000:01:00.0: enabling device (0140 -> 0142)
|[   15.392181] ath: EEPROM regdomain: 0x8114
|[   15.392219] ath: EEPROM indicates we should expect a country code
|[   15.392232] ath: doing EEPROM country->regdmn map search
|[   15.392242] ath: country maps to regdmn code: 0x37
|[   15.392254] ath: Country alpha2 being used: DE
|[   15.392264] ath: Regpair used: 0x37
|[   15.406435] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
|[   15.411722] ieee80211 phy0: Atheros AR9287 Rev:2 mem=0xbc000000, irq=144

## The not so good news / Netgear WNDR3700v2 ##

But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
"phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"

|[   23.260755] ath9k_pci_owl_loader 0000:00:11.0: enabling device (0000 -> 0002)
|[   23.268319] ath9k_pci_owl_loader 0000:00:12.0: enabling device (0000 -> 0002)
|[   23.285197] ath9k_pci_owl_loader 0000:00:11.0: fixup device configuration
|[   23.297232] ath9k_pci_owl_loader 0000:00:12.0: fixup device configuration
|[   23.320031] pci 0000:00:11.0: [168c:0029] type 00 class 0x028000
|[   23.326070] pci 0000:00:11.0: reg 0x10: [mem 0x10000000-0x1000ffff]
|[   23.332430] pci 0000:00:11.0: PME# supported from D0 D3hot
|[   23.338570] pci 0000:00:11.0: BAR 0: assigned [mem 0x10000000-0x1000ffff]
|[   23.347404] pci 0000:00:12.0: [168c:0029] type 00 class 0x028000
|[   23.353441] pci 0000:00:12.0: reg 0x10: [mem 0x10010000-0x1001ffff]
|[   23.359799] pci 0000:00:12.0: PME# supported from D0 D3hot
|[   23.365906] pci 0000:00:12.0: BAR 0: assigned [mem 0x10010000-0x1001ffff]
|[...] Wireguard loading
|[   24.333503] ath9k 0000:00:11.0: enabling device (0000 -> 0002)
|[   24.363290] ath: EEPROM regdomain: 0x0
|[   24.363319] ath: EEPROM indicates default country code should be used
|[   24.363325] ath: doing EEPROM country->regdmn map search
|[   24.363338] ath: country maps to regdmn code: 0x3a
|[   24.363346] ath: Country alpha2 being used: US
|[   24.363353] ath: Regpair used: 0x3a
|[   24.381709] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
|[   24.383599] gpio-508 (fixed antenna group 1): hogged as output/high
|[   24.389941] gpio-509 (fixed antenna group 1): hogged as output/high
|[   24.396204] gpio-510 (fixed antenna group 1): hogged as output/high
|[   24.402480] gpio-511 (fixed antenna group 1): hogged as output/high
|[   24.409007] ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xb0000000, irq=18
|[   24.415793] ath9k 0000:00:12.0: enabling device (0000 -> 0002)
|[   24.505496] ath: phy1: Bad EEPROM VER 0x0001 or REV 0x06e0
|[   24.511027] ath: phy1: Unable to initialize hardware; initialization status: -22
|[   24.518420] ath9k 0000:00:12.0: Failed to initialize device
|[   24.524004] ath9k: probe of 0000:00:12.0 failed with error -22

without the patch (rebuild mac80211/backports and uploaded the newly
created ath9k*.ko files and loaded them manually):

[ 1205.670387] ath: phy10: Ignoring endianness difference in EEPROM magic bytes. <----
[ 1205.679077] ath: EEPROM regdomain: 0x0
[ 1205.679086] ath: EEPROM indicates default country code should be used
[ 1205.679092] ath: doing EEPROM country->regdmn map search
[ 1205.679105] ath: country maps to regdmn code: 0x3a
[ 1205.679113] ath: Country alpha2 being used: US
[ 1205.679120] ath: Regpair used: 0x3a
[ 1205.692894] ieee80211 phy10: Selected rate control algorithm 'minstrel_ht'
[ 1205.694757] gpio-508 (fixed antenna group 1): hogged as output/high
[ 1205.701115] gpio-509 (fixed antenna group 1): hogged as output/high
[ 1205.707396] gpio-510 (fixed antenna group 1): hogged as output/high
[ 1205.713656] gpio-511 (fixed antenna group 1): hogged as output/high
[ 1205.720192] ieee80211 phy10: Atheros AR9280 Rev:2 mem=0xb0000000, irq=18
[ 1205.816842] ath: phy11: Ignoring endianness difference in EEPROM magic bytes. <----
[ 1205.825521] ath: EEPROM regdomain: 0x0
[ 1205.825530] ath: EEPROM indicates default country code should be used
[ 1205.825537] ath: doing EEPROM country->regdmn map search
[ 1205.825549] ath: country maps to regdmn code: 0x3a
[ 1205.825557] ath: Country alpha2 being used: US
[ 1205.825564] ath: Regpair used: 0x3a
[ 1205.841571] ieee80211 phy11: Selected rate control algorithm 'minstrel_ht'
[ 1205.843795] ieee80211 phy11: Atheros AR9280 Rev:2 mem=0xb0010000, irq=19

Everything was fine again.

### Hexdump extract of relevant ART data (/dev/mtd6)  ###
|00001000  a5 5a 00 00 00 03 60 00  16 8c 00 29 60 08 00 01  |.Z....`....)`...|   <- 2.4GHz WiFi OwlLoader boot code
|00001010  02 80 60 2c 16 8c a0 95  50 00 16 8c 00 2a 50 08  |..`,....P....*P.|   Apart from the Magic 0x5AA5 (or here in LE 0xA55A)
|00001020  00 01 02 80 50 2c 16 8c  a0 95 50 64 0c c0 05 04  |....P,....Pd....|   this is irrelevant for ath9k (it skips it,
|00001030  50 6c 38 11 00 03 40 04  07 3b 00 40 40 74 00 03  |Pl8...@..;.@@t..|   see ar5416_eep_start_loc in ath9k's eeprom_def.c)
|00001040  00 00 40 00 00 00 01 c2  60 34 00 44 00 00 ff ff  |..@.....`4.D....|
|00001050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
|*
|00001200  0c b8 d5 cc e0 16 02 01  00 00 00 1f 00 21 b7 9b  |.............!..|   <- 2.4GHz ath9k's base_eep_header
|00001210  81 c3 03 03 00 00 00 00  00 00 00 09 07 00 04 01  |................|
|00001220  00 ff 02 00 00 01 00 00  00 fb 00 00 00 00 00 00  |................|
|00001230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |.............
|[...]
|00005000  a5 5a 00 00 00 03 60 00  16 8c 00 29 60 08 00 01  |.Z....`....)`...|    <- 5GHz WiFi OwlLoader boot code
|00005010  02 80 60 2c 16 8c a0 94  50 00 16 8c 00 2a 50 08  |..`,....P....*P.|
|00005020  00 01 02 80 50 2c 16 8c  a0 94 50 64 0c c0 05 04  |....P,....Pd....|
|00005030  50 6c 38 11 00 03 40 04  07 3b 00 40 40 74 00 03  |Pl8...@..;.@@t..|
|00005040  00 00 40 00 00 00 01 c2  60 34 00 44 00 00 ff ff  |..@.....`4.D....|
|00005050  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
|*
|00005200  0c b8 27 40 e0 16 01 01  00 00 00 1f 00 21 b7 9b  |..'@.........!..| <- 5GHz ath9k's base_eep_header
|00005210  81 c3 03 03 00 00 00 00  00 00 00 09 07 00 04 00  |................|
|00005220  01 ff 02 00 00 01 00 00  00 fe 00 00 00 00 00 00  |................|
|00005230  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

### ath9k debug dumps of ath9k w/o patch (working) ###
# cat /sys/kernel/debug/ieee80211/phy10/ath9k/base_eeprom // 2.4GHz
        Major Version :         14
        Minor Version :         22
             Checksum :      54732
               Length :       3256
           RegDomain1 :          0
           RegDomain2 :         31
              TX Mask :          3
              RX Mask :          3
           Allow 5GHz :          0
           Allow 2GHz :          1
    Disable 2GHz HT20 :          0
    Disable 2GHz HT40 :          0
    Disable 5Ghz HT20 :          0
    Disable 5Ghz HT40 :          0
           Big Endian :          1
    Cal Bin Major Ver :          0
    Cal Bin Minor Ver :          7
        Cal Bin Build :          9
  OpenLoop Power Ctrl :          0
           MacAddress : 00:21:b7:9b:81:c3 <--- This looks like a valid MAC
(Note: This MAC is not used. Netgear stores the MAC at a different location
in the ART - not included in the extract above)

# cat /sys/kernel/debug/ieee80211/phy11/ath9k // 5Ghz
       Major Version :         14
        Minor Version :         22
             Checksum :      10048
               Length :       3256
           RegDomain1 :          0
           RegDomain2 :         31
              TX Mask :          3
              RX Mask :          3
           Allow 5GHz :          1
           Allow 2GHz :          0
    Disable 2GHz HT20 :          0
    Disable 2GHz HT40 :          0
    Disable 5Ghz HT20 :          0
    Disable 5Ghz HT40 :          0
           Big Endian :          1
    Cal Bin Major Ver :          0
    Cal Bin Minor Ver :          7
        Cal Bin Build :          9
  OpenLoop Power Ctrl :          0
           MacAddress : 00:21:b7:9b:81:c3 <--- This one too (it's the same)!

# cat /sys/kernel/debug/ieee80211/phy0/ath9k/modal_eeprom
[...]
  Ant. Common Control :        288  // That's 0x0120
[...]

### ath9k debug dumps of the ath9k with patch (Broken) ###

# cat /sys/kernel/debug/ieee80211/phy0/ath9k/base_eeprom  // This is the 2.4GHz Wifi, But now it says 5 GHz
        Major Version :         14
        Minor Version :         22
             Checksum :      54732 <--- checksum matches 2.4GHz @ 0x1200 (0xd5cc)
               Length :       3256 <--- this looks OK? Ok!
           RegDomain1 :          0
           RegDomain2 :         31
              TX Mask :          3
              RX Mask :          3
           Allow 5GHz :          1 <-- Says it's 5G! (0x1)
           Allow 2GHz :          0
    Disable 2GHz HT20 :          0
    Disable 2GHz HT40 :          0
    Disable 5Ghz HT20 :          0
    Disable 5Ghz HT40 :          0
           Big Endian :          0 <-- Oh? what happend here? This should have been 1 (Is this why the 5GHz PHY worked?)
    Cal Bin Major Ver :          7   | My best guess is that the u16 swap ath9k does, swapped opCapFlags and eepmisc.
                                     | So opCapFlags's value of 0x2/AR5416_OPFLAGS_11G in eepMisc is not publicy defined/known.
				    | while the eepMisc value of 0x1 (AR5416_OPFLAGS_11G) became "AR5416_OPFLAGS_11A"
				    | (=aka 5GHz capable) .
    Cal Bin Minor Ver :          0
        Cal Bin Build :          0
  OpenLoop Power Ctrl :          1
           MacAddress : 21:00:9b:b7:c3:81  <--- Multicast Bit set? This is bogus!

# cat /sys/kernel/debug/ieee80211/phy0/ath9k/modal_eeprom
[...]
  Ant. Common Control :   18874368  // That's 0x01 20 00 00
[...]


Cheers,
Christian
Toke Høiland-Jørgensen April 16, 2023, 10:50 a.m. UTC | #4
Christian Lamparter <chunkeey@gmail.com> writes:

> On 4/15/23 18:02, Christian Lamparter wrote:
>> Hi,
>> 
>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
>>>
>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>>> ath9k 0000:00:01.0: Failed to initialize device
>>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>>
>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>>> in the first place? Christian, care to comment on this?
>> 
>> I knew this would come up. I've written what I know and remember in the
>> pull-request/buglink.
>> 
>> Maybe this can be added to the commit?
>> Link: https://github.com/openwrt/openwrt/pull/12365
>> 
>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
>> 
>> Since the existing request_firmware eeprom fetcher code set the flag,
>> the nvmem code had to do it too.
>> 
>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
>> I don't know if there are devices out there, which have a swapped magic (which is
>> used to detect the endianess), but the caldata is in the correct endiannes (or
>> vice versa - Magic is correct, but data needs swapping).
>> 
>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
>> So I don't expect there to be a new issue there).
>
> Nope! This is a classic self-own!... Well at least, this now gets documented!
>
> Here are my findings. Please excuse the overlong lines.
>
> ## The good news / AVM FritzBox 7360v2 ##
>
> The good news: The AVM FritzBox 7360v2 worked the same as before.

[...]

> ## The not so good news / Netgear WNDR3700v2 ##
>
> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"

[...]

Alright, so IIUC, we have a situation where some devices only work
*with* the flag, and some devices only work *without* the flag? So we'll
need some kind of platform-specific setting? Could we put this in the
device trees, or is there a better solution?

-Toke
Christian Lamparter April 16, 2023, 1:37 p.m. UTC | #5
On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
> Christian Lamparter <chunkeey@gmail.com> writes:
> 
>> On 4/15/23 18:02, Christian Lamparter wrote:
>>> Hi,
>>>
>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>>>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
>>>>
>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>>>> ath9k 0000:00:01.0: Failed to initialize device
>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>>>
>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>>>> in the first place? Christian, care to comment on this?
>>>
>>> I knew this would come up. I've written what I know and remember in the
>>> pull-request/buglink.
>>>
>>> Maybe this can be added to the commit?
>>> Link: https://github.com/openwrt/openwrt/pull/12365
>>>
>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
>>>
>>> Since the existing request_firmware eeprom fetcher code set the flag,
>>> the nvmem code had to do it too.
>>>
>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
>>> I don't know if there are devices out there, which have a swapped magic (which is
>>> used to detect the endianess), but the caldata is in the correct endiannes (or
>>> vice versa - Magic is correct, but data needs swapping).
>>>
>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
>>> So I don't expect there to be a new issue there).
>>
>> Nope! This is a classic self-own!... Well at least, this now gets documented!
>>
>> Here are my findings. Please excuse the overlong lines.
>>
>> ## The good news / AVM FritzBox 7360v2 ##
>>
>> The good news: The AVM FritzBox 7360v2 worked the same as before.
> 
> [...]
> 
>> ## The not so good news / Netgear WNDR3700v2 ##
>>
>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
> 
> [...]
> 
> Alright, so IIUC, we have a situation where some devices only work
> *with* the flag, and some devices only work *without* the flag? So we'll
> need some kind of platform-specific setting? Could we put this in the
> device trees, or is there a better solution?

Depends. From what I gather, ath9k calls this "need_swap". Thing is,
the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
official documentation about the AR9170 Base EEPROM (has the same base
structure as AR5008 up to AR92xx) this is specified as:

"Only bit 0 is defined as Big Endian. This bit should be written as 1
when the structure is interpreted in big Endian byte ordering. This bit
must be reviewed before any larger than byte parameters can be interpreted."

It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
the  caldata should be in "Big-Endian" too... so no swapping is necessary.

Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
with this eepmisc flag:

|       if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
|               *swap_needed = true;
|               ath_dbg(common, EEPROM,
|                       "Big Endian EEPROM detected according to EEPMISC register.\n");
|       } else {
|               *swap_needed = false;
|       }

This doesn't take into consideration that swapping is not needed if
the data is in big endian format on a big endian device. So, this
could be changed so that the *swap_needed is only true if the flag and
device endiannes disagrees?

That said, Martin and Felix have written their reasons in the cover letter
and patches for why the code is what it is:
<https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>

Toke, What's your take on this? Having something similar like the
check_endian bool... but for OF? Or more logic that can somehow
figure out if it's big or little endian.

Cheers,
Christian
Álvaro Fernández Rojas April 16, 2023, 2:28 p.m. UTC | #6
El dom, 16 abr 2023 a las 15:37, Christian Lamparter
(<chunkeey@gmail.com>) escribió:
>
> On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
> > Christian Lamparter <chunkeey@gmail.com> writes:
> >
> >> On 4/15/23 18:02, Christian Lamparter wrote:
> >>> Hi,
> >>>
> >>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
> >>>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
> >>>>
> >>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
> >>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
> >>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
> >>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
> >>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
> >>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
> >>>>> ath9k 0000:00:01.0: Failed to initialize device
> >>>>> ath9k: probe of 0000:00:01.0 failed with error -22
> >>>>
> >>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
> >>>> in the first place? Christian, care to comment on this?
> >>>
> >>> I knew this would come up. I've written what I know and remember in the
> >>> pull-request/buglink.
> >>>
> >>> Maybe this can be added to the commit?
> >>> Link: https://github.com/openwrt/openwrt/pull/12365
> >>>
> >>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
> >>>
> >>> Since the existing request_firmware eeprom fetcher code set the flag,
> >>> the nvmem code had to do it too.
> >>>
> >>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
> >>> I don't know if there are devices out there, which have a swapped magic (which is
> >>> used to detect the endianess), but the caldata is in the correct endiannes (or
> >>> vice versa - Magic is correct, but data needs swapping).
> >>>
> >>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
> >>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
> >>> So I don't expect there to be a new issue there).
> >>
> >> Nope! This is a classic self-own!... Well at least, this now gets documented!
> >>
> >> Here are my findings. Please excuse the overlong lines.
> >>
> >> ## The good news / AVM FritzBox 7360v2 ##
> >>
> >> The good news: The AVM FritzBox 7360v2 worked the same as before.
> >
> > [...]
> >
> >> ## The not so good news / Netgear WNDR3700v2 ##
> >>
> >> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
> >> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
> >> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
> >
> > [...]
> >
> > Alright, so IIUC, we have a situation where some devices only work
> > *with* the flag, and some devices only work *without* the flag? So we'll
> > need some kind of platform-specific setting? Could we put this in the
> > device trees, or is there a better solution?
>
> Depends. From what I gather, ath9k calls this "need_swap". Thing is,
> the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
> official documentation about the AR9170 Base EEPROM (has the same base
> structure as AR5008 up to AR92xx) this is specified as:
>
> "Only bit 0 is defined as Big Endian. This bit should be written as 1
> when the structure is interpreted in big Endian byte ordering. This bit
> must be reviewed before any larger than byte parameters can be interpreted."
>
> It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
> the  caldata should be in "Big-Endian" too... so no swapping is necessary.
>
> Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
> with this eepmisc flag:
>
> |       if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> |               *swap_needed = true;
> |               ath_dbg(common, EEPROM,
> |                       "Big Endian EEPROM detected according to EEPMISC register.\n");
> |       } else {
> |               *swap_needed = false;
> |       }
>
> This doesn't take into consideration that swapping is not needed if
> the data is in big endian format on a big endian device. So, this
> could be changed so that the *swap_needed is only true if the flag and
> device endiannes disagrees?
>
> That said, Martin and Felix have written their reasons in the cover letter
> and patches for why the code is what it is:
> <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>
>
> Toke, What's your take on this? Having something similar like the
> check_endian bool... but for OF? Or more logic that can somehow
> figure out if it's big or little endian.

I already have v2 patches adding a new device tree flag, but I'll wait
for Toke's answer before sending them.
In my patches I added a new DT flag "qca,endian-check", which follows
the one from ath9k_platform_data (endian_check).

>
> Cheers,
> Christian

Best regards,
Álvaro.
Andrew Lunn April 16, 2023, 2:44 p.m. UTC | #7
> |       if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> |               *swap_needed = true;
> |               ath_dbg(common, EEPROM,
> |                       "Big Endian EEPROM detected according to EEPMISC register.\n");
> |       } else {
> |               *swap_needed = false;
> |       }
> 
> This doesn't take into consideration that swapping is not needed if
> the data is in big endian format on a big endian device. So, this
> could be changed so that the *swap_needed is only true if the flag and
> device endiannes disagrees?

There are versions of the macro which performs the swap which
understands the CPU endianness and become a NOP when it is not
required. htons()/ntohs() are the classic examples. So you need to
consider:

Despite swap_needed being true, it is possible no swap it actually
happening, because such a macro is being used.

and

Maybe using these variant can make the code simpler, by just doing the
NOP swap when the CPU endianess does not require it.

    Andrew
Toke Høiland-Jørgensen April 16, 2023, 9:49 p.m. UTC | #8
Christian Lamparter <chunkeey@gmail.com> writes:

> On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
>> Christian Lamparter <chunkeey@gmail.com> writes:
>> 
>>> On 4/15/23 18:02, Christian Lamparter wrote:
>>>> Hi,
>>>>
>>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
>>>>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
>>>>>
>>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
>>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
>>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
>>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
>>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
>>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
>>>>>> ath9k 0000:00:01.0: Failed to initialize device
>>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
>>>>>
>>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
>>>>> in the first place? Christian, care to comment on this?
>>>>
>>>> I knew this would come up. I've written what I know and remember in the
>>>> pull-request/buglink.
>>>>
>>>> Maybe this can be added to the commit?
>>>> Link: https://github.com/openwrt/openwrt/pull/12365
>>>>
>>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
>>>>
>>>> Since the existing request_firmware eeprom fetcher code set the flag,
>>>> the nvmem code had to do it too.
>>>>
>>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
>>>> I don't know if there are devices out there, which have a swapped magic (which is
>>>> used to detect the endianess), but the caldata is in the correct endiannes (or
>>>> vice versa - Magic is correct, but data needs swapping).
>>>>
>>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
>>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
>>>> So I don't expect there to be a new issue there).
>>>
>>> Nope! This is a classic self-own!... Well at least, this now gets documented!
>>>
>>> Here are my findings. Please excuse the overlong lines.
>>>
>>> ## The good news / AVM FritzBox 7360v2 ##
>>>
>>> The good news: The AVM FritzBox 7360v2 worked the same as before.
>> 
>> [...]
>> 
>>> ## The not so good news / Netgear WNDR3700v2 ##
>>>
>>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
>>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
>>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
>> 
>> [...]
>> 
>> Alright, so IIUC, we have a situation where some devices only work
>> *with* the flag, and some devices only work *without* the flag? So we'll
>> need some kind of platform-specific setting? Could we put this in the
>> device trees, or is there a better solution?
>
> Depends. From what I gather, ath9k calls this "need_swap". Thing is,
> the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
> official documentation about the AR9170 Base EEPROM (has the same base
> structure as AR5008 up to AR92xx) this is specified as:
>
> "Only bit 0 is defined as Big Endian. This bit should be written as 1
> when the structure is interpreted in big Endian byte ordering. This bit
> must be reviewed before any larger than byte parameters can be interpreted."
>
> It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
> the  caldata should be in "Big-Endian" too... so no swapping is necessary.
>
> Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
> with this eepmisc flag:
>
> |       if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> |               *swap_needed = true;
> |               ath_dbg(common, EEPROM,
> |                       "Big Endian EEPROM detected according to EEPMISC register.\n");
> |       } else {
> |               *swap_needed = false;
> |       }
>
> This doesn't take into consideration that swapping is not needed if
> the data is in big endian format on a big endian device. So, this
> could be changed so that the *swap_needed is only true if the flag and
> device endiannes disagrees?
>
> That said, Martin and Felix have written their reasons in the cover letter
> and patches for why the code is what it is:
> <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>
>
> Toke, What's your take on this? Having something similar like the
> check_endian bool... but for OF? Or more logic that can somehow
> figure out if it's big or little endian.

Digging into that old thread, it seems we are re-hashing a lot of the
old discussion when those patches went in. Basically, the code you
quoted above is correct because the commit that introduced it sets all
fields to be __le16 and __le32 types and reads them using the
leXX_to_cpu() macros.

The code *further up* in that function is what is enabled by Alvaro's
patch. Which is a different type of swapping (where the whole eeprom is
swab16()'ed, not just the actual multi-byte data fields in them).
However, in OpenWrt the in-driver code to do this is not used; instead,
a hotplug script applies the swapping before the device is seen by the
driver, as described in this commit[0]. Martin indeed mentions that this
is a device-specific thing, so the driver can't actually do the right
thing without some outside feature flag[1]. The commit[0] also indicates
that there used used to exist a device-tree binding in the out-of-tree
device trees used in OpenWrt to do the unconditional swab16().

The code in [0] still exists in OpenWrt today, albeit in a somewhat
modified form[2]. I guess the question then boils down to, Álvaro, can
your issue be resolved by a pre-processing step similar to that which is
done in [2]? Or do we need the device tree flag after all?

-Toke

[0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144
[1] https://narkive.com/2q5A6nu0.34
[2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD
Álvaro Fernández Rojas April 17, 2023, 5:33 a.m. UTC | #9
Hi Toke,

El dom, 16 abr 2023 a las 23:49, Toke Høiland-Jørgensen
(<toke@toke.dk>) escribió:
>
> Christian Lamparter <chunkeey@gmail.com> writes:
>
> > On 4/16/23 12:50, Toke Høiland-Jørgensen wrote:
> >> Christian Lamparter <chunkeey@gmail.com> writes:
> >>
> >>> On 4/15/23 18:02, Christian Lamparter wrote:
> >>>> Hi,
> >>>>
> >>>> On 4/15/23 17:25, Toke Høiland-Jørgensen wrote:
> >>>>> Álvaro Fernández Rojas <noltari@gmail.com> writes:
> >>>>>
> >>>>>> BCM63xx (Big Endian MIPS) devices store the calibration data in MTD
> >>>>>> partitions but it needs to be swapped in order to work, otherwise it fails:
> >>>>>> ath9k 0000:00:01.0: enabling device (0000 -> 0002)
> >>>>>> ath: phy0: Ignoring endianness difference in EEPROM magic bytes.
> >>>>>> ath: phy0: Bad EEPROM VER 0x0001 or REV 0x00e0
> >>>>>> ath: phy0: Unable to initialize hardware; initialization status: -22
> >>>>>> ath9k 0000:00:01.0: Failed to initialize device
> >>>>>> ath9k: probe of 0000:00:01.0 failed with error -22
> >>>>>
> >>>>> How does this affect other platforms? Why was the NO_EEP_SWAP flag set
> >>>>> in the first place? Christian, care to comment on this?
> >>>>
> >>>> I knew this would come up. I've written what I know and remember in the
> >>>> pull-request/buglink.
> >>>>
> >>>> Maybe this can be added to the commit?
> >>>> Link: https://github.com/openwrt/openwrt/pull/12365
> >>>>
> >>>> | From what I remember, the ah->ah_flags |= AH_NO_EEP_SWAP; was copied verbatim from ath9k_of_init's request_eeprom.
> >>>>
> >>>> Since the existing request_firmware eeprom fetcher code set the flag,
> >>>> the nvmem code had to do it too.
> >>>>
> >>>> In theory, I don't think that not setting the AH_NO_EEP_SWAP flag will cause havoc.
> >>>> I don't know if there are devices out there, which have a swapped magic (which is
> >>>> used to detect the endianess), but the caldata is in the correct endiannes (or
> >>>> vice versa - Magic is correct, but data needs swapping).
> >>>>
> >>>> I can run tests with it on a Netzgear WNDR3700v2 (AR7161+2xAR9220)
> >>>> and FritzBox 7360v2 (Lantiq XWAY+AR9220). (But these worked fine.
> >>>> So I don't expect there to be a new issue there).
> >>>
> >>> Nope! This is a classic self-own!... Well at least, this now gets documented!
> >>>
> >>> Here are my findings. Please excuse the overlong lines.
> >>>
> >>> ## The good news / AVM FritzBox 7360v2 ##
> >>>
> >>> The good news: The AVM FritzBox 7360v2 worked the same as before.
> >>
> >> [...]
> >>
> >>> ## The not so good news / Netgear WNDR3700v2 ##
> >>>
> >>> But not the Netgar WNDR3700v2. One WiFi (The 2.4G, reported itself now as the 5G @0000:00:11.0 -
> >>> doesn't really work now), and the real 5G WiFi (@0000:00:12.0) failed with:
> >>> "phy1: Bad EEPROM VER 0x0001 or REV 0x06e0"
> >>
> >> [...]
> >>
> >> Alright, so IIUC, we have a situation where some devices only work
> >> *with* the flag, and some devices only work *without* the flag? So we'll
> >> need some kind of platform-specific setting? Could we put this in the
> >> device trees, or is there a better solution?
> >
> > Depends. From what I gather, ath9k calls this "need_swap". Thing is,
> > the flag in the EEPROM is called "AR5416_EEPMISC_BIG_ENDIAN". In the
> > official documentation about the AR9170 Base EEPROM (has the same base
> > structure as AR5008 up to AR92xx) this is specified as:
> >
> > "Only bit 0 is defined as Big Endian. This bit should be written as 1
> > when the structure is interpreted in big Endian byte ordering. This bit
> > must be reviewed before any larger than byte parameters can be interpreted."
> >
> > It makes sense that on a Big-Endian MIPS device (like the Netgear WNDR3700v2),
> > the  caldata should be in "Big-Endian" too... so no swapping is necessary.
> >
> > Looking in ath9k's eeprom.c function ath9k_hw_nvram_swap_data() that deals
> > with this eepmisc flag:
> >
> > |       if (ah->eep_ops->get_eepmisc(ah) & AR5416_EEPMISC_BIG_ENDIAN) {
> > |               *swap_needed = true;
> > |               ath_dbg(common, EEPROM,
> > |                       "Big Endian EEPROM detected according to EEPMISC register.\n");
> > |       } else {
> > |               *swap_needed = false;
> > |       }
> >
> > This doesn't take into consideration that swapping is not needed if
> > the data is in big endian format on a big endian device. So, this
> > could be changed so that the *swap_needed is only true if the flag and
> > device endiannes disagrees?
> >
> > That said, Martin and Felix have written their reasons in the cover letter
> > and patches for why the code is what it is:
> > <https://ath9k-devel.ath9k.narkive.com/2q5A6nu0/patch-0-5-ath9k-eeprom-swapping-improvements>
> >
> > Toke, What's your take on this? Having something similar like the
> > check_endian bool... but for OF? Or more logic that can somehow
> > figure out if it's big or little endian.
>
> Digging into that old thread, it seems we are re-hashing a lot of the
> old discussion when those patches went in. Basically, the code you
> quoted above is correct because the commit that introduced it sets all
> fields to be __le16 and __le32 types and reads them using the
> leXX_to_cpu() macros.
>
> The code *further up* in that function is what is enabled by Alvaro's
> patch. Which is a different type of swapping (where the whole eeprom is
> swab16()'ed, not just the actual multi-byte data fields in them).
> However, in OpenWrt the in-driver code to do this is not used; instead,
> a hotplug script applies the swapping before the device is seen by the
> driver, as described in this commit[0]. Martin indeed mentions that this
> is a device-specific thing, so the driver can't actually do the right
> thing without some outside feature flag[1]. The commit[0] also indicates
> that there used used to exist a device-tree binding in the out-of-tree
> device trees used in OpenWrt to do the unconditional swab16().
>
> The code in [0] still exists in OpenWrt today, albeit in a somewhat
> modified form[2]. I guess the question then boils down to, Álvaro, can
> your issue be resolved by a pre-processing step similar to that which is
> done in [2]? Or do we need the device tree flag after all?

TBH, yes, it can be solved by a pre-processing step similar to what's
done in [2], but then having added nvmem support would make no sense
at all for those devices that need swapping, since it's unusable
without the flag.
So, in my opinion the flag should be added in order to be able to use
it without pre-processing the calibration data and to take advantage
of nvmem support.
I will send the v2 patch and even if it's not accepted I think I will
add it as a downstream patch on OpenWrt...

>
> -Toke
>
> [0] https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=afa37092663d00aa0abf8c61943d9a1b5558b144
> [1] https://narkive.com/2q5A6nu0.34
> [2] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/lantiq/xway/base-files/etc/hotplug.d/firmware/12-ath9k-eeprom;h=98bb9af6947a298775ff7fa26ac6501c57df8378;hb=HEAD

Best regards,
Álvaro.
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 4f00400c7ffb..1314fbc55c0b 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -615,7 +615,6 @@  static int ath9k_nvmem_request_eeprom(struct ath_softc *sc)
 
 	ah->nvmem_blob_len = len;
 	ah->ah_flags &= ~AH_USE_EEPROM;
-	ah->ah_flags |= AH_NO_EEP_SWAP;
 
 	return 0;
 }