diff mbox series

[RFC] Revert "mmc: core: Fixup support for writeback-cache for eMMC and SD"

Message ID 20230531002740.1235620-1-marex@denx.de
State New
Headers show
Series [RFC] Revert "mmc: core: Fixup support for writeback-cache for eMMC and SD" | expand

Commit Message

Marek Vasut May 31, 2023, 12:27 a.m. UTC
This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.

On STM32MP153C DHCOR DRC Compact with microSD card
Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
it is no longer possible to mount ext4 filesystem.

Git bisect points to this commit which is being reverted here. A bit of
investigation shows that for non-working microSD cards
  cache_enabled=TRUE fua_enabled=false
For working microSD card (Kingston Canvas React Plus, MicroSDXC II
64 GiB U3/V90/A1 SDCR2/64GB)
  cache_enabled=FALSE fua_enabled=false

It seems enabling the cache for SD cards causes the issue. Before the
reverted commit, the cache was not reported enabled for SD cards. I am
not sure whether such blunt revert is the correct solution however, any
ideas ?

Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: "Christian Löhle" <CLoehle@hyperstone.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marek Vasut <marex@denx.de>
Cc: Michael Wu <michael@allwinnertech.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Seunghui Lee <sh043.lee@samsung.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-mmc@vger.kernel.org
---
 drivers/mmc/core/block.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

Comments

Avri Altman May 31, 2023, 6:01 a.m. UTC | #1
> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
> 
> On STM32MP153C DHCOR DRC Compact with microSD card Kingston Canvas
> Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB it is no longer possible
> to mount ext4 filesystem.
> 
> Git bisect points to this commit which is being reverted here. A bit of
> investigation shows that for non-working microSD cards
>   cache_enabled=TRUE fua_enabled=false
> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
> 64 GiB U3/V90/A1 SDCR2/64GB)
>   cache_enabled=FALSE fua_enabled=false
> 
> It seems enabling the cache for SD cards causes the issue. Before the reverted
> commit, the cache was not reported enabled for SD cards. I am not sure
> whether such blunt revert is the correct solution however, any ideas ?
The commit you are reverting has a couple of fixes tags - so reverting it will re-introduce those issues it aims to fix.
If I recall correctly, I asked our sd system engineering to follow up on this,
And they had no problem with the logic.

I think that we need more data to establish the root cause of this issue,
Provided that it is not a specific-vendor issue.

Thanks,
Avri

> 
> Signed-off-by: Marek Vasut <marex@denx.de>
Marek Vasut May 31, 2023, 11:34 a.m. UTC | #2
On 5/31/23 08:01, Avri Altman wrote:
>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>
>> On STM32MP153C DHCOR DRC Compact with microSD card Kingston Canvas
>> Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB it is no longer possible
>> to mount ext4 filesystem.
>>
>> Git bisect points to this commit which is being reverted here. A bit of
>> investigation shows that for non-working microSD cards
>>    cache_enabled=TRUE fua_enabled=false
>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>    cache_enabled=FALSE fua_enabled=false
>>
>> It seems enabling the cache for SD cards causes the issue. Before the reverted
>> commit, the cache was not reported enabled for SD cards. I am not sure
>> whether such blunt revert is the correct solution however, any ideas ?
> The commit you are reverting has a couple of fixes tags - so reverting it will re-introduce those issues it aims to fix.
> If I recall correctly, I asked our sd system engineering to follow up on this,
> And they had no problem with the logic.
> 
> I think that we need more data to establish the root cause of this issue,
> Provided that it is not a specific-vendor issue.

I just sent Adrian the command log, but I don't see anything obvious in it.
Adrian Hunter May 31, 2023, 1:13 p.m. UTC | #3
On 31/05/23 14:34, Marek Vasut wrote:
> On 5/31/23 07:46, Adrian Hunter wrote:
>> On 31/05/23 03:27, Marek Vasut wrote:
>>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>>
>>> On STM32MP153C DHCOR DRC Compact with microSD card
>>> Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
>>> it is no longer possible to mount ext4 filesystem.
>>>
>>> Git bisect points to this commit which is being reverted here. A bit of
>>> investigation shows that for non-working microSD cards
>>>    cache_enabled=TRUE fua_enabled=false
>>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>>    cache_enabled=FALSE fua_enabled=false
>>>
>>> It seems enabling the cache for SD cards causes the issue. Before the
>>> reverted commit, the cache was not reported enabled for SD cards. I am
>>> not sure whether such blunt revert is the correct solution however, any
>>> ideas ?
>>
>> This does not disable the cache, it disables the block layer from issuing
>> cache flushes (by telling it there is no cache being used).
>>
>> It would be good to know what commands are failing, which should be visible
>> in the kernel messages if dynamic debug is enabled:
>>
>> Dynamic debug for mmc
>> ---------------------
>>
>>      Kernel must be configured:
>>
>>          CONFIG_DYNAMIC_DEBUG=y
>>
>>      To enable mmc debug via sysfs:
>>
>>          echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
>>          echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
>>
>>      To enable mmc debug via kernel command line:
>>
>>          dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>
>>      To disable mmc debug:
>>
>>          echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
>>          echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control
>>
>>      More general information in kernel documentation in kernel tree:
>>
>>          Documentation/admin-guide/dynamic-debug-howto.rst
> 
> Log is below, I added newlines around the mount failure.
> Do you see anything problematic there ?
> 
> $ echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 8 > /proc/sys/kernel/printk ; mount /dev/mmcblk1p4 /mnt
> [   61.061029] <mmc1: starting CMD23 arg 00000002 flags 00000015>
> [   61.065637] mmc1: starting CMD18 arg 00002002 flags 000000b5
> [   61.071144] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
> [   61.078096] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.083302] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
> [   61.090012] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
> [   61.096531] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.102332] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
> [   61.108836] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.114575] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.120371] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.127178] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.134084] mmc1:     1024 bytes transferred: 0
> [   61.138658] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.145533] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.150946] mmc1: starting CMD18 arg 00002000 flags 000000b5
> [   61.156609] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
> [   61.163426] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.168759] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
> [   61.175368] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.181985] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.187682] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
> [   61.194185] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.199979] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.205769] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.212574] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.219478] mmc1:     4096 bytes transferred: 0
> [   61.224050] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.231166] <mmc1: starting CMD23 arg 00000002 flags 00000015>
> [   61.236378] mmc1: starting CMD18 arg 00002002 flags 000000b5
> [   61.241983] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
> [   61.248940] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.254169] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
> [   61.260779] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
> [   61.267405] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.273101] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
> [   61.279603] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.285431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.291125] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.298032] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.304937] mmc1:     1024 bytes transferred: 0
> [   61.309409] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.316256] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.321768] mmc1: starting CMD18 arg 00002000 flags 000000b5
> [   61.327459] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
> [   61.334360] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.339459] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
> [   61.346167] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.352782] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.358478] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
> [   61.365082] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.370776] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.376565] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.383371] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.390275] mmc1:     4096 bytes transferred: 0
> [   61.394848] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.401885] <mmc1: starting CMD23 arg 00000002 flags 00000015>
> [   61.407193] mmc1: starting CMD18 arg 00002002 flags 000000b5
> [   61.412794] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
> [   61.419748] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.424973] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
> [   61.431582] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
> [   61.438204] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.443901] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
> [   61.450404] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.456232] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.461927] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.468834] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.475739] mmc1:     1024 bytes transferred: 0
> [   61.480210] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.487054] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.492462] mmc1: starting CMD18 arg 00002000 flags 000000b5
> [   61.498251] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
> [   61.505160] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.510260] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
> [   61.516969] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.523592] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.529288] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
> [   61.535789] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.541583] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.547271] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.554177] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.561082] mmc1:     4096 bytes transferred: 0
> [   61.565654] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.572440] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.578017] mmc1: starting CMD18 arg 00002008 flags 000000b5
> [   61.583653] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
> [   61.590465] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.595686] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
> [   61.602396] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.608915] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.614712] mmci-pl18x 58005000.mmc: op 12 arg 00002008 flags 000000b5
> [   61.621215] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.627034] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.632727] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.639633] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.646437] mmc1:     4096 bytes transferred: 0
> [   61.651010] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.658908] <mmc1: starting CMD23 arg 00000108 flags 00000015>
> [   61.663421] mmc1: starting CMD18 arg 00003be8 flags 000000b5
> [   61.669102] mmc1:     blksz 512 blocks 264 flags 00000200 tsac 100 ms nsac 0
> [   61.676155] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.681355] mmci-pl18x 58005000.mmc: blksz 0200 blks 0108 flags 00000200
> [   61.688064] mmci-pl18x 58005000.mmc: op 17 arg 00000108 flags 00000015
> [   61.694589] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.700284] mmci-pl18x 58005000.mmc: op 12 arg 00003be8 flags 000000b5
> [   61.706887] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.712581] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.718372] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   61.724060] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.730965] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.737870] mmc1:     135168 bytes transferred: 0
> [   61.742544] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.749064] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.755336] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.760649] mmc1: starting CMD18 arg 00442000 flags 000000b5
> [   61.766327] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
> [   61.773144] mmc1:     CMD12 arg 00000000 flags 00000095
> [   61.778489] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
> [   61.785100] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.791616] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.797414] mmci-pl18x 58005000.mmc: op 12 arg 00442000 flags 000000b5
> [   61.803916] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.809710] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.815399] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.822306] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   61.829211] mmc1:     4096 bytes transferred: 0
> [   61.833683] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.833952] mmc0: starting CMD7 arg fffd0000 flags 00000015
> [   61.840205] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.845820] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
> [   61.853036] <mmc1: starting CMD23 arg 00000008 flags 00000015>
> [   61.858074] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
> [   61.858098] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
> [   61.876506] mmc1: starting CMD25 arg 00002000 flags 000000b5
> [   61.882106] mmc1:     blksz 512 blocks 8 flags 00000100 tsac 3000 ms nsac 0
> [   61.889107] mmc1:     CMD12 arg 00000000 flags 0000049d
> [   61.894346] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000100
> [   61.900957] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
> [   61.907588] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.913283] mmci-pl18x 58005000.mmc: op 19 arg 00002000 flags 000000b5
> [   61.919786] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.925612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   61.931309] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   61.938219] mmc1: req done (CMD25): 0: 00000900 00000000 00000000 00000000
> [   61.945125] mmc1:     4096 bytes transferred: 0
> [   61.949598] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   61.956107] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.961888] mmc1: starting CMD13 arg 50480000 flags 00000195
> [   61.967581] mmci-pl18x 58005000.mmc: op 0d arg 50480000 flags 00000195
> [   61.974026] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   61.979718] mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
> [   61.986630] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms nsac 0
> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000 00000000
> [   62.036708] mmc1:     512 bytes transferred: 0
> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000 00000000
> [   62.091081] mmc1:     512 bytes transferred: 0
> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000

CMD48 was successful but it still became an error, which must have
been because it indicated the flush had not completed.  We do not
allow for mmc_poll_for_busy() being too quick i.e. card has not
had a chance to indicate busy yet.

Please try this:

diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index 72b664ed90cf..9c3123867a99 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
 {
 	struct mmc_card *card = host->card;
 	u8 *reg_buf, fno, page;
+	unsigned long timeout;
+	bool expired;
 	u16 offset;
 	int err;
 
@@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
 		goto out;
 	}
 
+	timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
+again:
 	err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
 				MMC_BUSY_EXTR_SINGLE);
 	if (err)
 		goto out;
 
+	expired = time_after(jiffies, timeout);
+
 	/*
 	 * Read the Flush Cache bit. The card shall reset it, to confirm that
 	 * it's has completed the flushing of the cache.
@@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
 		goto out;
 	}
 
-	if (reg_buf[0] & BIT(0))
-		err = -ETIMEDOUT;
+	if (reg_buf[0] & BIT(0)) {
+		if (expired)
+			err = -ETIMEDOUT;
+		else
+			goto again;
+	}
 out:
 	kfree(reg_buf);
 	return err;


> 
> 
> [   62.101381] I/O error, dev mmcblk1, sector 8192 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 2
> [   62.110223] Buffer I/O error on dev mmcblk1p4, logical block 0, lost sync page write
> [   62.117996] EXT4-fs (mmcblk1p4): I/O error while writing superblock
> [   62.124320] EXT4-fs (mmcblk1p4): mount failed
> 
> 
> [   62.129676] <mmc1: starting CMD23 arg 00000002 flags 00000015>
> [   62.134452] mmc1: starting CMD18 arg 00002000 flags 000000b5
> [   62.140055] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
> [   62.147003] mmc1:     CMD12 arg 00000000 flags 00000095
> [   62.152205] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
> [   62.158915] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
> [   62.165431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.171229] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
> [   62.177734] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   62.183566] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
> [   62.189262] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
> [   62.196169] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
> [   62.202973] mmc1:     1024 bytes transferred: 0
> [   62.207546] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
> [   62.213966] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   62.220406] mmc1: starting CMD17 arg 00002000 flags 000000b5
> [   62.225497] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
> [   62.232324] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
> [   62.239033] mmci-pl18x 58005000.mmc: op 11 arg 00002000 flags 000000b5
> [   62.245571] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.251267] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   62.257058] mmc1: req done (CMD17): 0: 00000900 00000000 00000000 00000000
> [   62.263966] mmc1:     512 bytes transferred: 0
> [   62.268347] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   62.274791] mmc1: starting CMD17 arg 00002000 flags 000000b5
> [   62.279800] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
> [   62.286783] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
> [   62.293396] mmci-pl18x 58005000.mmc: op 11 arg 00002000 flags 000000b5
> [   62.299912] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.305612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   62.311403] mmc1: req done (CMD17): 0: 00000900 00000000 00000000 00000000
> [   62.318309] mmc1:     512 bytes transferred: 0
> [   62.322801] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> 
> mount: mounting /dev/mmcblk1p4 on /mnt failed: Invalid argument
> ~ #
> 
> [   62.873917] mmc0: starting CMD7 arg fffd0000 flags 00000015
> [   62.878152] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
> [   62.884681] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
> [   62.890381] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
>
Marek Vasut May 31, 2023, 9:31 p.m. UTC | #4
On 5/31/23 15:13, Adrian Hunter wrote:
> On 31/05/23 14:34, Marek Vasut wrote:
>> On 5/31/23 07:46, Adrian Hunter wrote:
>>> On 31/05/23 03:27, Marek Vasut wrote:
>>>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>>>
>>>> On STM32MP153C DHCOR DRC Compact with microSD card
>>>> Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
>>>> it is no longer possible to mount ext4 filesystem.
>>>>
>>>> Git bisect points to this commit which is being reverted here. A bit of
>>>> investigation shows that for non-working microSD cards
>>>>     cache_enabled=TRUE fua_enabled=false
>>>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>>>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>>>     cache_enabled=FALSE fua_enabled=false
>>>>
>>>> It seems enabling the cache for SD cards causes the issue. Before the
>>>> reverted commit, the cache was not reported enabled for SD cards. I am
>>>> not sure whether such blunt revert is the correct solution however, any
>>>> ideas ?
>>>
>>> This does not disable the cache, it disables the block layer from issuing
>>> cache flushes (by telling it there is no cache being used).
>>>
>>> It would be good to know what commands are failing, which should be visible
>>> in the kernel messages if dynamic debug is enabled:
>>>
>>> Dynamic debug for mmc
>>> ---------------------
>>>
>>>       Kernel must be configured:
>>>
>>>           CONFIG_DYNAMIC_DEBUG=y
>>>
>>>       To enable mmc debug via sysfs:
>>>
>>>           echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>           echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>
>>>       To enable mmc debug via kernel command line:
>>>
>>>           dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>>
>>>       To disable mmc debug:
>>>
>>>           echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>           echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>
>>>       More general information in kernel documentation in kernel tree:
>>>
>>>           Documentation/admin-guide/dynamic-debug-howto.rst
>>
>> Log is below, I added newlines around the mount failure.
>> Do you see anything problematic there ?
>>
>> $ echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 8 > /proc/sys/kernel/printk ; mount /dev/mmcblk1p4 /mnt
>> [   61.061029] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>> [   61.065637] mmc1: starting CMD18 arg 00002002 flags 000000b5
>> [   61.071144] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>> [   61.078096] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.083302] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>> [   61.090012] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>> [   61.096531] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.102332] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>> [   61.108836] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.114575] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.120371] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.127178] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.134084] mmc1:     1024 bytes transferred: 0
>> [   61.138658] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.145533] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.150946] mmc1: starting CMD18 arg 00002000 flags 000000b5
>> [   61.156609] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>> [   61.163426] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.168759] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>> [   61.175368] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.181985] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.187682] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>> [   61.194185] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.199979] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.205769] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.212574] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.219478] mmc1:     4096 bytes transferred: 0
>> [   61.224050] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.231166] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>> [   61.236378] mmc1: starting CMD18 arg 00002002 flags 000000b5
>> [   61.241983] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>> [   61.248940] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.254169] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>> [   61.260779] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>> [   61.267405] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.273101] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>> [   61.279603] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.285431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.291125] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.298032] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.304937] mmc1:     1024 bytes transferred: 0
>> [   61.309409] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.316256] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.321768] mmc1: starting CMD18 arg 00002000 flags 000000b5
>> [   61.327459] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>> [   61.334360] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.339459] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>> [   61.346167] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.352782] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.358478] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>> [   61.365082] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.370776] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.376565] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.383371] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.390275] mmc1:     4096 bytes transferred: 0
>> [   61.394848] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.401885] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>> [   61.407193] mmc1: starting CMD18 arg 00002002 flags 000000b5
>> [   61.412794] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>> [   61.419748] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.424973] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>> [   61.431582] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>> [   61.438204] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.443901] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>> [   61.450404] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.456232] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.461927] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.468834] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.475739] mmc1:     1024 bytes transferred: 0
>> [   61.480210] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.487054] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.492462] mmc1: starting CMD18 arg 00002000 flags 000000b5
>> [   61.498251] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>> [   61.505160] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.510260] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>> [   61.516969] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.523592] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.529288] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>> [   61.535789] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.541583] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.547271] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.554177] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.561082] mmc1:     4096 bytes transferred: 0
>> [   61.565654] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.572440] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.578017] mmc1: starting CMD18 arg 00002008 flags 000000b5
>> [   61.583653] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>> [   61.590465] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.595686] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>> [   61.602396] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.608915] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.614712] mmci-pl18x 58005000.mmc: op 12 arg 00002008 flags 000000b5
>> [   61.621215] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.627034] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.632727] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.639633] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.646437] mmc1:     4096 bytes transferred: 0
>> [   61.651010] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.658908] <mmc1: starting CMD23 arg 00000108 flags 00000015>
>> [   61.663421] mmc1: starting CMD18 arg 00003be8 flags 000000b5
>> [   61.669102] mmc1:     blksz 512 blocks 264 flags 00000200 tsac 100 ms nsac 0
>> [   61.676155] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.681355] mmci-pl18x 58005000.mmc: blksz 0200 blks 0108 flags 00000200
>> [   61.688064] mmci-pl18x 58005000.mmc: op 17 arg 00000108 flags 00000015
>> [   61.694589] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.700284] mmci-pl18x 58005000.mmc: op 12 arg 00003be8 flags 000000b5
>> [   61.706887] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.712581] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.718372] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>> [   61.724060] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.730965] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.737870] mmc1:     135168 bytes transferred: 0
>> [   61.742544] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.749064] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.755336] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.760649] mmc1: starting CMD18 arg 00442000 flags 000000b5
>> [   61.766327] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>> [   61.773144] mmc1:     CMD12 arg 00000000 flags 00000095
>> [   61.778489] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>> [   61.785100] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.791616] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.797414] mmci-pl18x 58005000.mmc: op 12 arg 00442000 flags 000000b5
>> [   61.803916] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.809710] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.815399] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.822306] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>> [   61.829211] mmc1:     4096 bytes transferred: 0
>> [   61.833683] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.833952] mmc0: starting CMD7 arg fffd0000 flags 00000015
>> [   61.840205] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.845820] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
>> [   61.853036] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>> [   61.858074] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
>> [   61.858098] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
>> [   61.876506] mmc1: starting CMD25 arg 00002000 flags 000000b5
>> [   61.882106] mmc1:     blksz 512 blocks 8 flags 00000100 tsac 3000 ms nsac 0
>> [   61.889107] mmc1:     CMD12 arg 00000000 flags 0000049d
>> [   61.894346] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000100
>> [   61.900957] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>> [   61.907588] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.913283] mmci-pl18x 58005000.mmc: op 19 arg 00002000 flags 000000b5
>> [   61.919786] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.925612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>> [   61.931309] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>> [   61.938219] mmc1: req done (CMD25): 0: 00000900 00000000 00000000 00000000
>> [   61.945125] mmc1:     4096 bytes transferred: 0
>> [   61.949598] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>> [   61.956107] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.961888] mmc1: starting CMD13 arg 50480000 flags 00000195
>> [   61.967581] mmci-pl18x 58005000.mmc: op 0d arg 50480000 flags 00000195
>> [   61.974026] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   61.979718] mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
>> [   61.986630] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
>> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms nsac 0
>> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
>> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
>> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000 00000000
>> [   62.036708] mmc1:     512 bytes transferred: 0
>> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
>> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
>> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
>> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
>> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000 00000000
>> [   62.091081] mmc1:     512 bytes transferred: 0
>> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> 
> CMD48 was successful but it still became an error, which must have
> been because it indicated the flush had not completed.  We do not
> allow for mmc_poll_for_busy() being too quick i.e. card has not
> had a chance to indicate busy yet.
> 
> Please try this:
> 
> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
> index 72b664ed90cf..9c3123867a99 100644
> --- a/drivers/mmc/core/sd.c
> +++ b/drivers/mmc/core/sd.c
> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>   {
>   	struct mmc_card *card = host->card;
>   	u8 *reg_buf, fno, page;
> +	unsigned long timeout;
> +	bool expired;
>   	u16 offset;
>   	int err;
>   
> @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>   		goto out;
>   	}
>   
> +	timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
> +again:
>   	err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>   				MMC_BUSY_EXTR_SINGLE);
>   	if (err)
>   		goto out;
>   
> +	expired = time_after(jiffies, timeout);
> +
>   	/*
>   	 * Read the Flush Cache bit. The card shall reset it, to confirm that
>   	 * it's has completed the flushing of the cache.
> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>   		goto out;
>   	}
>   
> -	if (reg_buf[0] & BIT(0))
> -		err = -ETIMEDOUT;
> +	if (reg_buf[0] & BIT(0)) {

I am getting here, multiple times, with expired=0 .
After that, once, I get here with expired=1 and then the ext4 mount 
failure occurs.

> +		if (expired)
> +			err = -ETIMEDOUT;
> +		else
> +			goto again;
> +	}
>   out:
>   	kfree(reg_buf);
>   	return err;

Also, I made another observation -- if I reduce the SD bus clock 
frequency to 10 MHz (current is 50 MHz), then I don't observe this problem.
Marek Vasut May 31, 2023, 9:33 p.m. UTC | #5
On 5/31/23 15:08, Christian Loehle wrote:
> One thing that comes to my mind that would explain the behavior, since all commands itself complete fine is:
> Issuing sd cache flush:
> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms
> nsac 0
> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000
> 00000000
> [   62.036708] mmc1:     512 bytes transferred: 0
> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms
> nsac 0
> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
> 
> The kernel reads back the page to ensure it has completed.
> 
> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000
> 00000000
> [   62.091081] mmc1:     512 bytes transferred: 0
> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
> 
> 
> [   62.101381] I/O error, dev mmcblk1, sector 8192 op 0x1:(WRITE) flags
> 0x800 phys_seg 1 prio class 2
> 
> Fail here implies that it did indeed fail right after CMD48.
> 
> I would expect a card that does not hold down busy the entire cache flush time would produce the observed behavior.
> The CMD48 is issued ~100ms after the CMD49, so not nearly close to the timeout of 1s.
> 
> That is just one possibility of course, it would need a full trace to confirm.

What kind of trace ?

> For you easier probably something like:
> 
> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
> index 72b664ed90cf..c9d3cae0678d 100644
> --- a/drivers/mmc/core/sd.c
> +++ b/drivers/mmc/core/sd.c
> @@ -1354,8 +1354,11 @@ static int sd_flush_cache(struct mmc_host *host)
>                  goto out;
>          }
> 
> -       if (reg_buf[0] & BIT(0))
> +       if (reg_buf[0] & BIT(0)) {
> +               pr_warn("%s: Cache Flush bit still active after busy phase\n",
> +                       mmc_hostname(host));
>                  err = -ETIMEDOUT;
> +       }

Adrian provided a bit more complete patch, I tested that, see my reply 
there.
Adrian Hunter June 1, 2023, 6:20 a.m. UTC | #6
On 1/06/23 00:31, Marek Vasut wrote:
> On 5/31/23 15:13, Adrian Hunter wrote:
>> On 31/05/23 14:34, Marek Vasut wrote:
>>> On 5/31/23 07:46, Adrian Hunter wrote:
>>>> On 31/05/23 03:27, Marek Vasut wrote:
>>>>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>>>>
>>>>> On STM32MP153C DHCOR DRC Compact with microSD card
>>>>> Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
>>>>> it is no longer possible to mount ext4 filesystem.
>>>>>
>>>>> Git bisect points to this commit which is being reverted here. A bit of
>>>>> investigation shows that for non-working microSD cards
>>>>>     cache_enabled=TRUE fua_enabled=false
>>>>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>>>>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>>>>     cache_enabled=FALSE fua_enabled=false
>>>>>
>>>>> It seems enabling the cache for SD cards causes the issue. Before the
>>>>> reverted commit, the cache was not reported enabled for SD cards. I am
>>>>> not sure whether such blunt revert is the correct solution however, any
>>>>> ideas ?
>>>>
>>>> This does not disable the cache, it disables the block layer from issuing
>>>> cache flushes (by telling it there is no cache being used).
>>>>
>>>> It would be good to know what commands are failing, which should be visible
>>>> in the kernel messages if dynamic debug is enabled:
>>>>
>>>> Dynamic debug for mmc
>>>> ---------------------
>>>>
>>>>       Kernel must be configured:
>>>>
>>>>           CONFIG_DYNAMIC_DEBUG=y
>>>>
>>>>       To enable mmc debug via sysfs:
>>>>
>>>>           echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>           echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>
>>>>       To enable mmc debug via kernel command line:
>>>>
>>>>           dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>>>
>>>>       To disable mmc debug:
>>>>
>>>>           echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>           echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>
>>>>       More general information in kernel documentation in kernel tree:
>>>>
>>>>           Documentation/admin-guide/dynamic-debug-howto.rst
>>>
>>> Log is below, I added newlines around the mount failure.
>>> Do you see anything problematic there ?
>>>
>>> $ echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 8 > /proc/sys/kernel/printk ; mount /dev/mmcblk1p4 /mnt
>>> [   61.061029] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>> [   61.065637] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>> [   61.071144] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>> [   61.078096] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.083302] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>> [   61.090012] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>> [   61.096531] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.102332] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>> [   61.108836] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.114575] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.120371] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.127178] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.134084] mmc1:     1024 bytes transferred: 0
>>> [   61.138658] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.145533] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.150946] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>> [   61.156609] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>> [   61.163426] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.168759] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>> [   61.175368] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.181985] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.187682] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>> [   61.194185] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.199979] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.205769] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.212574] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.219478] mmc1:     4096 bytes transferred: 0
>>> [   61.224050] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.231166] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>> [   61.236378] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>> [   61.241983] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>> [   61.248940] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.254169] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>> [   61.260779] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>> [   61.267405] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.273101] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>> [   61.279603] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.285431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.291125] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.298032] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.304937] mmc1:     1024 bytes transferred: 0
>>> [   61.309409] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.316256] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.321768] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>> [   61.327459] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>> [   61.334360] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.339459] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>> [   61.346167] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.352782] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.358478] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>> [   61.365082] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.370776] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.376565] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.383371] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.390275] mmc1:     4096 bytes transferred: 0
>>> [   61.394848] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.401885] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>> [   61.407193] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>> [   61.412794] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>> [   61.419748] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.424973] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>> [   61.431582] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>> [   61.438204] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.443901] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>> [   61.450404] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.456232] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.461927] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.468834] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.475739] mmc1:     1024 bytes transferred: 0
>>> [   61.480210] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.487054] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.492462] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>> [   61.498251] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>> [   61.505160] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.510260] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>> [   61.516969] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.523592] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.529288] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>> [   61.535789] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.541583] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.547271] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.554177] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.561082] mmc1:     4096 bytes transferred: 0
>>> [   61.565654] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.572440] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.578017] mmc1: starting CMD18 arg 00002008 flags 000000b5
>>> [   61.583653] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>> [   61.590465] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.595686] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>> [   61.602396] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.608915] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.614712] mmci-pl18x 58005000.mmc: op 12 arg 00002008 flags 000000b5
>>> [   61.621215] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.627034] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.632727] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.639633] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.646437] mmc1:     4096 bytes transferred: 0
>>> [   61.651010] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.658908] <mmc1: starting CMD23 arg 00000108 flags 00000015>
>>> [   61.663421] mmc1: starting CMD18 arg 00003be8 flags 000000b5
>>> [   61.669102] mmc1:     blksz 512 blocks 264 flags 00000200 tsac 100 ms nsac 0
>>> [   61.676155] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.681355] mmci-pl18x 58005000.mmc: blksz 0200 blks 0108 flags 00000200
>>> [   61.688064] mmci-pl18x 58005000.mmc: op 17 arg 00000108 flags 00000015
>>> [   61.694589] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.700284] mmci-pl18x 58005000.mmc: op 12 arg 00003be8 flags 000000b5
>>> [   61.706887] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.712581] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.718372] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>> [   61.724060] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.730965] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.737870] mmc1:     135168 bytes transferred: 0
>>> [   61.742544] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.749064] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.755336] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.760649] mmc1: starting CMD18 arg 00442000 flags 000000b5
>>> [   61.766327] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>> [   61.773144] mmc1:     CMD12 arg 00000000 flags 00000095
>>> [   61.778489] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>> [   61.785100] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.791616] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.797414] mmci-pl18x 58005000.mmc: op 12 arg 00442000 flags 000000b5
>>> [   61.803916] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.809710] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.815399] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.822306] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>> [   61.829211] mmc1:     4096 bytes transferred: 0
>>> [   61.833683] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.833952] mmc0: starting CMD7 arg fffd0000 flags 00000015
>>> [   61.840205] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.845820] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
>>> [   61.853036] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>> [   61.858074] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
>>> [   61.858098] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
>>> [   61.876506] mmc1: starting CMD25 arg 00002000 flags 000000b5
>>> [   61.882106] mmc1:     blksz 512 blocks 8 flags 00000100 tsac 3000 ms nsac 0
>>> [   61.889107] mmc1:     CMD12 arg 00000000 flags 0000049d
>>> [   61.894346] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000100
>>> [   61.900957] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>> [   61.907588] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.913283] mmci-pl18x 58005000.mmc: op 19 arg 00002000 flags 000000b5
>>> [   61.919786] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.925612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>> [   61.931309] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>> [   61.938219] mmc1: req done (CMD25): 0: 00000900 00000000 00000000 00000000
>>> [   61.945125] mmc1:     4096 bytes transferred: 0
>>> [   61.949598] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>> [   61.956107] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.961888] mmc1: starting CMD13 arg 50480000 flags 00000195
>>> [   61.967581] mmci-pl18x 58005000.mmc: op 0d arg 50480000 flags 00000195
>>> [   61.974026] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   61.979718] mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
>>> [   61.986630] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
>>> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms nsac 0
>>> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
>>> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
>>> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000 00000000
>>> [   62.036708] mmc1:     512 bytes transferred: 0
>>> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
>>> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
>>> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
>>> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
>>> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000 00000000
>>> [   62.091081] mmc1:     512 bytes transferred: 0
>>> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>
>> CMD48 was successful but it still became an error, which must have
>> been because it indicated the flush had not completed.  We do not
>> allow for mmc_poll_for_busy() being too quick i.e. card has not
>> had a chance to indicate busy yet.
>>
>> Please try this:
>>
>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>> index 72b664ed90cf..9c3123867a99 100644
>> --- a/drivers/mmc/core/sd.c
>> +++ b/drivers/mmc/core/sd.c
>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>   {
>>       struct mmc_card *card = host->card;
>>       u8 *reg_buf, fno, page;
>> +    unsigned long timeout;
>> +    bool expired;
>>       u16 offset;
>>       int err;
>>   @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>           goto out;
>>       }
>>   +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>> +again:
>>       err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>                   MMC_BUSY_EXTR_SINGLE);
>>       if (err)
>>           goto out;
>>   +    expired = time_after(jiffies, timeout);
>> +
>>       /*
>>        * Read the Flush Cache bit. The card shall reset it, to confirm that
>>        * it's has completed the flushing of the cache.
>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>           goto out;
>>       }
>>   -    if (reg_buf[0] & BIT(0))
>> -        err = -ETIMEDOUT;
>> +    if (reg_buf[0] & BIT(0)) {
> 
> I am getting here, multiple times, with expired=0 .

So either the host controller's busy detection does not work, or the
card is not indicating busy by pulling down DAT0.

Can you try to figure out which it is?

> After that, once, I get here with expired=1 and then the ext4 mount failure occurs.
> 
>> +        if (expired)
>> +            err = -ETIMEDOUT;
>> +        else
>> +            goto again;
>> +    }
>>   out:
>>       kfree(reg_buf);
>>       return err;
> 
> Also, I made another observation -- if I reduce the SD bus clock frequency to 10 MHz (current is 50 MHz), then I don't observe this problem.

The lower frequency might just be making the commands take a bit longer
so the flush just completes before the timeout.

It can also be that the card has less time to work on flushing if it is
being hammered by commands.

I imagine you can hack some debugging prints as well as I can,
but here is an example (untested) to try to find out how long
the flush takes and decrease the rate at which commands are sent.

diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index 72b664ed90cf..7bf5386d6e72 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -15,6 +15,7 @@
 #include <linux/random.h>
 #include <linux/scatterlist.h>
 #include <linux/sysfs.h>
+#include <linux/ktime.h>
 
 #include <linux/mmc/host.h>
 #include <linux/mmc/card.h>
@@ -70,7 +71,7 @@ static const unsigned int sd_au_size[] = {
 	})
 
 #define SD_POWEROFF_NOTIFY_TIMEOUT_MS 1000
-#define SD_WRITE_EXTR_SINGLE_TIMEOUT_MS 1000
+#define SD_WRITE_EXTR_SINGLE_TIMEOUT_MS 10000
 
 struct sd_busy_data {
 	struct mmc_card *card;
@@ -1313,8 +1314,12 @@ static int sd_flush_cache(struct mmc_host *host)
 {
 	struct mmc_card *card = host->card;
 	u8 *reg_buf, fno, page;
+	unsigned long timeout;
+	ktime_t start_time;
+	bool expired;
 	u16 offset;
 	int err;
+	int cnt;
 
 	if (!sd_cache_enabled(host))
 		return 0;
@@ -1332,32 +1337,43 @@ static int sd_flush_cache(struct mmc_host *host)
 	offset = card->ext_perf.offset + 261;
 
 	err = sd_write_ext_reg(card, fno, page, offset, BIT(0));
-	if (err) {
-		pr_warn("%s: error %d writing Cache Flush bit\n",
-			mmc_hostname(host), err);
+	if (err)
 		goto out;
-	}
 
+	timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
+	cnt = 0;
+	start_time = ktime_get();
+again:
+	cnt += 1;
 	err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
 				MMC_BUSY_EXTR_SINGLE);
 	if (err)
 		goto out;
 
+	expired = time_after(jiffies, timeout);
+
 	/*
 	 * Read the Flush Cache bit. The card shall reset it, to confirm that
 	 * it's has completed the flushing of the cache.
 	 */
 	err = sd_read_ext_reg(card, fno, page, offset, 1, reg_buf);
-	if (err) {
-		pr_warn("%s: error %d reading Cache Flush bit\n",
-			mmc_hostname(host), err);
+	if (err)
 		goto out;
-	}
 
-	if (reg_buf[0] & BIT(0))
+	if (reg_buf[0] & BIT(0)) {
+		if (!expired) {
+			msleep(10);
+			goto again;
+		}
 		err = -ETIMEDOUT;
+	}
 out:
+	pr_info("%s: flushing cache took %ld ms, %d iterations, error %d\n",
+		mmc_hostname(host), (long)ktime_ms_delta(ktime_get(), start_time), cnt, err);
 	kfree(reg_buf);
+	if (err)
+		pr_err("%s: error %d flushing cache\n",
+		       mmc_hostname(host), err);
 	return err;
 }
Marek Vasut June 2, 2023, 9:46 p.m. UTC | #7
On 6/1/23 08:20, Adrian Hunter wrote:
> On 1/06/23 00:31, Marek Vasut wrote:
>> On 5/31/23 15:13, Adrian Hunter wrote:
>>> On 31/05/23 14:34, Marek Vasut wrote:
>>>> On 5/31/23 07:46, Adrian Hunter wrote:
>>>>> On 31/05/23 03:27, Marek Vasut wrote:
>>>>>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>>>>>
>>>>>> On STM32MP153C DHCOR DRC Compact with microSD card
>>>>>> Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
>>>>>> it is no longer possible to mount ext4 filesystem.
>>>>>>
>>>>>> Git bisect points to this commit which is being reverted here. A bit of
>>>>>> investigation shows that for non-working microSD cards
>>>>>>      cache_enabled=TRUE fua_enabled=false
>>>>>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>>>>>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>>>>>      cache_enabled=FALSE fua_enabled=false
>>>>>>
>>>>>> It seems enabling the cache for SD cards causes the issue. Before the
>>>>>> reverted commit, the cache was not reported enabled for SD cards. I am
>>>>>> not sure whether such blunt revert is the correct solution however, any
>>>>>> ideas ?
>>>>>
>>>>> This does not disable the cache, it disables the block layer from issuing
>>>>> cache flushes (by telling it there is no cache being used).
>>>>>
>>>>> It would be good to know what commands are failing, which should be visible
>>>>> in the kernel messages if dynamic debug is enabled:
>>>>>
>>>>> Dynamic debug for mmc
>>>>> ---------------------
>>>>>
>>>>>        Kernel must be configured:
>>>>>
>>>>>            CONFIG_DYNAMIC_DEBUG=y
>>>>>
>>>>>        To enable mmc debug via sysfs:
>>>>>
>>>>>            echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>>            echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>>
>>>>>        To enable mmc debug via kernel command line:
>>>>>
>>>>>            dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>>>>
>>>>>        To disable mmc debug:
>>>>>
>>>>>            echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>>            echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>>
>>>>>        More general information in kernel documentation in kernel tree:
>>>>>
>>>>>            Documentation/admin-guide/dynamic-debug-howto.rst
>>>>
>>>> Log is below, I added newlines around the mount failure.
>>>> Do you see anything problematic there ?
>>>>
>>>> $ echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 8 > /proc/sys/kernel/printk ; mount /dev/mmcblk1p4 /mnt
>>>> [   61.061029] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>> [   61.065637] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>> [   61.071144] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.078096] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.083302] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>> [   61.090012] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>> [   61.096531] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.102332] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>> [   61.108836] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.114575] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.120371] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.127178] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.134084] mmc1:     1024 bytes transferred: 0
>>>> [   61.138658] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.145533] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.150946] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>> [   61.156609] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.163426] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.168759] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>> [   61.175368] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.181985] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.187682] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>> [   61.194185] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.199979] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.205769] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.212574] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.219478] mmc1:     4096 bytes transferred: 0
>>>> [   61.224050] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.231166] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>> [   61.236378] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>> [   61.241983] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.248940] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.254169] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>> [   61.260779] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>> [   61.267405] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.273101] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>> [   61.279603] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.285431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.291125] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.298032] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.304937] mmc1:     1024 bytes transferred: 0
>>>> [   61.309409] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.316256] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.321768] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>> [   61.327459] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.334360] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.339459] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>> [   61.346167] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.352782] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.358478] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>> [   61.365082] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.370776] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.376565] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.383371] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.390275] mmc1:     4096 bytes transferred: 0
>>>> [   61.394848] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.401885] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>> [   61.407193] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>> [   61.412794] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.419748] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.424973] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>> [   61.431582] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>> [   61.438204] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.443901] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>> [   61.450404] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.456232] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.461927] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.468834] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.475739] mmc1:     1024 bytes transferred: 0
>>>> [   61.480210] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.487054] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.492462] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>> [   61.498251] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.505160] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.510260] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>> [   61.516969] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.523592] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.529288] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>> [   61.535789] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.541583] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.547271] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.554177] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.561082] mmc1:     4096 bytes transferred: 0
>>>> [   61.565654] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.572440] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.578017] mmc1: starting CMD18 arg 00002008 flags 000000b5
>>>> [   61.583653] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.590465] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.595686] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>> [   61.602396] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.608915] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.614712] mmci-pl18x 58005000.mmc: op 12 arg 00002008 flags 000000b5
>>>> [   61.621215] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.627034] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.632727] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.639633] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.646437] mmc1:     4096 bytes transferred: 0
>>>> [   61.651010] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.658908] <mmc1: starting CMD23 arg 00000108 flags 00000015>
>>>> [   61.663421] mmc1: starting CMD18 arg 00003be8 flags 000000b5
>>>> [   61.669102] mmc1:     blksz 512 blocks 264 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.676155] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.681355] mmci-pl18x 58005000.mmc: blksz 0200 blks 0108 flags 00000200
>>>> [   61.688064] mmci-pl18x 58005000.mmc: op 17 arg 00000108 flags 00000015
>>>> [   61.694589] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.700284] mmci-pl18x 58005000.mmc: op 12 arg 00003be8 flags 000000b5
>>>> [   61.706887] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.712581] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.718372] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>> [   61.724060] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.730965] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.737870] mmc1:     135168 bytes transferred: 0
>>>> [   61.742544] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.749064] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.755336] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.760649] mmc1: starting CMD18 arg 00442000 flags 000000b5
>>>> [   61.766327] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>> [   61.773144] mmc1:     CMD12 arg 00000000 flags 00000095
>>>> [   61.778489] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>> [   61.785100] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.791616] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.797414] mmci-pl18x 58005000.mmc: op 12 arg 00442000 flags 000000b5
>>>> [   61.803916] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.809710] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.815399] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.822306] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>> [   61.829211] mmc1:     4096 bytes transferred: 0
>>>> [   61.833683] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.833952] mmc0: starting CMD7 arg fffd0000 flags 00000015
>>>> [   61.840205] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.845820] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
>>>> [   61.853036] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>> [   61.858074] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.858098] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
>>>> [   61.876506] mmc1: starting CMD25 arg 00002000 flags 000000b5
>>>> [   61.882106] mmc1:     blksz 512 blocks 8 flags 00000100 tsac 3000 ms nsac 0
>>>> [   61.889107] mmc1:     CMD12 arg 00000000 flags 0000049d
>>>> [   61.894346] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000100
>>>> [   61.900957] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>> [   61.907588] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.913283] mmci-pl18x 58005000.mmc: op 19 arg 00002000 flags 000000b5
>>>> [   61.919786] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.925612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>> [   61.931309] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>> [   61.938219] mmc1: req done (CMD25): 0: 00000900 00000000 00000000 00000000
>>>> [   61.945125] mmc1:     4096 bytes transferred: 0
>>>> [   61.949598] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>> [   61.956107] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.961888] mmc1: starting CMD13 arg 50480000 flags 00000195
>>>> [   61.967581] mmci-pl18x 58005000.mmc: op 0d arg 50480000 flags 00000195
>>>> [   61.974026] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   61.979718] mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
>>>> [   61.986630] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
>>>> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms nsac 0
>>>> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
>>>> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
>>>> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000 00000000
>>>> [   62.036708] mmc1:     512 bytes transferred: 0
>>>> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
>>>> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
>>>> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
>>>> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
>>>> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000 00000000
>>>> [   62.091081] mmc1:     512 bytes transferred: 0
>>>> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>
>>> CMD48 was successful but it still became an error, which must have
>>> been because it indicated the flush had not completed.  We do not
>>> allow for mmc_poll_for_busy() being too quick i.e. card has not
>>> had a chance to indicate busy yet.
>>>
>>> Please try this:
>>>
>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>> index 72b664ed90cf..9c3123867a99 100644
>>> --- a/drivers/mmc/core/sd.c
>>> +++ b/drivers/mmc/core/sd.c
>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>    {
>>>        struct mmc_card *card = host->card;
>>>        u8 *reg_buf, fno, page;
>>> +    unsigned long timeout;
>>> +    bool expired;
>>>        u16 offset;
>>>        int err;
>>>    @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>            goto out;
>>>        }
>>>    +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>> +again:
>>>        err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>                    MMC_BUSY_EXTR_SINGLE);
>>>        if (err)
>>>            goto out;
>>>    +    expired = time_after(jiffies, timeout);
>>> +
>>>        /*
>>>         * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>         * it's has completed the flushing of the cache.
>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>            goto out;
>>>        }
>>>    -    if (reg_buf[0] & BIT(0))
>>> -        err = -ETIMEDOUT;
>>> +    if (reg_buf[0] & BIT(0)) {
>>
>> I am getting here, multiple times, with expired=0 .
> 
> So either the host controller's busy detection does not work, or the
> card is not indicating busy by pulling down DAT0.
> 
> Can you try to figure out which it is?

The byte 261 bit 0 is never cleared, I had this looping for an hour and 
the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 
both indicate the bit should be cleared by the card once cache flush is 
completed.

I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q 
uSDHC, laptop rtsx_pci_sdmmc , they all fail.

I tried to find another card which also has cache, I cannot find any 
other card, all the rest report no cache. The kingston card SSR (see the 
2ff in 6th field, the last f bit 2 is cache supported indication, SSR 
bit 330):

00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:

So either this card is weird, or the cards with cache are so rare that 
nobody noticed the problem yet.

>> After that, once, I get here with expired=1 and then the ext4 mount failure occurs.
>>
>>> +        if (expired)
>>> +            err = -ETIMEDOUT;
>>> +        else
>>> +            goto again;
>>> +    }
>>>    out:
>>>        kfree(reg_buf);
>>>        return err;
>>
>> Also, I made another observation -- if I reduce the SD bus clock frequency to 10 MHz (current is 50 MHz), then I don't observe this problem.
> 
> The lower frequency might just be making the commands take a bit longer
> so the flush just completes before the timeout.
> 
> It can also be that the card has less time to work on flushing if it is
> being hammered by commands.
> 
> I imagine you can hack some debugging prints as well as I can,
> but here is an example (untested) to try to find out how long
> the flush takes and decrease the rate at which commands are sent.

Discard this information about the 10 MHz, this must've been a fluke. I 
can reproduce the error even at 10 MHz now.

[...]
Adrian Hunter June 4, 2023, 4:30 p.m. UTC | #8
On 3/06/23 00:46, Marek Vasut wrote:
> On 6/1/23 08:20, Adrian Hunter wrote:
>> On 1/06/23 00:31, Marek Vasut wrote:
>>> On 5/31/23 15:13, Adrian Hunter wrote:
>>>> On 31/05/23 14:34, Marek Vasut wrote:
>>>>> On 5/31/23 07:46, Adrian Hunter wrote:
>>>>>> On 31/05/23 03:27, Marek Vasut wrote:
>>>>>>> This reverts commit 08ebf903af57cda6d773f3dd1671b64f73b432b8.
>>>>>>>
>>>>>>> On STM32MP153C DHCOR DRC Compact with microSD card
>>>>>>> Kingston Canvas Go! Plus MicroSDXC I 64 GiB U3/V30/A2 SDCG3/64GB
>>>>>>> it is no longer possible to mount ext4 filesystem.
>>>>>>>
>>>>>>> Git bisect points to this commit which is being reverted here. A bit of
>>>>>>> investigation shows that for non-working microSD cards
>>>>>>>      cache_enabled=TRUE fua_enabled=false
>>>>>>> For working microSD card (Kingston Canvas React Plus, MicroSDXC II
>>>>>>> 64 GiB U3/V90/A1 SDCR2/64GB)
>>>>>>>      cache_enabled=FALSE fua_enabled=false
>>>>>>>
>>>>>>> It seems enabling the cache for SD cards causes the issue. Before the
>>>>>>> reverted commit, the cache was not reported enabled for SD cards. I am
>>>>>>> not sure whether such blunt revert is the correct solution however, any
>>>>>>> ideas ?
>>>>>>
>>>>>> This does not disable the cache, it disables the block layer from issuing
>>>>>> cache flushes (by telling it there is no cache being used).
>>>>>>
>>>>>> It would be good to know what commands are failing, which should be visible
>>>>>> in the kernel messages if dynamic debug is enabled:
>>>>>>
>>>>>> Dynamic debug for mmc
>>>>>> ---------------------
>>>>>>
>>>>>>        Kernel must be configured:
>>>>>>
>>>>>>            CONFIG_DYNAMIC_DEBUG=y
>>>>>>
>>>>>>        To enable mmc debug via sysfs:
>>>>>>
>>>>>>            echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>>>            echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control
>>>>>>
>>>>>>        To enable mmc debug via kernel command line:
>>>>>>
>>>>>>            dyndbg="file drivers/mmc/core/* +p;file drivers/mmc/host/* +p"
>>>>>>
>>>>>>        To disable mmc debug:
>>>>>>
>>>>>>            echo 'file drivers/mmc/core/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>>>            echo 'file drivers/mmc/host/* -p' > /sys/kernel/debug/dynamic_debug/control
>>>>>>
>>>>>>        More general information in kernel documentation in kernel tree:
>>>>>>
>>>>>>            Documentation/admin-guide/dynamic-debug-howto.rst
>>>>>
>>>>> Log is below, I added newlines around the mount failure.
>>>>> Do you see anything problematic there ?
>>>>>
>>>>> $ echo 'file drivers/mmc/core/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 'file drivers/mmc/host/* +p' > /sys/kernel/debug/dynamic_debug/control ; echo 8 > /proc/sys/kernel/printk ; mount /dev/mmcblk1p4 /mnt
>>>>> [   61.061029] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>>> [   61.065637] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>>> [   61.071144] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.078096] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.083302] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>>> [   61.090012] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>>> [   61.096531] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.102332] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>>> [   61.108836] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.114575] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.120371] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.127178] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.134084] mmc1:     1024 bytes transferred: 0
>>>>> [   61.138658] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.145533] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.150946] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>>> [   61.156609] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.163426] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.168759] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>>> [   61.175368] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.181985] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.187682] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>>> [   61.194185] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.199979] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.205769] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.212574] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.219478] mmc1:     4096 bytes transferred: 0
>>>>> [   61.224050] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.231166] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>>> [   61.236378] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>>> [   61.241983] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.248940] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.254169] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>>> [   61.260779] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>>> [   61.267405] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.273101] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>>> [   61.279603] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.285431] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.291125] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.298032] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.304937] mmc1:     1024 bytes transferred: 0
>>>>> [   61.309409] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.316256] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.321768] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>>> [   61.327459] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.334360] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.339459] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>>> [   61.346167] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.352782] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.358478] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>>> [   61.365082] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.370776] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.376565] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.383371] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.390275] mmc1:     4096 bytes transferred: 0
>>>>> [   61.394848] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.401885] <mmc1: starting CMD23 arg 00000002 flags 00000015>
>>>>> [   61.407193] mmc1: starting CMD18 arg 00002002 flags 000000b5
>>>>> [   61.412794] mmc1:     blksz 512 blocks 2 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.419748] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.424973] mmci-pl18x 58005000.mmc: blksz 0200 blks 0002 flags 00000200
>>>>> [   61.431582] mmci-pl18x 58005000.mmc: op 17 arg 00000002 flags 00000015
>>>>> [   61.438204] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.443901] mmci-pl18x 58005000.mmc: op 12 arg 00002002 flags 000000b5
>>>>> [   61.450404] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.456232] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.461927] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.468834] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.475739] mmc1:     1024 bytes transferred: 0
>>>>> [   61.480210] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.487054] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.492462] mmc1: starting CMD18 arg 00002000 flags 000000b5
>>>>> [   61.498251] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.505160] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.510260] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>>> [   61.516969] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.523592] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.529288] mmci-pl18x 58005000.mmc: op 12 arg 00002000 flags 000000b5
>>>>> [   61.535789] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.541583] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.547271] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.554177] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.561082] mmc1:     4096 bytes transferred: 0
>>>>> [   61.565654] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.572440] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.578017] mmc1: starting CMD18 arg 00002008 flags 000000b5
>>>>> [   61.583653] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.590465] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.595686] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>>> [   61.602396] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.608915] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.614712] mmci-pl18x 58005000.mmc: op 12 arg 00002008 flags 000000b5
>>>>> [   61.621215] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.627034] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.632727] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.639633] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.646437] mmc1:     4096 bytes transferred: 0
>>>>> [   61.651010] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.658908] <mmc1: starting CMD23 arg 00000108 flags 00000015>
>>>>> [   61.663421] mmc1: starting CMD18 arg 00003be8 flags 000000b5
>>>>> [   61.669102] mmc1:     blksz 512 blocks 264 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.676155] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.681355] mmci-pl18x 58005000.mmc: blksz 0200 blks 0108 flags 00000200
>>>>> [   61.688064] mmci-pl18x 58005000.mmc: op 17 arg 00000108 flags 00000015
>>>>> [   61.694589] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.700284] mmci-pl18x 58005000.mmc: op 12 arg 00003be8 flags 000000b5
>>>>> [   61.706887] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.712581] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.718372] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>>> [   61.724060] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.730965] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.737870] mmc1:     135168 bytes transferred: 0
>>>>> [   61.742544] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.749064] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.755336] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.760649] mmc1: starting CMD18 arg 00442000 flags 000000b5
>>>>> [   61.766327] mmc1:     blksz 512 blocks 8 flags 00000200 tsac 100 ms nsac 0
>>>>> [   61.773144] mmc1:     CMD12 arg 00000000 flags 00000095
>>>>> [   61.778489] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000200
>>>>> [   61.785100] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.791616] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.797414] mmci-pl18x 58005000.mmc: op 12 arg 00442000 flags 000000b5
>>>>> [   61.803916] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.809710] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.815399] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.822306] mmc1: req done (CMD18): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.829211] mmc1:     4096 bytes transferred: 0
>>>>> [   61.833683] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.833952] mmc0: starting CMD7 arg fffd0000 flags 00000015
>>>>> [   61.840205] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.845820] mmci-pl18x 48004000.mmc: op 07 arg fffd0000 flags 00000015
>>>>> [   61.853036] <mmc1: starting CMD23 arg 00000008 flags 00000015>
>>>>> [   61.858074] mmci-pl18x 48004000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.858098] mmc0: req done (CMD7): 0: 00001ed7 00000000 00000000 00000000
>>>>> [   61.876506] mmc1: starting CMD25 arg 00002000 flags 000000b5
>>>>> [   61.882106] mmc1:     blksz 512 blocks 8 flags 00000100 tsac 3000 ms nsac 0
>>>>> [   61.889107] mmc1:     CMD12 arg 00000000 flags 0000049d
>>>>> [   61.894346] mmci-pl18x 58005000.mmc: blksz 0200 blks 0008 flags 00000100
>>>>> [   61.900957] mmci-pl18x 58005000.mmc: op 17 arg 00000008 flags 00000015
>>>>> [   61.907588] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.913283] mmci-pl18x 58005000.mmc: op 19 arg 00002000 flags 000000b5
>>>>> [   61.919786] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.925612] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000140
>>>>> [   61.931309] mmc1: req done <CMD23>: 0: 00000900 00000000 00000000 00000000
>>>>> [   61.938219] mmc1: req done (CMD25): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.945125] mmc1:     4096 bytes transferred: 0
>>>>> [   61.949598] mmc1:     (CMD12): 0: 00000000 00000000 00000000 00000000
>>>>> [   61.956107] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.961888] mmc1: starting CMD13 arg 50480000 flags 00000195
>>>>> [   61.967581] mmci-pl18x 58005000.mmc: op 0d arg 50480000 flags 00000195
>>>>> [   61.974026] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   61.979718] mmc1: req done (CMD13): 0: 00000900 00000000 00000000 00000000
>>>>> [   61.986630] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   61.992588] mmc1: starting CMD49 arg 10020a00 flags 00000035
>>>>> [   61.998152] mmc1:     blksz 512 blocks 1 flags 00000100 tsac 3000 ms nsac 0
>>>>> [   62.005158] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000100
>>>>> [   62.011778] mmci-pl18x 58005000.mmc: op 31 arg 10020a00 flags 00000035
>>>>> [   62.018317] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   62.024011] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>>> [   62.029801] mmc1: req done (CMD49): 0: 00000900 00000000 00000000 00000000
>>>>> [   62.036708] mmc1:     512 bytes transferred: 0
>>>>> [   62.041083] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>> [   62.046988] mmc1: starting CMD48 arg 10020a00 flags 000000b5
>>>>> [   62.052487] mmc1:     blksz 512 blocks 1 flags 00000200 tsac 100 ms nsac 0
>>>>> [   62.059443] mmci-pl18x 58005000.mmc: blksz 0200 blks 0001 flags 00000200
>>>>> [   62.066154] mmci-pl18x 58005000.mmc: op 30 arg 10020a00 flags 000000b5
>>>>> [   62.072688] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000040
>>>>> [   62.078382] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000100
>>>>> [   62.084173] mmc1: req done (CMD48): 0: 00000900 00000000 00000000 00000000
>>>>> [   62.091081] mmc1:     512 bytes transferred: 0
>>>>> [   62.095455] mmci-pl18x 58005000.mmc: irq0 (data+cmd) 00000000
>>>>
>>>> CMD48 was successful but it still became an error, which must have
>>>> been because it indicated the flush had not completed.  We do not
>>>> allow for mmc_poll_for_busy() being too quick i.e. card has not
>>>> had a chance to indicate busy yet.
>>>>
>>>> Please try this:
>>>>
>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>> index 72b664ed90cf..9c3123867a99 100644
>>>> --- a/drivers/mmc/core/sd.c
>>>> +++ b/drivers/mmc/core/sd.c
>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>    {
>>>>        struct mmc_card *card = host->card;
>>>>        u8 *reg_buf, fno, page;
>>>> +    unsigned long timeout;
>>>> +    bool expired;
>>>>        u16 offset;
>>>>        int err;
>>>>    @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>            goto out;
>>>>        }
>>>>    +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>> +again:
>>>>        err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>                    MMC_BUSY_EXTR_SINGLE);
>>>>        if (err)
>>>>            goto out;
>>>>    +    expired = time_after(jiffies, timeout);
>>>> +
>>>>        /*
>>>>         * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>         * it's has completed the flushing of the cache.
>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>            goto out;
>>>>        }
>>>>    -    if (reg_buf[0] & BIT(0))
>>>> -        err = -ETIMEDOUT;
>>>> +    if (reg_buf[0] & BIT(0)) {
>>>
>>> I am getting here, multiple times, with expired=0 .
>>
>> So either the host controller's busy detection does not work, or the
>> card is not indicating busy by pulling down DAT0.
>>
>> Can you try to figure out which it is?
> 
> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
> 
> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
> 
> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
> 
> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> 
> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.

The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card

	https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/

I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.

Currently I am thinking we could do a cache flush after enabling the cache, just
to see if it works.  If not, then disable the cache.

It would also be interesting to read back the extended register to see if the
enable bit has actually been set.

> 
>>> After that, once, I get here with expired=1 and then the ext4 mount failure occurs.
>>>
>>>> +        if (expired)
>>>> +            err = -ETIMEDOUT;
>>>> +        else
>>>> +            goto again;
>>>> +    }
>>>>    out:
>>>>        kfree(reg_buf);
>>>>        return err;
>>>
>>> Also, I made another observation -- if I reduce the SD bus clock frequency to 10 MHz (current is 50 MHz), then I don't observe this problem.
>>
>> The lower frequency might just be making the commands take a bit longer
>> so the flush just completes before the timeout.
>>
>> It can also be that the card has less time to work on flushing if it is
>> being hammered by commands.
>>
>> I imagine you can hack some debugging prints as well as I can,
>> but here is an example (untested) to try to find out how long
>> the flush takes and decrease the rate at which commands are sent.
> 
> Discard this information about the 10 MHz, this must've been a fluke. I can reproduce the error even at 10 MHz now.
> 
> [...]
Marek Vasut June 7, 2023, 8:43 p.m. UTC | #9
On 6/4/23 18:30, Adrian Hunter wrote:

[...]

>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>> --- a/drivers/mmc/core/sd.c
>>>>> +++ b/drivers/mmc/core/sd.c
>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>     {
>>>>>         struct mmc_card *card = host->card;
>>>>>         u8 *reg_buf, fno, page;
>>>>> +    unsigned long timeout;
>>>>> +    bool expired;
>>>>>         u16 offset;
>>>>>         int err;
>>>>>     @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>             goto out;
>>>>>         }
>>>>>     +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>> +again:
>>>>>         err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>                     MMC_BUSY_EXTR_SINGLE);
>>>>>         if (err)
>>>>>             goto out;
>>>>>     +    expired = time_after(jiffies, timeout);
>>>>> +
>>>>>         /*
>>>>>          * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>          * it's has completed the flushing of the cache.
>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>             goto out;
>>>>>         }
>>>>>     -    if (reg_buf[0] & BIT(0))
>>>>> -        err = -ETIMEDOUT;
>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>
>>>> I am getting here, multiple times, with expired=0 .
>>>
>>> So either the host controller's busy detection does not work, or the
>>> card is not indicating busy by pulling down DAT0.
>>>
>>> Can you try to figure out which it is?
>>
>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>
>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>
>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>
>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>
>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
> 
> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
> 
> 	https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/

I got that one now, tested it, the cache bit is being cleared correctly. 
I also tested a few more cards and dumped their SSR too:

Kingston Canvas Go! Plus:
80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
Flush never finishes

Sandisk Extreme PRO A2 64GiB:
80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
mmc0: flushing cache took 5 ms, 1 iterations, error 0

Goodram IRDM V30 A2 64GiB:
80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
mmc0: flushing cache took 5 ms, 1 iterations, error 0

Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, 
cache not enabled):
80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:

> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.

I'm really interested in what you would find with that one.

Maybe the card I have here is defective, although I would expect not 
just the cache functionality to fail in such a case.

Is there anyone at kingston we could ask about the cache specifics on 
that card ?

> Currently I am thinking we could do a cache flush after enabling the cache, just
> to see if it works.  If not, then disable the cache.
> 
> It would also be interesting to read back the extended register to see if the
> enable bit has actually been set.

Both offset 260 and 261 read back as 0x01 during the endless flush.
Adrian Hunter June 12, 2023, 4:59 a.m. UTC | #10
On 7/06/23 23:43, Marek Vasut wrote:
> On 6/4/23 18:30, Adrian Hunter wrote:
> 
> [...]
> 
>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>     {
>>>>>>         struct mmc_card *card = host->card;
>>>>>>         u8 *reg_buf, fno, page;
>>>>>> +    unsigned long timeout;
>>>>>> +    bool expired;
>>>>>>         u16 offset;
>>>>>>         int err;
>>>>>>     @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>             goto out;
>>>>>>         }
>>>>>>     +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>> +again:
>>>>>>         err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>                     MMC_BUSY_EXTR_SINGLE);
>>>>>>         if (err)
>>>>>>             goto out;
>>>>>>     +    expired = time_after(jiffies, timeout);
>>>>>> +
>>>>>>         /*
>>>>>>          * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>          * it's has completed the flushing of the cache.
>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>             goto out;
>>>>>>         }
>>>>>>     -    if (reg_buf[0] & BIT(0))
>>>>>> -        err = -ETIMEDOUT;
>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>
>>>>> I am getting here, multiple times, with expired=0 .
>>>>
>>>> So either the host controller's busy detection does not work, or the
>>>> card is not indicating busy by pulling down DAT0.
>>>>
>>>> Can you try to figure out which it is?
>>>
>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>
>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>
>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>
>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>
>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>
>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>
>>     https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
> 
> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
> 
> Kingston Canvas Go! Plus:
> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> Flush never finishes
> 
> Sandisk Extreme PRO A2 64GiB:
> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> 
> Goodram IRDM V30 A2 64GiB:
> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> 
> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> 
>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
> 
> I'm really interested in what you would find with that one.

It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023

$ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
/sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
/sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
/sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
/sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
/sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
/sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
/sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
/sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
/sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
/sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
/sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
/sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
/sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
/sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
/sys/class/mmc_host/mmc0/mmc0:5048/type:SD
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block

> 
> Maybe the card I have here is defective, although I would expect not just the cache functionality to fail in such a case.
> 
> Is there anyone at kingston we could ask about the cache specifics on that card ?
> 
>> Currently I am thinking we could do a cache flush after enabling the cache, just
>> to see if it works.  If not, then disable the cache.
>>
>> It would also be interesting to read back the extended register to see if the
>> enable bit has actually been set.
> 
> Both offset 260 and 261 read back as 0x01 during the endless flush.
Marek Vasut June 12, 2023, 8:59 a.m. UTC | #11
On 6/12/23 06:59, Adrian Hunter wrote:
> On 7/06/23 23:43, Marek Vasut wrote:
>> On 6/4/23 18:30, Adrian Hunter wrote:
>>
>> [...]
>>
>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>      {
>>>>>>>          struct mmc_card *card = host->card;
>>>>>>>          u8 *reg_buf, fno, page;
>>>>>>> +    unsigned long timeout;
>>>>>>> +    bool expired;
>>>>>>>          u16 offset;
>>>>>>>          int err;
>>>>>>>      @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>              goto out;
>>>>>>>          }
>>>>>>>      +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>> +again:
>>>>>>>          err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>                      MMC_BUSY_EXTR_SINGLE);
>>>>>>>          if (err)
>>>>>>>              goto out;
>>>>>>>      +    expired = time_after(jiffies, timeout);
>>>>>>> +
>>>>>>>          /*
>>>>>>>           * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>           * it's has completed the flushing of the cache.
>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>              goto out;
>>>>>>>          }
>>>>>>>      -    if (reg_buf[0] & BIT(0))
>>>>>>> -        err = -ETIMEDOUT;
>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>
>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>
>>>>> So either the host controller's busy detection does not work, or the
>>>>> card is not indicating busy by pulling down DAT0.
>>>>>
>>>>> Can you try to figure out which it is?
>>>>
>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>
>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>
>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>
>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>
>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>
>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>
>>>      https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>
>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>
>> Kingston Canvas Go! Plus:
>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>> Flush never finishes
>>
>> Sandisk Extreme PRO A2 64GiB:
>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>
>> Goodram IRDM V30 A2 64GiB:
>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>
>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>
>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>
>> I'm really interested in what you would find with that one.
> 
> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
> 
> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block

This one I have here is certainly older (this time tested on STM32MP135F):

$ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
/sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
/sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
/sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
/sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
/sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
/sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
/sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
/sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
/sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
/sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
/sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
/sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
/sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
/sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
/sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
/sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/class/mmc_host/mmc0/mmc0:5048/type:SD
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
/sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block

cid, csr, date, ocr, serial differ.

Which SD host controller did you use ?
Adrian Hunter June 12, 2023, 9:29 a.m. UTC | #12
On 12/06/23 11:59, Marek Vasut wrote:
> On 6/12/23 06:59, Adrian Hunter wrote:
>> On 7/06/23 23:43, Marek Vasut wrote:
>>> On 6/4/23 18:30, Adrian Hunter wrote:
>>>
>>> [...]
>>>
>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>      {
>>>>>>>>          struct mmc_card *card = host->card;
>>>>>>>>          u8 *reg_buf, fno, page;
>>>>>>>> +    unsigned long timeout;
>>>>>>>> +    bool expired;
>>>>>>>>          u16 offset;
>>>>>>>>          int err;
>>>>>>>>      @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>              goto out;
>>>>>>>>          }
>>>>>>>>      +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>>> +again:
>>>>>>>>          err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>>                      MMC_BUSY_EXTR_SINGLE);
>>>>>>>>          if (err)
>>>>>>>>              goto out;
>>>>>>>>      +    expired = time_after(jiffies, timeout);
>>>>>>>> +
>>>>>>>>          /*
>>>>>>>>           * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>>           * it's has completed the flushing of the cache.
>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>              goto out;
>>>>>>>>          }
>>>>>>>>      -    if (reg_buf[0] & BIT(0))
>>>>>>>> -        err = -ETIMEDOUT;
>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>>
>>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>>
>>>>>> So either the host controller's busy detection does not work, or the
>>>>>> card is not indicating busy by pulling down DAT0.
>>>>>>
>>>>>> Can you try to figure out which it is?
>>>>>
>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>>
>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>>
>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>>
>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>
>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>>
>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>>
>>>>      https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>>
>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>>
>>> Kingston Canvas Go! Plus:
>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>> Flush never finishes
>>>
>>> Sandisk Extreme PRO A2 64GiB:
>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>
>>> Goodram IRDM V30 A2 64GiB:
>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>
>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>
>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>>
>>> I'm really interested in what you would find with that one.
>>
>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
>>
>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> 
> This one I have here is certainly older (this time tested on STM32MP135F):
> 
> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> 
> cid, csr, date, ocr, serial differ.
> 
> Which SD host controller did you use ?
Ulf Hansson June 15, 2023, 3:14 p.m. UTC | #13
On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
>
> On 6/12/23 06:59, Adrian Hunter wrote:
> > On 7/06/23 23:43, Marek Vasut wrote:
> >> On 6/4/23 18:30, Adrian Hunter wrote:
> >>
> >> [...]
> >>
> >>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
> >>>>>>> index 72b664ed90cf..9c3123867a99 100644
> >>>>>>> --- a/drivers/mmc/core/sd.c
> >>>>>>> +++ b/drivers/mmc/core/sd.c
> >>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>      {
> >>>>>>>          struct mmc_card *card = host->card;
> >>>>>>>          u8 *reg_buf, fno, page;
> >>>>>>> +    unsigned long timeout;
> >>>>>>> +    bool expired;
> >>>>>>>          u16 offset;
> >>>>>>>          int err;
> >>>>>>>      @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>              goto out;
> >>>>>>>          }
> >>>>>>>      +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
> >>>>>>> +again:
> >>>>>>>          err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
> >>>>>>>                      MMC_BUSY_EXTR_SINGLE);
> >>>>>>>          if (err)
> >>>>>>>              goto out;
> >>>>>>>      +    expired = time_after(jiffies, timeout);
> >>>>>>> +
> >>>>>>>          /*
> >>>>>>>           * Read the Flush Cache bit. The card shall reset it, to confirm that
> >>>>>>>           * it's has completed the flushing of the cache.
> >>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>              goto out;
> >>>>>>>          }
> >>>>>>>      -    if (reg_buf[0] & BIT(0))
> >>>>>>> -        err = -ETIMEDOUT;
> >>>>>>> +    if (reg_buf[0] & BIT(0)) {
> >>>>>>
> >>>>>> I am getting here, multiple times, with expired=0 .
> >>>>>
> >>>>> So either the host controller's busy detection does not work, or the
> >>>>> card is not indicating busy by pulling down DAT0.
> >>>>>
> >>>>> Can you try to figure out which it is?
> >>>>
> >>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
> >>>>
> >>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
> >>>>
> >>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
> >>>>
> >>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>
> >>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
> >>>
> >>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
> >>>
> >>>      https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
> >>
> >> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
> >>
> >> Kingston Canvas Go! Plus:
> >> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >> Flush never finishes
> >>
> >> Sandisk Extreme PRO A2 64GiB:
> >> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>
> >> Goodram IRDM V30 A2 64GiB:
> >> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>
> >> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
> >> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>
> >>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
> >>
> >> I'm really interested in what you would find with that one.
> >
> > It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
> >
> > $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> > grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
> > /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
> > /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
> > /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
> > grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
> > /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> > /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> > /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> > /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> > /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> > /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> > /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
> > /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> > grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
> > /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> > /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> > /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> > /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
> > /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> > grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
> > /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> > /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> > /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> > /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> > /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>
> This one I have here is certainly older (this time tested on STM32MP135F):
>
> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>
> cid, csr, date, ocr, serial differ.

I have been trying to follow the progress around this matter. If I
understand correctly we are leaning towards making a card quirk for
this particular SD, to avoid us from turning on and using a broken
cache feature.

Or what are you thinking?

Kind regards
Uffe
Marek Vasut June 15, 2023, 3:32 p.m. UTC | #14
On 6/15/23 17:14, Ulf Hansson wrote:
> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
>>
>> On 6/12/23 06:59, Adrian Hunter wrote:
>>> On 7/06/23 23:43, Marek Vasut wrote:
>>>> On 6/4/23 18:30, Adrian Hunter wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>       {
>>>>>>>>>           struct mmc_card *card = host->card;
>>>>>>>>>           u8 *reg_buf, fno, page;
>>>>>>>>> +    unsigned long timeout;
>>>>>>>>> +    bool expired;
>>>>>>>>>           u16 offset;
>>>>>>>>>           int err;
>>>>>>>>>       @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>               goto out;
>>>>>>>>>           }
>>>>>>>>>       +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>>>> +again:
>>>>>>>>>           err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>>>                       MMC_BUSY_EXTR_SINGLE);
>>>>>>>>>           if (err)
>>>>>>>>>               goto out;
>>>>>>>>>       +    expired = time_after(jiffies, timeout);
>>>>>>>>> +
>>>>>>>>>           /*
>>>>>>>>>            * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>>>            * it's has completed the flushing of the cache.
>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>               goto out;
>>>>>>>>>           }
>>>>>>>>>       -    if (reg_buf[0] & BIT(0))
>>>>>>>>> -        err = -ETIMEDOUT;
>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>>>
>>>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>>>
>>>>>>> So either the host controller's busy detection does not work, or the
>>>>>>> card is not indicating busy by pulling down DAT0.
>>>>>>>
>>>>>>> Can you try to figure out which it is?
>>>>>>
>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>>>
>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>>>
>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>>>
>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>
>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>>>
>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>>>
>>>>>       https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>>>
>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>>>
>>>> Kingston Canvas Go! Plus:
>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> Flush never finishes
>>>>
>>>> Sandisk Extreme PRO A2 64GiB:
>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>
>>>> Goodram IRDM V30 A2 64GiB:
>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>
>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>
>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>>>
>>>> I'm really interested in what you would find with that one.
>>>
>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
>>>
>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>
>> This one I have here is certainly older (this time tested on STM32MP135F):
>>
>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>
>> cid, csr, date, ocr, serial differ.
> 
> I have been trying to follow the progress around this matter. If I
> understand correctly we are leaning towards making a card quirk for
> this particular SD, to avoid us from turning on and using a broken
> cache feature.
> 
> Or what are you thinking?

Today I received another of those Canvas Go Plus cards, I'll test it and 
see whether that one is newer and behaves right. I'll keep you posted.

Maybe the old card is indeed buggy and they fixed it in newer revision ?
Adrian Hunter June 15, 2023, 3:35 p.m. UTC | #15
On 15/06/23 18:14, Ulf Hansson wrote:
> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
>>
>> On 6/12/23 06:59, Adrian Hunter wrote:
>>> On 7/06/23 23:43, Marek Vasut wrote:
>>>> On 6/4/23 18:30, Adrian Hunter wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>      {
>>>>>>>>>          struct mmc_card *card = host->card;
>>>>>>>>>          u8 *reg_buf, fno, page;
>>>>>>>>> +    unsigned long timeout;
>>>>>>>>> +    bool expired;
>>>>>>>>>          u16 offset;
>>>>>>>>>          int err;
>>>>>>>>>      @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>              goto out;
>>>>>>>>>          }
>>>>>>>>>      +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>>>> +again:
>>>>>>>>>          err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>>>                      MMC_BUSY_EXTR_SINGLE);
>>>>>>>>>          if (err)
>>>>>>>>>              goto out;
>>>>>>>>>      +    expired = time_after(jiffies, timeout);
>>>>>>>>> +
>>>>>>>>>          /*
>>>>>>>>>           * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>>>           * it's has completed the flushing of the cache.
>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>              goto out;
>>>>>>>>>          }
>>>>>>>>>      -    if (reg_buf[0] & BIT(0))
>>>>>>>>> -        err = -ETIMEDOUT;
>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>>>
>>>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>>>
>>>>>>> So either the host controller's busy detection does not work, or the
>>>>>>> card is not indicating busy by pulling down DAT0.
>>>>>>>
>>>>>>> Can you try to figure out which it is?
>>>>>>
>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>>>
>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>>>
>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>>>
>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>
>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>>>
>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>>>
>>>>>      https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>>>
>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>>>
>>>> Kingston Canvas Go! Plus:
>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> Flush never finishes
>>>>
>>>> Sandisk Extreme PRO A2 64GiB:
>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>
>>>> Goodram IRDM V30 A2 64GiB:
>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>
>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>
>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>>>
>>>> I'm really interested in what you would find with that one.
>>>
>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
>>>
>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>
>> This one I have here is certainly older (this time tested on STM32MP135F):
>>
>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>
>> cid, csr, date, ocr, serial differ.
> 
> I have been trying to follow the progress around this matter. If I
> understand correctly we are leaning towards making a card quirk for
> this particular SD, to avoid us from turning on and using a broken
> cache feature.
> 
> Or what are you thinking?

That is probably the simplest option.
Marek Vasut June 15, 2023, 3:36 p.m. UTC | #16
On 6/15/23 17:35, Adrian Hunter wrote:
> On 15/06/23 18:14, Ulf Hansson wrote:
>> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
>>>
>>> On 6/12/23 06:59, Adrian Hunter wrote:
>>>> On 7/06/23 23:43, Marek Vasut wrote:
>>>>> On 6/4/23 18:30, Adrian Hunter wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>       {
>>>>>>>>>>           struct mmc_card *card = host->card;
>>>>>>>>>>           u8 *reg_buf, fno, page;
>>>>>>>>>> +    unsigned long timeout;
>>>>>>>>>> +    bool expired;
>>>>>>>>>>           u16 offset;
>>>>>>>>>>           int err;
>>>>>>>>>>       @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>               goto out;
>>>>>>>>>>           }
>>>>>>>>>>       +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>>>>> +again:
>>>>>>>>>>           err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>>>>                       MMC_BUSY_EXTR_SINGLE);
>>>>>>>>>>           if (err)
>>>>>>>>>>               goto out;
>>>>>>>>>>       +    expired = time_after(jiffies, timeout);
>>>>>>>>>> +
>>>>>>>>>>           /*
>>>>>>>>>>            * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>>>>            * it's has completed the flushing of the cache.
>>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>               goto out;
>>>>>>>>>>           }
>>>>>>>>>>       -    if (reg_buf[0] & BIT(0))
>>>>>>>>>> -        err = -ETIMEDOUT;
>>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>>>>
>>>>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>>>>
>>>>>>>> So either the host controller's busy detection does not work, or the
>>>>>>>> card is not indicating busy by pulling down DAT0.
>>>>>>>>
>>>>>>>> Can you try to figure out which it is?
>>>>>>>
>>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>>>>
>>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>>>>
>>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>>>>
>>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>>
>>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>>>>
>>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>>>>
>>>>>>       https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>>>>
>>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>>>>
>>>>> Kingston Canvas Go! Plus:
>>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>> Flush never finishes
>>>>>
>>>>> Sandisk Extreme PRO A2 64GiB:
>>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>>
>>>>> Goodram IRDM V30 A2 64GiB:
>>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>>
>>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>
>>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>>>>
>>>>> I'm really interested in what you would find with that one.
>>>>
>>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
>>>>
>>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>>
>>> This one I have here is certainly older (this time tested on STM32MP135F):
>>>
>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>>
>>> cid, csr, date, ocr, serial differ.
>>
>> I have been trying to follow the progress around this matter. If I
>> understand correctly we are leaning towards making a card quirk for
>> this particular SD, to avoid us from turning on and using a broken
>> cache feature.
>>
>> Or what are you thinking?
> 
> That is probably the simplest option.

Just give me a day or two to test the other newer card.

What would you base that quirk off of ? Date ? We don't know when the 
"fixed" cards started to be produced .
Ulf Hansson June 16, 2023, 10:02 a.m. UTC | #17
On Thu, 15 Jun 2023 at 17:37, Marek Vasut <marex@denx.de> wrote:
>
> On 6/15/23 17:35, Adrian Hunter wrote:
> > On 15/06/23 18:14, Ulf Hansson wrote:
> >> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
> >>>
> >>> On 6/12/23 06:59, Adrian Hunter wrote:
> >>>> On 7/06/23 23:43, Marek Vasut wrote:
> >>>>> On 6/4/23 18:30, Adrian Hunter wrote:
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
> >>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
> >>>>>>>>>> --- a/drivers/mmc/core/sd.c
> >>>>>>>>>> +++ b/drivers/mmc/core/sd.c
> >>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>       {
> >>>>>>>>>>           struct mmc_card *card = host->card;
> >>>>>>>>>>           u8 *reg_buf, fno, page;
> >>>>>>>>>> +    unsigned long timeout;
> >>>>>>>>>> +    bool expired;
> >>>>>>>>>>           u16 offset;
> >>>>>>>>>>           int err;
> >>>>>>>>>>       @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>               goto out;
> >>>>>>>>>>           }
> >>>>>>>>>>       +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
> >>>>>>>>>> +again:
> >>>>>>>>>>           err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
> >>>>>>>>>>                       MMC_BUSY_EXTR_SINGLE);
> >>>>>>>>>>           if (err)
> >>>>>>>>>>               goto out;
> >>>>>>>>>>       +    expired = time_after(jiffies, timeout);
> >>>>>>>>>> +
> >>>>>>>>>>           /*
> >>>>>>>>>>            * Read the Flush Cache bit. The card shall reset it, to confirm that
> >>>>>>>>>>            * it's has completed the flushing of the cache.
> >>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>               goto out;
> >>>>>>>>>>           }
> >>>>>>>>>>       -    if (reg_buf[0] & BIT(0))
> >>>>>>>>>> -        err = -ETIMEDOUT;
> >>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
> >>>>>>>>>
> >>>>>>>>> I am getting here, multiple times, with expired=0 .
> >>>>>>>>
> >>>>>>>> So either the host controller's busy detection does not work, or the
> >>>>>>>> card is not indicating busy by pulling down DAT0.
> >>>>>>>>
> >>>>>>>> Can you try to figure out which it is?
> >>>>>>>
> >>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
> >>>>>>>
> >>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
> >>>>>>>
> >>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
> >>>>>>>
> >>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>>
> >>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
> >>>>>>
> >>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
> >>>>>>
> >>>>>>       https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
> >>>>>
> >>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
> >>>>>
> >>>>> Kingston Canvas Go! Plus:
> >>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>> Flush never finishes
> >>>>>
> >>>>> Sandisk Extreme PRO A2 64GiB:
> >>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>>>>
> >>>>> Goodram IRDM V30 A2 64GiB:
> >>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>>>>
> >>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
> >>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>
> >>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
> >>>>>
> >>>>> I'm really interested in what you would find with that one.
> >>>>
> >>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
> >>>>
> >>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> >>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
> >>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> >>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> >>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> >>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> >>>
> >>> This one I have here is certainly older (this time tested on STM32MP135F):
> >>>
> >>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> >>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> >>>
> >>> cid, csr, date, ocr, serial differ.
> >>
> >> I have been trying to follow the progress around this matter. If I
> >> understand correctly we are leaning towards making a card quirk for
> >> this particular SD, to avoid us from turning on and using a broken
> >> cache feature.
> >>
> >> Or what are you thinking?
> >
> > That is probably the simplest option.
>
> Just give me a day or two to test the other newer card.
>
> What would you base that quirk off of ? Date ? We don't know when the
> "fixed" cards started to be produced .

Right. It seems like the best we can do is to make a quirk for that
particular version of card that you proved to have failed.

Kind regards
Uffe
Marek Vasut June 16, 2023, 10:20 a.m. UTC | #18
On 6/15/23 17:14, Ulf Hansson wrote:

I finally got around testing the other card, there are changes in CSD 
and SSR which indicate there was a HW change, but nothing significant. 
Considering we still have a non-working card sample size 1, I wonder 
whether I should just consider this card defective and archive it. Or 
would you prefer a quirk based on manufacturing date ? I am also CCing 
Xander, maybe they can check internally at Kingston .

Old 2019 card:
--------------
mmc1: host does not support reading read-only switch, assuming write-enable
mmc1: new ultra high speed SDR104 SDXC card at address 5048
mmcblk1: mmc1:5048 SD64G 58.0 GiB
  mmcblk1: p1

~ # grep -H . /sys/class/mmc_host/mmc1/mmc1\:5048/*
/sys/class/mmc_host/mmc1/mmc1:5048/cid:9f544953443634476136980065013b00
/sys/class/mmc_host/mmc1/mmc1:5048/csd:400e00325b590001cfff7f800a400000
/sys/class/mmc_host/mmc1/mmc1:5048/date:11/2019
/sys/class/mmc_host/mmc1/mmc1:5048/dsr:0x404
/sys/class/mmc_host/mmc1/mmc1:5048/erase_size:512
/sys/class/mmc_host/mmc1/mmc1:5048/fwrev:0x1
/sys/class/mmc_host/mmc1/mmc1:5048/hwrev:0x6
/sys/class/mmc_host/mmc1/mmc1:5048/manfid:0x00009f
/sys/class/mmc_host/mmc1/mmc1:5048/name:SD64G
/sys/class/mmc_host/mmc1/mmc1:5048/ocr:0x00300000
/sys/class/mmc_host/mmc1/mmc1:5048/oemid:0x5449
/sys/class/mmc_host/mmc1/mmc1:5048/preferred_erase_size:4194304
/sys/class/mmc_host/mmc1/mmc1:5048/rca:0x5048
/sys/class/mmc_host/mmc1/mmc1:5048/scr:0205848701006432
/sys/class/mmc_host/mmc1/mmc1:5048/serial:0x36980065
/sys/class/mmc_host/mmc1/mmc1:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/class/mmc_host/mmc1/mmc1:5048/type:SD
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_NAME=SD64G
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MODALIAS=mmc:block

New 2023 card:
--------------
mmc1: host does not support reading read-only switch, assuming write-enable
mmc1: new ultra high speed SDR104 SDXC card at address 5048
mmcblk1: mmc1:5048 SD64G 58.0 GiB
  mmcblk1: p1

~ # grep -H . /sys/class/mmc_host/mmc1/mmc1\:5048/*
/sys/class/mmc_host/mmc1/mmc1:5048/cid:9f54495344363447614890060a017500
/sys/class/mmc_host/mmc1/mmc1:5048/csd:400e00325b590001cf9f7f800a400000
/sys/class/mmc_host/mmc1/mmc1:5048/date:05/2023
/sys/class/mmc_host/mmc1/mmc1:5048/dsr:0x404
/sys/class/mmc_host/mmc1/mmc1:5048/erase_size:512
/sys/class/mmc_host/mmc1/mmc1:5048/fwrev:0x1
/sys/class/mmc_host/mmc1/mmc1:5048/hwrev:0x6
/sys/class/mmc_host/mmc1/mmc1:5048/manfid:0x00009f
/sys/class/mmc_host/mmc1/mmc1:5048/name:SD64G
/sys/class/mmc_host/mmc1/mmc1:5048/ocr:0x00300000
/sys/class/mmc_host/mmc1/mmc1:5048/oemid:0x5449
/sys/class/mmc_host/mmc1/mmc1:5048/preferred_erase_size:4194304
/sys/class/mmc_host/mmc1/mmc1:5048/rca:0x5048
/sys/class/mmc_host/mmc1/mmc1:5048/scr:0205848701006432
/sys/class/mmc_host/mmc1/mmc1:5048/serial:0x4890060a
/sys/class/mmc_host/mmc1/mmc1:5048/ssr:000000000800000004009000010f391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
/sys/class/mmc_host/mmc1/mmc1:5048/type:SD
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:DRIVER=mmcblk
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_TYPE=SD
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_NAME=SD64G
/sys/class/mmc_host/mmc1/mmc1:5048/uevent:MODALIAS=mmc:block

diff (-old 2019, +new 2023):
----------------------------
-/sys/class/mmc_host/mmc1/mmc1:5048/cid:9f544953443634476136980065013b00
-/sys/class/mmc_host/mmc1/mmc1:5048/csd:400e00325b590001cfff7f800a400000
-/sys/class/mmc_host/mmc1/mmc1:5048/date:11/2019
+/sys/class/mmc_host/mmc1/mmc1:5048/cid:9f54495344363447614890060a017500
+/sys/class/mmc_host/mmc1/mmc1:5048/csd:400e00325b590001cf9f7f800a400000
+/sys/class/mmc_host/mmc1/mmc1:5048/date:05/2023
  /sys/class/mmc_host/mmc1/mmc1:5048/dsr:0x404
  /sys/class/mmc_host/mmc1/mmc1:5048/erase_size:512
  /sys/class/mmc_host/mmc1/mmc1:5048/fwrev:0x1
@@ -18,10 +18,15 @@
  /sys/class/mmc_host/mmc1/mmc1:5048/preferred_erase_size:4194304
  /sys/class/mmc_host/mmc1/mmc1:5048/rca:0x5048
  /sys/class/mmc_host/mmc1/mmc1:5048/scr:0205848701006432
-/sys/class/mmc_host/mmc1/mmc1:5048/serial:0x36980065
-/sys/class/mmc_host/mmc1/mmc1:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
+/sys/class/mmc_host/mmc1/mmc1:5048/serial:0x4890060a
+/sys/class/mmc_host/mmc1/mmc1:5048/ssr:000000000800000004009000010f391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
  /sys/class/mmc_host/mmc1/mmc1:5048/type:SD
  /sys/class/mmc_host/mmc1/mmc1:5048/uevent:DRIVER=mmcblk
  /sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_TYPE=SD
  /sys/class/mmc_host/mmc1/mmc1:5048/uevent:MMC_NAME=SD64G
  /sys/class/mmc_host/mmc1/mmc1:5048/uevent:MODALIAS=mmc:block

Changes:
--------
CID: PSN (serial number), MDT (manufacturing date)
CSD: C_SIZE: 0x1cfff -> 0x1cf9f
SSR: ERASE_TIMEOUT: 6 -> 3
Marek Vasut June 16, 2023, 10:24 a.m. UTC | #19
On 6/16/23 12:02, Ulf Hansson wrote:
> On Thu, 15 Jun 2023 at 17:37, Marek Vasut <marex@denx.de> wrote:
>>
>> On 6/15/23 17:35, Adrian Hunter wrote:
>>> On 15/06/23 18:14, Ulf Hansson wrote:
>>>> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
>>>>>
>>>>> On 6/12/23 06:59, Adrian Hunter wrote:
>>>>>> On 7/06/23 23:43, Marek Vasut wrote:
>>>>>>> On 6/4/23 18:30, Adrian Hunter wrote:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
>>>>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
>>>>>>>>>>>> --- a/drivers/mmc/core/sd.c
>>>>>>>>>>>> +++ b/drivers/mmc/core/sd.c
>>>>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>>>        {
>>>>>>>>>>>>            struct mmc_card *card = host->card;
>>>>>>>>>>>>            u8 *reg_buf, fno, page;
>>>>>>>>>>>> +    unsigned long timeout;
>>>>>>>>>>>> +    bool expired;
>>>>>>>>>>>>            u16 offset;
>>>>>>>>>>>>            int err;
>>>>>>>>>>>>        @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>>>                goto out;
>>>>>>>>>>>>            }
>>>>>>>>>>>>        +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
>>>>>>>>>>>> +again:
>>>>>>>>>>>>            err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
>>>>>>>>>>>>                        MMC_BUSY_EXTR_SINGLE);
>>>>>>>>>>>>            if (err)
>>>>>>>>>>>>                goto out;
>>>>>>>>>>>>        +    expired = time_after(jiffies, timeout);
>>>>>>>>>>>> +
>>>>>>>>>>>>            /*
>>>>>>>>>>>>             * Read the Flush Cache bit. The card shall reset it, to confirm that
>>>>>>>>>>>>             * it's has completed the flushing of the cache.
>>>>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
>>>>>>>>>>>>                goto out;
>>>>>>>>>>>>            }
>>>>>>>>>>>>        -    if (reg_buf[0] & BIT(0))
>>>>>>>>>>>> -        err = -ETIMEDOUT;
>>>>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
>>>>>>>>>>>
>>>>>>>>>>> I am getting here, multiple times, with expired=0 .
>>>>>>>>>>
>>>>>>>>>> So either the host controller's busy detection does not work, or the
>>>>>>>>>> card is not indicating busy by pulling down DAT0.
>>>>>>>>>>
>>>>>>>>>> Can you try to figure out which it is?
>>>>>>>>>
>>>>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
>>>>>>>>>
>>>>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
>>>>>>>>>
>>>>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
>>>>>>>>>
>>>>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>>>>
>>>>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
>>>>>>>>
>>>>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
>>>>>>>>
>>>>>>>>        https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
>>>>>>>
>>>>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
>>>>>>>
>>>>>>> Kingston Canvas Go! Plus:
>>>>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>> Flush never finishes
>>>>>>>
>>>>>>> Sandisk Extreme PRO A2 64GiB:
>>>>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>>>>
>>>>>>> Goodram IRDM V30 A2 64GiB:
>>>>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
>>>>>>>
>>>>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
>>>>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
>>>>>>>
>>>>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
>>>>>>>
>>>>>>> I'm really interested in what you would find with that one.
>>>>>>
>>>>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
>>>>>>
>>>>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
>>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>>>>
>>>>> This one I have here is certainly older (this time tested on STM32MP135F):
>>>>>
>>>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
>>>>>
>>>>> cid, csr, date, ocr, serial differ.
>>>>
>>>> I have been trying to follow the progress around this matter. If I
>>>> understand correctly we are leaning towards making a card quirk for
>>>> this particular SD, to avoid us from turning on and using a broken
>>>> cache feature.
>>>>
>>>> Or what are you thinking?
>>>
>>> That is probably the simplest option.
>>
>> Just give me a day or two to test the other newer card.
>>
>> What would you base that quirk off of ? Date ? We don't know when the
>> "fixed" cards started to be produced .
> 
> Right. It seems like the best we can do is to make a quirk for that
> particular version of card that you proved to have failed.

I just sent a few more data points . It is either date, or C_SIZE, or 
ERASE_TIMEOUT (if I decode it right). I can also just archive the card, 
since we have sample size of the defective card equal 1 . It could just 
be a defective card after all, although the fact that only cache would 
be defective is unusual.
Ulf Hansson June 16, 2023, 10:33 a.m. UTC | #20
On Fri, 16 Jun 2023 at 12:24, Marek Vasut <marex@denx.de> wrote:
>
> On 6/16/23 12:02, Ulf Hansson wrote:
> > On Thu, 15 Jun 2023 at 17:37, Marek Vasut <marex@denx.de> wrote:
> >>
> >> On 6/15/23 17:35, Adrian Hunter wrote:
> >>> On 15/06/23 18:14, Ulf Hansson wrote:
> >>>> On Mon, 12 Jun 2023 at 10:59, Marek Vasut <marex@denx.de> wrote:
> >>>>>
> >>>>> On 6/12/23 06:59, Adrian Hunter wrote:
> >>>>>> On 7/06/23 23:43, Marek Vasut wrote:
> >>>>>>> On 6/4/23 18:30, Adrian Hunter wrote:
> >>>>>>>
> >>>>>>> [...]
> >>>>>>>
> >>>>>>>>>>>> diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
> >>>>>>>>>>>> index 72b664ed90cf..9c3123867a99 100644
> >>>>>>>>>>>> --- a/drivers/mmc/core/sd.c
> >>>>>>>>>>>> +++ b/drivers/mmc/core/sd.c
> >>>>>>>>>>>> @@ -1313,6 +1313,8 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>>>        {
> >>>>>>>>>>>>            struct mmc_card *card = host->card;
> >>>>>>>>>>>>            u8 *reg_buf, fno, page;
> >>>>>>>>>>>> +    unsigned long timeout;
> >>>>>>>>>>>> +    bool expired;
> >>>>>>>>>>>>            u16 offset;
> >>>>>>>>>>>>            int err;
> >>>>>>>>>>>>        @@ -1338,11 +1340,15 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>>>                goto out;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>        +    timeout = jiffies + msecs_to_jiffies(SD_WRITE_EXTR_SINGLE_TIMEOUT_MS) + 1;
> >>>>>>>>>>>> +again:
> >>>>>>>>>>>>            err = mmc_poll_for_busy(card, SD_WRITE_EXTR_SINGLE_TIMEOUT_MS, false,
> >>>>>>>>>>>>                        MMC_BUSY_EXTR_SINGLE);
> >>>>>>>>>>>>            if (err)
> >>>>>>>>>>>>                goto out;
> >>>>>>>>>>>>        +    expired = time_after(jiffies, timeout);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>            /*
> >>>>>>>>>>>>             * Read the Flush Cache bit. The card shall reset it, to confirm that
> >>>>>>>>>>>>             * it's has completed the flushing of the cache.
> >>>>>>>>>>>> @@ -1354,8 +1360,12 @@ static int sd_flush_cache(struct mmc_host *host)
> >>>>>>>>>>>>                goto out;
> >>>>>>>>>>>>            }
> >>>>>>>>>>>>        -    if (reg_buf[0] & BIT(0))
> >>>>>>>>>>>> -        err = -ETIMEDOUT;
> >>>>>>>>>>>> +    if (reg_buf[0] & BIT(0)) {
> >>>>>>>>>>>
> >>>>>>>>>>> I am getting here, multiple times, with expired=0 .
> >>>>>>>>>>
> >>>>>>>>>> So either the host controller's busy detection does not work, or the
> >>>>>>>>>> card is not indicating busy by pulling down DAT0.
> >>>>>>>>>>
> >>>>>>>>>> Can you try to figure out which it is?
> >>>>>>>>>
> >>>>>>>>> The byte 261 bit 0 is never cleared, I had this looping for an hour and the 'Flush Cache' bit just never got cleared. The SD spec 6.00 and 9.00 both indicate the bit should be cleared by the card once cache flush is completed.
> >>>>>>>>>
> >>>>>>>>> I tried three different controllers now -- STM32MP15xx ARM MMCI, i.MX6Q uSDHC, laptop rtsx_pci_sdmmc , they all fail.
> >>>>>>>>>
> >>>>>>>>> I tried to find another card which also has cache, I cannot find any other card, all the rest report no cache. The kingston card SSR (see the 2ff in 6th field, the last f bit 2 is cache supported indication, SSR bit 330):
> >>>>>>>>>
> >>>>>>>>> 00000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>>>>
> >>>>>>>>> So either this card is weird, or the cards with cache are so rare that nobody noticed the problem yet.
> >>>>>>>>
> >>>>>>>> The patch set cover letter says it was tested with 64GB Sandisk Extreme PRO UHS-I A2 card
> >>>>>>>>
> >>>>>>>>        https://lore.kernel.org/linux-mmc/20210506145829.198823-1-ulf.hansson@linaro.org/
> >>>>>>>
> >>>>>>> I got that one now, tested it, the cache bit is being cleared correctly. I also tested a few more cards and dumped their SSR too:
> >>>>>>>
> >>>>>>> Kingston Canvas Go! Plus:
> >>>>>>> 80000000:08000000:04009000:011b391e:00080000:0002ff00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>> Flush never finishes
> >>>>>>>
> >>>>>>> Sandisk Extreme PRO A2 64GiB:
> >>>>>>> 80000000:08000000:04009000:0f05391e:00080000:0002fc00:03000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>>>>>>
> >>>>>>> Goodram IRDM V30 A2 64GiB:
> >>>>>>> 80000000:08000000:0400a001:00fd3a1e:00080000:00023c00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>> mmc0: flushing cache took 5 ms, 1 iterations, error 0
> >>>>>>>
> >>>>>>> Samsung Pro Plus 512GiB V30 A2 (ext reg general info is all zeroes, cache not enabled):
> >>>>>>> 80000000:08000000:04009000:0811391e:00080000:0002fc00:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:
> >>>>>>>
> >>>>>>>> I ordered a Kingston Canvas Go Plus card as you described but it won't arrive for a week.
> >>>>>>>
> >>>>>>> I'm really interested in what you would find with that one.
> >>>>>>
> >>>>>> It worked just fine, but maybe it is a newer version of hw / firmware - the date is 04/2023
> >>>>>>
> >>>>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> >>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/block: Is a directory
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f54495344363447614b1004af017400
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cf9f7f800a400000
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:04/2023
> >>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/driver: Is a directory
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00200000
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> >>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/power: Is a directory
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x4b1004af
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> >>>>>> grep: /sys/class/mmc_host/mmc0/mmc0:5048/subsystem: Is a directory
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> >>>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> >>>>>
> >>>>> This one I have here is certainly older (this time tested on STM32MP135F):
> >>>>>
> >>>>> $ grep -H . /sys/class/mmc_host/mmc0/mmc0\:5048/*
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/cid:9f544953443634476136980065013b7e
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/csd:400e00325b590001cfff7f800a4000fa
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/date:11/2019
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/dsr:0x404
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/erase_size:512
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/fwrev:0x1
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/hwrev:0x6
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/manfid:0x00009f
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/name:SD64G
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ocr:0x00300000
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/oemid:0x5449
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/preferred_erase_size:4194304
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/rca:0x5048
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/scr:0205848701006432
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/serial:0x36980065
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/ssr:000000000800000004009000011b391e000800000002ff0003000000000000000000000000000000000000000000000000000000000000000000000000000000
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/type:SD
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:DRIVER=mmcblk
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_TYPE=SD
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MMC_NAME=SD64G
> >>>>> /sys/class/mmc_host/mmc0/mmc0:5048/uevent:MODALIAS=mmc:block
> >>>>>
> >>>>> cid, csr, date, ocr, serial differ.
> >>>>
> >>>> I have been trying to follow the progress around this matter. If I
> >>>> understand correctly we are leaning towards making a card quirk for
> >>>> this particular SD, to avoid us from turning on and using a broken
> >>>> cache feature.
> >>>>
> >>>> Or what are you thinking?
> >>>
> >>> That is probably the simplest option.
> >>
> >> Just give me a day or two to test the other newer card.
> >>
> >> What would you base that quirk off of ? Date ? We don't know when the
> >> "fixed" cards started to be produced .
> >
> > Right. It seems like the best we can do is to make a quirk for that
> > particular version of card that you proved to have failed.
>
> I just sent a few more data points . It is either date, or C_SIZE, or
> ERASE_TIMEOUT (if I decode it right). I can also just archive the card,
> since we have sample size of the defective card equal 1 . It could just
> be a defective card after all, although the fact that only cache would
> be defective is unusual.

I wouldn't mind having a card quirk and to use "date". But it looks
like it's your call to make. :-)

Kind regards
Uffe
Marek Vasut June 20, 2023, 1:44 a.m. UTC | #21
On 6/16/23 12:33, Ulf Hansson wrote:

Hi,

[...]

>>>>>> I have been trying to follow the progress around this matter. If I
>>>>>> understand correctly we are leaning towards making a card quirk for
>>>>>> this particular SD, to avoid us from turning on and using a broken
>>>>>> cache feature.
>>>>>>
>>>>>> Or what are you thinking?
>>>>>
>>>>> That is probably the simplest option.
>>>>
>>>> Just give me a day or two to test the other newer card.
>>>>
>>>> What would you base that quirk off of ? Date ? We don't know when the
>>>> "fixed" cards started to be produced .
>>>
>>> Right. It seems like the best we can do is to make a quirk for that
>>> particular version of card that you proved to have failed.
>>
>> I just sent a few more data points . It is either date, or C_SIZE, or
>> ERASE_TIMEOUT (if I decode it right). I can also just archive the card,
>> since we have sample size of the defective card equal 1 . It could just
>> be a defective card after all, although the fact that only cache would
>> be defective is unusual.
> 
> I wouldn't mind having a card quirk and to use "date". But it looks
> like it's your call to make. :-)

I sent the quirk patch:

[PATCH] mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus 
from 11/2019

It seems it is a bit bigger than I though it would originally be.
diff mbox series

Patch

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index db6d8a0999100..72aa47af11d37 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2413,8 +2413,6 @@  static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 	struct mmc_blk_data *md;
 	int devidx, ret;
 	char cap_str[10];
-	bool cache_enabled = false;
-	bool fua_enabled = false;
 
 	devidx = ida_simple_get(&mmc_blk_ida, 0, max_devices, GFP_KERNEL);
 	if (devidx < 0) {
@@ -2494,17 +2492,13 @@  static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 			md->flags |= MMC_BLK_CMD23;
 	}
 
-	if (md->flags & MMC_BLK_CMD23 &&
+	if (mmc_card_mmc(card) &&
+	    md->flags & MMC_BLK_CMD23 &&
 	    ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
 	     card->ext_csd.rel_sectors)) {
 		md->flags |= MMC_BLK_REL_WR;
-		fua_enabled = true;
-		cache_enabled = true;
+		blk_queue_write_cache(md->queue.queue, true, true);
 	}
-	if (mmc_cache_enabled(card->host))
-		cache_enabled  = true;
-
-	blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
 
 	string_get_size((u64)size, 512, STRING_UNITS_2,
 			cap_str, sizeof(cap_str));