diff mbox series

[v3] mmc: renesas_sdhi: Fix change point of data handling

Message ID 20240205112702.213050-1-claudiu.beznea.uj@bp.renesas.com
State New
Headers show
Series [v3] mmc: renesas_sdhi: Fix change point of data handling | expand

Commit Message

Claudiu Feb. 5, 2024, 11:27 a.m. UTC
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

On latest kernel revisions it has been noticed (on a RZ/G3S system) that
when booting Linux and root file system is on eMMC, at some point in
the booting process, when the systemd applications are started, the
"mmc0: tuning execution failed: -5" message is displayed on console.
On kernel v6.7-rc5 this is reproducible in 90% of the boots. This was
missing on the same system with kernel v6.5.0-rc1. It was also noticed on
kernel revisions v6.6-rcX on a RZ/G2UL based system but not on the kernel
this fix is based on (v6.7-rc5).

Investigating it on RZ/G3S lead to the conclusion that every time the issue
is reproduced all the probed TAPs are OK. According to datasheet, when this
happens the change point of data need to be considered for tuning.

Previous code considered the change point of data happens when the content
of the SMPCMP register is zero. According to RZ/V2M hardware manual,
chapter "Change Point of the Input Data" (as this is the most clear
description that I've found about change point of the input data and all
RZ hardware manual are similar on this chapter), at the time of tuning,
data is captured by the previous and next TAPs and the result is stored in
the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
If there is a mismatch b/w the previous and the next TAPs, it indicates
that there is a change point of the input data.

To comply with this, the patch checks if this mismatch is present and
updates the priv->smpcmp mask only if it is not. Previous code checked if
the value of SMPCMP register was zero. However, on RZ/G3S, this leads to
failues as it may happen, e.g., the following:
CMPNGU=0x0e, CMPNGD=0x0e, SMPCMP=0x000e000e.

Along with it, as mmc_send_tuning() may return with error even before the
MMC command reach the controller (and because at that point cmd_error = 0),
the update of priv->smpcmp mask has been done only if the return value of
mmc_send_tuning(mmc, opcode, &cmd_error) is 0 (success).

This change has been checked on the devices with the following DTSes by
doing 100 consecutive boots and checking for the tuning failure message:
- r9a08g045s33-smarc.dts
- r8a7742-iwg21d-q7.dts
- r8a7743-iwg20d-q7.dts
- r8a7744-iwg20d-q7.dts
- r8a7745-iwg22d-sodimm.dts
- r8a77470-iwg23s-sbc.dts
- r8a774a1-hihope-rzg2m-ex.dts
- r8a774b1-hihope-rzg2n-ex.dts
- r8a774c0-ek874.dts
- r8a774e1-hihope-rzg2h-ex.dts
- r9a07g043u11-smarc-rzg2ul.dts
- r9a07g044c2-smarc-rzg2lc.dts
- r9a07g044l2-smarc-rzg2l.dts
- r9a07g054l2-smarc-rzv2l.dts

Fixes: 5fb6bf51f6d1 ("mmc: renesas_sdhi: improve TAP selection if all TAPs are good")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---

Changes in v3:
- set priv->smpcmp if cmpngu_data == cmpngd_data and return code of
  mmc_send_tuning() is zero
- removed workaround introduced previously in
  renesas_sdhi_select_tuning() as it is not needed with the code from v3
- update patch description

Changes in v2:
- read the SH_MOBILE_SDHI_SCC_SMPCMP register only on success path of
  mmc_send_tuning()

 drivers/mmc/host/renesas_sdhi_core.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Wolfram Sang Feb. 8, 2024, 12:56 a.m. UTC | #1
Hi Claudiu,

I got more information about SMPCMP now. I had a misunderstanding there.
According to your patch description, you might have the same
misunderstanding? Let me quote again:

===
RZ hardware manual are similar on this chapter), at the time of tuning,
data is captured by the previous and next TAPs and the result is stored in
the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
===

It is not the previous and next TAP but the previous and next clock
cycle using the *same* TAP. And the bits in the register describe if
there was a mismatch in the data bits across these clock cycles.

So, we really want SMPCMP to be 0 because the data should be stable
across all three clock cycles of the same TAP.

> As of my understanding the TAP where cmpngu = 0x0e and cmpngd=0x0e is not
> considered change point of the input data. For that to happen it would mean
> that cmpngu != cmpngd.

I am not sure you can assume that cmpngu != cmpngd is always true for a
change point. I'd think it is likely often the case. But always? I am
not convinced. But I am convinced that if SMPCMP is 0, this is a good
TAP because it was stable over these clock cycles.

> From this snapshot, datasheet and our discussions:
> 
> i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e
> i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> *i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002*
> *i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff*
> *i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000*
> i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> *i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002*
> *i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff*
> *i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000*
> i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
> 
> I understand that TAP4,5,6 are change point of the input data and
> TAP8,0,1,2,3 are candidates for being selected, TAP 1,2 being the best
> (please correct me if I'm wrong).

I agree that TAP4-6 are the change point. TAP2 could be a candidate. I
dunno why SMPCMP is non-zero at i == 2, maybe some glitch due to noise
on the board?

I do really wonder why probing failed, though? TAP1 sounds like a good
choice as well. I mean we consider SMPCMP only if all TAPs are good. So,
if probing fails, that means that SMPCMP was non-zero all the time?

That being said, our code to select the best TAP from SMPCMP is really
not considering the change point :( It just picks the first one where
SMPCMP is 0. We are not checking where the change point is and try to be
as far away as possible.

> root@smarc-rzg3s:~# md5sum out test
> b053723af63801e665959d48cb7bd8e6  out
> b053723af63801e665959d48cb7bd8e6  test
> 
> Do yo consider this enough?

Yes, if done 100 times ;)

I hope this mail was helpful?

Thanks and happy hacking,

   Wolfram
Claudiu Feb. 8, 2024, 4:38 p.m. UTC | #2
Hi, Wolfram,

On 08.02.2024 02:56, Wolfram Sang wrote:
> Hi Claudiu,
> 
> I got more information about SMPCMP now. I had a misunderstanding there.
> According to your patch description, you might have the same
> misunderstanding? Let me quote again:
> 
> ===
> RZ hardware manual are similar on this chapter), at the time of tuning,
> data is captured by the previous and next TAPs and the result is stored in
> the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
> ===
> 
> It is not the previous and next TAP but the previous and next clock
> cycle using the *same* TAP. And the bits in the register describe if
> there was a mismatch in the data bits across these clock cycles.

That's something new for me, it's not described in HW manual (or at least I
haven't found it).

> 
> So, we really want SMPCMP to be 0 because the data should be stable
> across all three clock cycles of the same TAP.

So, it means issues should be somewhere else on my setup.

> 
>> As of my understanding the TAP where cmpngu = 0x0e and cmpngd=0x0e is not
>> considered change point of the input data. For that to happen it would mean
>> that cmpngu != cmpngd.
> 
> I am not sure you can assume that cmpngu != cmpngd is always true for a
> change point. I'd think it is likely often the case. But always? I am
> not convinced. 

That's was my understanding from HW manual and since it fixed my issue I
considered it valid at the point I wrote this statement. Maybe we need to
understand this?

> But I am convinced that if SMPCMP is 0, this is a good
> TAP because it was stable over these clock cycles.
> 
>> From this snapshot, datasheet and our discussions:
>>
>> i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e
>> i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> *i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002*
>> *i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff*
>> *i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000*
>> i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>> *i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002*
>> *i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff*
>> *i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000*
>> i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000
>>
>> I understand that TAP4,5,6 are change point of the input data and
>> TAP8,0,1,2,3 are candidates for being selected, TAP 1,2 being the best
>> (please correct me if I'm wrong).
> 
> I agree that TAP4-6 are the change point. TAP2 could be a candidate. I
> dunno why SMPCMP is non-zero at i == 2, maybe some glitch due to noise
> on the board?

Hm... it worth considering it...

> 
> I do really wonder why probing failed, though? TAP1 sounds like a good
> choice as well. I mean we consider SMPCMP only if all TAPs are good. So,
> if probing fails, that means that SMPCMP was non-zero all the time?

Yes, that was my finding as well on my setup which leads to this patch.

If we're taking as example the snapshot I dropped here in a previous email,
and do not consider this patch, code at [1] should clear bit for TAP2 in
smpcmp mask because in the 1st round SMPCMP was not zero (but 0x000e000e)
and in the 2nd round it was zero.

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/mmc/host/renesas_sdhi_core.c#L629

> 
> That being said, our code to select the best TAP from SMPCMP is really
> not considering the change point :( It just picks the first one where
> SMPCMP is 0.

Hm... I thought code at [2] selects the TAP in the middle (in the snapshot
I pointed, TAP1).

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/mmc/host/renesas_sdhi_core.c#L656


> We are not checking where the change point is and try to be
> as far away as possible.
> 
>> root@smarc-rzg3s:~# md5sum out test
>> b053723af63801e665959d48cb7bd8e6  out
>> b053723af63801e665959d48cb7bd8e6  test
>>
>> Do yo consider this enough?
> 
> Yes, if done 100 times ;)

This may take a while...

> 
> I hope this mail was helpful?

The tuning procedure it's better understand now. But I'm not sure in which
direction should I dig further... :)

Thank you for details and patience,
Claudiu Beznea

> 
> Thanks and happy hacking,
> 
>    Wolfram
>
diff mbox series

Patch

diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index c675dec587ef..8871521e1274 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -18,6 +18,7 @@ 
  *
  */
 
+#include <linux/bitfield.h>
 #include <linux/clk.h>
 #include <linux/delay.h>
 #include <linux/iopoll.h>
@@ -312,6 +313,8 @@  static int renesas_sdhi_start_signal_voltage_switch(struct mmc_host *mmc,
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQDOWN	BIT(8)
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQUP	BIT(24)
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_ERR	(BIT(8) | BIT(24))
+#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA	GENMASK(23, 16)
+#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA	GENMASK(7, 0)
 
 #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL	BIT(4)
 #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400EN	BIT(31)
@@ -703,11 +706,18 @@  static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode)
 		/* Set sampling clock position */
 		sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num);
 
-		if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0)
-			set_bit(i, priv->taps);
+		if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0) {
+			u32 val, cmpngu_data, cmpngd_data;
+
+			val = sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP);
+			cmpngu_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA, val);
+			cmpngd_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA, val);
+
+			if (cmpngu_data == cmpngd_data)
+				set_bit(i, priv->smpcmp);
 
-		if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0)
-			set_bit(i, priv->smpcmp);
+			set_bit(i, priv->taps);
+		}
 
 		if (cmd_error)
 			mmc_send_abort_tuning(mmc, opcode);