diff mbox series

[v1,1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for BlueField-3 SoC

Message ID 6082b74cbc681e8c24354828941361f4f4294242.1700315051.git.limings@nvidia.com
State New
Headers show
Series [v1,1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for BlueField-3 SoC | expand

Commit Message

Liming Sun Nov. 18, 2023, 1:46 p.m. UTC
This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
intermittent eMMC timeout issue reported on some cards under eMMC
stress test.

Reported error message:
  dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110

Signed-off-by: Liming Sun <limings@nvidia.com>
---
 drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Adrian Hunter Nov. 20, 2023, 6:49 a.m. UTC | #1
On 18/11/23 15:46, Liming Sun wrote:
> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> intermittent eMMC timeout issue reported on some cards under eMMC
> stress test.
> 
> Reported error message:
>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110

Were you able to determine the root cause?  For example,
is the host controller timeout correct, is the eMMC
providing correct timeout values, is the mmc subsystem
calculating a correct value, is sdhci programming a correct
value?

If there are problems outside the host controller then we
need to address them also.

> 
> Signed-off-by: Liming Sun <limings@nvidia.com>

Fixes tag?

> ---
>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c
> index 3a3bae6948a8..3c8fe8aec558 100644
> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data sdhci_dwcmshc_pdata = {
>  #ifdef CONFIG_ACPI
>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>  	.ops = &sdhci_dwcmshc_ops,
> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
>  };
Liming Sun Nov. 20, 2023, 3:18 p.m. UTC | #2
> -----Original Message-----
> From: Adrian Hunter <adrian.hunter@intel.com>
> Sent: Monday, November 20, 2023 1:49 AM
> To: Liming Sun <limings@nvidia.com>; Ulf Hansson <ulf.hansson@linaro.org>;
> David Thompson <davthompson@nvidia.com>
> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
> BlueField-3 SoC
> 
> On 18/11/23 15:46, Liming Sun wrote:
> > This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> > intermittent eMMC timeout issue reported on some cards under eMMC
> > stress test.
> >
> > Reported error message:
> >   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
> 
> Were you able to determine the root cause?  For example,
> is the host controller timeout correct, is the eMMC
> providing correct timeout values, is the mmc subsystem
> calculating a correct value, is sdhci programming a correct
> value?
> 
> If there are problems outside the host controller then we
> need to address them also.

It is caused by the host controller timeout, but is hard to tell whether the
configuration provided by the card is good enough since it's
intermittent under stress test the SoC needs to work with different eMMC vendors. 
In UEFI eMMC driver similar max timeout (0xe) is used to avoid such
issue. This commit tries to use existing quirk, which I think that it would work 
if there is another way to adjust the TOUT_CNT register. Any concern or suggestions?

> 
> >
> > Signed-off-by: Liming Sun <limings@nvidia.com>
> 
> Fixes tag?

Will update it in v2.

> 
> > ---
> >  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
> b/drivers/mmc/host/sdhci-of-dwcmshc.c
> > index 3a3bae6948a8..3c8fe8aec558 100644
> > --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> > +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> > @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
> sdhci_dwcmshc_pdata = {
> >  #ifdef CONFIG_ACPI
> >  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
> >  	.ops = &sdhci_dwcmshc_ops,
> > -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> > +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> > +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
> >  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
> >  		   SDHCI_QUIRK2_ACMD23_BROKEN,
> >  };
Adrian Hunter Nov. 21, 2023, 8:08 a.m. UTC | #3
On 20/11/23 17:18, Liming Sun wrote:
> 
> 
>> -----Original Message-----
>> From: Adrian Hunter <adrian.hunter@intel.com>
>> Sent: Monday, November 20, 2023 1:49 AM
>> To: Liming Sun <limings@nvidia.com>; Ulf Hansson <ulf.hansson@linaro.org>;
>> David Thompson <davthompson@nvidia.com>
>> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
>> BlueField-3 SoC
>>
>> On 18/11/23 15:46, Liming Sun wrote:
>>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
>>> intermittent eMMC timeout issue reported on some cards under eMMC
>>> stress test.
>>>
>>> Reported error message:
>>>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
>>
>> Were you able to determine the root cause?  For example,
>> is the host controller timeout correct, is the eMMC
>> providing correct timeout values, is the mmc subsystem
>> calculating a correct value, is sdhci programming a correct
>> value?
>>
>> If there are problems outside the host controller then we
>> need to address them also.
> 
> It is caused by the host controller timeout, but is hard to tell whether the
> configuration provided by the card is good enough since it's
> intermittent under stress test the SoC needs to work with different eMMC vendors. 
> In UEFI eMMC driver similar max timeout (0xe) is used to avoid such
> issue. This commit tries to use existing quirk, which I think that it would work 
> if there is another way to adjust the TOUT_CNT register. Any concern or suggestions?

If cards are providing timeout values that are too low under stress,
it would be better to fix it in the mmc subsystem so that all host
controllers can benefit.

> 
>>
>>>
>>> Signed-off-by: Liming Sun <limings@nvidia.com>
>>
>> Fixes tag?
> 
> Will update it in v2.
> 
>>
>>> ---
>>>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
>> b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> index 3a3bae6948a8..3c8fe8aec558 100644
>>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
>> sdhci_dwcmshc_pdata = {
>>>  #ifdef CONFIG_ACPI
>>>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>>>  	.ops = &sdhci_dwcmshc_ops,
>>> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
>>> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
>>> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>>>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>>>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
>>>  };
>
Christian Loehle Nov. 27, 2023, 1:36 p.m. UTC | #4
On 18/11/2023 13:46, Liming Sun wrote:
> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> intermittent eMMC timeout issue reported on some cards under eMMC
> stress test.
> 
> Reported error message:
>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
> 
> Signed-off-by: Liming Sun <limings@nvidia.com>
> ---
>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c
> index 3a3bae6948a8..3c8fe8aec558 100644
> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data sdhci_dwcmshc_pdata = {
>  #ifdef CONFIG_ACPI
>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>  	.ops = &sdhci_dwcmshc_ops,
> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
>  };

__mmc_blk_ioctl_cmd: data error ?
What stresstest are you running that issues ioctl commands?
On which commands does the timeout occur?
Anyway you should be able to increase the timeout in ioctl structure
directly, i.e. in userspace, or does that not work?
Liming Sun Nov. 30, 2023, 1:19 p.m. UTC | #5
> -----Original Message-----
> From: Christian Loehle <christian.loehle@arm.com>
> Sent: Monday, November 27, 2023 8:36 AM
> To: Liming Sun <limings@nvidia.com>; Adrian Hunter
> <adrian.hunter@intel.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
> Thompson <davthompson@nvidia.com>
> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
> BlueField-3 SoC
> 
> On 18/11/2023 13:46, Liming Sun wrote:
> > This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> > intermittent eMMC timeout issue reported on some cards under eMMC
> > stress test.
> >
> > Reported error message:
> >   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
> >
> > Signed-off-by: Liming Sun <limings@nvidia.com>
> > ---
> >  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
> b/drivers/mmc/host/sdhci-of-dwcmshc.c
> > index 3a3bae6948a8..3c8fe8aec558 100644
> > --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> > +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> > @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
> sdhci_dwcmshc_pdata = {
> >  #ifdef CONFIG_ACPI
> >  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
> >  	.ops = &sdhci_dwcmshc_ops,
> > -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> > +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> > +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
> >  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
> >  		   SDHCI_QUIRK2_ACMD23_BROKEN,
> >  };
> 
> __mmc_blk_ioctl_cmd: data error ?
> What stresstest are you running that issues ioctl commands?
> On which commands does the timeout occur?
> Anyway you should be able to increase the timeout in ioctl structure
> directly, i.e. in userspace, or does that not work?

It's running stress test with tool like "fio --name=randrw_stress_round_1 --ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K/3:76K/3 --filename=/dev/mmcblk0"
The tool(application) is owned by user or with some standard tool.
Adrian Hunter Dec. 11, 2023, 11:38 a.m. UTC | #6
On 30/11/23 15:19, Liming Sun wrote:
> 
> 
>> -----Original Message-----
>> From: Christian Loehle <christian.loehle@arm.com>
>> Sent: Monday, November 27, 2023 8:36 AM
>> To: Liming Sun <limings@nvidia.com>; Adrian Hunter
>> <adrian.hunter@intel.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
>> Thompson <davthompson@nvidia.com>
>> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
>> BlueField-3 SoC
>>
>> On 18/11/2023 13:46, Liming Sun wrote:
>>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
>>> intermittent eMMC timeout issue reported on some cards under eMMC
>>> stress test.
>>>
>>> Reported error message:
>>>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
>>>
>>> Signed-off-by: Liming Sun <limings@nvidia.com>
>>> ---
>>>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
>> b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> index 3a3bae6948a8..3c8fe8aec558 100644
>>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
>> sdhci_dwcmshc_pdata = {
>>>  #ifdef CONFIG_ACPI
>>>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>>>  	.ops = &sdhci_dwcmshc_ops,
>>> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
>>> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
>>> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>>>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>>>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
>>>  };
>>
>> __mmc_blk_ioctl_cmd: data error ?
>> What stresstest are you running that issues ioctl commands?
>> On which commands does the timeout occur?
>> Anyway you should be able to increase the timeout in ioctl structure
>> directly, i.e. in userspace, or does that not work?
> 
> It's running stress test with tool like "fio --name=randrw_stress_round_1 --ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K/3:76K/3 --filename=/dev/mmcblk0"
> The tool(application) is owned by user or with some standard tool.

fio does not send mmc ioctls, so I am also a bit confused about
how you get "__mmc_blk_ioctl_cmd: data error -110" ?
Liming Sun Dec. 19, 2023, 9:18 p.m. UTC | #7
> -----Original Message-----
> From: Adrian Hunter <adrian.hunter@intel.com>
> Sent: Monday, December 11, 2023 6:39 AM
> To: Liming Sun <limings@nvidia.com>; Christian Loehle
> <christian.loehle@arm.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
> Thompson <davthompson@nvidia.com>
> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
> BlueField-3 SoC
> 
> On 30/11/23 15:19, Liming Sun wrote:
> >
> >
> >> -----Original Message-----
> >> From: Christian Loehle <christian.loehle@arm.com>
> >> Sent: Monday, November 27, 2023 8:36 AM
> >> To: Liming Sun <limings@nvidia.com>; Adrian Hunter
> >> <adrian.hunter@intel.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
> >> Thompson <davthompson@nvidia.com>
> >> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
> >> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk
> for
> >> BlueField-3 SoC
> >>
> >> On 18/11/2023 13:46, Liming Sun wrote:
> >>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
> >>> intermittent eMMC timeout issue reported on some cards under eMMC
> >>> stress test.
> >>>
> >>> Reported error message:
> >>>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
> >>>
> >>> Signed-off-by: Liming Sun <limings@nvidia.com>
> >>> ---
> >>>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
> >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
> >> b/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> index 3a3bae6948a8..3c8fe8aec558 100644
> >>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
> >>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
> >> sdhci_dwcmshc_pdata = {
> >>>  #ifdef CONFIG_ACPI
> >>>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
> >>>  	.ops = &sdhci_dwcmshc_ops,
> >>> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
> >>> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
> >>> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
> >>>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
> >>>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
> >>>  };
> >>
> >> __mmc_blk_ioctl_cmd: data error ?
> >> What stresstest are you running that issues ioctl commands?
> >> On which commands does the timeout occur?
> >> Anyway you should be able to increase the timeout in ioctl structure
> >> directly, i.e. in userspace, or does that not work?
> >
> > It's running stress test with tool like "fio --name=randrw_stress_round_1 --
> ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --
> norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --
> iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --
> bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K
> /3:76K/3 --filename=/dev/mmcblk0"
> > The tool(application) is owned by user or with some standard tool.
> 
> fio does not send mmc ioctls, so I am also a bit confused about
> how you get "__mmc_blk_ioctl_cmd: data error -110" ?

There are other activities or background task going on. I assume it's other
MMC access which are affected by the stress FIO and got timeout. Would it make sense?
Adrian Hunter Jan. 4, 2024, 9:24 a.m. UTC | #8
On 19/12/23 23:18, Liming Sun wrote:
> 
> 
>> -----Original Message-----
>> From: Adrian Hunter <adrian.hunter@intel.com>
>> Sent: Monday, December 11, 2023 6:39 AM
>> To: Liming Sun <limings@nvidia.com>; Christian Loehle
>> <christian.loehle@arm.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
>> Thompson <davthompson@nvidia.com>
>> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk for
>> BlueField-3 SoC
>>
>> On 30/11/23 15:19, Liming Sun wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Christian Loehle <christian.loehle@arm.com>
>>>> Sent: Monday, November 27, 2023 8:36 AM
>>>> To: Liming Sun <limings@nvidia.com>; Adrian Hunter
>>>> <adrian.hunter@intel.com>; Ulf Hansson <ulf.hansson@linaro.org>; David
>>>> Thompson <davthompson@nvidia.com>
>>>> Cc: linux-mmc@vger.kernel.org; linux-kernel@vger.kernel.org
>>>> Subject: Re: [PATCH v1 1/1] mmc: sdhci-of-dwcmshc: Enable timeout quirk
>> for
>>>> BlueField-3 SoC
>>>>
>>>> On 18/11/2023 13:46, Liming Sun wrote:
>>>>> This commit enables SDHCI_QUIRK_BROKEN_TIMEOUT_VAL to solve the
>>>>> intermittent eMMC timeout issue reported on some cards under eMMC
>>>>> stress test.
>>>>>
>>>>> Reported error message:
>>>>>   dwcmshc MLNXBF30:00: __mmc_blk_ioctl_cmd: data error -110
>>>>>
>>>>> Signed-off-by: Liming Sun <limings@nvidia.com>
>>>>> ---
>>>>>  drivers/mmc/host/sdhci-of-dwcmshc.c | 3 ++-
>>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>> b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> index 3a3bae6948a8..3c8fe8aec558 100644
>>>>> --- a/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> +++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
>>>>> @@ -365,7 +365,8 @@ static const struct sdhci_pltfm_data
>>>> sdhci_dwcmshc_pdata = {
>>>>>  #ifdef CONFIG_ACPI
>>>>>  static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
>>>>>  	.ops = &sdhci_dwcmshc_ops,
>>>>> -	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
>>>>> +	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
>>>>> +		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
>>>>>  	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
>>>>>  		   SDHCI_QUIRK2_ACMD23_BROKEN,
>>>>>  };
>>>>
>>>> __mmc_blk_ioctl_cmd: data error ?
>>>> What stresstest are you running that issues ioctl commands?
>>>> On which commands does the timeout occur?
>>>> Anyway you should be able to increase the timeout in ioctl structure
>>>> directly, i.e. in userspace, or does that not work?
>>>
>>> It's running stress test with tool like "fio --name=randrw_stress_round_1 --
>> ioengine=libaio --direct=1 --time_based=1 --end_fsync=1 --ramp_time=5 --
>> norandommap=1 --randrepeat=0 --group_reporting=1 --numjobs=4 --
>> iodepth=128 --rw=randrw --overwrite=1 --runtime=36000 --
>> bssplit=4K/44:8K/1:12K/1:16K/1:24K/1:28K/1:32K/1:40K/32:64K/5:68K/7:72K
>> /3:76K/3 --filename=/dev/mmcblk0"
>>> The tool(application) is owned by user or with some standard tool.
>>
>> fio does not send mmc ioctls, so I am also a bit confused about
>> how you get "__mmc_blk_ioctl_cmd: data error -110" ?
> 
> There are other activities or background task going on. I assume it's other
> MMC access which are affected by the stress FIO and got timeout. Would it make sense?
> 

It depends on whether the IOCTL is overriding the timeout.  In
struct mmc_ioc_cmd there is data_timeout_ns which overrides the
mmc core data timeout calculated by mmc_set_data_timeout().  There
is also cmd_timeout_ms for commands.  You need to check whether
"__mmc_blk_ioctl_cmd: data error -110" is because data_timeout_ns
was set too low (but non-zero) by the caller of the IOCTL.
diff mbox series

Patch

diff --git a/drivers/mmc/host/sdhci-of-dwcmshc.c b/drivers/mmc/host/sdhci-of-dwcmshc.c
index 3a3bae6948a8..3c8fe8aec558 100644
--- a/drivers/mmc/host/sdhci-of-dwcmshc.c
+++ b/drivers/mmc/host/sdhci-of-dwcmshc.c
@@ -365,7 +365,8 @@  static const struct sdhci_pltfm_data sdhci_dwcmshc_pdata = {
 #ifdef CONFIG_ACPI
 static const struct sdhci_pltfm_data sdhci_dwcmshc_bf3_pdata = {
 	.ops = &sdhci_dwcmshc_ops,
-	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
+	.quirks = SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN |
+		  SDHCI_QUIRK_BROKEN_TIMEOUT_VAL,
 	.quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
 		   SDHCI_QUIRK2_ACMD23_BROKEN,
 };