mmc: block: Dont report successful writes with errors

Message ID	ca06b94aa48a484d965744e64e17a4ef@hyperstone.com
State	New
Headers	show Return-Path: <linux-mmc-owner@kernel.org> From: Christian Loehle <CLoehle@hyperstone.com> To: "ulf.hansson@linaro.org" <ulf.hansson@linaro.org>, Avri Altman <Avri.Altman@wdc.com>, Adrian Hunter <adrian.hunter@intel.com>, "andriy.shevchenko@linux.intel.com" <andriy.shevchenko@linux.intel.com>, Linux MMC List <linux-mmc@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: [PATCH] mmc: block: Dont report successful writes with errors Thread-Topic: [PATCH] mmc: block: Dont report successful writes with errors Thread-Index: Adibg9VERWQQSIdlTuivPV3k9XDqxw== Date: Tue, 19 Jul 2022 15:34:07 +0000 Message-ID: <ca06b94aa48a484d965744e64e17a4ef@hyperstone.com> Accept-Language: en-US, de-DE Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Precedence: bulk
Series	mmc: block: Dont report successful writes with errors \| expand mmc: block: Dont report successful writes with errors

Christian Loehle July 19, 2022, 3:34 p.m. UTC

Be as conservative about successful write reporting to the
block layer for SPI as with normal SD and MMC.
That means on any errors bytes_xfered is ignored and the
whole write must be repeated.

Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
---
 drivers/mmc/core/block.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Adrian Hunter July 23, 2022, 7:42 a.m. UTC | #1

On 19/07/22 18:34, Christian Loehle wrote:
> Be as conservative about successful write reporting to the
> block layer for SPI as with normal SD and MMC.
> That means on any errors bytes_xfered is ignored and the
> whole write must be repeated.
> 
> Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
> ---
>  drivers/mmc/core/block.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index f4a1281658db..63d1c05582a9 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -1765,8 +1765,12 @@ static bool mmc_blk_status_error(struct request *req, u32 status)
>  	struct mmc_queue *mq = req->q->queuedata;
>  	u32 stop_err_bits;
>  
> +	/*
> +	 * Either write timed out during busy and data->error is set
> +	 * or we actually received a valid R2 and check for error bits.
> +	 */
>  	if (mmc_host_is_spi(mq->card->host))
> -		return false;
> +		return brq->data.error || !!status;

This function is for checking status, so brq->data.error does not
belong here.  Also it would be more readable to use a define e.g.

		return status & SPI_R2_ERRORS;

I think clearing bytes_xfered for SPI brq->data.error should be a
separate patch, and you would need to explain a bit more for that
case too.

>  
>  	stop_err_bits = mmc_blk_stop_err_bits(brq);
>

Christian Loehle July 23, 2022, 3:08 p.m. UTC | #2

>On 19/07/22 18:34, Christian Loehle wrote:
>> Be as conservative about successful write reporting to the block layer 
>> for SPI as with normal SD and MMC.
>> That means on any errors bytes_xfered is ignored and the whole write 
>> must be repeated.
>> 
>> Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
>> ---
>>  drivers/mmc/core/block.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 
>> f4a1281658db..63d1c05582a9 100644
>> --- a/drivers/mmc/core/block.c
>> +++ b/drivers/mmc/core/block.c
>> @@ -1765,8 +1765,12 @@ static bool mmc_blk_status_error(struct request *req, u32 status)
>>  	struct mmc_queue *mq = req->q->queuedata;
>>  	u32 stop_err_bits;
>>  
>> +	/*
>> +	 * Either write timed out during busy and data->error is set
>> +	 * or we actually received a valid R2 and check for error bits.
>> +	 */
>>  	if (mmc_host_is_spi(mq->card->host))
>> -		return false;
>> +		return brq->data.error || !!status;
>
>This function is for checking status, so brq->data.error does not belong here.  Also it would be more readable to use a define e.g.
>
>		return status & SPI_R2_ERRORS;
>
>I think clearing bytes_xfered for SPI brq->data.error should be a separate patch, and you would need to explain a bit more for that case too.

I understand that, but there is no way of checking status in SPI mode.
The behavior of mmc block is only changed in a minor way here anyway, that is, checking for status is done one more time than before.
If brq->data.error is set directly after a write e.g. then bytes_xfered is already 0.
My intention was mostly to improve on the flow of the recovery and get the mmc_is_host_spi out of there for the most part with future patches.
After all it feels weird to do a single step read retry before ensuring a fix_state, and I ran into that quite often.
Unfortunately, I now realized that fix_state cannot properly be implemented within the spec or even real-world card's behavior and I won't be taking this further.
The best attempt I came up with is doing a loop of CMD12 and CMD13 in SPI and if CMD12 was ILLEGAL and CMD12 has no bits set, state is fixed.
But CMD12 is only defined for multiple block transfers in SPI and cards treat it differently on e.g. CMD17 transfers.
Instead I would just do a soft reset for SPI and retry and maybe increase the read timeout of 100ms which larger SD cards can fail sometimes.
Anyway since SPI initialization is quite fast, especially for soft resets there is likely no recovery to beat that performance-wise.
I will send an RFC for the soft reset in the coming days.
If not I would at least add the !mmc_is_host_spi condition for calling mmc_blk_status_error to make it a bit more clear that this function does do what is intended for SPI cards.

Regards,
Christian

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782

Adrian Hunter July 29, 2022, 10:17 a.m. UTC | #3

On 23/07/22 18:08, Christian Loehle wrote:
>> On 19/07/22 18:34, Christian Loehle wrote:
>>> Be as conservative about successful write reporting to the block layer 
>>> for SPI as with normal SD and MMC.
>>> That means on any errors bytes_xfered is ignored and the whole write 
>>> must be repeated.
>>>
>>> Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
>>> ---
>>>  drivers/mmc/core/block.c | 6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 
>>> f4a1281658db..63d1c05582a9 100644
>>> --- a/drivers/mmc/core/block.c
>>> +++ b/drivers/mmc/core/block.c
>>> @@ -1765,8 +1765,12 @@ static bool mmc_blk_status_error(struct request *req, u32 status)
>>>  	struct mmc_queue *mq = req->q->queuedata;
>>>  	u32 stop_err_bits;
>>>  
>>> +	/*
>>> +	 * Either write timed out during busy and data->error is set
>>> +	 * or we actually received a valid R2 and check for error bits.
>>> +	 */
>>>  	if (mmc_host_is_spi(mq->card->host))
>>> -		return false;
>>> +		return brq->data.error || !!status;
>>
>> This function is for checking status, so brq->data.error does not belong here.  Also it would be more readable to use a define e.g.
>>
>> 		return status & SPI_R2_ERRORS;
>>
>> I think clearing bytes_xfered for SPI brq->data.error should be a separate patch, and you would need to explain a bit more for that case too.
> 
> I understand that, but there is no way of checking status in SPI mode.
> The behavior of mmc block is only changed in a minor way here anyway, that is, checking for status is done one more time than before.
> If brq->data.error is set directly after a write e.g. then bytes_xfered is already 0.

The expectation is that the driver sets bytes_xfered correctly,
based controller errors.  The driver is not expected to check for
status errors, hence in that case the bytes_xfered is set to 0 by
error recovery.

> My intention was mostly to improve on the flow of the recovery and get the mmc_is_host_spi out of there for the most part with future patches.
> After all it feels weird to do a single step read retry before ensuring a fix_state, and I ran into that quite often.
> Unfortunately, I now realized that fix_state cannot properly be implemented within the spec or even real-world card's behavior and I won't be taking this further.
> The best attempt I came up with is doing a loop of CMD12 and CMD13 in SPI and if CMD12 was ILLEGAL and CMD12 has no bits set, state is fixed.
> But CMD12 is only defined for multiple block transfers in SPI and cards treat it differently on e.g. CMD17 transfers.
> Instead I would just do a soft reset for SPI and retry and maybe increase the read timeout of 100ms which larger SD cards can fail sometimes.
> Anyway since SPI initialization is quite fast, especially for soft resets there is likely no recovery to beat that performance-wise.
> I will send an RFC for the soft reset in the coming days.

That sounds like it would be a good improvement to have.

> If not I would at least add the !mmc_is_host_spi condition for calling mmc_blk_status_error to make it a bit more clear that this function does do what is intended for SPI cards.

I am not sure what you mean.  Isn't it OK to check CMD13 response for SPI?

Christian Loehle July 29, 2022, 2:20 p.m. UTC | #4

>> If not I would at least add the !mmc_is_host_spi condition for calling mmc_blk_status_error to make it a bit more clear that this function does do what is intended for SPI cards.
>
>I am not sure what you mean.  Isn't it OK to check CMD13 response for SPI?

You can do that for sure, but e.g. without some knowledge about what state you're in it doesn't tell you a lot.
If you get all zeroes after a write e.g., you cannot always tell if the SPI card is holding the line LOW because of busy[*], or you actually got an SPI R2 with no error bits set. (The CMD12 = ILLEGAL assert would fix it, but now all cards behave this way and the spec doesn't mandate it.)
It cannot really be dealt with in a nice manner.
Furthermore cards are, according to spec, free to treat cmd13 as ILLEGAL during data state.
If so, that's nice for us, we get a 0x4 back and know we have to fix state, some cards also
accept CMD13 (no error bits set), perfectly legal, but we don't know if we should fix state or not.
(Furthermore, how to fix state is then dependent on the issued (e.g. timedout command)

*The SD SPI spec e.g. says in 7.2.8 that a CMD13 must ALWAYS be responded to, but that is clearly not the intention of the spec, a card always listening for CMD13 in RCV state simply doesn't make sense.

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782

Adrian Hunter Aug. 8, 2022, 12:10 p.m. UTC | #5

On 29/07/22 17:20, Christian Loehle wrote:
> 
>>> If not I would at least add the !mmc_is_host_spi condition for calling mmc_blk_status_error to make it a bit more clear that this function does do what is intended for SPI cards.
>>
>> I am not sure what you mean.  Isn't it OK to check CMD13 response for SPI?
> 
> You can do that for sure, but e.g. without some knowledge about what state you're in it doesn't tell you a lot.
> If you get all zeroes after a write e.g., you cannot always tell if the SPI card is holding the line LOW because of busy[*], or you actually got an SPI R2 with no error bits set. (The CMD12 = ILLEGAL assert would fix it, but now all cards behave this way and the spec doesn't mandate it.)
> It cannot really be dealt with in a nice manner.
> Furthermore cards are, according to spec, free to treat cmd13 as ILLEGAL during data state.
> If so, that's nice for us, we get a 0x4 back and know we have to fix state, some cards also
> accept CMD13 (no error bits set), perfectly legal, but we don't know if we should fix state or not.
> (Furthermore, how to fix state is then dependent on the issued (e.g. timedout command)
> 
> *The SD SPI spec e.g. says in 7.2.8 that a CMD13 must ALWAYS be responded to, but that is clearly not the intention of the spec, a card always listening for CMD13 in RCV state simply doesn't make sense.

Note, my comments were structural, not functional:
	- use mmc_blk_status_error() to check the status only, since that is what that function is for,
	- make the brq->data.error check a separate patch since it needs its own explanation

WRT getting SPI error handling to work the way you need, that is up to you,
so long as there is a reasonable explanation.

mmc: block: Dont report successful writes with errors

Commit Message

Comments

Patch