diff mbox series

[2/2] mmc: core: Add no single read retries

Message ID 20210217052239.13780-3-dh0421.hwang@samsung.com
State New
Headers show
Series [1/2] dt-bindings: mmc: Add no-single-read-retry property | expand

Commit Message

DooHyun Hwang Feb. 17, 2021, 5:22 a.m. UTC
This makes to handle read errors faster by not retrying
multiple block read(CMD18) errors with single block reads(CMD17).

On some bad SD Cards that have problem with read operations,
it is not helpful to retry multiple block read errors with
several single block reads, and it is delayed to treat read
operations as I/O error as much as retrying single block reads.

Signed-off-by: DooHyun Hwang <dh0421.hwang@samsung.com>
---
 drivers/mmc/core/block.c | 3 ++-
 drivers/mmc/core/host.c  | 6 ++++++
 include/linux/mmc/host.h | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

Comments

Adrian Hunter Feb. 17, 2021, 6 a.m. UTC | #1
On 17/02/21 7:46 am, Adrian Hunter wrote:
> On 17/02/21 7:22 am, DooHyun Hwang wrote:
>> This makes to handle read errors faster by not retrying
>> multiple block read(CMD18) errors with single block reads(CMD17).
>>
>> On some bad SD Cards that have problem with read operations,
>> it is not helpful to retry multiple block read errors with
>> several single block reads, and it is delayed to treat read
>> operations as I/O error as much as retrying single block reads.
> 
> If the issue is that it takes too long, then maybe it would be better to get
> mmc_blk_read_single() to give up after a certain amount of time.
> 

So that a device property would not be needed I mean.  Then everyone would
benefit.
Ulf Hansson March 1, 2021, 8:23 a.m. UTC | #2
On Wed, 17 Feb 2021 at 09:46, DooHyun Hwang <dh0421.hwang@samsung.com> wrote:
>

>

> On 17/02/21 8:00 am, Adrian Hunter wrote:

> >On 17/02/21 7:46 am, Adrian Hunter wrote:

> >> On 17/02/21 7:22 am, DooHyun Hwang wrote:

> >>> This makes to handle read errors faster by not retrying multiple

> >>> block read(CMD18) errors with single block reads(CMD17).

> >>>

> >>> On some bad SD Cards that have problem with read operations, it is

> >>> not helpful to retry multiple block read errors with several single

> >>> block reads, and it is delayed to treat read operations as I/O error

> >>> as much as retrying single block reads.

> >>

> >> If the issue is that it takes too long, then maybe it would be better

> >> to get

> >> mmc_blk_read_single() to give up after a certain amount of time.

> >>

> >

> >So that a device property would not be needed I mean.  Then everyone would

> >benefit.


Just wanted to confirm with Adrian's points, that we don't want a
device property for this.

In fact, the DT maintainers would reject it because it would be
considered as a software configuration, which doesn't belong in DT.

>

> Thank you for reviewing this.

>

> mmc_blk_read_single() takes a different time depending on the number of

> sectors to read and the timeout value for each CMD.

>

> I think it's difficult to set the criteria for "a certain amount of time"

> you talked about.

> And it's harder to proceed with any errors caused by giving up in

> mmc_blk_read_single() than no retrying.

>

> So, I would like to add a configurable property to skip the single block

> read retrying because if multiple block read error occurs, single block

> read retrying doesn't help for some bad SD cards.


I certainly agree that falling back to single block reads is
questionable, at least for some cases. Moreover, I am pretty sure it's
not always the SD card that should be blamed, but a broken mmc host
driver or broken HW/controller.

That said, I assume that the main reason why we fall back to retry
with single block reads, is to try to recover as much data as possible
from a broken SD card. The intent is good, but to recover data from a
broken card, we should also consider to move to a lower/legacy speed
mode and to decrease the clock rate of the interface.

For the clock rate, we already have a debugfs node, allowing us to
change the rate per mmc host. I suggest we add a few more debugfs
nodes, allowing us to restrict the speed mode and to enable/disable
single/multi block read.

If we can get these things in place to help with recovery, I wouldn't
mind us changing the default behaviour to skip the single block read
in the recovery path.

>

> This is the log to check for this patch.

> #0. time difference is about 2.37s for 8 sectors between with(#1) and without(#2)

>      single block read retrying

>      This is a test for just one CMD18.

>      When there are many I/O requests, it takes too long to handle the errors.

>

> #1. retry multiple block read (8 sectors) error with single block reads

> // It takes about 3.585671s for the I/O error.

> // issue CMD23 (+ arg 0x8)

> // issue CMD18 (+ arg 0x000320e0) and error occurs

> <7>[  316.657115]  [5:   kworker/5:1H:  324] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  316.657124]  [5:   kworker/5:1H:  324] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> <7>[  316.826302] I[0:      swapper/0:    0] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000

> <7>[  316.826327] I[0:      swapper/0:    0] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000

> <7>[  316.826362] I[0:      swapper/0:    0] mmc0:     0 bytes transferred: -110

> <7>[  316.826389] I[0:      swapper/0:    0] mmc0:     (CMD12): 0: 00000b00 00000000 00000000 00000000

> <7>[  316.826516]  [0:   kworker/0:1H:  338] mmc0: starting CMD13 arg 00010000 flags 00000195

> <7>[  316.826621] I[0:   kworker/0:1H:  338] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000

> // retry CMD18 (+ arg 0x000320e0) and error occurs again. Same as above.

> <7>[  316.829224]  [5:   kworker/5:1H:  324] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  316.829237]  [5:   kworker/5:1H:  324] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> <7>[  316.999588] I[0:      swapper/0:    0] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000

> <7>[  316.999653] I[0:      swapper/0:    0] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000

> <7>[  316.999725] I[0:      swapper/0:    0] mmc0:     0 bytes transferred: -110

> <7>[  316.999789] I[0:      swapper/0:    0] mmc0:     (CMD12): 0: 00000b00 00000000 00000000 00000000

> <7>[  317.000034]  [0:   kworker/0:1H:  338] mmc0: starting CMD13 arg 00010000 flags 00000195

> <7>[  317.000370] I[0:   kworker/0:1H:  338] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000

> // mmc_blk_reset() and it's completed

> <7>[  317.000523]  [0:   kworker/0:1H:  338] mmc0: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0

> ...

> // mmc_blk_read_single() : CMD17, CMD13 and CMD12 repeats 8 times (for retrying multiple block read with 8 sectors)

> // CMD17 (+ arg 0x000320e0 ~ 0x000320e7) and timeout errors occur

> // It takes about 1.351s

> <7>[  317.200351]  [0:   kworker/0:1H:  338] mmc0: starting CMD17 arg 000320e0 flags 000000b5

> <7>[  317.368748] I[0:      swapper/0:    0] mmc0: req done (CMD17): 0: 00000900 00000000 00000000 00000000

> <7>[  317.368776] I[0:      swapper/0:    0] mmc0:     0 bytes transferred: -110

> <7>[  317.368871]  [0:   kworker/0:1H:  338] mmc0: starting CMD13 arg 00010000 flags 00000195

> <7>[  317.368932] I[0:   kworker/0:1H:  338] mmc0: sdhci: IRQ status 0x00000001

> <7>[  317.368970] I[0:   kworker/0:1H:  338] mmc0: req done (CMD13): 0: 00000b00 00000000 00000000 00000000

> <7>[  317.369020]  [0:   kworker/0:1H:  338] mmc0: starting CMD12 arg 00000000 flags 00000095

> <7>[  317.369070] I[0:   kworker/0:1H:  338] mmc0: sdhci: IRQ status 0x00000001

> <7>[  317.369108] I[0:   kworker/0:1H:  338] mmc0: req done (CMD12): 0: 00000b00 00000000 00000000 00000000

> <7>[  317.369155]  [0:   kworker/0:1H:  338] mmc0: starting CMD13 arg 00010000 flags 00000195

> <7>[  317.369204] I[0:   kworker/0:1H:  338] mmc0: sdhci: IRQ status 0x00000001

> <7>[  317.369245] I[0:   kworker/0:1H:  338] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000

> <3>[  317.369298]  [0:   kworker/0:1H:  338] print_req_error: I/O error, dev mmcblk0, sector 205024

> <7>[  317.369342]  [0:   kworker/0:1H:  338] mmc0: starting CMD17 arg 000320e1 flags 000000b5

> ...

> <7>[  318.382668]  [0:   kworker/0:1H:  338] mmc0: starting CMD17 arg 000320e7 flags 000000b5

> <3>[  318.551568]  [0:   kworker/0:1H:  338] print_req_error: I/O error, dev mmcblk0, sector 205031

> // retry CMD18 (+ arg 0x000320e0) and error occurs again.

> <7>[  318.551850]  [5:   kworker/5:1H:  324] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  318.551867]  [5:   kworker/5:1H:  324] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> ...

> // retry CMD18 (+ arg 0x000320e0) and error occurs again.

> <7>[  318.721767]  [5:   kworker/5:1H:  324] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> // CMD17 (+ arg 0x000320e0 ~ 0x000320e7)

> <7>[  318.891054]  [0:   kworker/0:1H:  338] mmc0: starting CMD17 arg 000320e0 flags 000000b5

> ...

> <7>[  320.073861]  [0:   kworker/0:1H:  338] mmc0: starting CMD17 arg 000320e7 flags 000000b5

> // Return I/O error for read operation finally

> <3>[  320.242786]  [0:   kworker/0:1H:  338] Buffer I/O error on dev mmcblk0, logical block 25628, async page read

>

> #2. retry multiple block read (8 sectors) error without single block reads

> // It takes about 1.205941s for the I/O error.

> // issue CMD23 (+ arg 0x8)

> // issue CMD18 (+ arg 0x000320e0) and error occurs

> <7>[  126.467114]  [7:   kworker/7:2H: 8887] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  126.467125]  [7:   kworker/7:2H: 8887] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> <7>[  126.636188] I[0:Measurement Wor: 9074] mmc0: req done <CMD23>: 0: 00000000 00000000 00000000 00000000

> <7>[  126.636213] I[0:Measurement Wor: 9074] mmc0: req done (CMD18): 0: 00000900 00000000 00000000 00000000

> <7>[  126.636241] I[0:Measurement Wor: 9074] mmc0:     0 bytes transferred: -110

> <7>[  126.636265] I[0:Measurement Wor: 9074] mmc0:     (CMD12): 0: 00000b00 00000000 00000000 00000000

> <7>[  126.636379]  [0:   kworker/0:1H:  336] mmc0: starting CMD13 arg 00010000 flags 00000195

> <7>[  126.636495] I[0:   kworker/0:1H:  336] mmc0: req done (CMD13): 0: 00000900 00000000 00000000 00000000

> // retry CMD18 (+ arg 0x000320e0) and error occurs again. Same as above.

> <7>[  126.638284]  [7:   kworker/7:2H: 8887] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  126.638298]  [7:   kworker/7:2H: 8887] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> // mmc_blk_reset() and it's completed

> <7>[  126.807645]  [0:   kworker/0:1H:  336] mmc0: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0

> ...

> // no mmc_blk_read_single() calling

> // retry CMD18 (+ arg 0x000320e0) and error occurs again.

> <7>[  126.993628]  [7:   kworker/7:2H: 8887] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  126.993643]  [7:   kworker/7:2H: 8887] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> // retry CMD18 (+ arg 0x000320e0) and error occurs again.

> <7>[  127.164836]  [7:   kworker/7:2H: 8887] <mmc0: starting CMD23 arg 00000008 flags 00000015>

> <7>[  127.164848]  [7:   kworker/7:2H: 8887] mmc0: starting CMD18 arg 000320e0 flags 000000b5

> ...

> // Return I/O error for read operation finally

> <3>[  127.673055] I[7:      swapper/7:    0] Buffer I/O error on dev mmcblk0, logical block 25628, async page read

>


Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index d666e24fbe0e..e25aaf8fca34 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1809,7 +1809,8 @@  static void mmc_blk_mq_rw_recovery(struct mmc_queue *mq, struct request *req)
 
 	/* FIXME: Missing single sector read for large sector size */
 	if (!mmc_large_sector(card) && rq_data_dir(req) == READ &&
-	    brq->data.blocks > 1) {
+	    brq->data.blocks > 1 &&
+	    !card->host->no_single_read_retry) {
 		/* Read one sector at a time */
 		mmc_blk_read_single(mq, req);
 		return;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 9b89a91b6b47..3bf5b2fc111b 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -352,6 +352,12 @@  int mmc_of_parse(struct mmc_host *host)
 	if (device_property_read_bool(dev, "no-mmc"))
 		host->caps2 |= MMC_CAP2_NO_MMC;
 
+	if (device_property_read_bool(dev, "no-single-read-retry")) {
+		dev_info(host->parent,
+			"Single block read retrying is not supported\n");
+		host->no_single_read_retry = true;
+	}
+
 	/* Must be after "non-removable" check */
 	if (device_property_read_u32(dev, "fixed-emmc-driver-type", &drv_type) == 0) {
 		if (host->caps & MMC_CAP_NONREMOVABLE)
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 26a3c7bc29ae..faec55035a63 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -502,6 +502,9 @@  struct mmc_host {
 	/* Host Software Queue support */
 	bool			hsq_enabled;
 
+	/* Do not retry multi block read as single block read */
+	bool			no_single_read_retry;
+
 	unsigned long		private[] ____cacheline_aligned;
 };