mbox series

[v2,0/3] block: scsi: introduce and use BLK_STS_OFFLINE

Message ID 20220203192827.1370270-1-song@kernel.org
Headers show
Series block: scsi: introduce and use BLK_STS_OFFLINE | expand

Message

Song Liu Feb. 3, 2022, 7:28 p.m. UTC
Changes v1 => v2:
1. Add patch 2/3 to change user visible return value to -ENODEV. (Hannes)
2. In the commit log, explain the reason to keep EIO in 1/3.

We have a use case where HDDs are regularly power on/off to perserve power.
When a drive is being removed, we often see errors like

   [  172.803279] I/O error, dev sda, sector 3137184

These messages are confusing for automations that grep dmesg, as they look
very similar to real HDD error.

Solve this issue with a new block state BLK_STS_OFFLINE. After the change,
the error message looks like

   [  172.803279] device offline error, dev sda, sector 3137184

so that the automations won't confuse them with real I/O error.

Song Liu (3):
  block: introduce BLK_STS_OFFLINE
  block: return -ENODEV for BLK_STS_OFFLINE
  scsi: use BLK_STS_OFFLINE for not fully online devices

 block/blk-core.c          | 1 +
 drivers/scsi/scsi_lib.c   | 2 +-
 include/linux/blk_types.h | 7 +++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

--
2.30.2

Comments

Martin K. Petersen Feb. 4, 2022, 3:16 a.m. UTC | #1
Song,

> We have a use case where HDDs are regularly power on/off to perserve power.
> When a drive is being removed, we often see errors like
>
>    [  172.803279] I/O error, dev sda, sector 3137184
>
> These messages are confusing for automations that grep dmesg, as they look
> very similar to real HDD error.
>
> Solve this issue with a new block state BLK_STS_OFFLINE. After the change,
> the error message looks like
>
>    [  172.803279] device offline error, dev sda, sector 3137184
>
> so that the automations won't confuse them with real I/O error.

Looks OK to me.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Jens Axboe Feb. 4, 2022, 4:10 a.m. UTC | #2
On Thu, 3 Feb 2022 11:28:24 -0800, Song Liu wrote:
> Changes v1 => v2:
> 1. Add patch 2/3 to change user visible return value to -ENODEV. (Hannes)
> 2. In the commit log, explain the reason to keep EIO in 1/3.
> 
> We have a use case where HDDs are regularly power on/off to perserve power.
> When a drive is being removed, we often see errors like
> 
> [...]

Applied, thanks!

[1/3] block: introduce BLK_STS_OFFLINE
      commit: 2651bf680bc2ad9a078b7222b0873145ab4ece07
[2/3] block: return -ENODEV for BLK_STS_OFFLINE
      commit: 7d32c027a21ef7aa0a400763397644d44b3576a9
[3/3] scsi: use BLK_STS_OFFLINE for not fully online devices
      commit: 9574d43479e16352e75bc875c9952ed8e129c9b2

Best regards,
Hannes Reinecke Feb. 4, 2022, 7:25 a.m. UTC | #3
On 2/3/22 20:28, Song Liu wrote:
> Currently, drivers reports BLK_STS_IOERR for devices that are not full
> online or being removed. This behavior could cause confusion for users,
> as they are not really I/O errors from the device.
> 
> Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
> offline error" in dmesg instead of "I/O error".
> 
> EIO is intentionally kept to not change user visible return value.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>   block/blk-core.c          | 1 +
>   include/linux/blk_types.h | 7 +++++++
>   2 files changed, 8 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Hannes Reinecke Feb. 4, 2022, 7:26 a.m. UTC | #4
On 2/3/22 20:28, Song Liu wrote:
> Change the user visible return value for BLK_STS_OFFLINE to -ENODEV, which
> is more descriptive than existing -EIO.
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>   block/blk-core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes