Message ID | 20200930080256.90964-3-hare@suse.de |
---|---|
State | New |
Headers | show |
Series | scsi: remove devices in ALUA transitioning status | expand |
On 2020-09-30 01:02, Hannes Reinecke wrote: > When the ALUA state indicates transitioning we should not retry > the command immediately, but rather complete the command with > BLK_STS_AGAIN to signal the completion handler that it might > be retried. > This allows multipathing to redirect the command to another path > if possible, and avoid stalls during lengthy transitioning times. > > Signed-off-by: Hannes Reinecke <hare@suse.de> > --- > drivers/scsi/device_handler/scsi_dh_alua.c | 2 +- > drivers/scsi/scsi_lib.c | 5 +++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c > index 308bda2e9c00..a68222e324e9 100644 > --- a/drivers/scsi/device_handler/scsi_dh_alua.c > +++ b/drivers/scsi/device_handler/scsi_dh_alua.c > @@ -1092,7 +1092,7 @@ static blk_status_t alua_prep_fn(struct scsi_device *sdev, struct request *req) > case SCSI_ACCESS_STATE_LBA: > return BLK_STS_OK; > case SCSI_ACCESS_STATE_TRANSITIONING: > - return BLK_STS_RESOURCE; > + return BLK_STS_AGAIN; > default: > req->rq_flags |= RQF_QUIET; > return BLK_STS_IOERR; > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index f0ee11dc07e4..b628aa0d824c 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1726,6 +1726,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > scsi_device_blocked(sdev)) > ret = BLK_STS_DEV_RESOURCE; > break; > + case BLK_STS_AGAIN: > + scsi_req(req)->result = DID_BUS_BUSY << 16; > + if (req->rq_flags & RQF_DONTPREP) > + scsi_mq_uninit_cmd(cmd); > + break; > default: > if (unlikely(!scsi_device_online(sdev))) > scsi_req(req)->result = DID_NO_CONNECT << 16; Hi Hannes, What will happen if all remote ports have the state "transitioning"? Does the above code resubmit a request immediately in that case? Can this cause spinning with 100% CPU usage if the ALUA device handler notices the transitioning state before multipathd does? Thanks, Bart.
On 10/1/20 4:44 AM, Bart Van Assche wrote: > On 2020-09-30 01:02, Hannes Reinecke wrote: >> When the ALUA state indicates transitioning we should not retry >> the command immediately, but rather complete the command with >> BLK_STS_AGAIN to signal the completion handler that it might >> be retried. >> This allows multipathing to redirect the command to another path >> if possible, and avoid stalls during lengthy transitioning times. >> >> Signed-off-by: Hannes Reinecke <hare@suse.de> >> --- >> drivers/scsi/device_handler/scsi_dh_alua.c | 2 +- >> drivers/scsi/scsi_lib.c | 5 +++++ >> 2 files changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c >> index 308bda2e9c00..a68222e324e9 100644 >> --- a/drivers/scsi/device_handler/scsi_dh_alua.c >> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c >> @@ -1092,7 +1092,7 @@ static blk_status_t alua_prep_fn(struct scsi_device *sdev, struct request *req) >> case SCSI_ACCESS_STATE_LBA: >> return BLK_STS_OK; >> case SCSI_ACCESS_STATE_TRANSITIONING: >> - return BLK_STS_RESOURCE; >> + return BLK_STS_AGAIN; >> default: >> req->rq_flags |= RQF_QUIET; >> return BLK_STS_IOERR; >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index f0ee11dc07e4..b628aa0d824c 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1726,6 +1726,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, >> scsi_device_blocked(sdev)) >> ret = BLK_STS_DEV_RESOURCE; >> break; >> + case BLK_STS_AGAIN: >> + scsi_req(req)->result = DID_BUS_BUSY << 16; >> + if (req->rq_flags & RQF_DONTPREP) >> + scsi_mq_uninit_cmd(cmd); >> + break; >> default: >> if (unlikely(!scsi_device_online(sdev))) >> scsi_req(req)->result = DID_NO_CONNECT << 16; > > Hi Hannes, > > What will happen if all remote ports have the state "transitioning"? > Does the above code resubmit a request immediately in that case? Can > this cause spinning with 100% CPU usage if the ALUA device handler > notices the transitioning state before multipathd does? > Curiously this patch only improves the current situation :-) With the current behaviour we will return BLK_STS_DEV_RESOURCE/BLK_STS_RESOURCE, causing an _immediate_ retry on the same queue. This patch will cause the I/O to be _completed_, to be requeued via the end_request() path. So yes, we will requeue (that's kinda the point of 'transitioning' ...), but with a lower frequency as originally. _And_ with a chance of multipath intercepting the I/O, which didn't happen previously. Cheers, Hannes
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c index 308bda2e9c00..a68222e324e9 100644 --- a/drivers/scsi/device_handler/scsi_dh_alua.c +++ b/drivers/scsi/device_handler/scsi_dh_alua.c @@ -1092,7 +1092,7 @@ static blk_status_t alua_prep_fn(struct scsi_device *sdev, struct request *req) case SCSI_ACCESS_STATE_LBA: return BLK_STS_OK; case SCSI_ACCESS_STATE_TRANSITIONING: - return BLK_STS_RESOURCE; + return BLK_STS_AGAIN; default: req->rq_flags |= RQF_QUIET; return BLK_STS_IOERR; diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f0ee11dc07e4..b628aa0d824c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1726,6 +1726,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, scsi_device_blocked(sdev)) ret = BLK_STS_DEV_RESOURCE; break; + case BLK_STS_AGAIN: + scsi_req(req)->result = DID_BUS_BUSY << 16; + if (req->rq_flags & RQF_DONTPREP) + scsi_mq_uninit_cmd(cmd); + break; default: if (unlikely(!scsi_device_online(sdev))) scsi_req(req)->result = DID_NO_CONNECT << 16;
When the ALUA state indicates transitioning we should not retry the command immediately, but rather complete the command with BLK_STS_AGAIN to signal the completion handler that it might be retried. This allows multipathing to redirect the command to another path if possible, and avoid stalls during lengthy transitioning times. Signed-off-by: Hannes Reinecke <hare@suse.de> --- drivers/scsi/device_handler/scsi_dh_alua.c | 2 +- drivers/scsi/scsi_lib.c | 5 +++++ 2 files changed, 6 insertions(+), 1 deletion(-)