From patchwork Wed Sep 30 09:45:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 304063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B0B4C47420 for ; Wed, 30 Sep 2020 09:48:58 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 172742074A for ; Wed, 30 Sep 2020 09:48:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 172742074A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49532 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYjJ-0008In-74 for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:48:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37440) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-00071P-ON for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:46:56 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:38500 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhK-0003nw-Sb for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:46:56 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 34EF048E18F327774F91; Wed, 30 Sep 2020 17:46:48 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:41 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 1/8] block-backend: introduce I/O rehandle info Date: Wed, 30 Sep 2020 17:45:59 +0800 Message-ID: <20200930094606.5323-2-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" The I/O hang feature is realized based on a rehandle mechanism. Each block backend will have a list to store hanging block AIOs, and a timer to regularly resend these aios. In order to issue the AIOs again, each block AIOs also need to store its coroutine entry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index ce78d30794..b8367d82cc 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -35,6 +35,18 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); +/* block backend rehandle timer interval 5s */ +#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 + +typedef struct BlockBackendRehandleInfo { + bool enable; + QEMUTimer *ts; + unsigned timer_interval_ms; + + unsigned int in_flight; + QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; +} BlockBackendRehandleInfo; + typedef struct BlockBackendAioNotifier { void (*attached_aio_context)(AioContext *new_context, void *opaque); void (*detach_aio_context)(void *opaque); @@ -95,6 +107,8 @@ struct BlockBackend { * Accessed with atomic ops. */ unsigned int in_flight; + + BlockBackendRehandleInfo reinfo; }; typedef struct BlockBackendAIOCB { @@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) qemu_co_queue_init(&blk->queued_requests); notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB { BlkRwCo rwco; int bytes; bool has_returned; + + /* for rehandle */ + CoroutineEntry *co_entry; + QTAILQ_ENTRY(BlkAioEmAIOCB) list; } BlkAioEmAIOCB; static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_) From patchwork Wed Sep 30 09:46:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 304062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C286BC4727E for ; Wed, 30 Sep 2020 09:50:46 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D57C2074A for ; Wed, 30 Sep 2020 09:50:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D57C2074A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55460 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYl3-0002K5-Ao for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:50:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37532) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhQ-00072p-M1 for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:01 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5155 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oH-Tl for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:00 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id A8B59516CA08BE7B74C5; Wed, 30 Sep 2020 17:46:52 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:42 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 2/8] block-backend: rehandle block aios when EIO Date: Wed, 30 Sep 2020 17:46:00 +0800 Message-ID: <20200930094606.5323-3-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When a backend device temporarily does not response, like a network disk down due to some network faults, any IO to the coresponding virtual block device in VM would return I/O error. If the hypervisor returns the error to VM, the filesystem on this block device may not work as usual. And in many situations, the returned error is often an EIO. To avoid this unavailablity, we can store the failed AIOs, and resend them later. If the error is temporary, the retries can succeed and the AIOs can be successfully completed. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 89 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index b8367d82cc..8050669d23 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) notifier_list_init(&blk->remove_bs_notifiers); notifier_list_init(&blk->insert_bs_notifiers); + /* for rehandle */ + blk->reinfo.enable = false; + blk->reinfo.ts = NULL; + qatomic_set(&blk->reinfo.in_flight, 0); + QTAILQ_INIT(&blk->reinfo.re_aios); + QLIST_INIT(&blk->aio_notifiers); QTAILQ_INSERT_TAIL(&block_backends, blk, link); @@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { .get_aio_context = blk_aio_em_aiocb_get_aio_context, }; +static void blk_rehandle_timer_cb(void *opaque); +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb); + static void blk_aio_complete(BlkAioEmAIOCB *acb) { + if (acb->rwco.blk->reinfo.enable) { + blk_rehandle_aio_complete(acb); + return; + } + if (acb->has_returned) { acb->common.cb(acb->common.opaque, acb->rwco.ret); blk_dec_in_flight(acb->rwco.blk); @@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, .ret = NOT_DONE, }; acb->bytes = bytes; + acb->co_entry = co_entry; acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); @@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context, throttle_group_attach_aio_context(tgm, new_context); bdrv_drained_end(bs); } + + if (blk->reinfo.enable) { + if (blk->reinfo.ts) { + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + } + blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, + blk); + if (qatomic_read(&blk->reinfo.in_flight)) { + timer_mod(blk->reinfo.ts, + qemu_clock_get_ms(QEMU_CLOCK_REALTIME)); + } + } } blk->ctx = new_context; @@ -2406,6 +2435,66 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter) } } +static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + assert(blk->reinfo.enable); + + qatomic_inc(&blk->reinfo.in_flight); + QTAILQ_INSERT_TAIL(&blk->reinfo.re_aios, acb, list); + timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + + blk->reinfo.timer_interval_ms); +} + +static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ + QTAILQ_REMOVE(&blk->reinfo.re_aios, acb, list); + qatomic_dec(&blk->reinfo.in_flight); +} + +static void blk_rehandle_timer_cb(void *opaque) +{ + BlockBackend *blk = opaque; + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + BlkAioEmAIOCB *acb, *tmp; + Coroutine *co; + + aio_context_acquire(blk_get_aio_context(blk)); + QTAILQ_FOREACH_SAFE(acb, &reinfo->re_aios, list, tmp) { + if (acb->rwco.ret == NOT_DONE) { + continue; + } + + blk_inc_in_flight(acb->rwco.blk); + acb->rwco.ret = NOT_DONE; + acb->has_returned = false; + blk_rehandle_remove_aiocb(acb->rwco.blk, acb); + + co = qemu_coroutine_create(acb->co_entry, acb); + qemu_coroutine_enter(co); + + acb->has_returned = true; + if (acb->rwco.ret != NOT_DONE) { + replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); + } + } + aio_context_release(blk_get_aio_context(blk)); +} + +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) +{ + if (acb->has_returned) { + blk_dec_in_flight(acb->rwco.blk); + if (acb->rwco.ret == -EIO) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } + + acb->common.cb(acb->common.opaque, acb->rwco.ret); + qemu_aio_unref(acb); + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); From patchwork Wed Sep 30 09:46:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 304061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 176C8C4727E for ; Wed, 30 Sep 2020 09:52:31 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8CB2E2075F for ; Wed, 30 Sep 2020 09:52:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CB2E2075F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:32978 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYmj-0004gU-Mp for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:52:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37554) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhR-00072z-W0 for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:02 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:38748 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oQ-TJ for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:01 -0400 Received: from DGGEMS412-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 1AF423D944E5AF00A8BE; Wed, 30 Sep 2020 17:46:54 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS412-HUB.china.huawei.com (10.3.19.212) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:43 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 3/8] block-backend: add I/O hang timeout Date: Wed, 30 Sep 2020 17:46:01 +0800 Message-ID: <20200930094606.5323-4-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.32; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:49 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Not all errors would be fixed, so it is better to add a rehandle timeout for I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 99 +++++++++++++++++++++++++++++++++- include/sysemu/block-backend.h | 2 + 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 8050669d23..90fcc678b5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +enum BlockIOHangStatus { + BLOCK_IO_HANG_STATUS_NORMAL = 0, + BLOCK_IO_HANG_STATUS_HANG, +}; + typedef struct BlockBackendRehandleInfo { bool enable; QEMUTimer *ts; @@ -109,6 +114,11 @@ struct BlockBackend { unsigned int in_flight; BlockBackendRehandleInfo reinfo; + + int64_t iohang_timeout; /* The I/O hang timeout value in sec. */ + int64_t iohang_time; /* The I/O hang start time */ + bool is_iohang_timeout; + int iohang_status; }; typedef struct BlockBackendAIOCB { @@ -2481,20 +2491,107 @@ static void blk_rehandle_timer_cb(void *opaque) aio_context_release(blk_get_aio_context(blk)); } +static bool blk_iohang_handle(BlockBackend *blk, int new_status) +{ + int64_t now; + int old_status = blk->iohang_status; + bool need_rehandle = false; + + switch (new_status) { + case BLOCK_IO_HANG_STATUS_NORMAL: + if (old_status == BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O Hang is recovered */ + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + } + break; + case BLOCK_IO_HANG_STATUS_HANG: + if (old_status != BLOCK_IO_HANG_STATUS_HANG) { + /* Case when I/O hang is first triggered */ + blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + need_rehandle = true; + } else { + if (!blk->is_iohang_timeout) { + now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; + if (now >= (blk->iohang_time + blk->iohang_timeout)) { + /* Case when I/O hang is timeout */ + blk->is_iohang_timeout = true; + } else { + /* Case when I/O hang is continued */ + need_rehandle = true; + } + } + } + break; + default: + break; + } + + blk->iohang_status = new_status; + return need_rehandle; +} + +static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout) +{ + bool need_rehandle = false; + + /* Rehandle aio which returns EIO before hang timeout */ + if (acb->rwco.ret == -EIO) { + if (acb->rwco.blk->is_iohang_timeout) { + /* I/O hang has timeout and not recovered */ + *has_timeout = true; + } else { + need_rehandle = blk_iohang_handle(acb->rwco.blk, + BLOCK_IO_HANG_STATUS_HANG); + /* I/O hang timeout first trigger */ + if (acb->rwco.blk->is_iohang_timeout) { + *has_timeout = true; + } + } + } + + return need_rehandle; +} + static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) { + bool has_timeout = false; + bool need_rehandle = false; + if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - if (acb->rwco.ret == -EIO) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { blk_rehandle_insert_aiocb(acb->rwco.blk, acb); return; } acb->common.cb(acb->common.opaque, acb->rwco.ret); + + /* I/O hang return to normal status */ + if (!has_timeout) { + blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL); + } + qemu_aio_unref(acb); } } +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) +{ + if (!blk) { + return; + } + + blk->is_iohang_timeout = false; + blk->iohang_time = 0; + blk->iohang_timeout = 0; + blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; + if (iohang_timeout > 0) { + blk->iohang_timeout = iohang_timeout; + } +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8203d7f6f9..bfebe3a960 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); + #endif From patchwork Wed Sep 30 09:46:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 272388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E750CC4727E for ; Wed, 30 Sep 2020 09:52:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 637B62075F for ; Wed, 30 Sep 2020 09:52:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 637B62075F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:32882 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYmh-0004e9-K4 for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:52:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37564) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhS-00073N-5I for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:02 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5156 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oG-Sn for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:01 -0400 Received: from DGGEMS406-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 36DBC4306B37C9C4DB62; Wed, 30 Sep 2020 17:46:53 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS406-HUB.china.huawei.com (10.3.19.206) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:43 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 4/8] block-backend: add I/O rehandle pause/unpause Date: Wed, 30 Sep 2020 17:46:02 +0800 Message-ID: <20200930094606.5323-5-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Sometimes there is no need to rehandle AIOs although I/O hang is enabled. For example, when deleting a block backend, we have to wait AIO completed by calling blk_drain(), but not care about the results. So a pause interface of I/O hang is helpful to bypass the rehandle mechanism. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 60 +++++++++++++++++++++++++++++++--- include/sysemu/block-backend.h | 2 ++ 2 files changed, 58 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index 90fcc678b5..c16d95a2c9 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +#define BLOCK_BACKEND_REHANDLE_NORMAL 1 +#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED 2 +#define BLOCK_BACKEND_REHANDLE_DRAINED 3 enum BlockIOHangStatus { BLOCK_IO_HANG_STATUS_NORMAL = 0, @@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo { unsigned int in_flight; QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; + + int status; } BlockBackendRehandleInfo; typedef struct BlockBackendAioNotifier { @@ -2461,6 +2466,51 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) qatomic_dec(&blk->reinfo.in_flight); } +static void blk_rehandle_drain(BlockBackend *blk) +{ + if (blk_bs(blk)) { + bdrv_drained_begin(blk_bs(blk)); + BDRV_POLL_WHILE(blk_bs(blk), qatomic_read(&blk->reinfo.in_flight) > 0); + bdrv_drained_end(blk_bs(blk)); + } +} + +static bool blk_rehandle_is_paused(BlockBackend *blk) +{ + return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED || + blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED; +} + +void blk_rehandle_pause(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED; + blk_rehandle_drain(blk); + reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED; + aio_context_release(blk_get_aio_context(blk)); +} + +void blk_rehandle_unpause(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_NORMAL) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL; + aio_context_release(blk_get_aio_context(blk)); +} + static void blk_rehandle_timer_cb(void *opaque) { BlockBackend *blk = opaque; @@ -2560,10 +2610,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); - need_rehandle = blk_rehandle_aio(acb, &has_timeout); - if (need_rehandle) { - blk_rehandle_insert_aiocb(acb->rwco.blk, acb); - return; + if (!blk_rehandle_is_paused(acb->rwco.blk)) { + need_rehandle = blk_rehandle_aio(acb, &has_timeout); + if (need_rehandle) { + blk_rehandle_insert_aiocb(acb->rwco.blk, acb); + return; + } } acb->common.cb(acb->common.opaque, acb->rwco.ret); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index bfebe3a960..391a047444 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,6 +268,8 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_rehandle_pause(BlockBackend *blk); +void blk_rehandle_unpause(BlockBackend *blk); void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); #endif From patchwork Wed Sep 30 09:46:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 272387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14E77C4727E for ; Wed, 30 Sep 2020 09:54:11 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A7E7F2075F for ; Wed, 30 Sep 2020 09:54:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A7E7F2075F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:37542 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYoL-0006Wh-Lf for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:54:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37580) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhS-000743-SL for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:03 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5157 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhN-0003oL-Ca for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:02 -0400 Received: from DGGEMS414-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 0352F2C7253BBC742785; Wed, 30 Sep 2020 17:46:54 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS414-HUB.china.huawei.com (10.3.19.214) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:44 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 5/8] block-backend: enable I/O hang when timeout is set Date: Wed, 30 Sep 2020 17:46:03 +0800 Message-ID: <20200930094606.5323-6-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Setting a non-zero timeout of I/O hang indicates I/O hang is enabled for the block backend. And when the block backend is going to be deleted, we should disable I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 40 ++++++++++++++++++++++++++++++++++ include/sysemu/block-backend.h | 1 + 2 files changed, 41 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index c16d95a2c9..c812b3a9c7 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -34,6 +34,7 @@ #define NOT_DONE 0x7fffffff /* used while emulated sync operation in progress */ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); +static void blk_rehandle_disable(BlockBackend *blk); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 @@ -476,6 +477,8 @@ static void blk_delete(BlockBackend *blk) assert(!blk->refcnt); assert(!blk->name); assert(!blk->dev); + assert(qatomic_read(&blk->reinfo.in_flight) == 0); + blk_rehandle_disable(blk); if (blk->public.throttle_group_member.throttle_state) { blk_io_limits_disable(blk); } @@ -2629,6 +2632,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) } } +static void blk_rehandle_enable(BlockBackend *blk) +{ + BlockBackendRehandleInfo *reinfo = &blk->reinfo; + + aio_context_acquire(blk_get_aio_context(blk)); + if (reinfo->enable) { + aio_context_release(blk_get_aio_context(blk)); + return; + } + + reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, blk); + reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL; + reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL; + reinfo->enable = true; + aio_context_release(blk_get_aio_context(blk)); +} + +static void blk_rehandle_disable(BlockBackend *blk) +{ + if (!blk->reinfo.enable) { + return; + } + + blk_rehandle_pause(blk); + timer_del(blk->reinfo.ts); + timer_free(blk->reinfo.ts); + blk->reinfo.ts = NULL; + blk->reinfo.enable = false; +} + +bool blk_iohang_is_enabled(BlockBackend *blk) +{ + return blk->iohang_timeout != 0; +} + void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) { if (!blk) { @@ -2641,6 +2680,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; if (iohang_timeout > 0) { blk->iohang_timeout = iohang_timeout; + blk_rehandle_enable(blk); } } diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 391a047444..851b90b99b 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -270,6 +270,7 @@ int blk_make_empty(BlockBackend *blk, Error **errp); void blk_rehandle_pause(BlockBackend *blk); void blk_rehandle_unpause(BlockBackend *blk); +bool blk_iohang_is_enabled(BlockBackend *blk); void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); #endif From patchwork Wed Sep 30 09:46:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 272390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20289C4727F for ; Wed, 30 Sep 2020 09:48:59 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A79C52074A for ; Wed, 30 Sep 2020 09:48:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A79C52074A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:49610 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYjJ-0008Ku-Rz for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:48:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37520) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhQ-00072O-BO for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:00 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:5154 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oJ-T7 for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:00 -0400 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id A4B1E7493571594515EC; Wed, 30 Sep 2020 17:46:52 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:44 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 6/8] virtio-blk: pause I/O hang when resetting Date: Wed, 30 Sep 2020 17:46:04 +0800 Message-ID: <20200930094606.5323-7-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.190; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" When resetting virtio-blk, we have to drain all AIOs but do not care about the results. So it is necessary to disable I/O hang before resetting virtio-blk, and enable it after resetting. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- hw/block/virtio-blk.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index bac2d6fa2b..f96e4ac274 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -899,6 +899,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) AioContext *ctx; VirtIOBlockReq *req; + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_pause(s->blk); + } + ctx = blk_get_aio_context(s->blk); aio_context_acquire(ctx); blk_drain(s->blk); @@ -916,6 +920,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) assert(!s->dataplane_started); blk_set_enable_write_cache(s->blk, s->original_wce); + + if (blk_iohang_is_enabled(s->blk)) { + blk_rehandle_unpause(s->blk); + } } /* coalesce internal state, copy to pci i/o region 0 From patchwork Wed Sep 30 09:46:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 304059 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7FF8C4727E for ; Wed, 30 Sep 2020 09:56:49 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 469C32075F for ; Wed, 30 Sep 2020 09:56:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 469C32075F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:45670 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYqu-0001Ru-D6 for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:56:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37586) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhT-000751-RZ for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:04 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:40786 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oK-U6 for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:03 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id B31CF6D48898F760592F; Wed, 30 Sep 2020 17:46:52 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:45 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 7/8] qemu-option: add I/O hang timeout option Date: Wed, 30 Sep 2020 17:46:05 +0800 Message-ID: <20200930094606.5323-8-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.35; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" I/O hang timeout should be different under different situations. So it is better to provide an option for user to determine I/O hang timeout for each block device. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/blockdev.c b/blockdev.c index bebd3ba1c3..127a7ea894 100644 --- a/blockdev.c +++ b/blockdev.c @@ -500,6 +500,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF; const char *throttling_group = NULL; + int64_t iohang_timeout = 0; /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ @@ -622,6 +623,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, bs->detect_zeroes = detect_zeroes; + /* init timeout value for I/O Hang */ + iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0); + if (iohang_timeout > 0) { + blk_iohang_init(blk, iohang_timeout); + } + block_acct_setup(blk_get_stats(blk), account_invalid, account_failed); if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) { @@ -3786,6 +3793,10 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_BOOL, .help = "whether to account for failed I/O operations " "in the statistics", + },{ + .name = "iohang-timeout", + .type = QEMU_OPT_NUMBER, + .help = "timeout value for I/O Hang", }, { /* end of list */ } }, From patchwork Wed Sep 30 09:46:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: cenjiahui X-Patchwork-Id: 272389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.0 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8BDAC4727E for ; Wed, 30 Sep 2020 09:50:51 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6589D2074A for ; Wed, 30 Sep 2020 09:50:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6589D2074A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55626 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kNYl8-0002OF-Kh for qemu-devel@archiver.kernel.org; Wed, 30 Sep 2020 05:50:50 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:37550) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhR-00072w-Kl for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:01 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]:40788 helo=huawei.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kNYhM-0003oI-TM for qemu-devel@nongnu.org; Wed, 30 Sep 2020 05:47:01 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id AEF0E67C36DC74AF360E; Wed, 30 Sep 2020 17:46:52 +0800 (CST) Received: from localhost (10.174.186.107) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.487.0; Wed, 30 Sep 2020 17:46:46 +0800 From: Jiahui Cen To: Subject: [RFC PATCH v2 8/8] qapi: add I/O hang and I/O hang timeout qapi event Date: Wed, 30 Sep 2020 17:46:06 +0800 Message-ID: <20200930094606.5323-9-cenjiahui@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200930094606.5323-1-cenjiahui@huawei.com> References: <20200930094606.5323-1-cenjiahui@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.186.107] X-CFilter-Loop: Reflected Received-SPF: pass client-ip=45.249.212.35; envelope-from=cenjiahui@huawei.com; helo=huawei.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/30 05:46:53 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, fangying1@huawei.com, cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, mreitz@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Sometimes hypervisor management tools like libvirt may need to monitor I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 3 +++ qapi/block-core.json | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index c812b3a9c7..42337ceb04 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2556,6 +2556,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O Hang is recovered */ blk->is_iohang_timeout = false; blk->iohang_time = 0; + qapi_event_send_block_io_hang(false); } break; case BLOCK_IO_HANG_STATUS_HANG: @@ -2563,12 +2564,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O hang is first triggered */ blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; need_rehandle = true; + qapi_event_send_block_io_hang(true); } else { if (!blk->is_iohang_timeout) { now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; if (now >= (blk->iohang_time + blk->iohang_timeout)) { /* Case when I/O hang is timeout */ blk->is_iohang_timeout = true; + qapi_event_send_block_io_hang_timeout(true); } else { /* Case when I/O hang is continued */ need_rehandle = true; diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e11d..7bdf75c6d7 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5535,3 +5535,29 @@ { 'command': 'blockdev-snapshot-delete-internal-sync', 'data': { 'device': 'str', '*id': 'str', '*name': 'str'}, 'returns': 'SnapshotInfo' } + +## +# @BLOCK_IO_HANG: +# +# Emitted when device I/O hang trigger event begin or end +# +# @set: true if I/O hang begin; false if I/O hang end. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG', + 'data': { 'set': 'bool' }} + +## +# @BLOCK_IO_HANG_TIMEOUT: +# +# Emitted when device I/O hang timeout event set or clear +# +# @set: true if set; false if clear. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG_TIMEOUT', + 'data': { 'set': 'bool' }}