From patchwork Fri May 4 17:17:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 134987 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp301128lji; Fri, 4 May 2018 10:17:29 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrlxckgQre/+d+UQobZvL3blHrQ9ZSSUS8Ml8fAnZitZUiWKiky+AVeVUx9J5jmyq+qrj58 X-Received: by 10.98.91.2 with SMTP id p2mr19524244pfb.96.1525454248845; Fri, 04 May 2018 10:17:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525454248; cv=none; d=google.com; s=arc-20160816; b=m+5Q0EF353PSYXA7lcFWu3ebz9/koAav8X7ZXTEvita07hpMhJvWqtolRIORWigsMF djD08wn/hmuTC7zcwamzY6cMrvA+r8T4WMDb6EM7PwLKjvIUgSQtK+LxQCLT54ETAUfx c5tWfkNI5guT9LoZGIuDPH7yF11CGGNWpo6V2Hff3Q0TzrsYfy7ehWzCJ9gMRjNFEha9 MO8DEMRxpASGMRQAe6N7KZ6nQIKdssvfdkkTQg2XrNeRvnd8Y0EtYTJSJzjNNKhkES/5 wE4cWKxnnbJpEr3nZm7CsJmveI0swxVzB4+M3eI6T1bNdBzRUnHWMmM37vCTDKt4Tf24 6Yiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=UVkXvcDmygTcPShsDdnFp0z9ChE169XlFkLyhjz0BS4=; b=gYhMJ+on3+74mMn+lu7cSeQOzJA/qMDllYNsDavAxJ49S5UquIEDPeZrcSmbBBWiI9 fU1pH090/P7adbGGqgigwEml/I2prhTLdFDgUQDaWHouZxv3b5tFjUpJFCwiGB1QGpCI UUldPk4Ic1ki7URtjgGY6u17xNk43MkBuP1FxE9NAUDh5rbirTjE5yLKRSQ9LLv9xUWe qY8NGz/LNbQxuNjU2IkoA0gj8TOrQdnuBBDQEHjxGYFFEROp2fnQ07XBhhRU83QsSz9a BaOc/AxWwKpOOwtwnqin+j7haOqGkdp1JsL4Q7XAINLtJvOx254IKrorxdSQsRV/tB6+ iKfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WXrJuLL3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si11082245pff.184.2018.05.04.10.17.28; Fri, 04 May 2018 10:17:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WXrJuLL3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751840AbeEDRR0 (ORCPT + 29 others); Fri, 4 May 2018 13:17:26 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:42581 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751771AbeEDRRX (ORCPT ); Fri, 4 May 2018 13:17:23 -0400 Received: by mail-wr0-f194.google.com with SMTP id v5-v6so21815843wrf.9 for ; Fri, 04 May 2018 10:17:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=UVkXvcDmygTcPShsDdnFp0z9ChE169XlFkLyhjz0BS4=; b=WXrJuLL3LiQYpg5k+aCuQo3K+KYLpduycYnMEU8S/o6LcfmU4Dle7vtOosS/mFGq65 DSBosoGlsYI+XBpKGsVrvIdyIeimfVUWkYUv0Z5EzpQ5G7cuFf7LyVM1noOpWi5Lwpzm k2xjWuQDYPD0sNQIQUlFAVcSl0mA9ChgGsJoY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=UVkXvcDmygTcPShsDdnFp0z9ChE169XlFkLyhjz0BS4=; b=Gng4FipgIVZ7pRmOavHgpv8k1mrJh6CwLcle9GIDN7pheSG6z57Xi2PVGDHu0BT6bA xQ1gdP0obruOqc7YDZFl2x9slgb5hQW3zDakSSv4z6+Y/dTgc6NEKOl9xvcd3a2HhiEJ onZIjAyuJUsIua4A6ehXl5Lvb+b1KNBwOnPOyofoc2hBfHfbgF7xYgZdgl6E8M/c/1ag FDNPN/Q/PRrfOlfZIfCfMDQjr22p3aXQ+fjQAR/70GMwhbFhy/Yybol7VgklJt7Q1+Tb R4lcdc/oc6FLHXjvv6Y9ybWhT7tKtDnwC9g2zbrgR9P81nz6HwytwYFFIPVpn3xrTx/2 xBug== X-Gm-Message-State: ALQs6tDul8MiF7VSPNvkQNjlX4H1C6/W2rnl19F8TzH5S4+5YCA7jOVF AtS8eCLpqmVe/Urk/unyoFlnBg== X-Received: by 2002:adf:8f25:: with SMTP id p34-v6mr21496614wrb.193.1525454241947; Fri, 04 May 2018 10:17:21 -0700 (PDT) Received: from localhost.localdomain (146-241-0-15.dyn.eolo.it. [146.241.0.15]) by smtp.gmail.com with ESMTPSA id r200sm3092709wmb.39.2018.05.04.10.17.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 04 May 2018 10:17:21 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, Paolo Valente Subject: [PATCH BUGFIX] block, bfq: postpone rq preparation to insert or merge Date: Fri, 4 May 2018 19:17:01 +0200 Message-Id: <20180504171701.6876-1-paolo.valente@linaro.org> X-Mailer: git-send-email 2.16.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When invoked for an I/O request rq, the prepare_request hook of bfq increments reference counters in the destination bfq_queue for rq. In this respect, after this hook has been invoked, rq may still be transformed into a request with no icq attached, i.e., for bfq, a request not associated with any bfq_queue. No further hook is invoked to signal this tranformation to bfq (in general, to the destination elevator for rq). This leads bfq into an inconsistent state, because bfq has no chance to correctly lower these counters back. This inconsistency may in its turn cause incorrect scheduling and hangs. It certainly causes memory leaks, by making it impossible for bfq to free the involved bfq_queue. On the bright side, no transformation can still happen for rq after rq has been inserted into bfq, or merged with another, already inserted, request. Exploiting this fact, this commit addresses the above issue by delaying the preparation of an I/O request to when the request is inserted or merged. This change also gives a performance bonus: a lock-contention point gets removed. To prepare a request, bfq needs to hold its scheduler lock. After postponing request preparation to insertion or merging, no lock needs to be grabbed any longer in the prepare_request hook, while the lock already taken to perform insertion or merging is used to preparare the request as well. Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 86 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 57 insertions(+), 29 deletions(-) -- 2.16.1 Tested-by: Oleksandr Natalenko Tested-by: Bart Van Assche diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 771ae9730ac6..ea02162df6c7 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -1858,6 +1858,8 @@ static int bfq_request_merge(struct request_queue *q, struct request **req, return ELEVATOR_NO_MERGE; } +static struct bfq_queue *bfq_init_rq(struct request *rq); + static void bfq_request_merged(struct request_queue *q, struct request *req, enum elv_merge type) { @@ -1866,7 +1868,7 @@ static void bfq_request_merged(struct request_queue *q, struct request *req, blk_rq_pos(req) < blk_rq_pos(container_of(rb_prev(&req->rb_node), struct request, rb_node))) { - struct bfq_queue *bfqq = RQ_BFQQ(req); + struct bfq_queue *bfqq = bfq_init_rq(req); struct bfq_data *bfqd = bfqq->bfqd; struct request *prev, *next_rq; @@ -1894,7 +1896,8 @@ static void bfq_request_merged(struct request_queue *q, struct request *req, static void bfq_requests_merged(struct request_queue *q, struct request *rq, struct request *next) { - struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next); + struct bfq_queue *bfqq = bfq_init_rq(rq), + *next_bfqq = bfq_init_rq(next); if (!RB_EMPTY_NODE(&rq->rb_node)) goto end; @@ -4540,14 +4543,12 @@ static inline void bfq_update_insert_stats(struct request_queue *q, unsigned int cmd_flags) {} #endif -static void bfq_prepare_request(struct request *rq, struct bio *bio); - static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head) { struct request_queue *q = hctx->queue; struct bfq_data *bfqd = q->elevator->elevator_data; - struct bfq_queue *bfqq = RQ_BFQQ(rq); + struct bfq_queue *bfqq; bool idle_timer_disabled = false; unsigned int cmd_flags; @@ -4562,24 +4563,13 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, blk_mq_sched_request_inserted(rq); spin_lock_irq(&bfqd->lock); + bfqq = bfq_init_rq(rq); if (at_head || blk_rq_is_passthrough(rq)) { if (at_head) list_add(&rq->queuelist, &bfqd->dispatch); else list_add_tail(&rq->queuelist, &bfqd->dispatch); - } else { - if (WARN_ON_ONCE(!bfqq)) { - /* - * This should never happen. Most likely rq is - * a requeued regular request, being - * re-inserted without being first - * re-prepared. Do a prepare, to avoid - * failure. - */ - bfq_prepare_request(rq, rq->bio); - bfqq = RQ_BFQQ(rq); - } - + } else { /* bfqq is assumed to be non null here */ idle_timer_disabled = __bfq_insert_request(bfqd, rq); /* * Update bfqq, because, if a queue merge has occurred @@ -4922,11 +4912,48 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd, } /* - * Allocate bfq data structures associated with this request. + * Only reset private fields. The actual request preparation will be + * performed by bfq_init_rq, when rq is either inserted or merged. See + * comments on bfq_init_rq for the reason behind this delayed + * preparation. */ static void bfq_prepare_request(struct request *rq, struct bio *bio) +{ + /* + * Regardless of whether we have an icq attached, we have to + * clear the scheduler pointers, as they might point to + * previously allocated bic/bfqq structs. + */ + rq->elv.priv[0] = rq->elv.priv[1] = NULL; +} + +/* + * If needed, init rq, allocate bfq data structures associated with + * rq, and increment reference counters in the destination bfq_queue + * for rq. Return the destination bfq_queue for rq, or NULL is rq is + * not associated with any bfq_queue. + * + * This function is invoked by the functions that perform rq insertion + * or merging. One may have expected the above preparation operations + * to be performed in bfq_prepare_request, and not delayed to when rq + * is inserted or merged. The rationale behind this delayed + * preparation is that, after the prepare_request hook is invoked for + * rq, rq may still be transformed into a request with no icq, i.e., a + * request not associated with any queue. No bfq hook is invoked to + * signal this tranformation. As a consequence, should these + * preparation operations be performed when the prepare_request hook + * is invoked, and should rq be transformed one moment later, bfq + * would end up in an inconsistent state, because it would have + * incremented some queue counters for an rq destined to + * transformation, without any chance to correctly lower these + * counters back. In contrast, no transformation can still happen for + * rq after rq has been inserted or merged. So, it is safe to execute + * these preparation operations when rq is finally inserted or merged. + */ +static struct bfq_queue *bfq_init_rq(struct request *rq) { struct request_queue *q = rq->q; + struct bio *bio = rq->bio; struct bfq_data *bfqd = q->elevator->elevator_data; struct bfq_io_cq *bic; const int is_sync = rq_is_sync(rq); @@ -4934,20 +4961,21 @@ static void bfq_prepare_request(struct request *rq, struct bio *bio) bool new_queue = false; bool bfqq_already_existing = false, split = false; + if (unlikely(!rq->elv.icq)) + return NULL; + /* - * Even if we don't have an icq attached, we should still clear - * the scheduler pointers, as they might point to previously - * allocated bic/bfqq structs. + * Assuming that elv.priv[1] is set only if everything is set + * for this rq. This holds true, because this function is + * invoked only for insertion or merging, and, after such + * events, a request cannot be manipulated any longer before + * being removed from bfq. */ - if (!rq->elv.icq) { - rq->elv.priv[0] = rq->elv.priv[1] = NULL; - return; - } + if (rq->elv.priv[1]) + return rq->elv.priv[1]; bic = icq_to_bic(rq->elv.icq); - spin_lock_irq(&bfqd->lock); - bfq_check_ioprio_change(bic, bio); bfq_bic_update_cgroup(bic, bio); @@ -5006,7 +5034,7 @@ static void bfq_prepare_request(struct request *rq, struct bio *bio) if (unlikely(bfq_bfqq_just_created(bfqq))) bfq_handle_burst(bfqd, bfqq); - spin_unlock_irq(&bfqd->lock); + return bfqq; } static void bfq_idle_slice_timer_body(struct bfq_queue *bfqq)