From patchwork Mon Jun 24 19:40:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167642 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615540ilk; Mon, 24 Jun 2019 12:41:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqyr1C13yl2+xZzAsMGV3x1TGukClw9T0gZUNoh4tZkIwZF1IUqzAGNGwQG52uk+AJKWfYrz X-Received: by 2002:a17:902:1004:: with SMTP id b4mr25712465pla.325.1561405300533; Mon, 24 Jun 2019 12:41:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405300; cv=none; d=google.com; s=arc-20160816; b=LA86UsY5WzssYYl1o5ZuECyVteLBbuwE3lJrdpM8+ojNF7Nd8IlzkV7gkLJCkKIB1g PxDLzjPyitgNcvd2h+RSuT2bRT4daOfUhkv6IGB//CUsgdYLHn+8c9YWJnKmBLTFZ23m RyYj4TGc5vCRAJUyMowjdUJLqfirIy8xHxkVo08gQAlln/GeKmA0djbUQoF5gNO5c5n2 pWGBUnC/y1AURr3pDd6hPheCYuRZ6zVCS2wiXY/OPGUiDrfUr7j7pcoznKg8R0bcjfCw AQpZjJuMIRrs8AXEmWkIajC+HbVRU8qguYtvs6CnfwpcxLrAlfAQp10lTqw1u2qew6q5 4sqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rIrEF7fqiupFIjD7AQLk7kwooWZ2ci8TyHX3uP5+Vdg=; b=ukzWlbXPa1F6jR6LP7DPJMMvXmHfOAm3fSgQkDU45fHe9W/7qQDBOAVWBsDbUCk0bZ UbOg/75QuysBIgFjMo65rQGTf4Ww0SI5vb+JAqu0nnV5GP4/jaQICsadoPrZGtGRdXtm gsqdq4295apv1R0DLu2vW5mdTOO1NEaQ00kqpfwtHtRnSQCnxwNS7sZYa5gFvGia77pZ oMldFXG0BGwCcbJmd+Creh/FkkAyd3e7yo+z2bc+uMOdyxHbSH7LSlSTk2NPHdHxD4/M FbOviPIIm0Kkc2KaOyRtJ6Kr+JNN7uUKUPLJ94+jIAXpBPIEMkONkTcgm5BSzPwMv7vd B4kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=T4BDLNKF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y17si12061455pfe.220.2019.06.24.12.41.40; Mon, 24 Jun 2019 12:41:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=T4BDLNKF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731112AbfFXTli (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:38 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:55725 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726414AbfFXTlK (ORCPT ); Mon, 24 Jun 2019 15:41:10 -0400 Received: by mail-wm1-f68.google.com with SMTP id a15so473333wmj.5 for ; Mon, 24 Jun 2019 12:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rIrEF7fqiupFIjD7AQLk7kwooWZ2ci8TyHX3uP5+Vdg=; b=T4BDLNKF6YXrrP3MVBJ+/3PIHBdQ84edc6MwZfSBXner3JGB43FRyGDN9aStuNm3g6 ww3etrDOHkB2aBfCEZ9wejjBdGtQRmEJ4tu3kl6JCgrwURon4NtoOXwuDK1e5bG1RKe5 0qzy2Nds/sNVO5RyiwzAp66e7jQbuyErF3M6K00B9ZQbwSDcMFC2EhztLwMLELzgBNe0 ibM0urf5jU+gj3I4lGNXhLlThqPCEWr8BMG+aVSYqkap+VvnPFdfiK3Kb6zWhxxF23y8 EfrR75gmwwZaaLyITaAZy1spIrIVA3sf3BoGrph8ltbq+S3avz+t3Lndiy76GdRvlPhC vKUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=rIrEF7fqiupFIjD7AQLk7kwooWZ2ci8TyHX3uP5+Vdg=; b=bfczx/ffYQEzES3Gk2prHI1SYNUAOgjVbIO6zgpNfPVRXfVcmzN9mekJofxadBJJlH zGC1dZnfJ8E1wumrpfnI1GNwfXVJwfyFVGXI6uCXLzGt38GRZCNh1ZNbVxuBddTIfNK/ mQMDrFsVH2abpg0bl2DYfGvc0SIPIFsC86fBV+ioWd7yt3OwPJgME+DzlyvyrbdJo27M jPBSb+HQ2Q40DHFRYblQotafvvitbf8/9Zb7I3Cj6Svbg/Y4VG9FPMlkkJQOU1UKdaR9 YbkL1+Yg5FvDEoU/hMNCXC89fRHjYnFxbCE1jj0X6HS9sARun2XSUNx+gx/+QGE+HAsQ ZDxw== X-Gm-Message-State: APjAAAUX/wXO644pdpLbViCiEvU7jJqlMsOIr9LjaE0Ug2yUaMSRHyfH nxo/Gvv67iEBp4UO8BP4nw0XOg== X-Received: by 2002:a1c:a783:: with SMTP id q125mr17917503wme.94.1561405266374; Mon, 24 Jun 2019 12:41:06 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:05 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 1/7] block, bfq: reset inject limit when think-time state changes Date: Mon, 24 Jun 2019 21:40:36 +0200 Message-Id: <20190624194042.38747-2-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Until the base value of the request service times gets finally computed for a bfq_queue, the inject limit does depend on the think-time state (short|long). The limit must be 0 or 1 if the think time is deemed, respectively, as short or long. However, such a check and possible limit update is performed only periodically, once per second. So, to make the injection mechanism much more reactive, this commit performs the update also every time the think-time state changes. In addition, in the following special case, this commit lets the inject limit of a bfq_queue bfqq remain equal to 1 even if bfqq's think time is short: bfqq's I/O is synchronized with that of some other queue, i.e., bfqq may receive new I/O only after the I/O of the other queue is completed. Keeping the inject limit to 1 allows the blocking I/O to be served while bfqq is in service. And this is very convenient both for bfqq and for the total throughput, as explained in detail in the comments in bfq_update_has_short_ttime(). Tested-by: Srivatsa S. Bhat Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 219 ++++++++++++++++++++++++++++++-------------- 1 file changed, 151 insertions(+), 68 deletions(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 44c6bbcd7720..9bc10198ddff 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -1735,6 +1735,72 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd, false, BFQQE_PREEMPTED); } +static void bfq_reset_inject_limit(struct bfq_data *bfqd, + struct bfq_queue *bfqq) +{ + /* invalidate baseline total service time */ + bfqq->last_serv_time_ns = 0; + + /* + * Reset pointer in case we are waiting for + * some request completion. + */ + bfqd->waited_rq = NULL; + + /* + * If bfqq has a short think time, then start by setting the + * inject limit to 0 prudentially, because the service time of + * an injected I/O request may be higher than the think time + * of bfqq, and therefore, if one request was injected when + * bfqq remains empty, this injected request might delay the + * service of the next I/O request for bfqq significantly. In + * case bfqq can actually tolerate some injection, then the + * adaptive update will however raise the limit soon. This + * lucky circumstance holds exactly because bfqq has a short + * think time, and thus, after remaining empty, is likely to + * get new I/O enqueued---and then completed---before being + * expired. This is the very pattern that gives the + * limit-update algorithm the chance to measure the effect of + * injection on request service times, and then to update the + * limit accordingly. + * + * However, in the following special case, the inject limit is + * left to 1 even if the think time is short: bfqq's I/O is + * synchronized with that of some other queue, i.e., bfqq may + * receive new I/O only after the I/O of the other queue is + * completed. Keeping the inject limit to 1 allows the + * blocking I/O to be served while bfqq is in service. And + * this is very convenient both for bfqq and for overall + * throughput, as explained in detail in the comments in + * bfq_update_has_short_ttime(). + * + * On the opposite end, if bfqq has a long think time, then + * start directly by 1, because: + * a) on the bright side, keeping at most one request in + * service in the drive is unlikely to cause any harm to the + * latency of bfqq's requests, as the service time of a single + * request is likely to be lower than the think time of bfqq; + * b) on the downside, after becoming empty, bfqq is likely to + * expire before getting its next request. With this request + * arrival pattern, it is very hard to sample total service + * times and update the inject limit accordingly (see comments + * on bfq_update_inject_limit()). So the limit is likely to be + * never, or at least seldom, updated. As a consequence, by + * setting the limit to 1, we avoid that no injection ever + * occurs with bfqq. On the downside, this proactive step + * further reduces chances to actually compute the baseline + * total service time. Thus it reduces chances to execute the + * limit-update algorithm and possibly raise the limit to more + * than 1. + */ + if (bfq_bfqq_has_short_ttime(bfqq)) + bfqq->inject_limit = 0; + else + bfqq->inject_limit = 1; + + bfqq->decrease_time_jif = jiffies; +} + static void bfq_add_request(struct request *rq) { struct bfq_queue *bfqq = RQ_BFQQ(rq); @@ -1755,71 +1821,8 @@ static void bfq_add_request(struct request *rq) * bfq_update_inject_limit(). */ if (time_is_before_eq_jiffies(bfqq->decrease_time_jif + - msecs_to_jiffies(1000))) { - /* invalidate baseline total service time */ - bfqq->last_serv_time_ns = 0; - - /* - * Reset pointer in case we are waiting for - * some request completion. - */ - bfqd->waited_rq = NULL; - - /* - * If bfqq has a short think time, then start - * by setting the inject limit to 0 - * prudentially, because the service time of - * an injected I/O request may be higher than - * the think time of bfqq, and therefore, if - * one request was injected when bfqq remains - * empty, this injected request might delay - * the service of the next I/O request for - * bfqq significantly. In case bfqq can - * actually tolerate some injection, then the - * adaptive update will however raise the - * limit soon. This lucky circumstance holds - * exactly because bfqq has a short think - * time, and thus, after remaining empty, is - * likely to get new I/O enqueued---and then - * completed---before being expired. This is - * the very pattern that gives the - * limit-update algorithm the chance to - * measure the effect of injection on request - * service times, and then to update the limit - * accordingly. - * - * On the opposite end, if bfqq has a long - * think time, then start directly by 1, - * because: - * a) on the bright side, keeping at most one - * request in service in the drive is unlikely - * to cause any harm to the latency of bfqq's - * requests, as the service time of a single - * request is likely to be lower than the - * think time of bfqq; - * b) on the downside, after becoming empty, - * bfqq is likely to expire before getting its - * next request. With this request arrival - * pattern, it is very hard to sample total - * service times and update the inject limit - * accordingly (see comments on - * bfq_update_inject_limit()). So the limit is - * likely to be never, or at least seldom, - * updated. As a consequence, by setting the - * limit to 1, we avoid that no injection ever - * occurs with bfqq. On the downside, this - * proactive step further reduces chances to - * actually compute the baseline total service - * time. Thus it reduces chances to execute the - * limit-update algorithm and possibly raise the - * limit to more than 1. - */ - if (bfq_bfqq_has_short_ttime(bfqq)) - bfqq->inject_limit = 0; - else - bfqq->inject_limit = 1; - bfqq->decrease_time_jif = jiffies; - } + msecs_to_jiffies(1000))) + bfq_reset_inject_limit(bfqd, bfqq); /* * The following conditions must hold to setup a new @@ -4855,7 +4858,7 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, struct bfq_queue *bfqq, struct bfq_io_cq *bic) { - bool has_short_ttime = true; + bool has_short_ttime = true, state_changed; /* * No need to update has_short_ttime if bfqq is async or in @@ -4880,13 +4883,93 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, bfqq->ttime.ttime_mean > bfqd->bfq_slice_idle)) has_short_ttime = false; - bfq_log_bfqq(bfqd, bfqq, "update_has_short_ttime: has_short_ttime %d", - has_short_ttime); + state_changed = has_short_ttime != bfq_bfqq_has_short_ttime(bfqq); if (has_short_ttime) bfq_mark_bfqq_has_short_ttime(bfqq); else bfq_clear_bfqq_has_short_ttime(bfqq); + + /* + * Until the base value for the total service time gets + * finally computed for bfqq, the inject limit does depend on + * the think-time state (short|long). In particular, the limit + * is 0 or 1 if the think time is deemed, respectively, as + * short or long (details in the comments in + * bfq_update_inject_limit()). Accordingly, the next + * instructions reset the inject limit if the think-time state + * has changed and the above base value is still to be + * computed. + * + * However, the reset is performed only if more than 100 ms + * have elapsed since the last update of the inject limit, or + * (inclusive) if the change is from short to long think + * time. The reason for this waiting is as follows. + * + * bfqq may have a long think time because of a + * synchronization with some other queue, i.e., because the + * I/O of some other queue may need to be completed for bfqq + * to receive new I/O. This happens, e.g., if bfqq is + * associated with a process that does some sync. A sync + * generates extra blocking I/O, which must be completed + * before the process associated with bfqq can go on with its + * I/O. + * + * If such a synchronization is actually in place, then, + * without injection on bfqq, the blocking I/O cannot happen + * to served while bfqq is in service. As a consequence, if + * bfqq is granted I/O-dispatch-plugging, then bfqq remains + * empty, and no I/O is dispatched, until the idle timeout + * fires. This is likely to result in lower bandwidth and + * higher latencies for bfqq, and in a severe loss of total + * throughput. + * + * On the opposite end, a non-zero inject limit may allow the + * I/O that blocks bfqq to be executed soon, and therefore + * bfqq to receive new I/O soon. But, if this actually + * happens, then the next think-time sample for bfqq may be + * very low. This in turn may cause bfqq's think time to be + * deemed short. Without the 100 ms barrier, this new state + * change would cause the body of the next if to be executed + * immediately. But this would set to 0 the inject + * limit. Without injection, the blocking I/O would cause the + * think time of bfqq to become long again, and therefore the + * inject limit to be raised again, and so on. The only effect + * of such a steady oscillation between the two think-time + * states would be to prevent effective injection on bfqq. + * + * In contrast, if the inject limit is not reset during such a + * long time interval as 100 ms, then the number of short + * think time samples can grow significantly before the reset + * is allowed. As a consequence, the think time state can + * become stable before the reset. There will be no state + * change when the 100 ms elapse, and therefore no reset of + * the inject limit. The inject limit remains steadily equal + * to 1 both during and after the 100 ms. So injection can be + * performed at all times, and throughput gets boosted. + * + * An inject limit equal to 1 is however in conflict, in + * general, with the fact that the think time of bfqq is + * short, because injection may be likely to delay bfqq's I/O + * (as explained in the comments in + * bfq_update_inject_limit()). But this does not happen in + * this special case, because bfqq's low think time is due to + * an effective handling of a synchronization, through + * injection. In this special case, bfqq's I/O does not get + * delayed by injection; on the contrary, bfqq's I/O is + * brought forward, because it is not blocked for + * milliseconds. + * + * In addition, during the 100 ms, the base value for the + * total service time is likely to get finally computed, + * freeing the inject limit from its relation with the think + * time. + */ + if (state_changed && bfqq->last_serv_time_ns == 0 && + (time_is_before_eq_jiffies(bfqq->decrease_time_jif + + msecs_to_jiffies(100)) || + !has_short_ttime)) + bfq_reset_inject_limit(bfqd, bfqq); } /* From patchwork Mon Jun 24 19:40:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167636 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615086ilk; Mon, 24 Jun 2019 12:41:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqxIIM+93XBU2bhzPR8GO2EPyidOEeFiHr1MmEvTU4wEour5uIAq9/y7eQexasox5Is/ktGK X-Received: by 2002:a63:4f46:: with SMTP id p6mr3048366pgl.268.1561405272787; Mon, 24 Jun 2019 12:41:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405272; cv=none; d=google.com; s=arc-20160816; b=EcWDgvzo4O9ChRwnBBmrhrpZEPpTBtvryHPt+19mRdMnchu5qloYLayADtKf3Fp16o /B4Zj9vA5+g9UJdFvweAtnJ6h+xnpM3ib6Hi5t0EpsL5mLKGyEPzfHLo1nPgLQ6LjO0z tIFYbcyD4FomHV9vueM0JIkdAje4tYZjT/TDaKaPSVr8GmrmzmZADZAv3rrtTqRKrxmN R3yJdIlS+ascNNXOJoJhjy3gZqgV1dVf4wjxaclHjofzjSb0gz3zUZ21zpx77wwyWsFf LX2rGjf0wxRUzPjK4MLlUF5C7eVeT93s4Vqt37tjUIDgi1LxhA579Z+itgCrkL+bSI6n PSGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=atnBayrLpc2AX2svhMGsm60ysNZ9T7b+BNR7ZDTK0cg=; b=FWDT67Qh3Me+DKpAQNEFpX6DxLfGpNksx6+OmN1MIXIzYoPM6M34TSRb9rFCjw4ToO 3gIO2pDW3nAVZfL0mAn51Yd9KOg1dQE9/TcSrjTV/T4JZ6oFwMdzSWCk0FILV+dEEYvm +/EmH5Q3jiS4s7V7crfwZxbxzAAyZOmYDha/UBqI3wjhSl6//1Mt2SUotfYRynsSyiff K9gveRIKxjgsXs5jaVc+fDuHF3IJqNcuS+gHexteoiHRFYTBYcR/tA7TmsiDy/mZJMi/ +QH5FE6gXYijqLuFYORe/SKJR2M/BYZutzf6siTywQ4NkGB1GM4r14yCzjrpfroiOzVS 9O1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=loOF0OsD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w21si12071187pff.263.2019.06.24.12.41.12; Mon, 24 Jun 2019 12:41:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=loOF0OsD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730658AbfFXTlL (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:11 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:45003 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730403AbfFXTlJ (ORCPT ); Mon, 24 Jun 2019 15:41:09 -0400 Received: by mail-wr1-f65.google.com with SMTP id r16so15136613wrl.11 for ; Mon, 24 Jun 2019 12:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=atnBayrLpc2AX2svhMGsm60ysNZ9T7b+BNR7ZDTK0cg=; b=loOF0OsDukaQjB4gzqEHf33xXpdkbvyUk80B+5eIPhILYl3VwyDSvqJO1sDzZPH6vR rYRmMEWXgZZ+gLv5AELdm+O8YH2kQdVu5DaDMI8iW08rgE1dfcsAHgsEalivjQKP7Jk5 ccts3YWXMX9OGdxCu3hjcf8sqVTErfRXU8/mZo9lefPBvJg7DWfUfAEC10ylbMWRFaoO +xisJCFcz4ITFGyNEpO6/c8P6ktheNpleL+RrZ09eT0srbMUAvnBR3usvSU1419Bmp9e GDd6dNe/u6Axk6UZpZxbMgrpyHrxioQvGQOnpNn7WlkjsFbxeTQA/5Wg4ZJTcbJlMJlA 7stA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=atnBayrLpc2AX2svhMGsm60ysNZ9T7b+BNR7ZDTK0cg=; b=LA4+D327dght5JSdzEziXRzCyzzGBOxVZVxhzYp0/0B7GP/UlXBwLkMnl7K5spAi9n d4gH4r/Ehc5qZ25Bzl1aZ3wMuqgyN1XPvddMs5zFTkyRR9MW6iG4C0+7hLJZS2R6aL1f jo5ViM0TI5tXdwoebPG/W/xQDk6FX8SfkXJxq6VQ2Umyc2QuQapQa/78Hz48yZy13dD0 rsYUaeyvQIF9gptIfNIg8QgGYSf/rWiqE659KQyAa/eFqG9Fq3qfOaT8ek4I1c+mR8QA MdVsKnZnNGIqf4qTlN9NmzGKRpJGI/9SjF22i7+wYVSqn7LX0FnziEKdI7P37b/FVXA7 PgbQ== X-Gm-Message-State: APjAAAWYQW5XgwIEzj7V+ErNosnpfc9aJJUFYElJ+ooDI3pEbtALMHNo jZaYrVPgq1/vPav8orCNpXQMJg== X-Received: by 2002:adf:ec4c:: with SMTP id w12mr34579834wrn.160.1561405267903; Mon, 24 Jun 2019 12:41:07 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:07 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 2/7] block, bfq: fix rq_in_driver check in bfq_update_inject_limit Date: Mon, 24 Jun 2019 21:40:37 +0200 Message-Id: <20190624194042.38747-3-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org One of the cases where the parameters for injection may be updated is when there are no more in-flight I/O requests. The number of in-flight requests is stored in the field bfqd->rq_in_driver of the descriptor bfqd of the device. So, the controlled condition is bfqd->rq_in_driver == 0. Unfortunately, this is wrong because, the instruction that checks this condition is in the code path that handles the completion of a request, and, in particular, the instruction is executed before bfqd->rq_in_driver is decremented in such a code path. This commit fixes this issue by just replacing 0 with 1 in the comparison. Tested-by: Srivatsa S. Bhat Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 9bc10198ddff..05041f84b8da 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5481,8 +5481,14 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd, * total service time, and there seem to be the right * conditions to do it, or we can lower the last base value * computed. + * + * NOTE: (bfqd->rq_in_driver == 1) means that there is no I/O + * request in flight, because this function is in the code + * path that handles the completion of a request of bfqq, and, + * in particular, this function is executed before + * bfqd->rq_in_driver is decremented in such a code path. */ - if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 0) || + if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 1) || tot_time_ns < bfqq->last_serv_time_ns) { bfqq->last_serv_time_ns = tot_time_ns; /* From patchwork Mon Jun 24 19:40:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167637 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615113ilk; Mon, 24 Jun 2019 12:41:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqzjUS241VmfHHEURcez6K3nfz/skqAPBRDwy7UvYT0KaQHi47H+JaPaDj7Nmi4Vk6Kh3oJv X-Received: by 2002:a17:90a:5887:: with SMTP id j7mr26457349pji.136.1561405275407; Mon, 24 Jun 2019 12:41:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405275; cv=none; d=google.com; s=arc-20160816; b=FsdYuuj5FFHFpAcqPgQFVl+WCPqa1DrnnpgpouuXRV6GSkPgW8jCOTBYW0GtuKRbAa zoLIAVqEG8b0a13CLaPOs/cgJOI6hlXjwTDXbWfwKOesLDrhvJ0uFpXx3Sbot1nGFnqT USvJT78VKrIFeMfWcatAiQGmS8w9uGYMhUPdO6+6u201eSWdKfdDo2Hb80y4jSWcNQYo xlmTRBMfg753gX9ljwfzYb89b/n/hj1JAoHaPQoJ7laWPgiN1OZjq3wlnKMjFWmEMVH0 gj3SWUFsdsAvdveRhNNv3FtbaieOAxoszI1rTzPaFd5nut3nIPMweI2wgI0rhNAe9y4H WeXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jFdPqD3TBf44a7c2W86e+D5nf2bSFpU6DQYW7s1Xtyc=; b=zizNwTJTGcShPAZY+QnkK6U/CJbpzNg7PB8XP9ore+q9Y0GHD9N43p6VC4k1x9v0JX XLAa8tSvrERMbvNfTgPKkfZBNiXetyZ4KfR0ZNxH0e6GF4q3atkHhVbzNfKH0cPXOo43 OD7B20Y2iNQrrsM3h0zJv8DkD5rugw1PK6ljm0i5sgQ0VPTUDcMd0X9cX3o1pw0n8MLH cPttD6VkBYtnTLERgkzwXy5WhIpRv73Yj+5YGoqhZXLU9dY8RXYWDRy40zYd9wT5cnwk orqB5F3VJJYk1krf29+OJ5pV+YPlRHNLx+7fXgv7AmqBU6kIYpw9S4RfF2zG6C0Apqpw 5uNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jUz3pIK7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w21si12071187pff.263.2019.06.24.12.41.15; Mon, 24 Jun 2019 12:41:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jUz3pIK7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730720AbfFXTlN (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:13 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:35393 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730404AbfFXTlL (ORCPT ); Mon, 24 Jun 2019 15:41:11 -0400 Received: by mail-wm1-f65.google.com with SMTP id c6so535892wml.0 for ; Mon, 24 Jun 2019 12:41:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jFdPqD3TBf44a7c2W86e+D5nf2bSFpU6DQYW7s1Xtyc=; b=jUz3pIK7GFLYzyZZC1NLM+qmBubKD0cR/ksXweLYxKtD6meQ2jRbJtvdVHenpdSe5h SxEErNwcmnsz62OBV3TPzqQd+x0zxqhF1UFlCDFLJPFXmVMshLuI7zYRWEPue7bSyXGg CC2Zl+gZjmxDeP3gVJMd/Xfc22sMsFvft1DcMuJluj3TVPmmq/EN3xDeN8ZbW8i7gUZz IkcWUe82yllfLovacQzbIin5axowhDCoF5fyMhSfXUVg+88PQsEnavEGi92WVc/Y4kaD 5SlZSuExJuMXfFqLR+xYOqqUB/wcKjUdsoAoHU0pYRkqmXUgXg1ZCggRbyTgTmGKrri4 5f/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jFdPqD3TBf44a7c2W86e+D5nf2bSFpU6DQYW7s1Xtyc=; b=Ww3Cn2240JxYaEiXHnypZjlHm8reDw+G6F1HU/yUsi4oq01MnmBvCCeo3WLCWHd9yR 0waNyQ7zDPyMw3jXRAH20plHHbxw+PMPCR1m7IL7MYUco5OZsgAdK8TOC7Fd5fw5wEEJ suf1djzxfA+6hVJXN2aKow/2bkxGRUJceI5mWp39N3U/zO2i+Gm1JG13/n1jb3PQKe74 3DyGIRxn9LCwwTsgICTbkhiQSpXIByoEisYRZzcXCofoGPLHlC6smy4v9uNn2BnsBPlh fefwSDwArPAynVmPHHu7jeZv2WPHSU7il61qUqXHTYzzthS1+z5x9wihjiY4CPV5/hFD sBgw== X-Gm-Message-State: APjAAAWR8SPYl88+oUGukx8DqdybLdbDefgQLo5TzPvbZZ5Y8B1Ls3eH ckorNySo+WkS7IBh1N+Xv3cGFg== X-Received: by 2002:a7b:cb84:: with SMTP id m4mr18108832wmi.50.1561405269151; Mon, 24 Jun 2019 12:41:09 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:08 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 3/7] block, bfq: update base request service times when possible Date: Mon, 24 Jun 2019 21:40:38 +0200 Message-Id: <20190624194042.38747-4-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I/O injection gets reduced if it increases the request service times of the victim queue beyond a certain threshold. The threshold, in its turn, is computed as a function of the base service time enjoyed by the queue when it undergoes no injection. As a consequence, for injection to work properly, the above base value has to be accurate. In this respect, such a value may vary over time. For example, it varies if the size or the spatial locality of the I/O requests in the queue change. It is then important to update this value whenever possible. This commit performs this update. Tested-by: Srivatsa S. Bhat Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 05041f84b8da..62442083b147 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5496,7 +5496,18 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd, * start trying injection. */ bfqq->inject_limit = max_t(unsigned int, 1, old_limit); - } + } else if (!bfqd->rqs_injected && bfqd->rq_in_driver == 1) + /* + * No I/O injected and no request still in service in + * the drive: these are the exact conditions for + * computing the base value of the total service time + * for bfqq. So let's update this value, because it is + * rather variable. For example, it varies if the size + * or the spatial locality of the I/O requests in bfqq + * change. + */ + bfqq->last_serv_time_ns = tot_time_ns; + /* update complete, not waiting for any request completion any longer */ bfqd->waited_rq = NULL; From patchwork Mon Jun 24 19:40:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167641 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615459ilk; Mon, 24 Jun 2019 12:41:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqxWwhSPXZI1c9iUW8ckJzZoY+PxIfggoMsev1RfurJUuGqyF61cuZULKnwDPXG4o7x0pyAR X-Received: by 2002:a17:90a:3590:: with SMTP id r16mr26957312pjb.44.1561405294767; Mon, 24 Jun 2019 12:41:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405294; cv=none; d=google.com; s=arc-20160816; b=rqkcvz0Br+x47X/Wq73HPV2ef6wgv5flh0/Zqgz4xs6BAaE/710HBznSh761/2IMmr lIscMTu3CQc1I2UAjaTQj/s9eWsZ13aSEnaaIUhOL8gc6JboFP7/FTvTrYuYQtp6Wj3y ruiSTB8NZWQMTziMaUa0M4w0r61thcELyudJyzpDtEzgMXlXAor9xfpXHueT0l0sm52C k3pvstpGhUFS5I9u5m6F5kOakWioF4sv0ocaBF681Io/uxeaa8YuwzM6khU59HM+NOI0 N9yN2kNnUoD2Z8Bj4z7iZxXJO1i5lubiNKL0RIG3a8SkqLK5n7U0Tw+D0dsb+Ob0sD1w UIUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fCzYwH1lRICR2ykOvmQ5ORaT4+N7Xt8pYNhzGo4EKys=; b=bxwhlWSG4pVzIRkj0rXW+k2kJjJJ22FCojur2jLwsERbKB18uH6ubnxtcgkwBwHLgC Y3QDZ8GUTqNwEHtmWcYqRmni5ywPyz3thl1DddT3DjJX3D2Mla4dP/fcznMwCkXqp8CM jYVBsH3FL0fyhOnL9Q7r8T1NvSJYAOLMkfwJidNwZ8wmPQgk7aEX8UK7ByPHSdMpBflt 2NubyMN+UDQoURqLXSPf3QgzvhLDuxjXdqwvyQycdAIA+5jn5NI/4dGc7wpTma7cmV0B xKhby9lP+q0rbLgU5n7EfN4Z8de95iDstJ0ajA+gCbfOK8WpK98KfsUTDc1zhC9BkMcA 7sJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Z9c+O36h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q8si12227089pfc.155.2019.06.24.12.41.34; Mon, 24 Jun 2019 12:41:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Z9c+O36h; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731026AbfFXTlc (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:32 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:36781 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730612AbfFXTlM (ORCPT ); Mon, 24 Jun 2019 15:41:12 -0400 Received: by mail-wm1-f66.google.com with SMTP id u8so531105wmm.1 for ; Mon, 24 Jun 2019 12:41:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=fCzYwH1lRICR2ykOvmQ5ORaT4+N7Xt8pYNhzGo4EKys=; b=Z9c+O36hxLaYmGCt3jQnOcBc8cPaM6x0e+7fVcOyr6HcWsQDCPG996xyEWcImxuGMO Pa90yU3mhwLiXg7ca/zMg3kcJyD1bvRJxCB+df+U8SiXyAVCdd9Y58Xz0862BEfTIC5K 404RjiKa2XhSX1s70mU94fAvi9+g9DQvAfgY8S1qfEWnQg3Ug9eJo2muVPgJ9kz3432h eu1xnGZCXb+MoMdhoPwdlrcM9Xxk1O2V+sKbVEpOvKkbykZty5PsiLWslv6wL3x/KGRX 7NqfwS3yL2uSkk+vwg3Gyh+XHytjI0LkhqfeM6ZBnYX43xlIYBilWvfMHuDe1RMR74Hj pkaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fCzYwH1lRICR2ykOvmQ5ORaT4+N7Xt8pYNhzGo4EKys=; b=rP4f8ve4m8GOEyspa9lsTG/sXlM+pe/pnikiOHeepobht0g4eeKwYqMgBIu3ss9fST V31T019LpyNpHYq4YOeoQ2QI+l3bVNOXBtFJjAWwgs1gBNig/Dr53SidkwkhGFW2Gs6q J5QLF6WqhY6ziaVGRyzWylJG7eJ/sq6nHVnoEvVMXqatlzl93TNU3I+90VRNLlZ+tFLC 7kcJyZz2G1kJzMupOJiNITktug+zyOgEwsdFM8a8yf984klLkp2+Y5BkwKqOM/3cz61T LSY2G9lDblpIMKipevZ2IEFzf0/d5t7jxEcknZEYX6l5/TGWRafrf4elAcGjqCXuGIEP 2UeA== X-Gm-Message-State: APjAAAVRbu8TgU3f2Qk5pKDBBdrIbRhtTTGawrsUJGBeFTOHPeM3x8zr 5XT/BD/2qrojGTdPwyid3SuOCQ== X-Received: by 2002:a05:600c:214e:: with SMTP id v14mr13914984wml.96.1561405270215; Mon, 24 Jun 2019 12:41:10 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:09 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 4/7] block, bfq: bring forward seek&think time update Date: Mon, 24 Jun 2019 21:40:39 +0200 Message-Id: <20190624194042.38747-5-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Until the base value for request service times gets finally computed for a bfq_queue, the inject limit for that queue does depend on the think-time state (short|long) of the queue. A timely update of the think time then guarantees a quicker activation or deactivation of the injection. Fortunately, the think time of a bfq_queue is updated in the same code path as the inject limit; but after the inject limit. This commits moves the update of the think time before the update of the inject limit. For coherence, it moves the update of the seek time too. Tested-by: Srivatsa S. Bhat Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 62442083b147..d5bc32371ace 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -4979,19 +4979,9 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq, struct request *rq) { - struct bfq_io_cq *bic = RQ_BIC(rq); - if (rq->cmd_flags & REQ_META) bfqq->meta_pending++; - bfq_update_io_thinktime(bfqd, bfqq); - bfq_update_has_short_ttime(bfqd, bfqq, bic); - bfq_update_io_seektime(bfqd, bfqq, rq); - - bfq_log_bfqq(bfqd, bfqq, - "rq_enqueued: has_short_ttime=%d (seeky %d)", - bfq_bfqq_has_short_ttime(bfqq), BFQQ_SEEKY(bfqq)); - bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq); if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) { @@ -5079,6 +5069,10 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) bfqq = new_bfqq; } + bfq_update_io_thinktime(bfqd, bfqq); + bfq_update_has_short_ttime(bfqd, bfqq, RQ_BIC(rq)); + bfq_update_io_seektime(bfqd, bfqq, rq); + waiting = bfqq && bfq_bfqq_wait_request(bfqq); bfq_add_request(rq); idle_timer_disabled = waiting && !bfq_bfqq_wait_request(bfqq); From patchwork Mon Jun 24 19:40:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167639 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615218ilk; Mon, 24 Jun 2019 12:41:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqyM37lxLRZRXarHjLL6tDsr6PnYepA3aWP+N92J09jROL/AU8/uocths74hxY3GXqhs508X X-Received: by 2002:a63:1b66:: with SMTP id b38mr2994370pgm.54.1561405281157; Mon, 24 Jun 2019 12:41:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405281; cv=none; d=google.com; s=arc-20160816; b=Q6OBYUNyyE1zxa65yvSxEvq07xe4fMqoE337OkwVW4IG7S1Usq+jeCS4FePXNpxVNT 93wRx1IdJ2lVnJfPGAPJnN+eCb+qryERffp8xcKWa9EL+I14p0J8Sj9EjJG84QUiwi+x rYBYzRJYE0+9UlD2AflTMwS7w5RgyCXzlL73KY3G7qoFyOnvpSS18ST1Eksh6r5CkeW2 PuTwqI/u0Lts/OwLD3uUen0nAuM9e2RNxDJnqwLO1vk3Ysq7N4q2VzYQCU8uI+omiIMs UFyqkWvWdwlQGcuSehvUHw+B9N1RJFLSjl2NBJe77/cW4CiwGYsR/P3snmLnCmY7IYhX k09Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GX+3CIFPXD6DyRvvYuQtwgO0823989osEgjCzZF7un8=; b=ti1NLWfnv9oa/cb1WD94oIEwMJa9mjng9k2/EioyatcIRaBgRqn0QTT/Sw0MUdAvTs RC/iCy7cOxmJ17uUvihsHP0iGR3Olxcuzn39GiNK/RIMxtB5/nnrePs+wEWPf+f6qht6 dSmFYEV4x6RzexWks7ehvWWT/yZ0cHcGolQ+4/Idj8/ztT//ddbGIT2PN5qpK+0jVJxL pyzNSRnbHEwNhoBbn3z+6yxqz9oIPY8jB8Y+MFRLOMnI2zRpoJbnz3iDCKVbYU1OuBAm r89AK8/9j5zAwOhhEYozGXG3p0d91CP0eyYX9v1RgfMGrzpAb7/FgDNzjnhrNmQHdWEz 9/VA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WH5q2qXi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z26si12713316pfj.247.2019.06.24.12.41.20; Mon, 24 Jun 2019 12:41:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WH5q2qXi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730839AbfFXTlT (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:19 -0400 Received: from mail-wm1-f50.google.com ([209.85.128.50]:40939 "EHLO mail-wm1-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730680AbfFXTlP (ORCPT ); Mon, 24 Jun 2019 15:41:15 -0400 Received: by mail-wm1-f50.google.com with SMTP id v19so501640wmj.5 for ; Mon, 24 Jun 2019 12:41:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=GX+3CIFPXD6DyRvvYuQtwgO0823989osEgjCzZF7un8=; b=WH5q2qXioYaTr+x8BpkFDtWIwcayk92jgqTVljpe/UPedBqTQdW9PEU0lUufkxj6X+ t63nXmSVXFnQGJnusg/P0TX/tTeXN1rb7j2/NvuwortECpkyjHuE+6JoT5Z2jL0g1pC4 FjUqruyWKhomhyqcdigh8BEK6TgGHR9AbAFwzDeobrQU6l0WMuxRqlPT2iR/t9C6pLVN zB0xFbPiM19LH3l6sJa2pP91UNI2Vl8PHpsl/VLxXaUkC+JFSKUs3+U5MZ6JWXU6sJXa nAPVPrRLOpiBDduDmE251wajnsuw1Nd3bG/W44hkM2qSdSY8ER5eHq7v7vcHaMABKh4l mmqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GX+3CIFPXD6DyRvvYuQtwgO0823989osEgjCzZF7un8=; b=MIix3JEmVkjlSzlbGrIhyjOVDg4VDrVfe9EqjTJAagzJY/kUKG2uuBD5bw15D6cGJp AdiR9CwiInT7a9ipxBx3yeaiaIuVJ5WQ7cQ2NmxsoygDI2/efxPZWQqyAfUP7tLaqLdT tJj6VjGHqyKvYhnDUsVMhF3revbl3KPTXGcIKGOD7KlKUjf01slBngwSCo+FJKMsiICM XMWLSovHIIf7EAdp0J6tGaL9nabFdnKcz0DcHPau1VDHCzpWKVhx3wtGL2GpP7OH2Wjx p+zty7+cvmxXtvvx63PWbfT/ihc3j+Yra+Wkcn8R0q7yDGea0XEHvdbfHK5L+9d0cGHF sIPw== X-Gm-Message-State: APjAAAVb8cw6uO3rqaW/lH9e5og/CkcwAqG5b9eREQKO7SdUdcWrLYMh mL/kaOoehxeCg59K3hfr6rWlsg== X-Received: by 2002:a1c:7ec7:: with SMTP id z190mr16544155wmc.17.1561405271330; Mon, 24 Jun 2019 12:41:11 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:10 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 5/7] block, bfq: detect wakers and unconditionally inject their I/O Date: Mon, 24 Jun 2019 21:40:40 +0200 Message-Id: <20190624194042.38747-6-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A bfq_queue Q may happen to be synchronized with another bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to receive new I/O. We call Q2 "waker queue". If I/O plugging is being performed for Q, and Q is not receiving any more I/O because of the above synchronization, then, thanks to BFQ's injection mechanism, the waker queue is likely to get served before the I/O-plugging timeout fires. Unfortunately, this fact may not be sufficient to guarantee a high throughput during the I/O plugging, because the inject limit for Q may be too low to guarantee a lot of injected I/O. In addition, the duration of the plugging, i.e., the time before Q finally receives new I/O, may not be minimized, because the waker queue may happen to be served only after other queues. To address these issues, this commit introduces the explicit detection of the waker queue, and the unconditional injection of a pending I/O request of the waker queue on each invocation of bfq_dispatch_request(). One may be concerned that this systematic injection of I/O from the waker queue delays the service of Q's I/O. Fortunately, it doesn't. On the contrary, next Q's I/O is brought forward dramatically, for it is not blocked for milliseconds. Tested-by: Srivatsa S. Bhat Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 270 ++++++++++++++++++++++++++++++++++++++------ block/bfq-iosched.h | 25 +++- 2 files changed, 261 insertions(+), 34 deletions(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index d5bc32371ace..9e2fbb7d1fb6 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -157,6 +157,7 @@ BFQ_BFQQ_FNS(in_large_burst); BFQ_BFQQ_FNS(coop); BFQ_BFQQ_FNS(split_coop); BFQ_BFQQ_FNS(softrt_update); +BFQ_BFQQ_FNS(has_waker); #undef BFQ_BFQQ_FNS \ /* Expiration time of sync (0) and async (1) requests, in ns. */ @@ -1814,6 +1815,111 @@ static void bfq_add_request(struct request *rq) bfqd->queued++; if (RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_sync(bfqq)) { + /* + * Detect whether bfqq's I/O seems synchronized with + * that of some other queue, i.e., whether bfqq, after + * remaining empty, happens to receive new I/O only + * right after some I/O request of the other queue has + * been completed. We call waker queue the other + * queue, and we assume, for simplicity, that bfqq may + * have at most one waker queue. + * + * A remarkable throughput boost can be reached by + * unconditionally injecting the I/O of the waker + * queue, every time a new bfq_dispatch_request + * happens to be invoked while I/O is being plugged + * for bfqq. In addition to boosting throughput, this + * unblocks bfqq's I/O, thereby improving bandwidth + * and latency for bfqq. Note that these same results + * may be achieved with the general injection + * mechanism, but less effectively. For details on + * this aspect, see the comments on the choice of the + * queue for injection in bfq_select_queue(). + * + * Turning back to the detection of a waker queue, a + * queue Q is deemed as a waker queue for bfqq if, for + * two consecutive times, bfqq happens to become non + * empty right after a request of Q has been + * completed. In particular, on the first time, Q is + * tentatively set as a candidate waker queue, while + * on the second time, the flag + * bfq_bfqq_has_waker(bfqq) is set to confirm that Q + * is a waker queue for bfqq. These detection steps + * are performed only if bfqq has a long think time, + * so as to make it more likely that bfqq's I/O is + * actually being blocked by a synchronization. This + * last filter, plus the above two-times requirement, + * make false positives less likely. + * + * NOTE + * + * The sooner a waker queue is detected, the sooner + * throughput can be boosted by injecting I/O from the + * waker queue. Fortunately, detection is likely to be + * actually fast, for the following reasons. While + * blocked by synchronization, bfqq has a long think + * time. This implies that bfqq's inject limit is at + * least equal to 1 (see the comments in + * bfq_update_inject_limit()). So, thanks to + * injection, the waker queue is likely to be served + * during the very first I/O-plugging time interval + * for bfqq. This triggers the first step of the + * detection mechanism. Thanks again to injection, the + * candidate waker queue is then likely to be + * confirmed no later than during the next + * I/O-plugging interval for bfqq. + */ + if (!bfq_bfqq_has_short_ttime(bfqq) && + ktime_get_ns() - bfqd->last_completion < + 200 * NSEC_PER_USEC) { + if (bfqd->last_completed_rq_bfqq != bfqq && + bfqd->last_completed_rq_bfqq != + bfqq->waker_bfqq) { + /* + * First synchronization detected with + * a candidate waker queue, or with a + * different candidate waker queue + * from the current one. + */ + bfqq->waker_bfqq = bfqd->last_completed_rq_bfqq; + + /* + * If the waker queue disappears, then + * bfqq->waker_bfqq must be reset. To + * this goal, we maintain in each + * waker queue a list, woken_list, of + * all the queues that reference the + * waker queue through their + * waker_bfqq pointer. When the waker + * queue exits, the waker_bfqq pointer + * of all the queues in the woken_list + * is reset. + * + * In addition, if bfqq is already in + * the woken_list of a waker queue, + * then, before being inserted into + * the woken_list of a new waker + * queue, bfqq must be removed from + * the woken_list of the old waker + * queue. + */ + if (!hlist_unhashed(&bfqq->woken_list_node)) + hlist_del_init(&bfqq->woken_list_node); + hlist_add_head(&bfqq->woken_list_node, + &bfqd->last_completed_rq_bfqq->woken_list); + + bfq_clear_bfqq_has_waker(bfqq); + } else if (bfqd->last_completed_rq_bfqq == + bfqq->waker_bfqq && + !bfq_bfqq_has_waker(bfqq)) { + /* + * synchronization with waker_bfqq + * seen for the second time + */ + bfq_mark_bfqq_has_waker(bfqq); + } + } + /* * Periodically reset inject limit, to make sure that * the latter eventually drops in case workload @@ -4164,18 +4270,89 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) bfqq->bic->bfqq[0] : NULL; /* - * If the process associated with bfqq has also async - * I/O pending, then inject it - * unconditionally. Injecting I/O from the same - * process can cause no harm to the process. On the - * contrary, it can only increase bandwidth and reduce - * latency for the process. + * The next three mutually-exclusive ifs decide + * whether to try injection, and choose the queue to + * pick an I/O request from. + * + * The first if checks whether the process associated + * with bfqq has also async I/O pending. If so, it + * injects such I/O unconditionally. Injecting async + * I/O from the same process can cause no harm to the + * process. On the contrary, it can only increase + * bandwidth and reduce latency for the process. + * + * The second if checks whether there happens to be a + * non-empty waker queue for bfqq, i.e., a queue whose + * I/O needs to be completed for bfqq to receive new + * I/O. This happens, e.g., if bfqq is associated with + * a process that does some sync. A sync generates + * extra blocking I/O, which must be completed before + * the process associated with bfqq can go on with its + * I/O. If the I/O of the waker queue is not served, + * then bfqq remains empty, and no I/O is dispatched, + * until the idle timeout fires for bfqq. This is + * likely to result in lower bandwidth and higher + * latencies for bfqq, and in a severe loss of total + * throughput. The best action to take is therefore to + * serve the waker queue as soon as possible. So do it + * (without relying on the third alternative below for + * eventually serving waker_bfqq's I/O; see the last + * paragraph for further details). This systematic + * injection of I/O from the waker queue does not + * cause any delay to bfqq's I/O. On the contrary, + * next bfqq's I/O is brought forward dramatically, + * for it is not blocked for milliseconds. + * + * The third if checks whether bfqq is a queue for + * which it is better to avoid injection. It is so if + * bfqq delivers more throughput when served without + * any further I/O from other queues in the middle, or + * if the service times of bfqq's I/O requests both + * count more than overall throughput, and may be + * easily increased by injection (this happens if bfqq + * has a short think time). If none of these + * conditions holds, then a candidate queue for + * injection is looked for through + * bfq_choose_bfqq_for_injection(). Note that the + * latter may return NULL (for example if the inject + * limit for bfqq is currently 0). + * + * NOTE: motivation for the second alternative + * + * Thanks to the way the inject limit is updated in + * bfq_update_has_short_ttime(), it is rather likely + * that, if I/O is being plugged for bfqq and the + * waker queue has pending I/O requests that are + * blocking bfqq's I/O, then the third alternative + * above lets the waker queue get served before the + * I/O-plugging timeout fires. So one may deem the + * second alternative superfluous. It is not, because + * the third alternative may be way less effective in + * case of a synchronization. For two main + * reasons. First, throughput may be low because the + * inject limit may be too low to guarantee the same + * amount of injected I/O, from the waker queue or + * other queues, that the second alternative + * guarantees (the second alternative unconditionally + * injects a pending I/O request of the waker queue + * for each bfq_dispatch_request()). Second, with the + * third alternative, the duration of the plugging, + * i.e., the time before bfqq finally receives new I/O, + * may not be minimized, because the waker queue may + * happen to be served only after other queues. */ if (async_bfqq && icq_to_bic(async_bfqq->next_rq->elv.icq) == bfqq->bic && bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <= bfq_bfqq_budget_left(async_bfqq)) bfqq = bfqq->bic->bfqq[0]; + else if (bfq_bfqq_has_waker(bfqq) && + bfq_bfqq_busy(bfqq->waker_bfqq) && + bfq_serv_to_charge(bfqq->waker_bfqq->next_rq, + bfqq->waker_bfqq) <= + bfq_bfqq_budget_left(bfqq->waker_bfqq) + ) + bfqq = bfqq->waker_bfqq; else if (!idling_boosts_thr_without_issues(bfqd, bfqq) && (bfqq->wr_coeff == 1 || bfqd->wr_busy_queues > 1 || !bfq_bfqq_has_short_ttime(bfqq))) @@ -4564,6 +4741,9 @@ static void bfq_put_cooperator(struct bfq_queue *bfqq) static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq) { + struct bfq_queue *item; + struct hlist_node *n; + if (bfqq == bfqd->in_service_queue) { __bfq_bfqq_expire(bfqd, bfqq); bfq_schedule_dispatch(bfqd); @@ -4573,6 +4753,18 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq) bfq_put_cooperator(bfqq); + /* remove bfqq from woken list */ + if (!hlist_unhashed(&bfqq->woken_list_node)) + hlist_del_init(&bfqq->woken_list_node); + + /* reset waker for all queues in woken list */ + hlist_for_each_entry_safe(item, n, &bfqq->woken_list, + woken_list_node) { + item->waker_bfqq = NULL; + bfq_clear_bfqq_has_waker(item); + hlist_del_init(&item->woken_list_node); + } + bfq_put_queue(bfqq); /* release process reference */ } @@ -4691,6 +4883,8 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq, RB_CLEAR_NODE(&bfqq->entity.rb_node); INIT_LIST_HEAD(&bfqq->fifo); INIT_HLIST_NODE(&bfqq->burst_list_node); + INIT_HLIST_NODE(&bfqq->woken_list_node); + INIT_HLIST_HEAD(&bfqq->woken_list); bfqq->ref = 0; bfqq->bfqd = bfqd; @@ -4909,28 +5103,27 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, * bfqq may have a long think time because of a * synchronization with some other queue, i.e., because the * I/O of some other queue may need to be completed for bfqq - * to receive new I/O. This happens, e.g., if bfqq is - * associated with a process that does some sync. A sync - * generates extra blocking I/O, which must be completed - * before the process associated with bfqq can go on with its - * I/O. + * to receive new I/O. Details in the comments on the choice + * of the queue for injection in bfq_select_queue(). * - * If such a synchronization is actually in place, then, - * without injection on bfqq, the blocking I/O cannot happen - * to served while bfqq is in service. As a consequence, if - * bfqq is granted I/O-dispatch-plugging, then bfqq remains - * empty, and no I/O is dispatched, until the idle timeout - * fires. This is likely to result in lower bandwidth and - * higher latencies for bfqq, and in a severe loss of total - * throughput. + * As stressed in those comments, if such a synchronization is + * actually in place, then, without injection on bfqq, the + * blocking I/O cannot happen to served while bfqq is in + * service. As a consequence, if bfqq is granted + * I/O-dispatch-plugging, then bfqq remains empty, and no I/O + * is dispatched, until the idle timeout fires. This is likely + * to result in lower bandwidth and higher latencies for bfqq, + * and in a severe loss of total throughput. * * On the opposite end, a non-zero inject limit may allow the * I/O that blocks bfqq to be executed soon, and therefore - * bfqq to receive new I/O soon. But, if this actually - * happens, then the next think-time sample for bfqq may be - * very low. This in turn may cause bfqq's think time to be - * deemed short. Without the 100 ms barrier, this new state - * change would cause the body of the next if to be executed + * bfqq to receive new I/O soon. + * + * But, if the blocking gets actually eliminated, then the + * next think-time sample for bfqq may be very low. This in + * turn may cause bfqq's think time to be deemed + * short. Without the 100 ms barrier, this new state change + * would cause the body of the next if to be executed * immediately. But this would set to 0 the inject * limit. Without injection, the blocking I/O would cause the * think time of bfqq to become long again, and therefore the @@ -4941,11 +5134,11 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, * In contrast, if the inject limit is not reset during such a * long time interval as 100 ms, then the number of short * think time samples can grow significantly before the reset - * is allowed. As a consequence, the think time state can - * become stable before the reset. There will be no state - * change when the 100 ms elapse, and therefore no reset of - * the inject limit. The inject limit remains steadily equal - * to 1 both during and after the 100 ms. So injection can be + * is performed. As a consequence, the think time state can + * become stable before the reset. Therefore there will be no + * state change when the 100 ms elapse, and no reset of the + * inject limit. The inject limit remains steadily equal to 1 + * both during and after the 100 ms. So injection can be * performed at all times, and throughput gets boosted. * * An inject limit equal to 1 is however in conflict, in @@ -4960,10 +5153,20 @@ static void bfq_update_has_short_ttime(struct bfq_data *bfqd, * brought forward, because it is not blocked for * milliseconds. * - * In addition, during the 100 ms, the base value for the - * total service time is likely to get finally computed, - * freeing the inject limit from its relation with the think - * time. + * In addition, serving the blocking I/O much sooner, and much + * more frequently than once per I/O-plugging timeout, makes + * it much quicker to detect a waker queue (the concept of + * waker queue is defined in the comments in + * bfq_add_request()). This makes it possible to start sooner + * to boost throughput more effectively, by injecting the I/O + * of the waker queue unconditionally on every + * bfq_dispatch_request(). + * + * One last, important benefit of not resetting the inject + * limit before 100 ms is that, during this time interval, the + * base value for the total service time is likely to get + * finally computed for bfqq, freeing the inject limit from + * its relation with the think time. */ if (state_changed && bfqq->last_serv_time_ns == 0 && (time_is_before_eq_jiffies(bfqq->decrease_time_jif + @@ -5278,6 +5481,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd) 1UL<<(BFQ_RATE_SHIFT - 10)) bfq_update_rate_reset(bfqd, NULL); bfqd->last_completion = now_ns; + bfqd->last_completed_rq_bfqq = bfqq; /* * If we are waiting to discover whether the request pattern diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 584d3c9ed8ba..e80adf822bbe 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -357,6 +357,24 @@ struct bfq_queue { /* max service rate measured so far */ u32 max_service_rate; + + /* + * Pointer to the waker queue for this queue, i.e., to the + * queue Q such that this queue happens to get new I/O right + * after some I/O request of Q is completed. For details, see + * the comments on the choice of the queue for injection in + * bfq_select_queue(). + */ + struct bfq_queue *waker_bfqq; + /* node for woken_list, see below */ + struct hlist_node woken_list_node; + /* + * Head of the list of the woken queues for this queue, i.e., + * of the list of the queues for which this queue is a waker + * queue. This list is used to reset the waker_bfqq pointer in + * the woken queues when this queue exits. + */ + struct hlist_head woken_list; }; /** @@ -533,6 +551,9 @@ struct bfq_data { /* time of last request completion (ns) */ u64 last_completion; + /* bfqq owning the last completed rq */ + struct bfq_queue *last_completed_rq_bfqq; + /* time of last transition from empty to non-empty (ns) */ u64 last_empty_occupied_ns; @@ -743,7 +764,8 @@ enum bfqq_state_flags { * update */ BFQQF_coop, /* bfqq is shared */ - BFQQF_split_coop /* shared bfqq will be split */ + BFQQF_split_coop, /* shared bfqq will be split */ + BFQQF_has_waker /* bfqq has a waker queue */ }; #define BFQ_BFQQ_FNS(name) \ @@ -763,6 +785,7 @@ BFQ_BFQQ_FNS(in_large_burst); BFQ_BFQQ_FNS(coop); BFQ_BFQQ_FNS(split_coop); BFQ_BFQQ_FNS(softrt_update); +BFQ_BFQQ_FNS(has_waker); #undef BFQ_BFQQ_FNS /* Expiration reasons. */ From patchwork Mon Jun 24 19:40:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167638 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615167ilk; Mon, 24 Jun 2019 12:41:18 -0700 (PDT) X-Google-Smtp-Source: APXvYqy+WG4u7vb7GB4Q5/kpBybDy9/oYv+HVZ7elm7JqZSsAeFmRcQ8wgMebFoQf8gFsg12FRj7 X-Received: by 2002:a17:902:9307:: with SMTP id bc7mr7678520plb.183.1561405278303; Mon, 24 Jun 2019 12:41:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405278; cv=none; d=google.com; s=arc-20160816; b=alIRpuBJXpzl4nRxFhK+2UE31n46iw4IQTI+MSWGNWjemR9P/MsOjQFU25clUnGlo/ CJvwCd/02SJ8vYt9wfGJpMLKHLIbTzqVRdjoTXb2u8S+izlXGlLdM8uGNy65xXPitkFl h4WD9RQccgaOdqxkSil++pTvxqgzdo1cKXTZD/l5/EG9Vvs3Xyu4eNziBnIOxEFyW/Jj m5aNlY5+EOmDakcZT5Hdak3YtMi3lhXrnmiOWKzdngc5pX2B061Lm2hZVJhgkCG4J62T TzZfmAih/DFS2dk8xwmYoJ7g19F3HTJCpUVZBS7Y25e7Si5PWY/qsD9X6nChf26nQEus PeMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JW8uE9p2cExcuevLdfaub1Bim8IPk1JSRsG24Fpy7KY=; b=JpmmvePEDKCj20U1mTHtDTaHf5/UfVFv/9uaJIWdTq+tg7CFmBj24LZuEFskJ9rc9Z fYyQu34egTLwp6cySuwUNrsbnMe6F7FGq7xhCgVGhOulTSAeRq6Ay4ugdd7Laq9NXu9X 9pHMcd+uek1vTUprdceZ4vBveHnqVNHaD7u0gAbp0HAyCBzIjUjX4BR5/ucE3IhcQjev R+NBObcoCnYROYKFD1OwAdLtIxYTqllL0Dm/oyxyRuOIN0nlDUJ6yle4a1oZpMTd0B6k UA4QP8JgGegF+OxAXBZvn9VatyIoJW6s3pC7/H4hFQBrZcYrJaYFm14OD1gUQfVsQxHH etYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=K3KUyA2b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z5si11096071pgj.213.2019.06.24.12.41.18; Mon, 24 Jun 2019 12:41:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=K3KUyA2b; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730765AbfFXTlQ (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:16 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:54018 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730717AbfFXTlO (ORCPT ); Mon, 24 Jun 2019 15:41:14 -0400 Received: by mail-wm1-f67.google.com with SMTP id x15so484306wmj.3 for ; Mon, 24 Jun 2019 12:41:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JW8uE9p2cExcuevLdfaub1Bim8IPk1JSRsG24Fpy7KY=; b=K3KUyA2b31P3piOKQWbeD/qXN+fkRIYaTmC3fpvh9rFOFyO6k2F4MFAAueJ2O9R04J NCuphMwoCrLrX9PjK8BQMwCQJ27ABaobCAwJvRUFZh4i2l9tM/OeWlWVBbMjh4uw3aqo d5YIuHyTbqaVXCCd6IfHyUAEmSeVfFTjlAEkJqW+QszM69Okt7pB8hF0z0QSaLuKBmyV w3tekp0bgeDPCcTK49fqBmAXYGVJtkfgzd6INfaY5v5MG6Xqt66ykFBa0NRP4R7gwQTq NgM1QD0yf+AX6lkUKHfO7MKXgrmTFwbTcQeJTn4RBQs6t+jAndrqasmxNhhqgpTb/+xN vZyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JW8uE9p2cExcuevLdfaub1Bim8IPk1JSRsG24Fpy7KY=; b=nttuZ/++W/YQxaqsHW7k/lssujJiUpHIBeb/G8eHOCqXgiwOGTCFMYvqFm/kkHUjok FPoQhaYu9x+/XVmpm4YvR0yXHw3b7gP3Qe3w2xg9uLkUBNuWrEcl48+Zg328oX3wfXUS yITloW0Qbm6Vx6uGc29Ip3C1iv/niCbG7R0v+cl5D9Xwqz4QtFg8jz020rAD7GUUiPZI jI2q5jJzFr8/0iuCoq6w3nKlA9OsIrIF5W84FZl09wrux5UvtNPsDByVLxgODLpWo9oJ SLKxZbq0WJ1+dAoW2roDfVqJxp0QECt54HC1aukjVySAuUDRshkiWu2Pa2dcwBRAOKym 6KeA== X-Gm-Message-State: APjAAAUDL60etHSg91PNfiMxZVzKYcr8vktn1qiCU3j4AHiB4/NfLvwj ZCPdvVcsv5qLCmgrQ0CPl2CPhA== X-Received: by 2002:a1c:c14b:: with SMTP id r72mr17075444wmf.166.1561405272454; Mon, 24 Jun 2019 12:41:12 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:12 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 6/7] block, bfq: preempt lower-weight or lower-priority queues Date: Mon, 24 Jun 2019 21:40:41 +0200 Message-Id: <20190624194042.38747-7-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org BFQ enqueues the I/O coming from each process into a separate bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be served for at most timeout_sync milliseconds (default: 125 ms). This service scheme is prone to the following inaccuracy. While a bfq_queue Q1 is in service, some empty bfq_queue Q2 may receive I/O, and, according to BFQ's scheduling policy, may become the right bfq_queue to serve, in place of the currently in-service bfq_queue. In this respect, postponing the service of Q2 to after the service of Q1 finishes may delay the completion of Q2's I/O, compared with an ideal service in which all non-empty bfq_queues are served in parallel, and every non-empty bfq_queue is served at a rate proportional to the bfq_queue's weight. This additional delay is equal at most to the time Q1 may unjustly remain in service before switching to Q2. If Q1 and Q2 have the same weight, then this time is most likely negligible compared with the completion time to be guaranteed to Q2's I/O. In addition, first, one of the reasons why BFQ may want to serve Q1 for a while is that this boosts throughput and, second, serving Q1 longer reduces BFQ's overhead. As a conclusion, it is usually better not to preempt Q1 if both Q1 and Q2 have the same weight. In contrast, as Q2's weight or priority becomes higher and higher compared with that of Q1, the above delay becomes larger and larger, compared with the I/O completion times that have to be guaranteed to Q2 according to Q2's weight. So reducing this delay may be more important than avoiding the costs of preempting Q1. Accordingly, this commit preempts Q1 if Q2 has a higher weight or a higher priority than Q1. Preemption causes Q1 to be re-scheduled, and triggers a new choice of the next bfq_queue to serve. If Q2 really is the next bfq_queue to serve, then Q2 will be set in service immediately. This change reduces the component of the I/O latency caused by the above delay by about 80%. For example, on an (old) PLEXTOR PX-256M5 SSD, the maximum latency reported by fio drops from 15.1 to 3.2 ms for a process doing sporadic random reads while another process is doing continuous sequential reads. Signed-off-by: Nicola Bottura Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 95 +++++++++++++++++++++++++++++++++++---------- 1 file changed, 75 insertions(+), 20 deletions(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 9e2fbb7d1fb6..6a3d05023300 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -1428,17 +1428,19 @@ static int bfq_min_budget(struct bfq_data *bfqd) * mechanism may be re-designed in such a way to make it possible to * know whether preemption is needed without needing to update service * trees). In addition, queue preemptions almost always cause random - * I/O, and thus loss of throughput. Because of these facts, the next - * function adopts the following simple scheme to avoid both costly - * operations and too frequent preemptions: it requests the expiration - * of the in-service queue (unconditionally) only for queues that need - * to recover a hole, or that either are weight-raised or deserve to - * be weight-raised. + * I/O, which may in turn cause loss of throughput. Finally, there may + * even be no in-service queue when the next function is invoked (so, + * no queue to compare timestamps with). Because of these facts, the + * next function adopts the following simple scheme to avoid costly + * operations, too frequent preemptions and too many dependencies on + * the state of the scheduler: it requests the expiration of the + * in-service queue (unconditionally) only for queues that need to + * recover a hole. Then it delegates to other parts of the code the + * responsibility of handling the above case 2. */ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd, struct bfq_queue *bfqq, - bool arrived_in_time, - bool wr_or_deserves_wr) + bool arrived_in_time) { struct bfq_entity *entity = &bfqq->entity; @@ -1493,7 +1495,7 @@ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd, entity->budget = max_t(unsigned long, bfqq->max_budget, bfq_serv_to_charge(bfqq->next_rq, bfqq)); bfq_clear_bfqq_non_blocking_wait_rq(bfqq); - return wr_or_deserves_wr; + return false; } /* @@ -1611,6 +1613,36 @@ static bool bfq_bfqq_idle_for_long_time(struct bfq_data *bfqd, bfqd->bfq_wr_min_idle_time); } + +/* + * Return true if bfqq is in a higher priority class, or has a higher + * weight than the in-service queue. + */ +static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq, + struct bfq_queue *in_serv_bfqq) +{ + int bfqq_weight, in_serv_weight; + + if (bfqq->ioprio_class < in_serv_bfqq->ioprio_class) + return true; + + if (in_serv_bfqq->entity.parent == bfqq->entity.parent) { + bfqq_weight = bfqq->entity.weight; + in_serv_weight = in_serv_bfqq->entity.weight; + } else { + if (bfqq->entity.parent) + bfqq_weight = bfqq->entity.parent->weight; + else + bfqq_weight = bfqq->entity.weight; + if (in_serv_bfqq->entity.parent) + in_serv_weight = in_serv_bfqq->entity.parent->weight; + else + in_serv_weight = in_serv_bfqq->entity.weight; + } + + return bfqq_weight > in_serv_weight; +} + static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd, struct bfq_queue *bfqq, int old_wr_coeff, @@ -1655,8 +1687,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd, */ bfqq_wants_to_preempt = bfq_bfqq_update_budg_for_activation(bfqd, bfqq, - arrived_in_time, - wr_or_deserves_wr); + arrived_in_time); /* * If bfqq happened to be activated in a burst, but has been @@ -1721,16 +1752,40 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd, /* * Expire in-service queue only if preemption may be needed - * for guarantees. In this respect, the function - * next_queue_may_preempt just checks a simple, necessary - * condition, and not a sufficient condition based on - * timestamps. In fact, for the latter condition to be - * evaluated, timestamps would need first to be updated, and - * this operation is quite costly (see the comments on the - * function bfq_bfqq_update_budg_for_activation). + * for guarantees. In particular, we care only about two + * cases. The first is that bfqq has to recover a service + * hole, as explained in the comments on + * bfq_bfqq_update_budg_for_activation(), i.e., that + * bfqq_wants_to_preempt is true. However, if bfqq does not + * carry time-critical I/O, then bfqq's bandwidth is less + * important than that of queues that carry time-critical I/O. + * So, as a further constraint, we consider this case only if + * bfqq is at least as weight-raised, i.e., at least as time + * critical, as the in-service queue. + * + * The second case is that bfqq is in a higher priority class, + * or has a higher weight than the in-service queue. If this + * condition does not hold, we don't care because, even if + * bfqq does not start to be served immediately, the resulting + * delay for bfqq's I/O is however lower or much lower than + * the ideal completion time to be guaranteed to bfqq's I/O. + * + * In both cases, preemption is needed only if, according to + * the timestamps of both bfqq and of the in-service queue, + * bfqq actually is the next queue to serve. So, to reduce + * useless preemptions, the return value of + * next_queue_may_preempt() is considered in the next compound + * condition too. Yet next_queue_may_preempt() just checks a + * simple, necessary condition for bfqq to be the next queue + * to serve. In fact, to evaluate a sufficient condition, the + * timestamps of the in-service queue would need to be + * updated, and this operation is quite costly (see the + * comments on bfq_bfqq_update_budg_for_activation()). */ - if (bfqd->in_service_queue && bfqq_wants_to_preempt && - bfqd->in_service_queue->wr_coeff < bfqq->wr_coeff && + if (bfqd->in_service_queue && + ((bfqq_wants_to_preempt && + bfqq->wr_coeff >= bfqd->in_service_queue->wr_coeff) || + bfq_bfqq_higher_class_or_weight(bfqq, bfqd->in_service_queue)) && next_queue_may_preempt(bfqd)) bfq_bfqq_expire(bfqd, bfqd->in_service_queue, false, BFQQE_PREEMPTED); From patchwork Mon Jun 24 19:40:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 167640 Delivered-To: patch@linaro.org Received: by 2002:a92:4782:0:0:0:0:0 with SMTP id e2csp4615248ilk; Mon, 24 Jun 2019 12:41:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqx+VWSkyqHjneVsXYIcF7eG0qmZvKG7Zlaa3gzhS9hzMVIs0XbmrE9J6qVRfenQNA6BmGbX X-Received: by 2002:a63:f64a:: with SMTP id u10mr27320813pgj.329.1561405283194; Mon, 24 Jun 2019 12:41:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561405283; cv=none; d=google.com; s=arc-20160816; b=QjQlupbRBjhqE/ws+JK9zZVuiQCWzjoTM7E9ekT2lA+MwL3BxlZQaqGAaSXcYDYVk+ AFTBkemUuNInbGJ6jTK6BgT/xJ8btzhDsNPjb2PRNKwH0US0Hpg77z/vuF8xXr3o3zMq 5Px5Lq+pm1pDo5XY8nn8Lla/z55759Pru82P2GgFk7iYlkigEIzhTeYmUDjBc8qHadvj M95nCXjUuuzegoGzotgCsADoUEzC98E/0YD3N0ePkgr8YDNoN9kMe/+JaFKG/8AQLTxW kb3NNbagCC0elR6Bp+uMfIzoRhoSZfBZ9/Zfpg8hAQDIztbqSt2cVapQbAaPDkLBgrmv fiBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZqOH6JolxVNnSbSBaGe2Jh+qvF6QxJR/Z+iRWaahp5U=; b=aNNdZNP2lj7DKX0uf2+Jxm8x4h71bgUgA9NaNEnS/XU/f6GFIOQ5pE9XJSXoQkLS1/ zh/J8Hsq2wbJsiN0fmcmR2WNcUdMVI1JN5Rghc+81FtV9s/8oNk8/393KllHNn14y9g6 W/RfF/LcqPIgtd7WhYNi2JduGzy1C2c5o9ExvLsn7GmBec1zFWD212M/xJMQG5jEDR6R Zde4rJlXgmSRiCNz9YLNu7FaeXtCy7N+1V9kfaXPUudGhXcZgXcSEm/8Z+szMRCylIey tzZotx+iNCtypPyb9CRzjI3eBKA7LmMs4lZ6/DaSyUsMJ1s8gRFDyaRRvGPOndSMfDZc YnuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gUm/O1Ma"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j25si11389547pfr.11.2019.06.24.12.41.22; Mon, 24 Jun 2019 12:41:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gUm/O1Ma"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730959AbfFXTlV (ORCPT + 30 others); Mon, 24 Jun 2019 15:41:21 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:35403 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730740AbfFXTlS (ORCPT ); Mon, 24 Jun 2019 15:41:18 -0400 Received: by mail-wm1-f68.google.com with SMTP id c6so536042wml.0 for ; Mon, 24 Jun 2019 12:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZqOH6JolxVNnSbSBaGe2Jh+qvF6QxJR/Z+iRWaahp5U=; b=gUm/O1MaMjWuqu5r4Y+CtrV3aS9D0yUyXwhqFMQ4h1lSkub19L/ToJF17Xxu8k7hr4 XW6msnXjBvHuGurncF60nTYHc1l6lI9E9qAHfTHLugFl5uDJxCulJbHNrCzprbxtXhwA tXYsEsIbftsjlE8DrKb8MmG5ZBy1dVIH+YwbKpX3XrEelrXSA6xoC1Jw1JaUrPoOmxpv slJk9SCOaAa2q/IE0D3FHwNn7yfWJc6pvcg8Ya3dGvzK/Rpr/dRY2rxJ6ZjqD+sfYXwx oX85bSZ/gtUQm7uOs+Zy307jggM0wFP6tCNdse1TCbXGeh1h8ZET9iJOPoPVies7OV/H 1hdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZqOH6JolxVNnSbSBaGe2Jh+qvF6QxJR/Z+iRWaahp5U=; b=uJ++rlVOeIwoq7Ve+OUSajEZpKyhu/wB+UInwkjzwKwHW3DMGUKgIzPChpx5fSc+gX Ly9XE66UhSYbRdLRRWXKckjFIPivRHFG+hPBaprYDpTv3pUIe0qhnPx/842+xFhH78is TxjZE9ARK5rZHNGGGJdPsB/1j22QHCpFkRUB2wuabNS5o8q4wG4kDmBY4/rq1DHpd7dm NMjYFKA6X0kqpCozAlsnD50JblpVNatznws+7aJcPkTcp9qQPkjlgKgWRUO71QL1f3go 23E+5VFvPxmMpD5BvWQZsqdzNw0D9w/3u6gXq8yMLOMU1vuv/w2zqs2XcTDv4Txjqtzr /hZg== X-Gm-Message-State: APjAAAUFLbVh8lC0U73PcuXcrj2uPekAxL8oGnUvyZSYlMhOkiAk7tu2 J8cJdG8YjZX7RwZGwlRzOuKcHw== X-Received: by 2002:a1c:2e09:: with SMTP id u9mr17158631wmu.137.1561405273701; Mon, 24 Jun 2019 12:41:13 -0700 (PDT) Received: from localhost.localdomain (146-241-101-27.dyn.eolo.it. [146.241.101.27]) by smtp.gmail.com with ESMTPSA id q25sm17615395wrc.68.2019.06.24.12.41.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jun 2019 12:41:13 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, bottura.nicola95@gmail.com, srivatsa@csail.mit.edu, Paolo Valente Subject: [PATCH BUGFIX IMPROVEMENT 7/7] block, bfq: re-schedule empty queues if they deserve I/O plugging Date: Mon, 24 Jun 2019 21:40:42 +0200 Message-Id: <20190624194042.38747-8-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190624194042.38747-1-paolo.valente@linaro.org> References: <20190624194042.38747-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Consider, on one side, a bfq_queue Q that remains empty while in service, and, on the other side, the pending I/O of bfq_queues that, according to their timestamps, have to be served after Q. If an uncontrolled amount of I/O from the latter bfq_queues were dispatched while Q is waiting for its new I/O to arrive, then Q's bandwidth guarantees would be violated. To prevent this, I/O dispatch is plugged until Q receives new I/O (except for a properly controlled amount of injected I/O). Unfortunately, preemption breaks I/O-dispatch plugging, for the following reason. Preemption is performed in two steps. First, Q is expired and re-scheduled. Second, the new bfq_queue to serve is chosen. The first step is needed by the second, as the second can be performed only after Q's timestamps have been properly updated (done in the expiration step), and Q has been re-queued for service. This dependency is a consequence of the way how BFQ's scheduling algorithm is currently implemented. But Q is not re-scheduled at all in the first step, because Q is empty. As a consequence, an uncontrolled amount of I/O may be dispatched until Q becomes non empty again. This breaks Q's service guarantees. This commit addresses this issue by re-scheduling Q even if it is empty. This in turn breaks the assumption that all scheduled queues are non empty. Then a few extra checks are now needed. Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 387 +++++++++++++++++++++++--------------------- 1 file changed, 203 insertions(+), 184 deletions(-) -- 2.20.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 6a3d05023300..72840ebf953e 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -3210,7 +3210,186 @@ static void bfq_dispatch_remove(struct request_queue *q, struct request *rq) bfq_remove_request(q, rq); } -static bool __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq) +/* + * There is a case where idling does not have to be performed for + * throughput concerns, but to preserve the throughput share of + * the process associated with bfqq. + * + * To introduce this case, we can note that allowing the drive + * to enqueue more than one request at a time, and hence + * delegating de facto final scheduling decisions to the + * drive's internal scheduler, entails loss of control on the + * actual request service order. In particular, the critical + * situation is when requests from different processes happen + * to be present, at the same time, in the internal queue(s) + * of the drive. In such a situation, the drive, by deciding + * the service order of the internally-queued requests, does + * determine also the actual throughput distribution among + * these processes. But the drive typically has no notion or + * concern about per-process throughput distribution, and + * makes its decisions only on a per-request basis. Therefore, + * the service distribution enforced by the drive's internal + * scheduler is likely to coincide with the desired throughput + * distribution only in a completely symmetric, or favorably + * skewed scenario where: + * (i-a) each of these processes must get the same throughput as + * the others, + * (i-b) in case (i-a) does not hold, it holds that the process + * associated with bfqq must receive a lower or equal + * throughput than any of the other processes; + * (ii) the I/O of each process has the same properties, in + * terms of locality (sequential or random), direction + * (reads or writes), request sizes, greediness + * (from I/O-bound to sporadic), and so on; + + * In fact, in such a scenario, the drive tends to treat the requests + * of each process in about the same way as the requests of the + * others, and thus to provide each of these processes with about the + * same throughput. This is exactly the desired throughput + * distribution if (i-a) holds, or, if (i-b) holds instead, this is an + * even more convenient distribution for (the process associated with) + * bfqq. + * + * In contrast, in any asymmetric or unfavorable scenario, device + * idling (I/O-dispatch plugging) is certainly needed to guarantee + * that bfqq receives its assigned fraction of the device throughput + * (see [1] for details). + * + * The problem is that idling may significantly reduce throughput with + * certain combinations of types of I/O and devices. An important + * example is sync random I/O on flash storage with command + * queueing. So, unless bfqq falls in cases where idling also boosts + * throughput, it is important to check conditions (i-a), i(-b) and + * (ii) accurately, so as to avoid idling when not strictly needed for + * service guarantees. + * + * Unfortunately, it is extremely difficult to thoroughly check + * condition (ii). And, in case there are active groups, it becomes + * very difficult to check conditions (i-a) and (i-b) too. In fact, + * if there are active groups, then, for conditions (i-a) or (i-b) to + * become false 'indirectly', it is enough that an active group + * contains more active processes or sub-groups than some other active + * group. More precisely, for conditions (i-a) or (i-b) to become + * false because of such a group, it is not even necessary that the + * group is (still) active: it is sufficient that, even if the group + * has become inactive, some of its descendant processes still have + * some request already dispatched but still waiting for + * completion. In fact, requests have still to be guaranteed their + * share of the throughput even after being dispatched. In this + * respect, it is easy to show that, if a group frequently becomes + * inactive while still having in-flight requests, and if, when this + * happens, the group is not considered in the calculation of whether + * the scenario is asymmetric, then the group may fail to be + * guaranteed its fair share of the throughput (basically because + * idling may not be performed for the descendant processes of the + * group, but it had to be). We address this issue with the following + * bi-modal behavior, implemented in the function + * bfq_asymmetric_scenario(). + * + * If there are groups with requests waiting for completion + * (as commented above, some of these groups may even be + * already inactive), then the scenario is tagged as + * asymmetric, conservatively, without checking any of the + * conditions (i-a), (i-b) or (ii). So the device is idled for bfqq. + * This behavior matches also the fact that groups are created + * exactly if controlling I/O is a primary concern (to + * preserve bandwidth and latency guarantees). + * + * On the opposite end, if there are no groups with requests waiting + * for completion, then only conditions (i-a) and (i-b) are actually + * controlled, i.e., provided that conditions (i-a) or (i-b) holds, + * idling is not performed, regardless of whether condition (ii) + * holds. In other words, only if conditions (i-a) and (i-b) do not + * hold, then idling is allowed, and the device tends to be prevented + * from queueing many requests, possibly of several processes. Since + * there are no groups with requests waiting for completion, then, to + * control conditions (i-a) and (i-b) it is enough to check just + * whether all the queues with requests waiting for completion also + * have the same weight. + * + * Not checking condition (ii) evidently exposes bfqq to the + * risk of getting less throughput than its fair share. + * However, for queues with the same weight, a further + * mechanism, preemption, mitigates or even eliminates this + * problem. And it does so without consequences on overall + * throughput. This mechanism and its benefits are explained + * in the next three paragraphs. + * + * Even if a queue, say Q, is expired when it remains idle, Q + * can still preempt the new in-service queue if the next + * request of Q arrives soon (see the comments on + * bfq_bfqq_update_budg_for_activation). If all queues and + * groups have the same weight, this form of preemption, + * combined with the hole-recovery heuristic described in the + * comments on function bfq_bfqq_update_budg_for_activation, + * are enough to preserve a correct bandwidth distribution in + * the mid term, even without idling. In fact, even if not + * idling allows the internal queues of the device to contain + * many requests, and thus to reorder requests, we can rather + * safely assume that the internal scheduler still preserves a + * minimum of mid-term fairness. + * + * More precisely, this preemption-based, idleless approach + * provides fairness in terms of IOPS, and not sectors per + * second. This can be seen with a simple example. Suppose + * that there are two queues with the same weight, but that + * the first queue receives requests of 8 sectors, while the + * second queue receives requests of 1024 sectors. In + * addition, suppose that each of the two queues contains at + * most one request at a time, which implies that each queue + * always remains idle after it is served. Finally, after + * remaining idle, each queue receives very quickly a new + * request. It follows that the two queues are served + * alternatively, preempting each other if needed. This + * implies that, although both queues have the same weight, + * the queue with large requests receives a service that is + * 1024/8 times as high as the service received by the other + * queue. + * + * The motivation for using preemption instead of idling (for + * queues with the same weight) is that, by not idling, + * service guarantees are preserved (completely or at least in + * part) without minimally sacrificing throughput. And, if + * there is no active group, then the primary expectation for + * this device is probably a high throughput. + * + * We are now left only with explaining the additional + * compound condition that is checked below for deciding + * whether the scenario is asymmetric. To explain this + * compound condition, we need to add that the function + * bfq_asymmetric_scenario checks the weights of only + * non-weight-raised queues, for efficiency reasons (see + * comments on bfq_weights_tree_add()). Then the fact that + * bfqq is weight-raised is checked explicitly here. More + * precisely, the compound condition below takes into account + * also the fact that, even if bfqq is being weight-raised, + * the scenario is still symmetric if all queues with requests + * waiting for completion happen to be + * weight-raised. Actually, we should be even more precise + * here, and differentiate between interactive weight raising + * and soft real-time weight raising. + * + * As a side note, it is worth considering that the above + * device-idling countermeasures may however fail in the + * following unlucky scenario: if idling is (correctly) + * disabled in a time period during which all symmetry + * sub-conditions hold, and hence the device is allowed to + * enqueue many requests, but at some later point in time some + * sub-condition stops to hold, then it may become impossible + * to let requests be served in the desired order until all + * the requests already queued in the device have been served. + */ +static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd, + struct bfq_queue *bfqq) +{ + return (bfqq->wr_coeff > 1 && + bfqd->wr_busy_queues < + bfq_tot_busy_queues(bfqd)) || + bfq_asymmetric_scenario(bfqd, bfqq); +} + +static bool __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq, + enum bfqq_expiration reason) { /* * If this bfqq is shared between multiple processes, check @@ -3221,7 +3400,22 @@ static bool __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq) if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq)) bfq_mark_bfqq_split_coop(bfqq); - if (RB_EMPTY_ROOT(&bfqq->sort_list)) { + /* + * Consider queues with a higher finish virtual time than + * bfqq. If idling_needed_for_service_guarantees(bfqq) returns + * true, then bfqq's bandwidth would be violated if an + * uncontrolled amount of I/O from these queues were + * dispatched while bfqq is waiting for its new I/O to + * arrive. This is exactly what may happen if this is a forced + * expiration caused by a preemption attempt, and if bfqq is + * not re-scheduled. To prevent this from happening, re-queue + * bfqq if it needs I/O-dispatch plugging, even if it is + * empty. By doing so, bfqq is granted to be served before the + * above queues (provided that bfqq is of course eligible). + */ + if (RB_EMPTY_ROOT(&bfqq->sort_list) && + !(reason == BFQQE_PREEMPTED && + idling_needed_for_service_guarantees(bfqd, bfqq))) { if (bfqq->dispatched == 0) /* * Overloading budget_timeout field to store @@ -3238,7 +3432,8 @@ static bool __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq) * Resort priority tree of potential close cooperators. * See comments on bfq_pos_tree_add_move() for the unlikely(). */ - if (unlikely(!bfqd->nonrot_with_queueing)) + if (unlikely(!bfqd->nonrot_with_queueing && + !RB_EMPTY_ROOT(&bfqq->sort_list))) bfq_pos_tree_add_move(bfqd, bfqq); } @@ -3739,7 +3934,7 @@ void bfq_bfqq_expire(struct bfq_data *bfqd, * reason. */ __bfq_bfqq_recalc_budget(bfqd, bfqq, reason); - if (__bfq_bfqq_expire(bfqd, bfqq)) + if (__bfq_bfqq_expire(bfqd, bfqq, reason)) /* bfqq is gone, no more actions on it */ return; @@ -3885,184 +4080,6 @@ static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd, bfqd->wr_busy_queues == 0; } -/* - * There is a case where idling does not have to be performed for - * throughput concerns, but to preserve the throughput share of - * the process associated with bfqq. - * - * To introduce this case, we can note that allowing the drive - * to enqueue more than one request at a time, and hence - * delegating de facto final scheduling decisions to the - * drive's internal scheduler, entails loss of control on the - * actual request service order. In particular, the critical - * situation is when requests from different processes happen - * to be present, at the same time, in the internal queue(s) - * of the drive. In such a situation, the drive, by deciding - * the service order of the internally-queued requests, does - * determine also the actual throughput distribution among - * these processes. But the drive typically has no notion or - * concern about per-process throughput distribution, and - * makes its decisions only on a per-request basis. Therefore, - * the service distribution enforced by the drive's internal - * scheduler is likely to coincide with the desired throughput - * distribution only in a completely symmetric, or favorably - * skewed scenario where: - * (i-a) each of these processes must get the same throughput as - * the others, - * (i-b) in case (i-a) does not hold, it holds that the process - * associated with bfqq must receive a lower or equal - * throughput than any of the other processes; - * (ii) the I/O of each process has the same properties, in - * terms of locality (sequential or random), direction - * (reads or writes), request sizes, greediness - * (from I/O-bound to sporadic), and so on; - - * In fact, in such a scenario, the drive tends to treat the requests - * of each process in about the same way as the requests of the - * others, and thus to provide each of these processes with about the - * same throughput. This is exactly the desired throughput - * distribution if (i-a) holds, or, if (i-b) holds instead, this is an - * even more convenient distribution for (the process associated with) - * bfqq. - * - * In contrast, in any asymmetric or unfavorable scenario, device - * idling (I/O-dispatch plugging) is certainly needed to guarantee - * that bfqq receives its assigned fraction of the device throughput - * (see [1] for details). - * - * The problem is that idling may significantly reduce throughput with - * certain combinations of types of I/O and devices. An important - * example is sync random I/O on flash storage with command - * queueing. So, unless bfqq falls in cases where idling also boosts - * throughput, it is important to check conditions (i-a), i(-b) and - * (ii) accurately, so as to avoid idling when not strictly needed for - * service guarantees. - * - * Unfortunately, it is extremely difficult to thoroughly check - * condition (ii). And, in case there are active groups, it becomes - * very difficult to check conditions (i-a) and (i-b) too. In fact, - * if there are active groups, then, for conditions (i-a) or (i-b) to - * become false 'indirectly', it is enough that an active group - * contains more active processes or sub-groups than some other active - * group. More precisely, for conditions (i-a) or (i-b) to become - * false because of such a group, it is not even necessary that the - * group is (still) active: it is sufficient that, even if the group - * has become inactive, some of its descendant processes still have - * some request already dispatched but still waiting for - * completion. In fact, requests have still to be guaranteed their - * share of the throughput even after being dispatched. In this - * respect, it is easy to show that, if a group frequently becomes - * inactive while still having in-flight requests, and if, when this - * happens, the group is not considered in the calculation of whether - * the scenario is asymmetric, then the group may fail to be - * guaranteed its fair share of the throughput (basically because - * idling may not be performed for the descendant processes of the - * group, but it had to be). We address this issue with the following - * bi-modal behavior, implemented in the function - * bfq_asymmetric_scenario(). - * - * If there are groups with requests waiting for completion - * (as commented above, some of these groups may even be - * already inactive), then the scenario is tagged as - * asymmetric, conservatively, without checking any of the - * conditions (i-a), (i-b) or (ii). So the device is idled for bfqq. - * This behavior matches also the fact that groups are created - * exactly if controlling I/O is a primary concern (to - * preserve bandwidth and latency guarantees). - * - * On the opposite end, if there are no groups with requests waiting - * for completion, then only conditions (i-a) and (i-b) are actually - * controlled, i.e., provided that conditions (i-a) or (i-b) holds, - * idling is not performed, regardless of whether condition (ii) - * holds. In other words, only if conditions (i-a) and (i-b) do not - * hold, then idling is allowed, and the device tends to be prevented - * from queueing many requests, possibly of several processes. Since - * there are no groups with requests waiting for completion, then, to - * control conditions (i-a) and (i-b) it is enough to check just - * whether all the queues with requests waiting for completion also - * have the same weight. - * - * Not checking condition (ii) evidently exposes bfqq to the - * risk of getting less throughput than its fair share. - * However, for queues with the same weight, a further - * mechanism, preemption, mitigates or even eliminates this - * problem. And it does so without consequences on overall - * throughput. This mechanism and its benefits are explained - * in the next three paragraphs. - * - * Even if a queue, say Q, is expired when it remains idle, Q - * can still preempt the new in-service queue if the next - * request of Q arrives soon (see the comments on - * bfq_bfqq_update_budg_for_activation). If all queues and - * groups have the same weight, this form of preemption, - * combined with the hole-recovery heuristic described in the - * comments on function bfq_bfqq_update_budg_for_activation, - * are enough to preserve a correct bandwidth distribution in - * the mid term, even without idling. In fact, even if not - * idling allows the internal queues of the device to contain - * many requests, and thus to reorder requests, we can rather - * safely assume that the internal scheduler still preserves a - * minimum of mid-term fairness. - * - * More precisely, this preemption-based, idleless approach - * provides fairness in terms of IOPS, and not sectors per - * second. This can be seen with a simple example. Suppose - * that there are two queues with the same weight, but that - * the first queue receives requests of 8 sectors, while the - * second queue receives requests of 1024 sectors. In - * addition, suppose that each of the two queues contains at - * most one request at a time, which implies that each queue - * always remains idle after it is served. Finally, after - * remaining idle, each queue receives very quickly a new - * request. It follows that the two queues are served - * alternatively, preempting each other if needed. This - * implies that, although both queues have the same weight, - * the queue with large requests receives a service that is - * 1024/8 times as high as the service received by the other - * queue. - * - * The motivation for using preemption instead of idling (for - * queues with the same weight) is that, by not idling, - * service guarantees are preserved (completely or at least in - * part) without minimally sacrificing throughput. And, if - * there is no active group, then the primary expectation for - * this device is probably a high throughput. - * - * We are now left only with explaining the additional - * compound condition that is checked below for deciding - * whether the scenario is asymmetric. To explain this - * compound condition, we need to add that the function - * bfq_asymmetric_scenario checks the weights of only - * non-weight-raised queues, for efficiency reasons (see - * comments on bfq_weights_tree_add()). Then the fact that - * bfqq is weight-raised is checked explicitly here. More - * precisely, the compound condition below takes into account - * also the fact that, even if bfqq is being weight-raised, - * the scenario is still symmetric if all queues with requests - * waiting for completion happen to be - * weight-raised. Actually, we should be even more precise - * here, and differentiate between interactive weight raising - * and soft real-time weight raising. - * - * As a side note, it is worth considering that the above - * device-idling countermeasures may however fail in the - * following unlucky scenario: if idling is (correctly) - * disabled in a time period during which all symmetry - * sub-conditions hold, and hence the device is allowed to - * enqueue many requests, but at some later point in time some - * sub-condition stops to hold, then it may become impossible - * to let requests be served in the desired order until all - * the requests already queued in the device have been served. - */ -static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd, - struct bfq_queue *bfqq) -{ - return (bfqq->wr_coeff > 1 && - bfqd->wr_busy_queues < - bfq_tot_busy_queues(bfqd)) || - bfq_asymmetric_scenario(bfqd, bfqq); -} - /* * For a queue that becomes empty, device idling is allowed only if * this function returns true for that queue. As a consequence, since @@ -4321,7 +4338,8 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) { struct bfq_queue *async_bfqq = bfqq->bic && bfqq->bic->bfqq[0] && - bfq_bfqq_busy(bfqq->bic->bfqq[0]) ? + bfq_bfqq_busy(bfqq->bic->bfqq[0]) && + bfqq->bic->bfqq[0]->next_rq ? bfqq->bic->bfqq[0] : NULL; /* @@ -4403,6 +4421,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) bfqq = bfqq->bic->bfqq[0]; else if (bfq_bfqq_has_waker(bfqq) && bfq_bfqq_busy(bfqq->waker_bfqq) && + bfqq->next_rq && bfq_serv_to_charge(bfqq->waker_bfqq->next_rq, bfqq->waker_bfqq) <= bfq_bfqq_budget_left(bfqq->waker_bfqq) @@ -4800,7 +4819,7 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq) struct hlist_node *n; if (bfqq == bfqd->in_service_queue) { - __bfq_bfqq_expire(bfqd, bfqq); + __bfq_bfqq_expire(bfqd, bfqq, BFQQE_BUDGET_TIMEOUT); bfq_schedule_dispatch(bfqd); }