From patchwork Thu Dec 28 11:19:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 122835 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp3320477qgn; Thu, 28 Dec 2017 03:19:39 -0800 (PST) X-Google-Smtp-Source: ACJfBos6O3Hm7v5f4GuxcgXKaK5RBgQwzdbE1NcgZtE8eqt7GHAWMYsBt0uAb6YhbS7dE1ZQqNWy X-Received: by 10.101.66.67 with SMTP id d3mr2256854pgq.244.1514459979610; Thu, 28 Dec 2017 03:19:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514459979; cv=none; d=google.com; s=arc-20160816; b=xgdvcZQCzOEfWhPb01LkhnqxbnG/axzBgw9f/ONZvKc4pwTs86r4UXtV0zJ51cjYSF xIO4WZqISHwFzoC/l4YVvH8v9sBG7HlfQvmWmWMhCbjOvUfvUzQxp1F1obiYlVDqBLP1 tvQWV1W3EkR4QKF/blyzU9dXOe9EV429VRyXda5EoTIBH+yY2D0bj8nySD0UGAYPh7Xb hrHTS6lEftYXJGJCUzF+8RVmtg7B28Kamwz/69pI53LEnAheax9HEcYQL8xS6qD8Ju5W UDKkxQLYs/263s4bbyN5+oe8Od6MMsp9Rq1mn/FZTvRB0z3o0FkMfIScwsMBm3GUM5Ef HADw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=jbTftw/FGZ4oZxYiqQTosOeqNa1RocRCynx3ZADQJLk=; b=x0vvxXGC/NJdIgJ3VnDRDApKBCuw2OGjLCrOZhiobH3Mp7r9cPDWlUZ2qDxx/NzcRy UVS/CdRi7/NxjG/3YDJZzscveFK1JiiX5e4UvwCF3bboRRISjYsn1Mrvnto4hwln0Zoz CYIi/UI/MDodlGSAzMzb0BS453Q/rnDYPWZDUAP7w/GpdPIgzGJuF0aIcxESj5e5tkkF py2DltsXaTZwoWwj/0r3MFr0V0wQbKMk8xBPFFreeNeaBvk1RtEaUjJExL6Nl1RSOVur QaWAtvndbAoVdbcHuTE9eJGeuezmgvloXHKPBdbKrjA5uzPM38Van2YxR1fOazvtBnAu OxDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=W5w2ptir; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 33si26258652ply.741.2017.12.28.03.19.39; Thu, 28 Dec 2017 03:19:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=W5w2ptir; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753386AbdL1LTf (ORCPT + 28 others); Thu, 28 Dec 2017 06:19:35 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:46263 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752905AbdL1LTc (ORCPT ); Thu, 28 Dec 2017 06:19:32 -0500 Received: by mail-wr0-f196.google.com with SMTP id g17so29697321wrd.13 for ; Thu, 28 Dec 2017 03:19:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=jbTftw/FGZ4oZxYiqQTosOeqNa1RocRCynx3ZADQJLk=; b=W5w2ptir63dECLym0UFp4UPYN34DQYEO2HNmv602zf70mXtB7qxy8Mm7MaYdg6vlU1 WUePAUPzGqt5WtTK3T9YL2oCohYHhVHev3+Dui8DurMk+xvT+04Z+qzxVjhCcuXVfWcx vJMnWpr4ujn5eujZLfX9XrP/ZDeJCMYt6KH1s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=jbTftw/FGZ4oZxYiqQTosOeqNa1RocRCynx3ZADQJLk=; b=m7trARJi/CpltFyFJ0v6/NNfVCOd/RmtRQLRkAZGjMGj9ll7MG47DMvqfeXNgTQlCR u+NOI1As2Wng+RgwOP1aZia+kYmbtDNKYLAH/EZumq5QBcRbJzyiiFuwnQOVigTopI1u 3pgJdg0CK4UIR/KQXtcJiYYkQfe2QJ2D/eF9tQjvBenDUp/0VSVCvSl8d52IJBGwxECG 3A02/66Bzn2LrCgNEvZI4NNH56C0n0e+XRydAT4rR1HK5tCCD8WmCT9TgBTlx5BWZKH3 822Gv/LT3ZfxqOsQq1aRLr9owWvK5JkuV/umfz35AwwRI9UiwnxC4NEt/OLer+/YFVqE A3/w== X-Gm-Message-State: AKGB3mJHbyIu71LKKc4MbhoiDx86OmjgY6QZYiHFimLwwngTOHxhFyhB W6GmpNDNtwDWiCsClLvCnZei/g== X-Received: by 10.223.161.208 with SMTP id v16mr28717603wrv.103.1514459971281; Thu, 28 Dec 2017 03:19:31 -0800 (PST) Received: from localhost.localdomain (146-241-52-215.dyn.eolo.it. [146.241.52.215]) by smtp.gmail.com with ESMTPSA id f132sm12527735wmf.17.2017.12.28.03.19.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Dec 2017 03:19:30 -0800 (PST) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, Paolo Valente Subject: [PATCH IMPROVEMENT] block, bfq: limit sectors served with interactive weight raising Date: Thu, 28 Dec 2017 12:19:17 +0100 Message-Id: <20171228111917.2767-1-paolo.valente@linaro.org> X-Mailer: git-send-email 2.15.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To maximise responsiveness, BFQ raises the weight, and performs device idling, for bfq_queues associated with processes deemed as interactive. In particular, weight raising has a maximum duration, equal to the time needed to start a large application. If a weight-raised process goes on doing I/O beyond this maximum duration, it loses weight-raising. This mechanism is evidently vulnerable to the following false positives: I/O-bound applications that will go on doing I/O for much longer than the duration of weight-raising. These applications have basically no benefit from being weight-raised at the beginning of their I/O. On the opposite end, while being weight-raised, these applications a) unjustly steal throughput to applications that may truly need low latency; b) make BFQ uselessly perform device idling; device idling results in loss of device throughput with most flash-based storage, and may increase latencies when used purposelessly. This commit adds a countermeasure to reduce both the above problems. To introduce this countermeasure, we provide the following extra piece of information (full details in the comments added by this commit). During the start-up of the large application used as a reference to set the duration of weight-raising, involved processes transfer at most ~110K sectors each. Accordingly, a process initially deemed as interactive has no right to be weight-raised any longer, once transferred 110K sectors or more. Basing on this consideration, this commit early-ends weight-raising for a bfq_queue if the latter happens to have received an amount of service at least equal to 110K sectors (actually, a little bit more, to keep a safety margin). I/O-bound applications that reach a high throughput, such as file copy, get to this threshold much before the allowed weight-raising period finishes. Thus this early ending of weight-raising reduces the amount of time during which these applications cause the problems described above. Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++------ block/bfq-iosched.h | 5 ++++ block/bfq-wf2q.c | 3 ++ 3 files changed, 80 insertions(+), 9 deletions(-) -- 2.15.1 Tested-by: Oleksandr Natalenko diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 6f75015d18c0..ea48b5c8f088 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -209,15 +209,17 @@ static struct kmem_cache *bfq_pool; * interactive applications automatically, using the following formula: * duration = (R / r) * T, where r is the peak rate of the device, and * R and T are two reference parameters. - * In particular, R is the peak rate of the reference device (see below), - * and T is a reference time: given the systems that are likely to be - * installed on the reference device according to its speed class, T is - * about the maximum time needed, under BFQ and while reading two files in - * parallel, to load typical large applications on these systems. - * In practice, the slower/faster the device at hand is, the more/less it - * takes to load applications with respect to the reference device. - * Accordingly, the longer/shorter BFQ grants weight raising to interactive - * applications. + * In particular, R is the peak rate of the reference device (see + * below), and T is a reference time: given the systems that are + * likely to be installed on the reference device according to its + * speed class, T is about the maximum time needed, under BFQ and + * while reading two files in parallel, to load typical large + * applications on these systems (see the comments on + * max_service_from_wr below, for more details on how T is obtained). + * In practice, the slower/faster the device at hand is, the more/less + * it takes to load applications with respect to the reference device. + * Accordingly, the longer/shorter BFQ grants weight raising to + * interactive applications. * * BFQ uses four different reference pairs (R, T), depending on: * . whether the device is rotational or non-rotational; @@ -254,6 +256,60 @@ static int T_slow[2]; static int T_fast[2]; static int device_speed_thresh[2]; +/* + * BFQ uses the above-detailed, time-based weight-raising mechanism to + * privilege interactive tasks. This mechanism is vulnerable to the + * following false positives: I/O-bound applications that will go on + * doing I/O for much longer than the duration of weight + * raising. These applications have basically no benefit from being + * weight-raised at the beginning of their I/O. On the opposite end, + * while being weight-raised, these applications + * a) unjustly steal throughput to applications that may actually need + * low latency; + * b) make BFQ uselessly perform device idling; device idling results + * in loss of device throughput with most flash-based storage, and may + * increase latencies when used purposelessly. + * + * BFQ tries to reduce these problems, by adopting the following + * countermeasure. To introduce this countermeasure, we need first to + * finish explaining how the duration of weight-raising for + * interactive tasks is computed. + * + * For a bfq_queue deemed as interactive, the duration of weight + * raising is dynamically adjusted, as a function of the estimated + * peak rate of the device, so as to be equal to the time needed to + * execute the 'largest' interactive task we benchmarked so far. By + * largest task, we mean the task for which each involved process has + * to do more I/O than for any of the other tasks we benchmarked. This + * reference interactive task is the start-up of LibreOffice Writer, + * and in this task each process/bfq_queue needs to have at most ~110K + * sectors transferred. + * + * This last piece of information enables BFQ to reduce the actual + * duration of weight-raising for at least one class of I/O-bound + * applications: those doing sequential or quasi-sequential I/O. An + * example is file copy. In fact, once started, the main I/O-bound + * processes of these applications usually consume the above 110K + * sectors in much less time than the processes of an application that + * is starting, because these I/O-bound processes will greedily devote + * almost all their CPU cycles only to their target, + * throughput-friendly I/O operations. This is even more true if BFQ + * happens to be underestimating the device peak rate, and thus + * overestimating the duration of weight raising. But, according to + * our measurements, once transferred 110K sectors, these processes + * have no right to be weight-raised any longer. + * + * Basing on the last consideration, BFQ ends weight-raising for a + * bfq_queue if the latter happens to have received an amount of + * service at least equal to the following constant. The constant is + * set to slightly more than 110K, to have a minimum safety margin. + * + * This early ending of weight-raising reduces the amount of time + * during which interactive false positives cause the two problems + * described at the beginning of these comments. + */ +static const unsigned long max_service_from_wr = 120000; + #define RQ_BIC(rq) icq_to_bic((rq)->elv.priv[0]) #define RQ_BFQQ(rq) ((rq)->elv.priv[1]) @@ -1352,6 +1408,7 @@ static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd, if (old_wr_coeff == 1 && wr_or_deserves_wr) { /* start a weight-raising period */ if (interactive) { + bfqq->service_from_wr = 0; bfqq->wr_coeff = bfqd->bfq_wr_coeff; bfqq->wr_cur_max_time = bfq_wr_duration(bfqd); } else { @@ -3665,6 +3722,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq) bfqq->entity.prio_changed = 1; } } + if (bfqq->wr_coeff > 1 && + bfqq->wr_cur_max_time != bfqd->bfq_wr_rt_max_time && + bfqq->service_from_wr > max_service_from_wr) { + /* see comments on max_service_from_wr */ + bfq_bfqq_end_wr(bfqq); + } } /* * To improve latency (for this or other queues), immediately diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index fcd941008127..350c39ae2896 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -337,6 +337,11 @@ struct bfq_queue { * last transition from idle to backlogged. */ unsigned long service_from_backlogged; + /* + * Cumulative service received from the @bfq_queue since its + * last transition to weight-raised state. + */ + unsigned long service_from_wr; /* * Value of wr start time when switching to soft rt diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index 4456eda34e48..4498c43245e2 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -838,6 +838,9 @@ void bfq_bfqq_served(struct bfq_queue *bfqq, int served) if (!bfqq->service_from_backlogged) bfqq->first_IO_time = jiffies; + if (bfqq->wr_coeff > 1) + bfqq->service_from_wr += served; + bfqq->service_from_backlogged += served; for_each_entity(entity) { st = bfq_entity_service_tree(entity);