From patchwork Sat Jan 13 11:05:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 124423 Delivered-To: patch@linaro.org Received: by 10.46.64.148 with SMTP id r20csp95594lje; Sat, 13 Jan 2018 03:06:01 -0800 (PST) X-Google-Smtp-Source: ACJfBot4Bd40cO84Exisi7RCQiC1zSXzF2MKM4moBzgM9Ucw8k5C8pxEoC9C+DO7daXhMffja6op X-Received: by 10.159.207.150 with SMTP id z22mr29028254plo.214.1515841561397; Sat, 13 Jan 2018 03:06:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515841561; cv=none; d=google.com; s=arc-20160816; b=kObLhPTJ4Zu6OiAOLRA3v/qyzHwqm1u9hIkTx0Q+XYDiLp27MJ0jlT0OoT6f0EB/3s 7N2Co+gPWqIzaegFKR4v98zHzLNg7d6l+nYH/Q7i9yjoK+J/b5Lb0xHB0UH0We6eIvOR kpGUW4Dh4SOf7022DY7QHkli2JcmFlmzJucjwdIMJG/SaT2XqseZPcTMdyMqbmNlmGQf 9lzWm2TawLWC77KPGSci9MFM/vRudOWPR8QqWDxh7CNZbjsfNLlkTP7cffaMeVaVvWyU O1FXD3W0d8PdhYEStcCNyP7ooCnuiqH081Xcw1/XpWD8S3lyWvBhNVR3F6692ZWCmIMR P+cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=tPmlQPbD1RJPmkRgIqkRKPwGMCuKL5Tn9KHHaG7Kr+M=; b=X2hgS6AWNftxgQuTQQg99WOgDaHMpIEpdPJIsQI3joGheAM69Q9SVCtH/clvU6L02g dJvPyjJIFVjPFM8VPrXEtFO0yD136uOyYb1qEY0pSWgMOqUR2smSE0G+YdHng6JvOtLt t1wAkrNdTQOlwkDFup73ybCJ99RDBYtf9QZDMJFN36mdioKQt4uSJ/ljGky8CcqGJmqG WY/0R5UlUvClqBq57YGyu6B2iGHp4DMOHfCOluGHIu9RtJPq/GB4zXcBM9cOvadbP7kO 2RTyM5bJs98eD4Q1ciDwx7Q38SNEHLn+IQS5la1G1EKAiWXi4LLqFp2JBL5tTrPlM8YI +caQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PXV6aMaG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r83si11748003pfe.13.2018.01.13.03.06.01; Sat, 13 Jan 2018 03:06:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PXV6aMaG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934247AbeAMLFm (ORCPT + 28 others); Sat, 13 Jan 2018 06:05:42 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:35757 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933771AbeAMLFg (ORCPT ); Sat, 13 Jan 2018 06:05:36 -0500 Received: by mail-wr0-f194.google.com with SMTP id g38so4243134wrd.2 for ; Sat, 13 Jan 2018 03:05:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tPmlQPbD1RJPmkRgIqkRKPwGMCuKL5Tn9KHHaG7Kr+M=; b=PXV6aMaGwmnXTifnRl+ndI/48AiyG+8kfZ3whLZsyB5R21vh4WDHJrRcfDGd6yhb++ ORS+gVNrk4+CC5QXWh9BE135vKyhQrF67GYSkR5H5kblH7tSAzzNGN0kKsjIprU8Zxz9 qfpDQY+CvD/25dsMBEVN3w00NahmqSWLerhw8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tPmlQPbD1RJPmkRgIqkRKPwGMCuKL5Tn9KHHaG7Kr+M=; b=Y1M0N889yFlPJmPBSu6A6UhUIbocgl7T+/xoNrukRXXiJbtFVMYNdCRAhpVAir9w+j X7fwY99q7TukvEJB6eSL+Jyr3ko+oZIWc/gt7tCmwYocYobHZB1cbk2Whn+sgUkycdbM hKE0qiGcTO1q7+KjeupUftyEjQnEAep2XzPzbhE5njT5Q9/CUMsmzGO9gBm+VFUriWH0 QCDqNrFqod4l2xoxCL40ZeZQB2feqC11TDiebuAWR0KBZgqcIOYBNWgY0Nc/YyycIxsO UABWWEzdcOzinr3AY7SdGCkRSfhWiJwp6nFSPkR6xVJw8mFzZ9ag2kFZxZm9G0GIG+1p XAog== X-Gm-Message-State: AKwxytfvyMz6nxOYrFCULSF32eqCW1nqpqX6rMrQIBu9CmmlPHKAGQrA g0SBM/Y6Md6bimSKIARXMWO/6g== X-Received: by 10.223.166.41 with SMTP id k38mr6996442wrc.242.1515841534797; Sat, 13 Jan 2018 03:05:34 -0800 (PST) Received: from localhost.localdomain (146-241-36-39.dyn.eolo.it. [146.241.36.39]) by smtp.gmail.com with ESMTPSA id o18sm20747276wrg.59.2018.01.13.03.05.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Jan 2018 03:05:33 -0800 (PST) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, Paolo Valente Subject: [PATCH BUGFIX/IMPROVEMENT 1/2] block, bfq: limit tags for writes and async I/O Date: Sat, 13 Jan 2018 12:05:17 +0100 Message-Id: <20180113110518.2519-2-paolo.valente@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180113110518.2519-1-paolo.valente@linaro.org> References: <20180113110518.2519-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Asynchronous I/O can easily starve synchronous I/O (both sync reads and sync writes), by consuming all request tags. Similarly, storms of synchronous writes, such as those that sync(2) may trigger, can starve synchronous reads. In their turn, these two problems may also cause BFQ to loose control on latency for interactive and soft real-time applications. For example, on a PLEXTOR PX-256M5S SSD, LibreOffice Writer takes 0.6 seconds to start if the device is idle, but it takes more than 45 seconds (!) if there are sequential writes in the background. This commit addresses this issue by limiting the maximum percentage of tags that asynchronous I/O requests and synchronous write requests can consume. In particular, this commit grants a higher threshold to synchronous writes, to prevent the latter from being starved by asynchronous I/O. According to the above test, LibreOffice Writer now starts in about 1.2 seconds on average, regardless of the background workload, and apart from some rare outlier. To check this improvement, run, e.g., sudo ./comm_startup_lat.sh bfq 5 5 seq 10 "lowriter --terminate_after_init" for the comm_startup_lat benchmark in the S suite [1]. [1] https://github.com/Algodev-github/S Tested-by: Oleksandr Natalenko Tested-by: Holger Hoffstätte Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++ block/bfq-iosched.h | 12 +++++++++ 2 files changed, 89 insertions(+) -- 2.15.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 1caeecad7af1..527bd2ccda51 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -417,6 +417,82 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd, } } +/* + * See the comments on bfq_limit_depth for the purpose of + * the depths set in the function. + */ +static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) +{ + bfqd->sb_shift = bt->sb.shift; + + /* + * In-word depths if no bfq_queue is being weight-raised: + * leaving 25% of tags only for sync reads. + * + * In next formulas, right-shift the value + * (1U<sb_shift), instead of computing directly + * (1U<<(bfqd->sb_shift - something)), to be robust against + * any possible value of bfqd->sb_shift, without having to + * limit 'something'. + */ + /* no more than 50% of tags for async I/O */ + bfqd->word_depths[0][0] = max((1U<sb_shift)>>1, 1U); + /* + * no more than 75% of tags for sync writes (25% extra tags + * w.r.t. async I/O, to prevent async I/O from starving sync + * writes) + */ + bfqd->word_depths[0][1] = max(((1U<sb_shift) * 3)>>2, 1U); + + /* + * In-word depths in case some bfq_queue is being weight- + * raised: leaving ~63% of tags for sync reads. This is the + * highest percentage for which, in our tests, application + * start-up times didn't suffer from any regression due to tag + * shortage. + */ + /* no more than ~18% of tags for async I/O */ + bfqd->word_depths[1][0] = max(((1U<sb_shift) * 3)>>4, 1U); + /* no more than ~37% of tags for sync writes (~20% extra tags) */ + bfqd->word_depths[1][1] = max(((1U<sb_shift) * 6)>>4, 1U); +} + +/* + * Async I/O can easily starve sync I/O (both sync reads and sync + * writes), by consuming all tags. Similarly, storms of sync writes, + * such as those that sync(2) may trigger, can starve sync reads. + * Limit depths of async I/O and sync writes so as to counter both + * problems. + */ +static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data) +{ + struct blk_mq_tags *tags = blk_mq_tags_from_data(data); + struct bfq_data *bfqd = data->q->elevator->elevator_data; + struct sbitmap_queue *bt; + + if (op_is_sync(op) && !op_is_write(op)) + return; + + if (data->flags & BLK_MQ_REQ_RESERVED) { + if (unlikely(!tags->nr_reserved_tags)) { + WARN_ON_ONCE(1); + return; + } + bt = &tags->breserved_tags; + } else + bt = &tags->bitmap_tags; + + if (unlikely(bfqd->sb_shift != bt->sb.shift)) + bfq_update_depths(bfqd, bt); + + data->shallow_depth = + bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; + + bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", + __func__, bfqd->wr_busy_queues, op_is_sync(op), + data->shallow_depth); +} + static struct bfq_queue * bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root, sector_t sector, struct rb_node **ret_parent, @@ -5267,6 +5343,7 @@ static struct elv_fs_entry bfq_attrs[] = { static struct elevator_type iosched_bfq_mq = { .ops.mq = { + .limit_depth = bfq_limit_depth, .prepare_request = bfq_prepare_request, .finish_request = bfq_finish_request, .exit_icq = bfq_exit_icq, diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 5d47b58d5fc8..fcd941008127 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -629,6 +629,18 @@ struct bfq_data { struct bfq_io_cq *bio_bic; /* bfqq associated with the task issuing current bio for merging */ struct bfq_queue *bio_bfqq; + + /* + * Cached sbitmap shift, used to compute depth limits in + * bfq_update_depths. + */ + unsigned int sb_shift; + + /* + * Depth limits used in bfq_limit_depth (see comments on the + * function) + */ + unsigned int word_depths[2][2]; }; enum bfqq_state_flags { From patchwork Sat Jan 13 11:05:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 124422 Delivered-To: patch@linaro.org Received: by 10.46.64.148 with SMTP id r20csp95559lje; Sat, 13 Jan 2018 03:05:48 -0800 (PST) X-Google-Smtp-Source: ACJfBotKKNVfFBfoEotKgOJ4DMvUxN0eOTyE0JpZ7x1tgQ9GeauoZZlKyPI1RP/DvJiZiaw6u7BN X-Received: by 10.98.66.67 with SMTP id p64mr19203656pfa.227.1515841548639; Sat, 13 Jan 2018 03:05:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515841548; cv=none; d=google.com; s=arc-20160816; b=I/PGbnStEim49vFb+CQHqyDrkI5GUoskAPbD2p3mTTkYVpanPZg+2Nv8PnwS1V4Is0 NRAY1obcfRt5xqMU9DJbZzvEiDhzZUUPxhPKyYOEqKjwDTGbQs7s8ixYq4ao5jzBGjcW iCeaNK3FRuuTJ+GEvQibrVIkWwrhfTY5ClhkIDP0epuAtEU+vi4Hr6Ydc9vemlA+lnMC Pxjx8BxutsEuZVo0T7FQQC80iEYXv1vj2i4uRPXxJyO9MzavsxkCy/jNbM4/WJ9MAyfe dHuJBwRZrff4nOr3XHsVTQwvg+Iwtbwx7oV+pf/RDCfYkzMAs5PvtewtOWCP//rRe4RD yxMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=A4hAC9EsB1rd8v+W6hjPque0uzDQwvNGRfYppcbwKWk=; b=EJnat64BZqVOr3vroHyZK5l2fGPksKJrYYUNyC7NtVMjjRCG0ISOiTV8DwbKyGybUd O5TCIc2mi4BEj+jf1FwekpFIbOOd+/4WGj406dKydEmmKrVgBl0IhC11BkCiBCXE5DOg Ixe442hJPI/VtD/Tp1y5ZJ8gss/kBGJCrkAZRbm94m98I55HdU9Ry/Xbc5r+rQjKbWsG NVvVAZhlYZ92Tabve8D18iW9RrJ5mAflX9i5NCxEFnQp1WmEnG5cBJRO1c+qG6qfQpZJ 6+vyMjiociJEeNfC8U3jCr9Sn4D/wVdN6sNbLZ573sgYr+O3WuLiViHn0f/A7JwCASLl DJ7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ClRYNHU6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l6si14937885pgf.344.2018.01.13.03.05.48; Sat, 13 Jan 2018 03:05:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ClRYNHU6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934277AbeAMLFo (ORCPT + 28 others); Sat, 13 Jan 2018 06:05:44 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:46832 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934186AbeAMLFh (ORCPT ); Sat, 13 Jan 2018 06:05:37 -0500 Received: by mail-wm0-f66.google.com with SMTP id 143so16536890wma.5 for ; Sat, 13 Jan 2018 03:05:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=A4hAC9EsB1rd8v+W6hjPque0uzDQwvNGRfYppcbwKWk=; b=ClRYNHU6a8fltH46j6lSWmlmhg15+oU+HAq8cGGns60le4c34wjICOwUvcXdy1ZQxK FLp2sBbtl7t+Gio5tkmvos0zo+KYTERQ9uTJewOMNgTQwQ3W3V5O55hE5Gwb2ul4tUYC DSxQy9rVLzXF5jJagcm7aZp0i4YkmX8GKGtj8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=A4hAC9EsB1rd8v+W6hjPque0uzDQwvNGRfYppcbwKWk=; b=fFFcj2cpWnn0cge3+zWxaR1RDq5ChmnJR2Wy91KQ10rGgkzkJOKHCgQngx68lpmyea Gj9xh9xsOpiYBL6WqTG5yLgULdlM6Eedaw9fSAa8cHTw92eo4BSoUvvH2I5FHKcj+xor skHNnIp7q+AyZIaLqYyGIoKlobbQFGJM5PktEsL9l9qQurruR8nDXrlUVUES7z0XZHPs KvkengWH5US2XnTmzIen3KiyjA1ilKZFc+yFcjAEiGeBzQkPr4YG8ytrMlOVXrBlkvwN hR1UFUvxvoO62m30M1EPOM7zU8miqNTyWW+l3ESpZ7Mm6DQgVRRKWcjD8tuHVtNjtZUk m1RQ== X-Gm-Message-State: AKwxytfb+GhcoW5DNRP7/CQrrZGaMPrzegBo4Zl1yIYAKIiPU4jqt7wN XySWzsx83XiiVMtJ2XcvI+vTug== X-Received: by 10.28.234.10 with SMTP id i10mr6106403wmh.14.1515841536190; Sat, 13 Jan 2018 03:05:36 -0800 (PST) Received: from localhost.localdomain (146-241-36-39.dyn.eolo.it. [146.241.36.39]) by smtp.gmail.com with ESMTPSA id o18sm20747276wrg.59.2018.01.13.03.05.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Jan 2018 03:05:35 -0800 (PST) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, linus.walleij@linaro.org, bfq-iosched@googlegroups.com, oleksandr@natalenko.name, Paolo Valente Subject: [PATCH BUGFIX/IMPROVEMENT 2/2] block, bfq: limit sectors served with interactive weight raising Date: Sat, 13 Jan 2018 12:05:18 +0100 Message-Id: <20180113110518.2519-3-paolo.valente@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180113110518.2519-1-paolo.valente@linaro.org> References: <20180113110518.2519-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To maximise responsiveness, BFQ raises the weight, and performs device idling, for bfq_queues associated with processes deemed as interactive. In particular, weight raising has a maximum duration, equal to the time needed to start a large application. If a weight-raised process goes on doing I/O beyond this maximum duration, it loses weight-raising. This mechanism is evidently vulnerable to the following false positives: I/O-bound applications that will go on doing I/O for much longer than the duration of weight-raising. These applications have basically no benefit from being weight-raised at the beginning of their I/O. On the opposite end, while being weight-raised, these applications a) unjustly steal throughput to applications that may truly need low latency; b) make BFQ uselessly perform device idling; device idling results in loss of device throughput with most flash-based storage, and may increase latencies when used purposelessly. This commit adds a countermeasure to reduce both the above problems. To introduce this countermeasure, we provide the following extra piece of information (full details in the comments added by this commit). During the start-up of the large application used as a reference to set the duration of weight-raising, involved processes transfer at most ~110K sectors each. Accordingly, a process initially deemed as interactive has no right to be weight-raised any longer, once transferred 110K sectors or more. Basing on this consideration, this commit early-ends weight-raising for a bfq_queue if the latter happens to have received an amount of service at least equal to 110K sectors (actually, a little bit more, to keep a safety margin). I/O-bound applications that reach a high throughput, such as file copy, get to this threshold much before the allowed weight-raising period finishes. Thus this early ending of weight-raising reduces the amount of time during which these applications cause the problems described above. Tested-by: Oleksandr Natalenko Tested-by: Holger Hoffstätte Signed-off-by: Paolo Valente --- block/bfq-iosched.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++------ block/bfq-iosched.h | 5 ++++ block/bfq-wf2q.c | 3 ++ 3 files changed, 80 insertions(+), 9 deletions(-) -- 2.15.1 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 527bd2ccda51..93a97a7fe519 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -209,15 +209,17 @@ static struct kmem_cache *bfq_pool; * interactive applications automatically, using the following formula: * duration = (R / r) * T, where r is the peak rate of the device, and * R and T are two reference parameters. - * In particular, R is the peak rate of the reference device (see below), - * and T is a reference time: given the systems that are likely to be - * installed on the reference device according to its speed class, T is - * about the maximum time needed, under BFQ and while reading two files in - * parallel, to load typical large applications on these systems. - * In practice, the slower/faster the device at hand is, the more/less it - * takes to load applications with respect to the reference device. - * Accordingly, the longer/shorter BFQ grants weight raising to interactive - * applications. + * In particular, R is the peak rate of the reference device (see + * below), and T is a reference time: given the systems that are + * likely to be installed on the reference device according to its + * speed class, T is about the maximum time needed, under BFQ and + * while reading two files in parallel, to load typical large + * applications on these systems (see the comments on + * max_service_from_wr below, for more details on how T is obtained). + * In practice, the slower/faster the device at hand is, the more/less + * it takes to load applications with respect to the reference device. + * Accordingly, the longer/shorter BFQ grants weight raising to + * interactive applications. * * BFQ uses four different reference pairs (R, T), depending on: * . whether the device is rotational or non-rotational; @@ -254,6 +256,60 @@ static int T_slow[2]; static int T_fast[2]; static int device_speed_thresh[2]; +/* + * BFQ uses the above-detailed, time-based weight-raising mechanism to + * privilege interactive tasks. This mechanism is vulnerable to the + * following false positives: I/O-bound applications that will go on + * doing I/O for much longer than the duration of weight + * raising. These applications have basically no benefit from being + * weight-raised at the beginning of their I/O. On the opposite end, + * while being weight-raised, these applications + * a) unjustly steal throughput to applications that may actually need + * low latency; + * b) make BFQ uselessly perform device idling; device idling results + * in loss of device throughput with most flash-based storage, and may + * increase latencies when used purposelessly. + * + * BFQ tries to reduce these problems, by adopting the following + * countermeasure. To introduce this countermeasure, we need first to + * finish explaining how the duration of weight-raising for + * interactive tasks is computed. + * + * For a bfq_queue deemed as interactive, the duration of weight + * raising is dynamically adjusted, as a function of the estimated + * peak rate of the device, so as to be equal to the time needed to + * execute the 'largest' interactive task we benchmarked so far. By + * largest task, we mean the task for which each involved process has + * to do more I/O than for any of the other tasks we benchmarked. This + * reference interactive task is the start-up of LibreOffice Writer, + * and in this task each process/bfq_queue needs to have at most ~110K + * sectors transferred. + * + * This last piece of information enables BFQ to reduce the actual + * duration of weight-raising for at least one class of I/O-bound + * applications: those doing sequential or quasi-sequential I/O. An + * example is file copy. In fact, once started, the main I/O-bound + * processes of these applications usually consume the above 110K + * sectors in much less time than the processes of an application that + * is starting, because these I/O-bound processes will greedily devote + * almost all their CPU cycles only to their target, + * throughput-friendly I/O operations. This is even more true if BFQ + * happens to be underestimating the device peak rate, and thus + * overestimating the duration of weight raising. But, according to + * our measurements, once transferred 110K sectors, these processes + * have no right to be weight-raised any longer. + * + * Basing on the last consideration, BFQ ends weight-raising for a + * bfq_queue if the latter happens to have received an amount of + * service at least equal to the following constant. The constant is + * set to slightly more than 110K, to have a minimum safety margin. + * + * This early ending of weight-raising reduces the amount of time + * during which interactive false positives cause the two problems + * described at the beginning of these comments. + */ +static const unsigned long max_service_from_wr = 120000; + #define RQ_BIC(rq) icq_to_bic((rq)->elv.priv[0]) #define RQ_BFQQ(rq) ((rq)->elv.priv[1]) @@ -1352,6 +1408,7 @@ static void bfq_update_bfqq_wr_on_rq_arrival(struct bfq_data *bfqd, if (old_wr_coeff == 1 && wr_or_deserves_wr) { /* start a weight-raising period */ if (interactive) { + bfqq->service_from_wr = 0; bfqq->wr_coeff = bfqd->bfq_wr_coeff; bfqq->wr_cur_max_time = bfq_wr_duration(bfqd); } else { @@ -3665,6 +3722,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq) bfqq->entity.prio_changed = 1; } } + if (bfqq->wr_coeff > 1 && + bfqq->wr_cur_max_time != bfqd->bfq_wr_rt_max_time && + bfqq->service_from_wr > max_service_from_wr) { + /* see comments on max_service_from_wr */ + bfq_bfqq_end_wr(bfqq); + } } /* * To improve latency (for this or other queues), immediately diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index fcd941008127..350c39ae2896 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -337,6 +337,11 @@ struct bfq_queue { * last transition from idle to backlogged. */ unsigned long service_from_backlogged; + /* + * Cumulative service received from the @bfq_queue since its + * last transition to weight-raised state. + */ + unsigned long service_from_wr; /* * Value of wr start time when switching to soft rt diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index 4456eda34e48..4498c43245e2 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -838,6 +838,9 @@ void bfq_bfqq_served(struct bfq_queue *bfqq, int served) if (!bfqq->service_from_backlogged) bfqq->first_IO_time = jiffies; + if (bfqq->wr_coeff > 1) + bfqq->service_from_wr += served; + bfqq->service_from_backlogged += served; for_each_entity(entity) { st = bfq_entity_service_tree(entity);