From patchwork Thu Aug 3 16:48:18 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 109357 Delivered-To: patch@linaro.org Received: by 10.182.109.195 with SMTP id hu3csp229044obb; Thu, 3 Aug 2017 09:50:15 -0700 (PDT) X-Received: by 10.98.216.70 with SMTP id e67mr2219268pfg.299.1501779015322; Thu, 03 Aug 2017 09:50:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501779015; cv=none; d=google.com; s=arc-20160816; b=GWwnf5Py8X7rly8qKjqivsmXLMrViOPdlGctMuXu+HsIzB2eT/zBfT4QwP4ppN53VN 6AMUsq7NKqObsGhiDd9PymWGAFbiJcPhvHzGptww99XaMi+X4LVqi8FDSl0h9I9fwFmP ZbB0OHY9g7REEAqhm0/8x4he58DdfZBCrYWon3ga/VP2njKaH3TrrFhYFd/8GIhHTQ0v W5hSsph4VIWJkAiTpjqXk37zeSR8MatpGZy03QqknEXtper5lCIksMs81Et72HjXzcbT qVt/z/yHfzpSjBvARdZhKlkk8ln2iQpbjcolyyrEofcRKxjKI04Q0hdXfyG7TYvSnWku S1Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=VUVxQEeDdUulaQCWQj7ss8OVpGyagit/BwtrhUKvGeo=; b=saM7LXMTEvjIa0gTU3PduAunFES4I780nPrYRX2dEYkMPZv38/9e39VIt51tiOlugZ vpP9mZCV6Opv4P4Sipt202W7Yn0F2O6uNODVc2AZEFohsn31J0PG29ICNPTf1BlQyqxD mh7feV1jDE1g5wrXbNqryeVaKeNtbL3FCNtboy1mZXf6UF+i0gpKM6s1bI0/jf3g3gE/ roBmWgCc1bscOHd92fjtRvbZa8OJN44CTzVJ+j9uNYhlVFd2cbQbmYYV5msKp+EqcYSD VbyM/WQvBT8KziGNBMgZ34+WfXcFz9SqE0U2jrUyLJXIXn4c6l1eIz6VhQXXU3tyjMf9 aUWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.b=DLSBGlG4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f16si18968039pga.949.2017.08.03.09.50.14; Thu, 03 Aug 2017 09:50:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.b=DLSBGlG4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751978AbdHCQuM (ORCPT + 25 others); Thu, 3 Aug 2017 12:50:12 -0400 Received: from mail-wr0-f181.google.com ([209.85.128.181]:33473 "EHLO mail-wr0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751940AbdHCQtB (ORCPT ); Thu, 3 Aug 2017 12:49:01 -0400 Received: by mail-wr0-f181.google.com with SMTP id v105so7968560wrb.0 for ; Thu, 03 Aug 2017 09:49:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=VUVxQEeDdUulaQCWQj7ss8OVpGyagit/BwtrhUKvGeo=; b=DLSBGlG4xOW6Ia6P8LpM/SE3d9s1mkNcLM/O1Kv3yS5myPvfbgN7e2tFZToDTgZJ9s rTNCvPf5QktNFTlip/NQUPz8SXdY+jrTLII2aYxij+JwIyCs1/pTZljn5j7tnWN0c7pZ 5fG6p7K/wr/X+OMlKvvdZTzkBSi2gjpNZ8QME= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=VUVxQEeDdUulaQCWQj7ss8OVpGyagit/BwtrhUKvGeo=; b=Gz+rAOp1KQckfbPL/0xlLJcp6GXih4V2w416b5PRLKHKZUe2p8z+rtAT8jl2Rq/8m8 bkykCzETEXGrH8cqa9Vu6T9rIcy5ux3nw/cL5m2S3M6AhsEoOhkBx+etAs4mwGzesBFS geQeBs4gPGuT1oqwbfFS1Y6BTjXYYc7SCZMBxdNx9EzF8ASSod5a+2t8w4PhJxNiUVpd Va9IOf/0lswT9n9264CBIxQBIfhqed5qXfh4MtkR5nq72tCOPqv7X9teb3IFmPacbFB4 fR+mvvWvECVcBmp/RCdbUwAjCHSy5mWhYcx8FvM0JKmbLTD6hqS6N+I2CMCZV404h7Ow cGwQ== X-Gm-Message-State: AIVw110Ua64Zkv6utL485Bw6FslfIa4Y3GKnPnPXGleRbGt0F+oN7O7n NwohMgnHkrGIqujz X-Received: by 10.223.163.88 with SMTP id d24mr1738387wrb.33.1501778940145; Thu, 03 Aug 2017 09:49:00 -0700 (PDT) Received: from localhost.localdomain ([185.14.11.2]) by smtp.gmail.com with ESMTPSA id n184sm2135840wme.33.2017.08.03.09.48.57 (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 03 Aug 2017 09:48:59 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, lucmiccio@gmail.com, Paolo Valente Subject: [PATCH BUGFIX/IMPROVEMENT] block, bfq: improve and refactor throughput-boosting logic Date: Thu, 3 Aug 2017 18:48:18 +0200 Message-Id: <20170803164818.10562-1-paolo.valente@linaro.org> X-Mailer: git-send-email 2.10.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a queue associated with a process remains empty, there are cases where throughput gets boosted if the device is idled to await the arrival of a new I/O request for that queue. Currently, BFQ assumes that one of these cases is when the device has no internal queueing (regardless of the properties of the I/O being served). Unfortunately, this condition has proved to be too general. So, this commit refines it as "the device has no internal queueing and is rotational". This refinement provides a significant throughput boost with random I/O, on flash-based storage without internal queueing. For example, on a HiKey board, throughput increases by up to 125%, growing, e.g., from 6.9MB/s to 15.6MB/s with two or three random readers in parallel. This commit also refactors the code related to device idling, for the following reason. Finding the change that provides the above large improvement has been slightly more difficult than it had to be, because the logic that decides whether to idle the device is still scattered across three functions. Almost all of the logic is in the function bfq_bfqq_may_idle, but (1) part of the decision is made in bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may switch off idling regardless of the output of bfq_bfqq_may_idle. In addition, both bfq_update_idle_window and bfq_bfqq_must_idle make their decisions as a function of parameters that are used, for similar purposes, also in bfq_bfqq_may_idle. This commit addresses this issue by moving all the logic into bfq_bfqq_may_idle. Signed-off-by: Paolo Valente Signed-off-by: Luca Miccio --- block/bfq-iosched.c | 144 ++++++++++++++++++++++++++++------------------------ block/bfq-iosched.h | 12 ++--- 2 files changed, 85 insertions(+), 71 deletions(-) -- 2.10.0 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 436b6ca..509f399 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -128,7 +128,7 @@ BFQ_BFQQ_FNS(busy); BFQ_BFQQ_FNS(wait_request); BFQ_BFQQ_FNS(non_blocking_wait_rq); BFQ_BFQQ_FNS(fifo_expire); -BFQ_BFQQ_FNS(idle_window); +BFQ_BFQQ_FNS(has_short_ttime); BFQ_BFQQ_FNS(sync); BFQ_BFQQ_FNS(IO_bound); BFQ_BFQQ_FNS(in_large_burst); @@ -731,10 +731,10 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd, unsigned int old_wr_coeff = bfqq->wr_coeff; bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq); - if (bic->saved_idle_window) - bfq_mark_bfqq_idle_window(bfqq); + if (bic->saved_has_short_ttime) + bfq_mark_bfqq_has_short_ttime(bfqq); else - bfq_clear_bfqq_idle_window(bfqq); + bfq_clear_bfqq_has_short_ttime(bfqq); if (bic->saved_IO_bound) bfq_mark_bfqq_IO_bound(bfqq); @@ -2012,7 +2012,7 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq) return; bic->saved_ttime = bfqq->ttime; - bic->saved_idle_window = bfq_bfqq_idle_window(bfqq); + bic->saved_has_short_ttime = bfq_bfqq_has_short_ttime(bfqq); bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq); bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq); bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node); @@ -3038,8 +3038,8 @@ void bfq_bfqq_expire(struct bfq_data *bfqd, } bfq_log_bfqq(bfqd, bfqq, - "expire (%d, slow %d, num_disp %d, idle_win %d)", reason, - slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq)); + "expire (%d, slow %d, num_disp %d, short_ttime %d)", reason, + slow, bfqq->dispatched, bfq_bfqq_has_short_ttime(bfqq)); /* * Increase, decrease or leave budget unchanged according to @@ -3114,7 +3114,10 @@ static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq) static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq) { struct bfq_data *bfqd = bfqq->bfqd; - bool idling_boosts_thr, idling_boosts_thr_without_issues, + bool rot_without_queueing = + !blk_queue_nonrot(bfqd->queue) && !bfqd->hw_tag, + bfqq_sequential_and_IO_bound, + idling_boosts_thr, idling_boosts_thr_without_issues, idling_needed_for_service_guarantees, asymmetric_scenario; @@ -3122,27 +3125,45 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq) return true; /* + * Idling is performed only if slice_idle > 0. In addition, we + * do not idle if + * (a) bfqq is async + * (b) bfqq is in the idle io prio class: in this case we do + * not idle because we want to minimize the bandwidth that + * queues in this class can steal to higher-priority queues + */ + if (bfqd->bfq_slice_idle == 0 || !bfq_bfqq_sync(bfqq) || + bfq_class_idle(bfqq)) + return false; + + bfqq_sequential_and_IO_bound = !BFQQ_SEEKY(bfqq) && + bfq_bfqq_IO_bound(bfqq) && bfq_bfqq_has_short_ttime(bfqq); + + /* * The next variable takes into account the cases where idling * boosts the throughput. * * The value of the variable is computed considering, first, that * idling is virtually always beneficial for the throughput if: - * (a) the device is not NCQ-capable, or - * (b) regardless of the presence of NCQ, the device is rotational - * and the request pattern for bfqq is I/O-bound and sequential. + * (a) the device is not NCQ-capable and rotational, or + * (b) regardless of the presence of NCQ, the device is rotational and + * the request pattern for bfqq is I/O-bound and sequential, or + * (c) regardless of whether it is rotational, the device is + * not NCQ-capable and the request pattern for bfqq is + * I/O-bound and sequential. * * Secondly, and in contrast to the above item (b), idling an * NCQ-capable flash-based device would not boost the * throughput even with sequential I/O; rather it would lower * the throughput in proportion to how fast the device * is. Accordingly, the next variable is true if any of the - * above conditions (a) and (b) is true, and, in particular, - * happens to be false if bfqd is an NCQ-capable flash-based - * device. + * above conditions (a), (b) or (c) is true, and, in + * particular, happens to be false if bfqd is an NCQ-capable + * flash-based device. */ - idling_boosts_thr = !bfqd->hw_tag || - (!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq) && - bfq_bfqq_idle_window(bfqq)); + idling_boosts_thr = rot_without_queueing || + ((!blk_queue_nonrot(bfqd->queue) || !bfqd->hw_tag) && + bfqq_sequential_and_IO_bound); /* * The value of the next variable, @@ -3313,16 +3334,13 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq) asymmetric_scenario && !bfq_bfqq_in_large_burst(bfqq); /* - * We have now all the components we need to compute the return - * value of the function, which is true only if both the following - * conditions hold: - * 1) bfqq is sync, because idling make sense only for sync queues; - * 2) idling either boosts the throughput (without issues), or - * is necessary to preserve service guarantees. + * We have now all the components we need to compute the + * return value of the function, which is true only if idling + * either boosts the throughput (without issues), or is + * necessary to preserve service guarantees. */ - return bfq_bfqq_sync(bfqq) && - (idling_boosts_thr_without_issues || - idling_needed_for_service_guarantees); + return idling_boosts_thr_without_issues || + idling_needed_for_service_guarantees; } /* @@ -3338,10 +3356,7 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq) */ static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq) { - struct bfq_data *bfqd = bfqq->bfqd; - - return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 && - bfq_bfqq_may_idle(bfqq); + return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_may_idle(bfqq); } /* @@ -3783,7 +3798,6 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic) case IOPRIO_CLASS_IDLE: bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE; bfqq->new_ioprio = 7; - bfq_clear_bfqq_idle_window(bfqq); break; } @@ -3843,8 +3857,14 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfq_set_next_ioprio_data(bfqq, bic); if (is_sync) { + /* + * No need to mark as has_short_ttime if in + * idle_class, because no device idling is performed + * for queues in idle class + */ if (!bfq_class_idle(bfqq)) - bfq_mark_bfqq_idle_window(bfqq); + /* tentatively mark as has_short_ttime */ + bfq_mark_bfqq_has_short_ttime(bfqq); bfq_mark_bfqq_sync(bfqq); bfq_mark_bfqq_just_created(bfqq); } else @@ -3985,18 +4005,19 @@ bfq_update_io_seektime(struct bfq_data *bfqd, struct bfq_queue *bfqq, blk_rq_sectors(rq) < BFQQ_SECT_THR_NONROT); } -/* - * Disable idle window if the process thinks too long or seeks so much that - * it doesn't matter. - */ -static void bfq_update_idle_window(struct bfq_data *bfqd, - struct bfq_queue *bfqq, - struct bfq_io_cq *bic) +static void bfq_update_has_short_ttime(struct bfq_data *bfqd, + struct bfq_queue *bfqq, + struct bfq_io_cq *bic) { - int enable_idle; + bool has_short_ttime = true; - /* Don't idle for async or idle io prio class. */ - if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq)) + /* + * No need to update has_short_ttime if bfqq is async or in + * idle io prio class, or if bfq_slice_idle is zero, because + * no device idling is performed for bfqq in this case. + */ + if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq) || + bfqd->bfq_slice_idle == 0) return; /* Idle window just restored, statistics are meaningless. */ @@ -4004,27 +4025,22 @@ static void bfq_update_idle_window(struct bfq_data *bfqd, bfqd->bfq_wr_min_idle_time)) return; - enable_idle = bfq_bfqq_idle_window(bfqq); - + /* Think time is infinite if no process is linked to + * bfqq. Otherwise check average think time to + * decide whether to mark as has_short_ttime + */ if (atomic_read(&bic->icq.ioc->active_ref) == 0 || - bfqd->bfq_slice_idle == 0 || - (bfqd->hw_tag && BFQQ_SEEKY(bfqq) && - bfqq->wr_coeff == 1)) - enable_idle = 0; - else if (bfq_sample_valid(bfqq->ttime.ttime_samples)) { - if (bfqq->ttime.ttime_mean > bfqd->bfq_slice_idle && - bfqq->wr_coeff == 1) - enable_idle = 0; - else - enable_idle = 1; - } - bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d", - enable_idle); + (bfq_sample_valid(bfqq->ttime.ttime_samples) && + bfqq->ttime.ttime_mean > bfqd->bfq_slice_idle)) + has_short_ttime = false; + + bfq_log_bfqq(bfqd, bfqq, "update_has_short_ttime: has_short_ttime %d", + has_short_ttime); - if (enable_idle) - bfq_mark_bfqq_idle_window(bfqq); + if (has_short_ttime) + bfq_mark_bfqq_has_short_ttime(bfqq); else - bfq_clear_bfqq_idle_window(bfqq); + bfq_clear_bfqq_has_short_ttime(bfqq); } /* @@ -4040,14 +4056,12 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfqq->meta_pending++; bfq_update_io_thinktime(bfqd, bfqq); + bfq_update_has_short_ttime(bfqd, bfqq, bic); bfq_update_io_seektime(bfqd, bfqq, rq); - if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 || - !BFQQ_SEEKY(bfqq)) - bfq_update_idle_window(bfqd, bfqq, bic); bfq_log_bfqq(bfqd, bfqq, - "rq_enqueued: idle_window=%d (seeky %d)", - bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq)); + "rq_enqueued: has_short_ttime=%d (seeky %d)", + bfq_bfqq_has_short_ttime(bfqq), BFQQ_SEEKY(bfqq)); bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq); diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 63e771a..53e2907 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -348,11 +348,11 @@ struct bfq_io_cq { uint64_t blkcg_serial_nr; /* the current blkcg serial */ #endif /* - * Snapshot of the idle window before merging; taken to - * remember this value while the queue is merged, so as to be - * able to restore it in case of split. + * Snapshot of the has_short_time flag before merging; taken + * to remember its value while the queue is merged, so as to + * be able to restore it in case of split. */ - bool saved_idle_window; + bool saved_has_short_ttime; /* * Same purpose as the previous two fields for the I/O bound * classification of a queue. @@ -626,7 +626,7 @@ enum bfqq_state_flags { * without idling the device */ BFQQF_fifo_expire, /* FIFO checked in this slice */ - BFQQF_idle_window, /* slice idling enabled */ + BFQQF_has_short_ttime, /* queue has a short think time */ BFQQF_sync, /* synchronous queue */ BFQQF_IO_bound, /* * bfqq has timed-out at least once @@ -655,7 +655,7 @@ BFQ_BFQQ_FNS(busy); BFQ_BFQQ_FNS(wait_request); BFQ_BFQQ_FNS(non_blocking_wait_rq); BFQ_BFQQ_FNS(fifo_expire); -BFQ_BFQQ_FNS(idle_window); +BFQ_BFQQ_FNS(has_short_ttime); BFQ_BFQQ_FNS(sync); BFQ_BFQQ_FNS(IO_bound); BFQ_BFQQ_FNS(in_large_burst);