From patchwork Mon Oct 9 11:11:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Paolo Valente X-Patchwork-Id: 115200 Delivered-To: patch@linaro.org Received: by 10.140.22.163 with SMTP id 32csp2423717qgn; Mon, 9 Oct 2017 04:11:43 -0700 (PDT) X-Received: by 10.98.52.130 with SMTP id b124mr9881885pfa.264.1507547503241; Mon, 09 Oct 2017 04:11:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507547503; cv=none; d=google.com; s=arc-20160816; b=HWv8pPCDjkaj3063ejIHB4KsTucpTpOUtLJj2D0V4+xTB+Wc3OdZd9UnJyHz3eJmqM /RITTRWWIyVq3lW93lr5NibhQOimlfmGOkFjvDpzuLkFsHxz9QN424j808yp71+5YePR 8bRRHiPaTh0chO2vLHc0lqRHb1LpgMOEiFR9TrDq+lQS8xtYjyjHBstz+u8HQfVhNSCV HOIqiKX7nIWEr+Xvva6b0ILIoTBVfKy2dty2ebHxrFlRA2QIwQUbyGhx5+P28mydLFY2 owku85eoc5hVaUko1TZWs8A3ua2/1CVA5Cvvhh06SkJQmblXZWv0pvvPmnVs+EfPVPbf E/aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=PQHw+EhzpxTLsXahNCcF7SXxTt/sXTMVQNapbLaWIJg=; b=kAjk4n/wel5LhKdECJbGd7geAjR+mxADtGpFIzITcdaO6cFXrlceItueh4mTTu6Ygw eZ5FaIi0vkyuRnh7zlAbjxQe7DTXWyVgeI21MQ5ctH++ZilWJ8SDLz6zSoA97ZWfXvKn bV4cj3J8WrmK2AOC96byD6NsmBHfCx4WHW73iYA5XgEDFyD966jkI5e3e2XkzGOxkE7L qOfIr4n38h3LC6g59Vjix1VSL2HdwNmi+e42Hv5/GQxKvIqPFC6VkSuwHnoML7bcMilW b/AOEQQ7u9eAzG4PcfIF4vIcQMEZm6E9ux9F8QhrKChXrmU6qvVdTL6Yg/daPIwx9+Aq Nbdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Xyta2d9o; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l62si6094058pge.274.2017.10.09.04.11.42; Mon, 09 Oct 2017 04:11:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Xyta2d9o; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754117AbdJILLl (ORCPT + 26 others); Mon, 9 Oct 2017 07:11:41 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:51488 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754083AbdJILLi (ORCPT ); Mon, 9 Oct 2017 07:11:38 -0400 Received: by mail-wm0-f49.google.com with SMTP id f4so22381277wme.0 for ; Mon, 09 Oct 2017 04:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=PQHw+EhzpxTLsXahNCcF7SXxTt/sXTMVQNapbLaWIJg=; b=Xyta2d9oQ6mvTFXyVw45pu0VN2aITOxFx05yCLXkNqX0wsG17NTiqmhczsu9XE21ui +G14ZdeiAzuAJJgxOjQmxtIC5nhe5Oa1CcHdw13BDDkbWfNTrGo/WA6AzRAgoLCFICCh WsB+dAZZ9axXNR4XZ5oIFgUAPEt8jqDN3DoX0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PQHw+EhzpxTLsXahNCcF7SXxTt/sXTMVQNapbLaWIJg=; b=GRoiy/LIwdWtzsLAcvpuXIHUF18HyEn0XgZ4p24Un4pjGGsHIAGqZdDbBBkTMaCDPJ hFXwGYZlTWBd8U8B3UW9XOY2jYFLtZkgeEQ720ATjhZfOwLAbrycsYeRNgotXXtTZ0kG M3MikxEWW1OxG0/E2/loRTj37axHZ3ppCmZ1/IfBSUGGZNw6YLe5ManMyx/6oGrSW0aD //eorPDzYy3FEH8tov4doOZyCxt0l7nHRnBeh9DF6T6Iua8gvepeeMw0aszwabXGfn0S hvPwD4CgkO9uHalivYwQhFbZU49C5ORWqcjEt3OEB1XPeaEdwl0aDbakxvlwgvwdqnfg lSGQ== X-Gm-Message-State: AMCzsaUJge+W8iezUtWUi5PXDh0L8EjfW8HPOEPtN+Fx3FHytOpIHqAZ /f86nN9YftWwMmcIbZQt35Y4PQ== X-Google-Smtp-Source: AOwi7QACv95bUMjjY0l+fUFYe2yZCrNIRh0SuiZiKrkOKHZAcKHateBdQpd66Zi2ugcx1+sWW93s1Q== X-Received: by 10.28.230.77 with SMTP id d74mr7320799wmh.75.1507547497491; Mon, 09 Oct 2017 04:11:37 -0700 (PDT) Received: from localhost.localdomain ([185.14.9.224]) by smtp.gmail.com with ESMTPSA id g136sm9119206wmd.40.2017.10.09.04.11.35 (version=TLS1 cipher=AES128-SHA bits=128/128); Mon, 09 Oct 2017 04:11:36 -0700 (PDT) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, broonie@kernel.org, lee.tibbert@gmail.com, oleksandr@natalenko.name, angeloruocco90@gmail.com, philm@manjaro.org, Paolo Valente Subject: [PATCH BUGFIX] block, bfq: fix unbalanced decrements of burst size Date: Mon, 9 Oct 2017 13:11:23 +0200 Message-Id: <20171009111123.2961-2-paolo.valente@linaro.org> X-Mailer: git-send-email 2.10.0 In-Reply-To: <20171009111123.2961-1-paolo.valente@linaro.org> References: <20171009111123.2961-1-paolo.valente@linaro.org> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The commit "block, bfq: decrease burst size when queues in burst exit" introduced the decrement of burst_size on the removal of a bfq_queue from the burst list. Unfortunately, this decrement can happen to be performed even when burst size is already equal to 0, because of unbalanced decrements. A description follows of the cause of these unbalanced decrements, namely a wrong assumption, and of the way how this wrong assumption leads to unbalanced decrements. The wrong assumption is that a bfq_queue can exit only if the process associated with the bfq_queue has exited. This is false, because a bfq_queue, say Q, may exit also as a consequence of a merge with another bfq_queue. In this case, Q exits because the I/O of its associated process has been redirected to another bfq_queue. The decrement unbalance occurs because Q may then be re-created after a split, and added back to the current burst list, *without* incrementing burst_size. burst_size is not incremented because Q is not a new bfq_queue added to the burst list, but a bfq_queue only temporarily removed from the list, and, before the commit "bfq-sq, bfq-mq: decrease burst size when queues in burst exit", burst_size was not decremented when Q was removed. This commit addresses this issue by just checking whether the exiting bfq_queue is a merged bfq_queue, and, in that case, not decrementing burst_size. Unfortunately, this still leaves room for unbalanced decrements, in the following rarer case: on a split, the bfq_queue happens to be inserted into a different burst list than that it was removed from when merged. If this happens, the number of elements in the new burst list becomes higher than burst_size (by one). When the bfq_queue then exits, it is of course not in a merged state any longer, thus burst_size is decremented, which results in an unbalanced decrement. To handle this sporadic, unlucky case in a simple way, this commit also checks that burst_size is larger than 0 before decrementing it. Finally, this commit removes an useless, extra check: the check that the bfq_queue is sync, performed before checking whether the bfq_queue is in the burst list. This extra check is redundant, because only sync bfq_queues can be inserted into the burst list. Reported-by: Philip Müller Signed-off-by: Paolo Valente Signed-off-by: Angelo Ruocco Tested-by: Philip Müller Tested-by: Oleksandr Natalenko Tested-by: Lee Tibbert --- block/bfq-iosched.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) -- 2.10.0 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 0d7272a..8689b24 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -3685,9 +3685,36 @@ void bfq_put_queue(struct bfq_queue *bfqq) if (bfqq->ref) return; - if (bfq_bfqq_sync(bfqq) && !hlist_unhashed(&bfqq->burst_list_node)) { + if (!hlist_unhashed(&bfqq->burst_list_node)) { hlist_del_init(&bfqq->burst_list_node); - bfqq->bfqd->burst_size--; + /* + * Decrement also burst size after the removal, if the + * process associated with bfqq is exiting, and thus + * does not contribute to the burst any longer. This + * decrement helps filter out false positives of large + * bursts, when some short-lived process (often due to + * the execution of commands by some service) happens + * to start and exit while a complex application is + * starting, and thus spawning several processes that + * do I/O (and that *must not* be treated as a large + * burst, see comments on bfq_handle_burst). + * + * In particular, the decrement is performed only if: + * 1) bfqq is not a merged queue, because, if it is, + * then this free of bfqq is not triggered by the exit + * of the process bfqq is associated with, but exactly + * by the fact that bfqq has just been merged. + * 2) burst_size is greater than 0, to handle + * unbalanced decrements. Unbalanced decrements may + * happen in te following case: bfqq is inserted into + * the current burst list--without incrementing + * bust_size--because of a split, but the current + * burst list is not the burst list bfqq belonged to + * (see comments on the case of a split in + * bfq_set_request). + */ + if (bfqq->bic && bfqq->bfqd->burst_size > 0) + bfqq->bfqd->burst_size--; } kmem_cache_free(bfq_pool, bfqq); @@ -4418,6 +4445,34 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd, else { bfq_clear_bfqq_in_large_burst(bfqq); if (bic->was_in_burst_list) + /* + * If bfqq was in the current + * burst list before being + * merged, then we have to add + * it back. And we do not need + * to increase burst_size, as + * we did not decrement + * burst_size when we removed + * bfqq from the burst list as + * a consequence of a merge + * (see comments in + * bfq_put_queue). In this + * respect, it would be rather + * costly to know whether the + * current burst list is still + * the same burst list from + * which bfqq was removed on + * the merge. To avoid this + * cost, if bfqq was in a + * burst list, then we add + * bfqq to the current burst + * list without any further + * check. This can cause + * inappropriate insertions, + * but rarely enough to not + * harm the detection of large + * bursts significantly. + */ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list); }