From patchwork Tue Jan 29 17:18:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 157000 Delivered-To: patch@linaro.org Received: by 2002:a02:48:0:0:0:0:0 with SMTP id 69csp4916501jaa; Tue, 29 Jan 2019 09:19:01 -0800 (PST) X-Google-Smtp-Source: ALg8bN4lI1idQFJZK0Y9s/bpf3+3bDrwZWtI1ZQiRerH7AIamFU+5mNEQBTn2/AxnKVatnqIaHJD X-Received: by 2002:a62:35c7:: with SMTP id c190mr27764672pfa.76.1548782341278; Tue, 29 Jan 2019 09:19:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548782341; cv=none; d=google.com; s=arc-20160816; b=fNFr2yAhnUJqNdT211ZS6VeieoEps9t61HzpOlc2UMhA3kAEynOOJl+enGXrMgjfxD Gm7SZNmB5RtherxcXsuoV120C1RJlDeQYmMm6i48uEUdsn742grBI9PvuCFOPMNL5o// MJrwmSVw8j2Ij9kzXcYkYqc09cenZ8RJ+LxiPqrRYEhS6WjGyH7sfQ6Zo9A0GiBH0zIj n8fUBOo6Hk69mwyc4STQkeNiUBd5lyKWA1kCnp8kFejEgt9b1co6FAzmSnZdkbeylkR8 VWDA+nXg/NqrH92YI0A/+mSzx1EO4dV/p2+eZEdyI9HxynfOJ2HscB+ycIOdW+Rv73Xb AUzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=bOjQszdiXiJL/fJroUex0a3tZRSKXR3lbPIDzGKz6dY=; b=ktpeyFciQrqoz2lgXAzpdNMRg5FYXHlTelZCTEYW5eAcNk/ysXUjTyTQD5MpJbocsF BRSWj5ZULp3594vBk84aWp8/GIk5R4q7n/fb9szftoGlQtIcHi8OrTkmy41nfUeSAt9Y Yov92rsjeukjF22zbuZLIzLI+HTyuP2AlJ6IeOJYhRK2BPTH+DD549RW3vlK1CkGV06J NsoIipUrexoxRyZ98sBA0zvnJXWQErmNKziI6fVrzVsWLJwnrZeZYVm/P7s2XnU0BhSu aZeGn8+XBpZFAYgmopKBuiMTvuIsARRcKJAhl+H22FP8TQL4QY4EKAq3yHksmWZpvqmO 5UvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hm0uYdfU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f63si670131pfg.136.2019.01.29.09.19.00; Tue, 29 Jan 2019 09:19:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hm0uYdfU; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729101AbfA2RS7 (ORCPT + 31 others); Tue, 29 Jan 2019 12:18:59 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:44861 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727100AbfA2RS6 (ORCPT ); Tue, 29 Jan 2019 12:18:58 -0500 Received: by mail-wr1-f65.google.com with SMTP id z5so22938604wrt.11 for ; Tue, 29 Jan 2019 09:18:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=bOjQszdiXiJL/fJroUex0a3tZRSKXR3lbPIDzGKz6dY=; b=hm0uYdfUAsjCU6FQ/w5UYUPg0eoT3+X9LWXR8KmkKg3GzbOriYDv/JAfS41pjVQ2ED RjtiJGac9TjizjBJeYiEM1G97hew/v5mFgwtQ4AGZzJO+L1mzjUDaV0D6bB1M1U9VMAc 7WQ6SxVdEXQfrCMyamvOW6mOJ3GQgfMRKCu1Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=bOjQszdiXiJL/fJroUex0a3tZRSKXR3lbPIDzGKz6dY=; b=Wk2tKwSEVt3ubM+iSywyBaBopOh97UU4m1CqwZMW1BFf4XXfTQFiJ8CaQG9CdmNEp4 OZZFoMNZjk9uieot/cQ5Rp6cI+D8JxojVkTKmuQxBlnIJXpj11weLlJCX0+euEkkLniP VX6dToPggWNxxqoGpkEckoy+mWWuS9Og68WWI6AhKHpicYDqOCi3C+xGStCWYlEEgqpN 0owPfay5ARZFbPoozcoIFHZGRDdvCHKKZ4Ayd7OxqzHPNBp2VYprWMp2XOWUvqslLpJp y+GpEVgXrub80C5RwPW2X3tw0ZvBatx6GPqLaMV4Bqd7jtYw6OzCSoxCzqS4N2Enug4J 6vjg== X-Gm-Message-State: AHQUAuY7nVBdVJKG0vjP481yBB07SeThv9Ijo+JY8Q3plfkotibpP12B nWR0auuga6T4PJV76ID1rtVeiSwKBgBbNA== X-Received: by 2002:a5d:4d46:: with SMTP id a6mr5597972wru.28.1548782335864; Tue, 29 Jan 2019 09:18:55 -0800 (PST) Received: from localhost.localdomain ([2a01:e0a:f:6020:a13d:e482:c90f:2b78]) by smtp.gmail.com with ESMTPSA id x3sm116048550wrd.19.2019.01.29.09.18.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 29 Jan 2019 09:18:55 -0800 (PST) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: tj@kernel.org, sargun@sargun.me, Vincent Guittot Subject: [PATCH] sched/fair: Fix insertion in rq->leaf_cfs_rq_list Date: Tue, 29 Jan 2019 18:18:52 +0100 Message-Id: <1548782332-18591-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Sargu reported a crash: "I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c and put it on top of 4.19.13. In addition to this, I uninlined list_add_leaf_cfs_rq for debugging. This revealed a new bug that we didn't get to because we kept getting crashes from the previous issue. When we are running with cgroups that are rapidly changing, with CFS bandwidth control, and in addition using the cpusets cgroup, we see this crash. Specifically, it seems to occur with cgroups that are throttled and we change the allowed cpuset." The algorithm used to order cfs_rq in rq->leaf_cfs_rq_list assumes that it will walk down to root the 1st time a cfs_rq is used and we will finish to add either a cfs_rq without parent or a cfs_rq with a parent that is already on the list. But this is not always true in presence of throttling. Because a cfs_rq can be throttled even if it has never been used but other CPUS of the cgroup have already used all the bandwdith, we are not sure to go down to the root and add all cfs_rq in the list. Ensure that all cfs_rq will be added in the list even if they are throttled. Reported-by: Sargun Dhillon Fixes: 9c2791f936ef ("Fix hierarchical order in rq->leaf_cfs_rq_list") Signed-off-by: Vincent Guittot --- This patch doesn't fix: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") which has been reverted in v5.0-rc1. I'm working on an additonal patch that should be similar to this one to fix a9e7f6544b9c. kernel/sched/fair.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e2ff4b6..bf6b6c1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -352,6 +352,20 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) } } +static inline void list_add_branch_cfs_rq(struct sched_entity *se, struct rq *rq) +{ + struct cfs_rq *cfs_rq; + + for_each_sched_entity(se) { + cfs_rq = cfs_rq_of(se); + list_add_leaf_cfs_rq(cfs_rq); + + /* If parent is already in the list, we can stop */ + if (rq->tmp_alone_branch == &rq->leaf_cfs_rq_list) + break; + } +} + /* Iterate through all leaf cfs_rq's on a runqueue: */ #define for_each_leaf_cfs_rq(rq, cfs_rq) \ list_for_each_entry_rcu(cfs_rq, &rq->leaf_cfs_rq_list, leaf_cfs_rq_list) @@ -5179,6 +5193,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) } + /* Ensure that all cfs_rq have been added to the list */ + list_add_branch_cfs_rq(se, rq); + hrtick_update(rq); }