From patchwork Tue Feb 28 14:38:40 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Bellasi <patrick.bellasi@arm.com>
X-Patchwork-Id: 94623
Delivered-To: patch@linaro.org
Received: by 10.140.20.113 with SMTP id 104csp1354209qgi;
 Tue, 28 Feb 2017 06:51:05 -0800 (PST)
X-Received: by 10.84.229.137 with SMTP id c9mr3541170plk.41.1488293465348;
 Tue, 28 Feb 2017 06:51:05 -0800 (PST)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id c9si1979114pge.126.2017.02.28.06.51.05;
 Tue, 28 Feb 2017 06:51:05 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752634AbdB1Ouy (ORCPT <rfc822;julien.grall@linaro.org>
 + 25 others); Tue, 28 Feb 2017 09:50:54 -0500
Received: from foss.arm.com ([217.140.101.70]:38114 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1751686AbdB1Ot3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
 Tue, 28 Feb 2017 09:49:29 -0500
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6EDD139F;
 Tue, 28 Feb 2017 06:38:59 -0800 (PST)
Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com
 [10.1.210.68])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
 8A05A3F77C; Tue, 28 Feb 2017 06:38:58 -0800 (PST)
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
 Tejun Heo <tj@kernel.org>
Subject: [RFC v3 3/5] sched/core: sync capacity_{min,
 max} between slow and fast paths
Date: Tue, 28 Feb 2017 14:38:40 +0000
Message-Id: <1488292722-19410-4-git-send-email-patrick.bellasi@arm.com>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1488292722-19410-1-git-send-email-patrick.bellasi@arm.com>
References: <1488292722-19410-1-git-send-email-patrick.bellasi@arm.com>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

At enqueue/dequeue time a task needs to be placed in the CPU's rb_tree,
depending on the current capacity_{min,max} value of the cgroup it
belongs to. Thus, we need to grant that these values cannot be changed
while the task is in these critical sections.

To this purpose, this patch uses the same locking schema already used by
the __set_cpus_allowed_ptr. We might uselessly lock the (previous) RQ of
a !RUNNABLE task, but that's the price to pay to safely serialize
capacity_{min,max} updates with enqueues, dequeues and migrations.

This patch adds the synchronization calls required to grant that each
RUNNABLE task is always in the correct relative position within the
RBTree. Specifically, when a group's capacity_{min,max} value is
updated, each task in that group is re-positioned within the rb_tree, if
currently RUNNABLE and its relative position has changed.
This operation is mutually exclusive with the task being {en,de}queued
or migrated via a task_rq_lock().

It's worth to notice that moving a task from a CGroup to another,
perhaps with different capacity_{min,max} values, is already covered by
the current locking schema. Indeed, this operation requires a dequeue
from the original cgroup's RQ followed by an enqueue in the new one.
The same argument is true for tasks migrations thus, tasks migrations
between CPUs and CGruoups are ultimately managed like tasks
wakeups/sleeps.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

-- 
2.7.4

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8f509be..d620bc4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -846,9 +846,68 @@ cap_clamp_remove_capacity(struct rq *rq, struct task_struct *p,
 	RB_CLEAR_NODE(node);
 }
 
+static void
+cap_clamp_update_capacity(struct task_struct *p, unsigned int cap_idx)
+{
+	struct task_group *tg = task_group(p);
+	unsigned int next_cap = SCHED_CAPACITY_SCALE;
+	unsigned int prev_cap = 0;
+	struct task_struct *entry;
+	struct rb_node *node;
+	struct rq_flags rf;
+	struct rq *rq;
+
+	/*
+	 * Lock the CPU's RBTree where the task is (eventually) queued.
+	 *
+	 * We might uselessly lock the (previous) RQ of a !RUNNABLE task, but
+	 * that's the price to pay to safely serializ capacity_{min,max}
+	 * updates with enqueues, dequeues and migration operations, which is
+	 * the same locking schema already in use by __set_cpus_allowed_ptr().
+	 */
+	rq = task_rq_lock(p, &rf);
+
+	/*
+	 * If the task has not a node in the rbtree, it's not yet RUNNABLE or
+	 * it's going to be enqueued with the proper value.
+	 * The setting of the cap_clamp_node is serialized by task_rq_lock().
+	 */
+	if (RB_EMPTY_NODE(&p->cap_clamp_node[cap_idx]))
+		goto done;
+
+	/* Check current position in the capacity rbtree */
+	node = rb_next(&p->cap_clamp_node[cap_idx]);
+	if (node) {
+		entry = rb_entry(node, struct task_struct,
+				 cap_clamp_node[cap_idx]);
+		next_cap = task_group(entry)->cap_clamp[cap_idx];
+	}
+	node = rb_prev(&p->cap_clamp_node[cap_idx]);
+	if (node) {
+		entry = rb_entry(node, struct task_struct,
+				 cap_clamp_node[cap_idx]);
+		prev_cap = task_group(entry)->cap_clamp[cap_idx];
+	}
+
+	/* If relative position has not changed: nothing to do */
+	if (prev_cap <= tg->cap_clamp[cap_idx] &&
+	    next_cap >= tg->cap_clamp[cap_idx])
+		goto done;
+
+	/* Reposition this node within the rbtree */
+	cap_clamp_remove_capacity(rq, p, cap_idx);
+	cap_clamp_insert_capacity(rq, p, cap_idx);
+
+done:
+	task_rq_unlock(rq, p, &rf);
+}
+
 static inline void
 cap_clamp_enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	lockdep_assert_held(&p->pi_lock);
+	lockdep_assert_held(&rq->lock);
+
 	/* Track task's min/max capacities */
 	cap_clamp_insert_capacity(rq, p, CAP_CLAMP_MIN);
 	cap_clamp_insert_capacity(rq, p, CAP_CLAMP_MAX);
@@ -857,6 +916,9 @@ cap_clamp_enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 static inline void
 cap_clamp_dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 {
+	lockdep_assert_held(&p->pi_lock);
+	lockdep_assert_held(&rq->lock);
+
 	/* Track task's min/max capacities */
 	cap_clamp_remove_capacity(rq, p, CAP_CLAMP_MIN);
 	cap_clamp_remove_capacity(rq, p, CAP_CLAMP_MAX);
@@ -7046,8 +7108,10 @@ static int cpu_capacity_min_write_u64(struct cgroup_subsys_state *css,
 				      struct cftype *cftype, u64 value)
 {
 	struct cgroup_subsys_state *pos;
+	struct css_task_iter it;
 	unsigned int min_value;
 	struct task_group *tg;
+	struct task_struct *p;
 	int ret = -EINVAL;
 
 	min_value = min_t(unsigned int, value, SCHED_CAPACITY_SCALE);
@@ -7078,6 +7142,12 @@ static int cpu_capacity_min_write_u64(struct cgroup_subsys_state *css,
 
 	tg->cap_clamp[CAP_CLAMP_MIN] = min_value;
 
+	/* Update the capacity_min of RUNNABLE tasks */
+	css_task_iter_start(css, &it);
+	while ((p = css_task_iter_next(&it)))
+		cap_clamp_update_capacity(p, CAP_CLAMP_MIN);
+	css_task_iter_end(&it);
+
 done:
 	ret = 0;
 out:
@@ -7091,8 +7161,10 @@ static int cpu_capacity_max_write_u64(struct cgroup_subsys_state *css,
 				      struct cftype *cftype, u64 value)
 {
 	struct cgroup_subsys_state *pos;
+	struct css_task_iter it;
 	unsigned int max_value;
 	struct task_group *tg;
+	struct task_struct *p;
 	int ret = -EINVAL;
 
 	max_value = min_t(unsigned int, value, SCHED_CAPACITY_SCALE);
@@ -7123,6 +7195,12 @@ static int cpu_capacity_max_write_u64(struct cgroup_subsys_state *css,
 
 	tg->cap_clamp[CAP_CLAMP_MAX] = max_value;
 
+	/* Update the capacity_max of RUNNABLE tasks */
+	css_task_iter_start(css, &it);
+	while ((p = css_task_iter_next(&it)))
+		cap_clamp_update_capacity(p, CAP_CLAMP_MAX);
+	css_task_iter_end(&it);
+
 done:
 	ret = 0;
 out: