From patchwork Thu Apr 25 17:23:19 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 16398 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-vc0-f199.google.com (mail-vc0-f199.google.com [209.85.220.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 75E6D2395D for ; Thu, 25 Apr 2013 17:25:50 +0000 (UTC) Received: by mail-vc0-f199.google.com with SMTP id hr11sf1757132vcb.6 for ; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=q2lUxI24VTs5pDAXVoj8pV76xL31E5F60WLNkLrKQs0=; b=GLQAhHbA5IATA7xDmHRXtE7JlUdN4U717GB098QIAbTispNJ8FZ8jBG5QSKvVbtLVj RRQMV8/5jJ68vaw7Mex7MKY6q0lah4Lv/nEBrGlq64smFX1ln3TdGrgIKg5vt11RgOiS dXANKcvoKZFoefte9W1kBfMtS40UGNnkKDQokG/t8G+iNME+Of/9CCgSZ7v9b1hAV0ak m9NgnPrBticuj4svw62NaqeR/Ry1U1wlCeQBse2mZFy476kpbBJ4SsbnI7/tvyIRGdCG 3z+MK8waxliqCSBq7OLAqSEKxpBrIeHud5J2KUtgb91g3Q4oERTjMWeVj3u94mZ3Mhe8 yEpg== X-Received: by 10.224.18.133 with SMTP id w5mr14192107qaa.1.1366910696730; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.36.69 with SMTP id o5ls1367107qej.55.gmail; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) X-Received: by 10.52.76.164 with SMTP id l4mr23084361vdw.122.1366910696407; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) Received: from mail-vc0-f174.google.com (mail-vc0-f174.google.com [209.85.220.174]) by mx.google.com with ESMTPS id d4si3691689vcl.20.2013.04.25.10.24.56 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 25 Apr 2013 10:24:56 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.174 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.174; Received: by mail-vc0-f174.google.com with SMTP id kw10so3079219vcb.33 for ; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) X-Received: by 10.52.71.4 with SMTP id q4mr23278908vdu.8.1366910696274; Thu, 25 Apr 2013 10:24:56 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.127.98 with SMTP id nf2csp32649veb; Thu, 25 Apr 2013 10:24:55 -0700 (PDT) X-Received: by 10.180.77.10 with SMTP id o10mr549230wiw.10.1366910695005; Thu, 25 Apr 2013 10:24:55 -0700 (PDT) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [2a00:1450:400c:c05::22b]) by mx.google.com with ESMTPS id 12si2733459wjq.129.2013.04.25.10.24.54 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 25 Apr 2013 10:24:54 -0700 (PDT) Received-SPF: neutral (google.com: 2a00:1450:400c:c05::22b is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=2a00:1450:400c:c05::22b; Received: by mail-wi0-f171.google.com with SMTP id l13so8965224wie.16 for ; Thu, 25 Apr 2013 10:24:54 -0700 (PDT) X-Received: by 10.180.183.210 with SMTP id eo18mr38909184wic.17.1366910694437; Thu, 25 Apr 2013 10:24:54 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id q13sm12311485wie.8.2013.04.25.10.24.52 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 25 Apr 2013 10:24:53 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, linux@arm.linux.org.uk, pjt@google.com, santosh.shilimkar@ti.com, Morten.Rasmussen@arm.com, chander.kashyap@linaro.org, cmetcalf@tilera.com, tony.luck@intel.com, alex.shi@intel.com, preeti@linux.vnet.ibm.com Cc: paulmck@linux.vnet.ibm.com, tglx@linutronix.de, len.brown@intel.com, arjan@linux.intel.com, amit.kucheria@linaro.org, corbet@lwn.net, l.majewski@samsung.com, Vincent Guittot Subject: [PATCH 03/14] sched: pack small tasks Date: Thu, 25 Apr 2013 19:23:19 +0200 Message-Id: <1366910611-20048-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org> References: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org> X-Gm-Message-State: ALoCoQk1tIoFBEEJOMZBM0K/de3dMt0DcMj6dessipn9ls65qkucwobbPJqosPmC7nslXeI0z42d X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.174 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPUs can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | ----------------------------------- buddy | CPU0 | CPU0 | CPU0 | CPU2 | If the cores in a cluster can't be power gated independently, the buddy configuration becomes: | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | ----------------------------------- buddy | CPU0 | CPU1 | CPU0 | CPU0 | Small tasks tend to slip out of the periodic load balance so the best place to choose to migrate them is during their wake up. The decision is in O(1) as we only check again one buddy CPU Signed-off-by: Vincent Guittot Reviewed-by: Morten Rasmussen --- kernel/sched/core.c | 1 + kernel/sched/fair.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 5 ++ 3 files changed, 138 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3b9861f..c5ef170 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5664,6 +5664,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq->sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_top_cache_domain(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9c2f726..6adc57c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -160,6 +160,76 @@ void sched_init_granularity(void) update_sysctl(); } + +#ifdef CONFIG_SMP +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* + * Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We look for the 1st sched_domain without the + * SD_SHARE_POWERDOMAIN flag. Then we look for the sched_group with the lowest + * power per core based on the assumption that their power efficiency is + * better + */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); + else + sd = sd->parent; + + while (sd && (sd->flags & SD_LOAD_BALANCE) + && !(sd->flags & SD_SHARE_POWERDOMAIN)) { + struct sched_group *sg = sd->groups; + struct sched_group *pack = sg; + struct sched_group *tmp; + + /* + * The sched_domain of a CPU points on the local sched_group + * and this CPU of this local group is a good candidate + */ + id = cpu; + + /* loop the sched groups to find the best one */ + for (tmp = sg->next; tmp != sg; tmp = tmp->next) { + if (tmp->sgp->power * pack->group_weight > + pack->sgp->power * tmp->group_weight) + continue; + + if ((tmp->sgp->power * pack->group_weight == + pack->sgp->power * tmp->group_weight) + && (cpumask_first(sched_group_cpus(tmp)) >= id)) + continue; + + /* we have found a better group */ + pack = tmp; + + /* Take the 1st CPU of the new group */ + id = cpumask_first(sched_group_cpus(pack)); + } + + /* Look for another CPU than itself */ + if (id != cpu) + break; + + sd = sd->parent; + } + + pr_debug("CPU%d packing on CPU%d\n", cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + +#endif /* CONFIG_SMP */ + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -3291,6 +3361,64 @@ done: return target; } +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + u32 sum = rq->avg.runnable_avg_sum; + u32 period = rq->avg.runnable_avg_period; + + /* + * If a CPU accesses the runnable_avg_sum and runnable_avg_period + * fields of its buddy CPU while the latter updates it, it can get the + * new version of a field and the old version of the other one. This + * can generate erroneous decisions. We don't want to use a lock + * mechanism for ensuring the coherency because of the overhead in + * this critical path. + * The runnable_avg_period of a runqueue tends to the max value in + * less than 345ms after plugging a CPU, which implies that we could + * use the max value instead of reading runnable_avg_period after + * 345ms. During the starting phase, we must ensure a minimum of + * coherency between the fields. A simple rule is runnable_avg_sum <= + * runnable_avg_period. + */ + sum = min(sum, period); + + /* + * A busy buddy is a CPU with a high load or a small load with a lot of + * running tasks. + */ + return (sum > (period / (rq->nr_running + 2))); +} + +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p->se.avg.runnable_avg_sum * 5) < + (p->se.avg.runnable_avg_period)); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy = per_cpu(sd_pack_buddy, cpu); + + /* No pack buddy for this CPU */ + if (buddy == -1) + return false; + + /* buddy is not an allowed CPU */ + if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p))) + return false; + + /* + * If the task is a small one and the buddy is not overloaded, + * we use buddy cpu + */ + if (!is_light_task(p) || is_buddy_busy(buddy)) + return false; + + return true; +} + /* * sched_balance_self: balance the current task (running on cpu) in domains * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and @@ -3319,6 +3447,10 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) want_affine = 1; new_cpu = prev_cpu; + + /* We pack only at wake up and not new task */ + if (check_pack_buddy(new_cpu, p)) + return per_cpu(sd_pack_buddy, new_cpu); } rcu_read_lock(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7f36024f..96b164d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -872,6 +872,7 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +extern void update_packing_domain(int cpu); #else /* CONFIG_SMP */ @@ -879,6 +880,10 @@ static inline void idle_balance(int cpu, struct rq *rq) { } +static inline void update_packing_domain(int cpu) +{ +} + #endif extern void sysrq_sched_debug_show(void);