From patchwork Fri Mar 22 12:25:52 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 15536 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id C3AE123DEE for ; Fri, 22 Mar 2013 12:29:41 +0000 (UTC) Received: from mail-ve0-f179.google.com (mail-ve0-f179.google.com [209.85.128.179]) by fiordland.canonical.com (Postfix) with ESMTP id 6B1D5A18134 for ; Fri, 22 Mar 2013 12:29:41 +0000 (UTC) Received: by mail-ve0-f179.google.com with SMTP id da11so3254247veb.24 for ; Fri, 22 Mar 2013 05:29:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :in-reply-to:references:x-gm-message-state; bh=urSq8XmN2JTmog0YzJfI4uo5K8y3i0/2ddzXzOSlnSM=; b=KvN8RgWNmrx4ZQCWSKuDfF89w3GO6seIlf2s5xVIHBTTfe5S39BdRmEec8xMc0KeKT sXao1fTpFWpcFRr2/cYzZUaUyvzIsaguFs7meR2+HXNFd032TOldsABLB0A8FYx4akLl bw8Vk1m3GZejCcqjUT94qSpSQ6DOTq3kjcKgE3VhtmBg/QtVQwAcTTFxYPVt3u2+EIM1 9c7LfsEjcEABZE9EvJkT6vbgQ2UVuwXJgosBAYPb6CJRgSVF09phh2+aW6DTwkSRi3nF tUNMwaAC7m8ggkDuFi+NHuhM1h+tSH4UCsCpKSdn7x3JFBIGVYfR9bju0AjuL+U1SEEx Nv+A== X-Received: by 10.52.31.103 with SMTP id z7mr1628987vdh.56.1363955380968; Fri, 22 Mar 2013 05:29:40 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.58.233.198 with SMTP id ty6csp102072vec; Fri, 22 Mar 2013 05:29:40 -0700 (PDT) X-Received: by 10.180.185.44 with SMTP id ez12mr10805026wic.33.1363955379701; Fri, 22 Mar 2013 05:29:39 -0700 (PDT) Received: from mail-wi0-x22d.google.com ([2a00:1450:400c:c05::22d]) by mx.google.com with ESMTPS id e9si827820wja.242.2013.03.22.05.29.39 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Mar 2013 05:29:39 -0700 (PDT) Received-SPF: neutral (google.com: 2a00:1450:400c:c05::22d is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=2a00:1450:400c:c05::22d; Authentication-Results: mx.google.com; spf=neutral (google.com: 2a00:1450:400c:c05::22d is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) smtp.mail=vincent.guittot@linaro.org Received: by mail-wi0-f173.google.com with SMTP id hq4so7728489wib.0 for ; Fri, 22 Mar 2013 05:29:39 -0700 (PDT) X-Received: by 10.180.94.133 with SMTP id dc5mr15770671wib.1.1363955379288; Fri, 22 Mar 2013 05:29:39 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id f1sm3237642wib.0.2013.03.22.05.29.36 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 22 Mar 2013 05:29:38 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, linux@arm.linux.org.uk, pjt@google.com, santosh.shilimkar@ti.com, morten.rasmussen@arm.com, chander.kashyap@linaro.org, cmetcalf@tilera.com, tony.luck@intel.com Cc: alex.shi@intel.com, preeti@linux.vnet.ibm.com, paulmck@linux.vnet.ibm.com, tglx@linutronix.de, len.brown@intel.com, arjan@linux.intel.com, amit.kucheria@linaro.org, corbet@lwn.net, Vincent Guittot Subject: [RFC PATCH v3 3/6] sched: pack small tasks Date: Fri, 22 Mar 2013 13:25:52 +0100 Message-Id: <1363955155-18382-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1363955155-18382-1-git-send-email-vincent.guittot@linaro.org> References: <1363955155-18382-1-git-send-email-vincent.guittot@linaro.org> X-Gm-Message-State: ALoCoQlWQ+apsG42YbLJSQsoVzXofvye2bf/DBDSyPRiEVukGAe2IN5zxELxGuSJDvuiv9rwMtgY During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | ----------------------------------- buddy | CPU0 | CPU0 | CPU0 | CPU2 | Small tasks tend to slip out of the periodic load balance so the best place to choose to migrate them is during their wake up. The decision is in O(1) as we only check again one buddy CPU Signed-off-by: Vincent Guittot Reviewed-by: Morten Rasmussen --- kernel/sched/core.c | 1 + kernel/sched/fair.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 5 +++ 3 files changed, 121 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b827e0c..21c35ce 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5662,6 +5662,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq->sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_top_cache_domain(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9c2f726..021c7b7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -160,6 +160,76 @@ void sched_init_granularity(void) update_sysctl(); } + +#ifdef CONFIG_SMP +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* + * Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We look for the 1st sched_domain without the + * SD_SHARE_POWERDOMAIN flag. Then we look for the sched_group with the lowest + * power per core based on the assumption that their power efficiency is + * better + */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); + else + sd = sd->parent; + + while (sd && (sd->flags & SD_LOAD_BALANCE) + && !(sd->flags & SD_SHARE_POWERDOMAIN)) { + struct sched_group *sg = sd->groups; + struct sched_group *pack = sg; + struct sched_group *tmp; + + /* + * The sched_domain of a CPU points on the local sched_group + * and the 1st CPU of this local group is a good candidate + */ + id = cpumask_first(sched_group_cpus(pack)); + + /* loop the sched groups to find the best one */ + for (tmp = sg->next; tmp != sg; tmp = tmp->next) { + if (tmp->sgp->power * pack->group_weight > + pack->sgp->power * tmp->group_weight) + continue; + + if ((tmp->sgp->power * pack->group_weight == + pack->sgp->power * tmp->group_weight) + && (cpumask_first(sched_group_cpus(tmp)) >= id)) + continue; + + /* we have found a better group */ + pack = tmp; + + /* Take the 1st CPU of the new group */ + id = cpumask_first(sched_group_cpus(pack)); + } + + /* Look for another CPU than itself */ + if (id != cpu) + break; + + sd = sd->parent; + } + + pr_debug("CPU%d packing on CPU%d\n", cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + +#endif /* CONFIG_SMP */ + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -3291,6 +3361,47 @@ done: return target; } +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* + * A busy buddy is a CPU with a high load or a small load with a lot of + * running tasks. + */ + return (rq->avg.runnable_avg_sum > + (rq->avg.runnable_avg_period / (rq->nr_running + 2))); +} + +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 20% in average */ + return ((p->se.avg.runnable_avg_sum * 5) < + (p->se.avg.runnable_avg_period)); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy = per_cpu(sd_pack_buddy, cpu); + + /* No pack buddy for this CPU */ + if (buddy == -1) + return false; + + /* buddy is not an allowed CPU */ + if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p))) + return false; + + /* + * If the task is a small one and the buddy is not overloaded, + * we use buddy cpu + */ + if (!is_light_task(p) || is_buddy_busy(buddy)) + return false; + + return true; +} + /* * sched_balance_self: balance the current task (running on cpu) in domains * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and @@ -3319,6 +3430,10 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) want_affine = 1; new_cpu = prev_cpu; + + /* We pack only at wake up and not new task */ + if (check_pack_buddy(new_cpu, p)) + return per_cpu(sd_pack_buddy, new_cpu); } rcu_read_lock(); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7f36024f..96b164d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -872,6 +872,7 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +extern void update_packing_domain(int cpu); #else /* CONFIG_SMP */ @@ -879,6 +880,10 @@ static inline void idle_balance(int cpu, struct rq *rq) { } +static inline void update_packing_domain(int cpu) +{ +} + #endif extern void sysrq_sched_debug_show(void);