From patchwork Fri Oct 18 11:52:22 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 21122 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ie0-f198.google.com (mail-ie0-f198.google.com [209.85.223.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 48ABA246F1 for ; Fri, 18 Oct 2013 11:53:58 +0000 (UTC) Received: by mail-ie0-f198.google.com with SMTP id tp5sf9857412ieb.5 for ; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:x-gm-message-state:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=Q0xaH1owpfuNJ1p3eSf0PBT/e4dmu0azSSIaVHfjECk=; b=elYBSISn5pspkmN12dYjswz1+5noc2dUS5Nyrrzq+ejnQ02OjhOrPY5NunpnDEyQVP lFjBkfqqKkMhI+GktPTC/LNX+LakmnwYKBunGHzfYnTsgEJNQVB7wI96ctDvAWy8PaIH tYB6UgjUdkm5XK+jlt089IPp3SXdIj8A+abruOYyL7MWJZOGbkzLcBO6hbqBObbCXC4J IiFcBjGuEvrfycs8XBk6lBoTVcuFiLv1y9upDbe2w2cE0KLJsOFcAMvKOepOsYYy+fpA EufF52THFlHJhs4Cbu0g39qZAS1YhExTsHUk0LLQlmVIRrycbfbKvto+5JceYC360LoH sy8w== X-Received: by 10.43.100.129 with SMTP id cw1mr784302icc.30.1382097237826; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.35.170 with SMTP id i10ls1256453qej.74.gmail; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) X-Received: by 10.221.51.206 with SMTP id vj14mr1080001vcb.17.1382097237698; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) Received: from mail-ve0-f177.google.com (mail-ve0-f177.google.com [209.85.128.177]) by mx.google.com with ESMTPS id tq4si233704vdc.51.2013.10.18.04.53.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Oct 2013 04:53:57 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.128.177 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.177; Received: by mail-ve0-f177.google.com with SMTP id oz11so1595992veb.36 for ; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) X-Gm-Message-State: ALoCoQmAbKjqqCnab4IUEXkWAShsI2OfrEgbo6chZv4oAp//q33AIyJlqn+3p2yiBk1OINFIKx2o X-Received: by 10.220.186.202 with SMTP id ct10mr1782504vcb.14.1382097237270; Fri, 18 Oct 2013 04:53:57 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp29490vcz; Fri, 18 Oct 2013 04:53:56 -0700 (PDT) X-Received: by 10.194.119.132 with SMTP id ku4mr1428506wjb.51.1382097236026; Fri, 18 Oct 2013 04:53:56 -0700 (PDT) Received: from mail-we0-f173.google.com (mail-we0-f173.google.com [74.125.82.173]) by mx.google.com with ESMTPS id vu5si432819wjc.7.2013.10.18.04.53.55 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Oct 2013 04:53:56 -0700 (PDT) Received-SPF: neutral (google.com: 74.125.82.173 is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=74.125.82.173; Received: by mail-we0-f173.google.com with SMTP id u57so3563386wes.4 for ; Fri, 18 Oct 2013 04:53:55 -0700 (PDT) X-Received: by 10.194.81.8 with SMTP id v8mr1451517wjx.47.1382097235586; Fri, 18 Oct 2013 04:53:55 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id lr3sm25000673wic.5.2013.10.18.04.53.54 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 18 Oct 2013 04:53:55 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@kernel.org, pjt@google.com, Morten.Rasmussen@arm.com, cmetcalf@tilera.com, tony.luck@intel.com, alex.shi@intel.com, preeti@linux.vnet.ibm.com, linaro-kernel@lists.linaro.org Cc: rjw@sisk.pl, paulmck@linux.vnet.ibm.com, corbet@lwn.net, tglx@linutronix.de, len.brown@intel.com, arjan@linux.intel.com, amit.kucheria@linaro.org, l.majewski@samsung.com, Vincent Guittot Subject: [RFC][PATCH v5 09/14] sched: update the packing cpu list Date: Fri, 18 Oct 2013 13:52:22 +0200 Message-Id: <1382097147-30088-9-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1382097147-30088-1-git-send-email-vincent.guittot@linaro.org> References: <1382097147-30088-1-git-send-email-vincent.guittot@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.177 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Use the activity statistics to update the list of CPUs that should be used to hanlde the current system activity. The cpu_power is updated for CPUs that don't participate to the packing effort. We consider that their cpu_power is allocated to idleness as it could be allocated by rt. So the cpu_power that remains available for cfs, is set to min value (i.e. 1). The cpu_power is used for a task that wakes up because a waking up task is already taken into account in the current activity whereas we use the power_available for a fork and exec because the task is not part of the current activity. In order to quickly found the packing starting point, we save information that will be used to directly start with the right sched_group at the right sched_domain level instead of running the complete update_packing_domain algorithm each time we need to use the packing cpu list. The sd_power_leader defines the leader of a group of CPU that can't be powergated independantly. As soon as this CPU is used, all the CPU in the same group will be used based on the fact that it doesn't worth to keep some cores idle if they can't be power gated while one core in the group is running. The sd_pack_group and sd_pack_domain are used to quickly check if a power leader should be used in the packing effort Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 162 ++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 149 insertions(+), 13 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c258c38..f9b03c1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -185,11 +185,20 @@ static unsigned long available_of(int cpu) } #ifdef CONFIG_SCHED_PACKING_TASKS +struct sd_pack { + int my_buddy; /* cpu on which tasks should be packed */ + int my_leader; /* cpu which leads the packing state of a group */ + struct sched_domain *domain; /* domain at which the check is done */ + struct sched_group *group; /* starting group for checking */ +}; + /* - * Save the id of the optimal CPU that should be used to pack small tasks - * The value -1 is used when no buddy has been found + * Save per_cpu information about the optimal CPUs that should be used to pack + * tasks. */ -DEFINE_PER_CPU(int, sd_pack_buddy); +DEFINE_PER_CPU(struct sd_pack, sd_pack_buddy) = { + .my_buddy = -1, +}; /* * The packing level of the scheduler @@ -202,6 +211,15 @@ int __read_mostly sysctl_sched_packing_level = DEFAULT_PACKING_LEVEL; unsigned int sd_pack_threshold = (100 * 1024) / DEFAULT_PACKING_LEVEL; +static inline int get_buddy(int cpu) +{ + return per_cpu(sd_pack_buddy, cpu).my_buddy; +} + +static inline int get_leader(int cpu) +{ + return per_cpu(sd_pack_buddy, cpu).my_leader; +} int sched_proc_update_packing(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, @@ -219,13 +237,19 @@ int sched_proc_update_packing(struct ctl_table *table, int write, static inline bool is_packing_cpu(int cpu) { - int my_buddy = per_cpu(sd_pack_buddy, cpu); + int my_buddy = get_buddy(cpu); return (my_buddy == -1) || (cpu == my_buddy); } -static inline int get_buddy(int cpu) +static inline bool is_leader_cpu(int cpu, struct sched_domain *sd) { - return per_cpu(sd_pack_buddy, cpu); + if (sd != per_cpu(sd_pack_buddy, cpu).domain) + return 0; + + if (cpu != get_leader(cpu)) + return 0; + + return 1; } /* @@ -239,7 +263,9 @@ static inline int get_buddy(int cpu) void update_packing_domain(int cpu) { struct sched_domain *sd; - int id = -1; + struct sched_group *target = NULL; + struct sd_pack *pack = &per_cpu(sd_pack_buddy, cpu); + int id = cpu, pcpu = cpu; sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN); if (!sd) @@ -247,6 +273,12 @@ void update_packing_domain(int cpu) else sd = sd->parent; + if (sd) { + pcpu = cpumask_first(sched_group_cpus(sd->groups)); + if (pcpu != cpu) + goto end; + } + while (sd && (sd->flags & SD_LOAD_BALANCE) && !(sd->flags & SD_SHARE_POWERDOMAIN)) { struct sched_group *sg = sd->groups; @@ -258,15 +290,16 @@ void update_packing_domain(int cpu) * and this CPU of this local group is a good candidate */ id = cpu; + target = pack; /* loop the sched groups to find the best one */ for (tmp = sg->next; tmp != sg; tmp = tmp->next) { - if (tmp->sgp->power * pack->group_weight > - pack->sgp->power * tmp->group_weight) + if (tmp->sgp->power_available * pack->group_weight > + pack->sgp->power_available * tmp->group_weight) continue; - if ((tmp->sgp->power * pack->group_weight == - pack->sgp->power * tmp->group_weight) + if ((tmp->sgp->power_available * pack->group_weight == + pack->sgp->power_available * tmp->group_weight) && (cpumask_first(sched_group_cpus(tmp)) >= id)) continue; @@ -275,6 +308,7 @@ void update_packing_domain(int cpu) /* Take the 1st CPU of the new group */ id = cpumask_first(sched_group_cpus(pack)); + target = pack; } /* Look for another CPU than itself */ @@ -284,15 +318,75 @@ void update_packing_domain(int cpu) sd = sd->parent; } +end: pr_debug("CPU%d packing on CPU%d\n", cpu, id); - per_cpu(sd_pack_buddy, cpu) = id; + + pack->my_leader = pcpu; + pack->my_buddy = id; + pack->domain = sd; + pack->group = target; } + +void update_packing_buddy(int cpu, int activity) +{ + struct sched_group *tmp; + int id = cpu, pcpu = get_leader(cpu); + + /* Get the state of 1st CPU of the power group */ + if (!is_packing_cpu(pcpu)) + id = get_buddy(pcpu); + + if (cpu != pcpu) + goto end; + + /* Set the activity level */ + if (sysctl_sched_packing_level == 0) + activity = INT_MAX; + else + activity = (activity * sd_pack_threshold) / 1024; + + tmp = per_cpu(sd_pack_buddy, cpu).group; + id = cpumask_first(sched_group_cpus(tmp)); + + /* Take the best group at this sd level to pack activity */ + for (; activity > 0; tmp = tmp->next) { + int next; + if (tmp->sgp->power_available > activity) { + next = cpumask_first(sched_group_cpus(tmp)); + while ((activity > 0) && (id < nr_cpu_ids)) { + activity -= available_of(id); + id = next; + if (pcpu == id) { + activity = 0; + id = cpu; + } else + next = cpumask_next(id, + sched_group_cpus(tmp)); + } + } else if (cpumask_test_cpu(cpu, sched_group_cpus(tmp))) { + id = cpu; + activity = 0; + } else { + activity -= tmp->sgp->power_available; + } + } + +end: + per_cpu(sd_pack_buddy, cpu).my_buddy = id; +} + +static int get_cpu_activity(int cpu); + static int check_nohz_packing(int cpu) { if (!is_packing_cpu(cpu)) return true; + if ((get_cpu_activity(cpu) * 100) >= + (available_of(cpu) * sysctl_sched_packing_level)) + return true; + return false; } #else /* CONFIG_SCHED_PACKING_TASKS */ @@ -302,6 +396,11 @@ static inline bool is_packing_cpu(int cpu) return 1; } +static inline bool is_leader_cpu(int cpu, struct sched_domain *sd) +{ + return 1; +} + static inline int get_buddy(int cpu) { return -1; @@ -3443,6 +3542,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, do { unsigned long load, avg_load; int local_group, packing_cpus = 0; + unsigned int power; int i; /* Skip over this group if it has no CPUs allowed */ @@ -3472,8 +3572,13 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, if (!packing_cpus) continue; + if (sd_flag & SD_BALANCE_WAKE) + power = group->sgp->power; + else + power = group->sgp->power_available; + /* Adjust by relative CPU power of the group */ - avg_load = (avg_load * SCHED_POWER_SCALE) / group->sgp->power; + avg_load = (avg_load * SCHED_POWER_SCALE) / power; if (local_group) { this_load = avg_load; @@ -4611,6 +4716,9 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) cpu_rq(cpu)->cpu_available = power; sdg->sgp->power_available = power; + if (!is_packing_cpu(cpu)) + power = 1; + cpu_rq(cpu)->cpu_power = power; sdg->sgp->power = power; @@ -4931,6 +5039,25 @@ static inline void update_sd_lb_stats(struct lb_env *env, } while (sg != env->sd->groups); } +#ifdef CONFIG_SCHED_PACKING_TASKS +static void update_sd_lb_packing(int cpu, struct sd_lb_stats *sds, + struct sched_domain *sd) +{ + /* Update the list of packing CPU */ + if (sd == per_cpu(sd_pack_buddy, cpu).domain) + update_packing_buddy(cpu, sds->total_activity); + + /* This CPU doesn't act for agressive packing */ + if (!is_packing_cpu(cpu)) + sds->busiest = NULL; +} + +#else /* CONFIG_SCHED_PACKING_TASKS */ +static void update_sd_lb_packing(int cpu, struct sd_lb_stats *sds, + struct sched_domain *sd) {} + +#endif /* CONFIG_SCHED_PACKING_TASKS */ + /** * check_asym_packing - Check to see if the group is packed into the * sched doman. @@ -5153,6 +5280,11 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; + /* + * Update the involvement of the CPU in the packing effort + */ + update_sd_lb_packing(env->dst_cpu, &sds, env->sd); + if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) && check_asym_packing(env, &sds)) return sds.busiest; @@ -5312,6 +5444,10 @@ static int should_we_balance(struct lb_env *env) if (env->idle == CPU_NEWLY_IDLE) return 1; + /* Leader CPU must be used to update packing CPUs list */ + if (is_leader_cpu(env->dst_cpu, env->sd)) + return 1; + sg_cpus = sched_group_cpus(sg); sg_mask = sched_group_mask(sg); /* Try to find first idle cpu */