diff mbox

[v2,01/11] sched: fix imbalance flag reset

Message ID 1400860385-14555-2-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot May 23, 2014, 3:52 p.m. UTC
The imbalance flag can stay set whereas there is no imbalance.

Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
We will have some idle load balance which are triggered during tick.
Unfortunately, the tick is also used to queue background work so we can reach
the situation where short work has been queued on a CPU which already runs a
task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
a worker thread that is pinned on a CPU so an imbalance due to pinned task is
detected and the imbalance flag is set.
Then, we will not be able to clear the flag because we have at most 1 task on
each CPU but the imbalance flag will trig to useless active load balance
between the idle CPU and the busy CPU.

We need to reset of the imbalance flag as soon as we have reached a balanced
state.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Preeti U Murthy May 25, 2014, 10:33 a.m. UTC | #1
Hi Vincent,

On 05/23/2014 09:22 PM, Vincent Guittot wrote:
> The imbalance flag can stay set whereas there is no imbalance.
> 
> Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
> We will have some idle load balance which are triggered during tick.
> Unfortunately, the tick is also used to queue background work so we can reach
> the situation where short work has been queued on a CPU which already runs a
> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
> CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
> a worker thread that is pinned on a CPU so an imbalance due to pinned task is
> detected and the imbalance flag is set.
> Then, we will not be able to clear the flag because we have at most 1 task on
> each CPU but the imbalance flag will trig to useless active load balance
> between the idle CPU and the busy CPU.

Why do we do active balancing today when there is at-most 1 task on the
busiest cpu? Shouldn't we be skipping load balancing altogether? If we
do active balancing when the number of tasks = 1, it will lead to a ping
pong right?

Regards
Preeti U Murthy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot May 26, 2014, 7:49 a.m. UTC | #2
On 25 May 2014 12:33, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 05/23/2014 09:22 PM, Vincent Guittot wrote:
>> The imbalance flag can stay set whereas there is no imbalance.
>>
>> Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
>> We will have some idle load balance which are triggered during tick.
>> Unfortunately, the tick is also used to queue background work so we can reach
>> the situation where short work has been queued on a CPU which already runs a
>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
>> CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
>> a worker thread that is pinned on a CPU so an imbalance due to pinned task is
>> detected and the imbalance flag is set.
>> Then, we will not be able to clear the flag because we have at most 1 task on
>> each CPU but the imbalance flag will trig to useless active load balance
>> between the idle CPU and the busy CPU.
>
> Why do we do active balancing today when there is at-most 1 task on the
> busiest cpu? Shouldn't we be skipping load balancing altogether? If we
> do active balancing when the number of tasks = 1, it will lead to a ping
> pong right?

That's the purpose of the patch to prevent this useless active load
balance. When the imbalance flag is set, an active load balance is
triggered whatever the load balance is because of pinned tasks that
prevents a balance state.

Vincent

>
> Regards
> Preeti U Murthy
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot May 26, 2014, 10:14 a.m. UTC | #3
On 26 May 2014 11:16, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> On 05/26/2014 01:19 PM, Vincent Guittot wrote:
>> On 25 May 2014 12:33, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>>> Hi Vincent,
>>>
>>> On 05/23/2014 09:22 PM, Vincent Guittot wrote:
>>>> The imbalance flag can stay set whereas there is no imbalance.
>>>>
>>>> Let assume that we have 3 tasks that run on a dual cores /dual cluster system.
>>>> We will have some idle load balance which are triggered during tick.
>>>> Unfortunately, the tick is also used to queue background work so we can reach
>>>> the situation where short work has been queued on a CPU which already runs a
>>>> task. The load balance will detect this imbalance (2 tasks on 1 CPU and an idle
>>>> CPU) and will try to pull the waiting task on the idle CPU. The waiting task is
>>>> a worker thread that is pinned on a CPU so an imbalance due to pinned task is
>>>> detected and the imbalance flag is set.
>>>> Then, we will not be able to clear the flag because we have at most 1 task on
>>>> each CPU but the imbalance flag will trig to useless active load balance
>>>> between the idle CPU and the busy CPU.
>>>
>>> Why do we do active balancing today when there is at-most 1 task on the
>>> busiest cpu? Shouldn't we be skipping load balancing altogether? If we
>>> do active balancing when the number of tasks = 1, it will lead to a ping
>>> pong right?
>>
>> That's the purpose of the patch to prevent this useless active load
>> balance. When the imbalance flag is set, an active load balance is
>> triggered whatever the load balance is because of pinned tasks that
>> prevents a balance state.
>
> No I mean this:
>
> sched:Do not continue load balancing when the busiest cpu has one
> running task

But you can have situation where you have to migrate the task even if
the busiest CPU has only 1 task. The use of imbalance flag is one
example. You can also be in a situation where the busiest group has
too much load compared to the local group and the busiest CPU even
with 1 task (at this instant) has been selected to move task away from
the busiest group

Vincent

>
> From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
>
>
> ---
>  kernel/sched/fair.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c9617b7..b175333 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6626,6 +6626,8 @@ more_balance:
>                         }
>                         goto out_balanced;
>                 }
> +       } else {
> +               goto out;
>         }
>
>         if (!ld_moved) {
>
>
> }
>
> Regards
> Preeti U Murthy
>>
>> Vincent
>>
>>>
>>> Regards
>>> Preeti U Murthy
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c9617b7..9587ed1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6610,10 +6610,8 @@  more_balance:
 		if (sd_parent) {
 			int *group_imbalance = &sd_parent->groups->sgp->imbalance;
 
-			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {
+			if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0)
 				*group_imbalance = 1;
-			} else if (*group_imbalance)
-				*group_imbalance = 0;
 		}
 
 		/* All tasks on this runqueue were pinned by CPU affinity */
@@ -6698,6 +6696,16 @@  more_balance:
 	goto out;
 
 out_balanced:
+	/*
+	 * We reach balance although we may have faced some affinity
+	 * constraints. Clear the imbalance flag if it was set.
+	 */
+	if (sd_parent) {
+		int *group_imbalance = &sd_parent->groups->sgp->imbalance;
+		if (*group_imbalance)
+			*group_imbalance = 0;
+	}
+
 	schedstat_inc(sd, lb_balanced[idle]);
 
 	sd->nr_balance_failed = 0;