[RFC,3/4] sched/topology: remove smt_gain

Message ID 1535548752-4434-4-git-send-email-vincent.guittot@linaro.org
State New
Headers show
Series
  • sched/numa: remove unused code
Related show

Commit Message

Vincent Guittot Aug. 29, 2018, 1:19 p.m.
smt_gain is used to compute the capacity of CPUs of a SMT core with the
constraint 1 < smt_gain < 2 in order to be able to compute number of CPUs
per core. The field has_free_capacity of struct numa_stat, which was the
last user of this computation of number of CPUs per core, has been removed
by :

  commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()")

We can now remove this constraint on core capacity and use the defautl value
SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE
becomes the maximum compute capacity of CPUs on every systems. This should
help to simplify some code and remove fields like rd->max_cpu_capacity

Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other
places in the code when it wants the capacity of a CPUs to scale
some metrics like in pelt, deadline or schedutil. In case on SMT, the value
returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.

Remove the smt_gain field from sched_domain struct

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org (open list)
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 include/linux/sched/topology.h | 1 -
 kernel/sched/sched.h           | 3 ---
 kernel/sched/topology.c        | 2 --
 3 files changed, 6 deletions(-)

-- 
2.7.4

Comments

Qais Yousef Aug. 29, 2018, 2:08 p.m. | #1
On 29/08/18 14:19, Vincent Guittot wrote:
> smt_gain is used to compute the capacity of CPUs of a SMT core with the

> constraint 1 < smt_gain < 2 in order to be able to compute number of CPUs

> per core. The field has_free_capacity of struct numa_stat, which was the

> last user of this computation of number of CPUs per core, has been removed

> by :

>

>    commit 2d4056fafa19 ("sched/numa: Remove numa_has_capacity()")

>

> We can now remove this constraint on core capacity and use the defautl value

> SCHED_CAPACITY_SCALE for SMT CPUs. With this remove, SCHED_CAPACITY_SCALE

> becomes the maximum compute capacity of CPUs on every systems. This should

> help to simplify some code and remove fields like rd->max_cpu_capacity

>

> Furthermore, arch_scale_cpu_capacity() is used with a NULL sd in several other

> places in the code when it wants the capacity of a CPUs to scale

> some metrics like in pelt, deadline or schedutil. In case on SMT, the value

> returned is not the capacity of SMT CPUs but default SCHED_CAPACITY_SCALE.

>

> Remove the smt_gain field from sched_domain struct


You beat me to it, I got confused by smt_gain recently when I stumbled 
on it as I found out on ARM it's not used and had to spend sometime to 
convince myself it's not really necessary to use it.

It was hard to track the history of this and *why* it's needed.

The only 'theoretical' case I found smt_gain can be useful is when you 
have asymmetric system, for example:

Node_A: 1 Core 2 Threads
Node_B: 1 Core 4 Threads

Then with smt_gain the group_capacity at the core level will be limited 
to smt_gain. But without smt_gain Node_B will look twice as powerful as 
Node_A - which will affect balancing AFAICT causing Node_B's single core 
to be oversubscribed as the 4 threads will still have to share the same 
underlying hardware resources. I don't think in practice such systems 
exists, or even make sense, though.

So +1 from my side for the removal.

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Ingo Molnar <mingo@redhat.com>

> Cc: linux-kernel@vger.kernel.org (open list)

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

>   include/linux/sched/topology.h | 1 -

>   kernel/sched/sched.h           | 3 ---

>   kernel/sched/topology.c        | 2 --

>   3 files changed, 6 deletions(-)

>

> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h

> index 2634774..212792b 100644

> --- a/include/linux/sched/topology.h

> +++ b/include/linux/sched/topology.h

> @@ -89,7 +89,6 @@ struct sched_domain {

>   	unsigned int newidle_idx;

>   	unsigned int wake_idx;

>   	unsigned int forkexec_idx;

> -	unsigned int smt_gain;

>   

>   	int nohz_idle;			/* NOHZ IDLE status */

>   	int flags;			/* See SD_* */

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

> index 4a2e8ca..b1715b8 100644

> --- a/kernel/sched/sched.h

> +++ b/kernel/sched/sched.h

> @@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)

>   static __always_inline

>   unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)

>   {

> -	if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))

> -		return sd->smt_gain / sd->span_weight;

> -

>   	return SCHED_CAPACITY_SCALE;

>   }

>   #endif

> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c

> index 56a0fed..069c924 100644

> --- a/kernel/sched/topology.c

> +++ b/kernel/sched/topology.c

> @@ -1129,7 +1129,6 @@ sd_init(struct sched_domain_topology_level *tl,

>   

>   		.last_balance		= jiffies,

>   		.balance_interval	= sd_weight,

> -		.smt_gain		= 0,

>   		.max_newidle_lb_cost	= 0,

>   		.next_decay_max_lb_cost	= jiffies,

>   		.child			= child,

> @@ -1155,7 +1154,6 @@ sd_init(struct sched_domain_topology_level *tl,

>   	if (sd->flags & SD_SHARE_CPUCAPACITY) {

>   		sd->flags |= SD_PREFER_SIBLING;

>   		sd->imbalance_pct = 110;

> -		sd->smt_gain = 1178; /* ~15% */

>   

>   	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {

>   		sd->flags |= SD_PREFER_SIBLING;



-- 
Qais Yousef
Vincent Guittot Aug. 29, 2018, 2:42 p.m. | #2
On Wed, 29 Aug 2018 at 16:08, Qais Yousef <qais.yousef@arm.com> wrote:
>


> You beat me to it, I got confused by smt_gain recently when I stumbled

> on it as I found out on ARM it's not used and had to spend sometime to

> convince myself it's not really necessary to use it.

>

> It was hard to track the history of this and *why* it's needed.

>

> The only 'theoretical' case I found smt_gain can be useful is when you

> have asymmetric system, for example:

>

> Node_A: 1 Core 2 Threads

> Node_B: 1 Core 4 Threads

>

> Then with smt_gain the group_capacity at the core level will be limited

> to smt_gain. But without smt_gain Node_B will look twice as powerful as

> Node_A - which will affect balancing AFAICT causing Node_B's single core

> to be oversubscribed as the 4 threads will still have to share the same

> underlying hardware resources. I don't think in practice such systems

> exists, or even make sense, though.


Yes I came to the same conclusion

>

> So +1 from my side for the removal.


Thanks

>

>

> --

> Qais Yousef

>
Srikar Dronamraju Sept. 4, 2018, 8:24 a.m. | #3
> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Ingo Molnar <mingo@redhat.com>

> Cc: linux-kernel@vger.kernel.org (open list)

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

> index 4a2e8ca..b1715b8 100644

> --- a/kernel/sched/sched.h

> +++ b/kernel/sched/sched.h

> @@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)

>  static __always_inline

>  unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)

>  {

> -	if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))

> -		return sd->smt_gain / sd->span_weight;

> -

>  	return SCHED_CAPACITY_SCALE;


Without this change, the capacity_orig of an SMT would have been based
on the number of threads.
For example on SMT2, capacity_orig would have been 589 and
for SMT 8, capacity_orig would have been 148.

However after this change, capacity_orig of each SMT thread would be
1024. For example SMT 8 core capacity_orig would now be 8192.

smt_gain was suppose to make a multi threaded core was slightly more
powerful than a single threaded core. I suspect if that sometimes hurt
us when doing load balance between 2 cores i.e at MC or DIE sched
domain. Even with 2 threads running on a core, the core might look
lightly loaded 2048/8192. Hence might dissuade movement to a idle core.

I always wonder why arch_scale_cpu_capacity() is called with NULL
sched_domain, in scale_rt_capacity(). This way capacity might actually
be more than the capacity_orig. I am always under an impression that
capacity_orig > capacity.  Or am I misunderstanding that?

-- 
Thanks and Regards
Srikar Dronamraju
Vincent Guittot Sept. 4, 2018, 9:36 a.m. | #4
Hi Srikar,

Le Tuesday 04 Sep 2018 à 01:24:24 (-0700), Srikar Dronamraju a écrit :
> > Cc: Peter Zijlstra <peterz@infradead.org>

> > Cc: Ingo Molnar <mingo@redhat.com>

> > Cc: linux-kernel@vger.kernel.org (open list)

> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h

> > index 4a2e8ca..b1715b8 100644

> > --- a/kernel/sched/sched.h

> > +++ b/kernel/sched/sched.h

> > @@ -1758,9 +1758,6 @@ unsigned long arch_scale_freq_capacity(int cpu)

> >  static __always_inline

> >  unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)

> >  {

> > -	if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))

> > -		return sd->smt_gain / sd->span_weight;

> > -

> >  	return SCHED_CAPACITY_SCALE;

> 

> Without this change, the capacity_orig of an SMT would have been based

> on the number of threads.

> For example on SMT2, capacity_orig would have been 589 and

> for SMT 8, capacity_orig would have been 148.

> 

> However after this change, capacity_orig of each SMT thread would be

> 1024. For example SMT 8 core capacity_orig would now be 8192.

> 

> smt_gain was suppose to make a multi threaded core was slightly more

> powerful than a single threaded core. I suspect if that sometimes hurt


Is there system with both single threaded and multi threaded core ?
That was the main open point for me (and for Qais too)


> us when doing load balance between 2 cores i.e at MC or DIE sched

> domain. Even with 2 threads running on a core, the core might look

> lightly loaded 2048/8192. Hence might dissuade movement to a idle core.


Then, there is the sibling flag at SMT level that normally ensures 1 task per
core for such UC

> 

> I always wonder why arch_scale_cpu_capacity() is called with NULL

> sched_domain, in scale_rt_capacity(). This way capacity might actually


Probably because until this v4.19-rcxx version, the rt scaling was done
relatively to local cpu capacity:
  capacity  = arch_scale_cpu() * scale_rt_capacity / SCHED_CAPACITY_SCALE

Whereas now, it directly returns the remaining capacity

> be more than the capacity_orig. I am always under an impression that

> capacity_orig > capacity.  Or am I misunderstanding that?


You are right, there is a bug for SMT and the patch below should fix it.
Nevertheless, we still have the problem in some other places in the code.

Subject: [PATCH] sched/fair: fix scale_rt_capacity() for SMT

Since commit:
  commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
scale_rt_capacity() returns the remaining capacity and not a scale factor
to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by
scale_rt_capacity() so we must take the sched_domain argument

Fixes: 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 309c93f..c73e1fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7481,10 +7481,10 @@ static inline int get_sd_load_idx(struct sched_domain *sd,
 	return load_idx;
 }
 
-static unsigned long scale_rt_capacity(int cpu)
+static unsigned long scale_rt_capacity(struct sched_domain *sd, int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	unsigned long max = arch_scale_cpu_capacity(NULL, cpu);
+	unsigned long max = arch_scale_cpu_capacity(sd, cpu);
 	unsigned long used, free;
 	unsigned long irq;
 
@@ -7506,7 +7506,7 @@ static unsigned long scale_rt_capacity(int cpu)
 
 static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	unsigned long capacity = scale_rt_capacity(cpu);
+	unsigned long capacity = scale_rt_capacity(sd, cpu);
 	struct sched_group *sdg = sd->groups;
 
 	cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(sd, cpu);
-- 
2.7.4

> 

> -- 

> Thanks and Regards

> Srikar Dronamraju

>
Srikar Dronamraju Sept. 4, 2018, 10:37 a.m. | #5
* Vincent Guittot <vincent.guittot@linaro.org> [2018-09-04 11:36:26]:

> Hi Srikar,

> 

> Le Tuesday 04 Sep 2018 à 01:24:24 (-0700), Srikar Dronamraju a écrit :

> > However after this change, capacity_orig of each SMT thread would be

> > 1024. For example SMT 8 core capacity_orig would now be 8192.

> > 

> > smt_gain was suppose to make a multi threaded core was slightly more

> > powerful than a single threaded core. I suspect if that sometimes hurt

> 

> Is there system with both single threaded and multi threaded core ?

> That was the main open point for me (and for Qais too)

> 


I dont know of any systems that have come with single threaded and
multithreaded. However some user can still offline few threads in a core
while leaving other cores untouched. I dont really know why somebody
would want to do it.  For example, some customer was toying with SMT 3
mode in a SMT 8 power8 box.

> 

> > us when doing load balance between 2 cores i.e at MC or DIE sched

> > domain. Even with 2 threads running on a core, the core might look

> > lightly loaded 2048/8192. Hence might dissuade movement to a idle core.

> 

> Then, there is the sibling flag at SMT level that normally ensures 1 task per

> core for such UC

> 


Agree.

> > 

> > I always wonder why arch_scale_cpu_capacity() is called with NULL

> > sched_domain, in scale_rt_capacity(). This way capacity might actually

> 

> Probably because until this v4.19-rcxx version, the rt scaling was done

> relatively to local cpu capacity:

>   capacity  = arch_scale_cpu() * scale_rt_capacity / SCHED_CAPACITY_SCALE

> 

> Whereas now, it directly returns the remaining capacity

> 

> > be more than the capacity_orig. I am always under an impression that

> > capacity_orig > capacity.  Or am I misunderstanding that?

> 

> You are right, there is a bug for SMT and the patch below should fix it.

> Nevertheless, we still have the problem in some other places in the code.

> 

> Subject: [PATCH] sched/fair: fix scale_rt_capacity() for SMT

> 

> Since commit:

>   commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")

> scale_rt_capacity() returns the remaining capacity and not a scale factor

> to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by

> scale_rt_capacity() so we must take the sched_domain argument

> 

> Fixes: 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")

> Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>


Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


-- 
Thanks and Regards
Srikar Dronamraju
Vincent Guittot Sept. 5, 2018, 7:36 a.m. | #6
On Tue, 4 Sep 2018 at 12:37, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>

> * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-04 11:36:26]:

>

> > Hi Srikar,

> >

> > Le Tuesday 04 Sep 2018 à 01:24:24 (-0700), Srikar Dronamraju a écrit :

> > > However after this change, capacity_orig of each SMT thread would be

> > > 1024. For example SMT 8 core capacity_orig would now be 8192.

> > >

> > > smt_gain was suppose to make a multi threaded core was slightly more

> > > powerful than a single threaded core. I suspect if that sometimes hurt

> >

> > Is there system with both single threaded and multi threaded core ?

> > That was the main open point for me (and for Qais too)

> >

>

> I dont know of any systems that have come with single threaded and

> multithreaded. However some user can still offline few threads in a core

> while leaving other cores untouched. I dont really know why somebody

> would want to do it.  For example, some customer was toying with SMT 3

> mode in a SMT 8 power8 box.


In this case, it means that we have the same core capacity whatever
the number of CPUs
and a core with SMT 3 will be set with the same compute capacity as
the core with SMT 8.
Does it still make sense ?

>

> >

> > > us when doing load balance between 2 cores i.e at MC or DIE sched

> > > domain. Even with 2 threads running on a core, the core might look

> > > lightly loaded 2048/8192. Hence might dissuade movement to a idle core.

> >

> > Then, there is the sibling flag at SMT level that normally ensures 1 task per

> > core for such UC

> >

>

> Agree.

>

> > >

> > > I always wonder why arch_scale_cpu_capacity() is called with NULL

> > > sched_domain, in scale_rt_capacity(). This way capacity might actually

> >

> > Probably because until this v4.19-rcxx version, the rt scaling was done

> > relatively to local cpu capacity:

> >   capacity  = arch_scale_cpu() * scale_rt_capacity / SCHED_CAPACITY_SCALE

> >

> > Whereas now, it directly returns the remaining capacity

> >

> > > be more than the capacity_orig. I am always under an impression that

> > > capacity_orig > capacity.  Or am I misunderstanding that?

> >

> > You are right, there is a bug for SMT and the patch below should fix it.

> > Nevertheless, we still have the problem in some other places in the code.

> >

> > Subject: [PATCH] sched/fair: fix scale_rt_capacity() for SMT

> >

> > Since commit:

> >   commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")

> > scale_rt_capacity() returns the remaining capacity and not a scale factor

> > to apply on cpu_capacity_orig. arch_scale_cpu() is directly called by

> > scale_rt_capacity() so we must take the sched_domain argument

> >

> > Fixes: 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")

> > Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>

> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

>

> --

> Thanks and Regards

> Srikar Dronamraju

>
Srikar Dronamraju Sept. 5, 2018, 8:50 a.m. | #7
* Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 09:36:42]:

> >

> > I dont know of any systems that have come with single threaded and

> > multithreaded. However some user can still offline few threads in a core

> > while leaving other cores untouched. I dont really know why somebody

> > would want to do it.  For example, some customer was toying with SMT 3

> > mode in a SMT 8 power8 box.

> 

> In this case, it means that we have the same core capacity whatever

> the number of CPUs

> and a core with SMT 3 will be set with the same compute capacity as

> the core with SMT 8.

> Does it still make sense ?

> 


To me it make sense atleast from  a power 8 perspective, because SMT 1 >
SMT 2 > SMT 4 > SMT8. So if one core is configured for SMT 2 and other
core is configured for SMT4; all threads being busy, the  individual
threads running on SMT2 core will complete more work than SMT 4 core
threads.

-- 
Thanks and Regards
Srikar Dronamraju
Vincent Guittot Sept. 5, 2018, 9:11 a.m. | #8
On Wed, 5 Sep 2018 at 10:50, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>

> * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 09:36:42]:

>

> > >

> > > I dont know of any systems that have come with single threaded and

> > > multithreaded. However some user can still offline few threads in a core

> > > while leaving other cores untouched. I dont really know why somebody

> > > would want to do it.  For example, some customer was toying with SMT 3

> > > mode in a SMT 8 power8 box.

> >

> > In this case, it means that we have the same core capacity whatever

> > the number of CPUs

> > and a core with SMT 3 will be set with the same compute capacity as

> > the core with SMT 8.

> > Does it still make sense ?

> >

>

> To me it make sense atleast from  a power 8 perspective, because SMT 1 >

> SMT 2 > SMT 4 > SMT8. So if one core is configured for SMT 2 and other

> core is configured for SMT4; all threads being busy, the  individual

> threads running on SMT2 core will complete more work than SMT 4 core

> threads.


I agree for individual thread capacity but at core group level, the
core SMT 1 will have the same capacity as core group SMT 8 so load
balance will try to balance evenly the tasks between the 2 cores
whereas core SMT 8 > core SMT1 , isn't it ?

>

> --

> Thanks and Regards

> Srikar Dronamraju

>
Srikar Dronamraju Sept. 5, 2018, 11:14 a.m. | #9
* Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 11:11:35]:

> On Wed, 5 Sep 2018 at 10:50, Srikar Dronamraju

> <srikar@linux.vnet.ibm.com> wrote:

> >

> > * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 09:36:42]:

> >

> > > >

> > > > I dont know of any systems that have come with single threaded and

> > > > multithreaded. However some user can still offline few threads in a core

> > > > while leaving other cores untouched. I dont really know why somebody

> > > > would want to do it.  For example, some customer was toying with SMT 3

> > > > mode in a SMT 8 power8 box.

> > >

> > > In this case, it means that we have the same core capacity whatever

> > > the number of CPUs

> > > and a core with SMT 3 will be set with the same compute capacity as

> > > the core with SMT 8.

> > > Does it still make sense ?

> > >

> >

> > To me it make sense atleast from  a power 8 perspective, because SMT 1 >

> > SMT 2 > SMT 4 > SMT8. So if one core is configured for SMT 2 and other

> > core is configured for SMT4; all threads being busy, the  individual

> > threads running on SMT2 core will complete more work than SMT 4 core

> > threads.

> 

> I agree for individual thread capacity but at core group level, the

> core SMT 1 will have the same capacity as core group SMT 8 so load

> balance will try to balance evenly the tasks between the 2 cores

> whereas core SMT 8 > core SMT1 , isn't it ?

> 


I believe that Core capacity irrespective of the number of threads
should be similar. We wanted to give a small benefit if the core has
multiple threads and that was smt_gain. Lets say we have 8 equal sw
threads running on 2 cores; one being SMT 2 and other being SMT4.
then 4 threads should be spread to each core. So that we would be fair
to each of the 8 SW threads.

-- 
Thanks and Regards
Srikar Dronamraju
Vincent Guittot Sept. 6, 2018, 9:21 a.m. | #10
On Wed, 5 Sep 2018 at 13:14, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>

> * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 11:11:35]:

>

> > On Wed, 5 Sep 2018 at 10:50, Srikar Dronamraju

> > <srikar@linux.vnet.ibm.com> wrote:

> > >

> > > * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-05 09:36:42]:

> > >

> > > > >

> > > > > I dont know of any systems that have come with single threaded and

> > > > > multithreaded. However some user can still offline few threads in a core

> > > > > while leaving other cores untouched. I dont really know why somebody

> > > > > would want to do it.  For example, some customer was toying with SMT 3

> > > > > mode in a SMT 8 power8 box.

> > > >

> > > > In this case, it means that we have the same core capacity whatever

> > > > the number of CPUs

> > > > and a core with SMT 3 will be set with the same compute capacity as

> > > > the core with SMT 8.

> > > > Does it still make sense ?

> > > >

> > >

> > > To me it make sense atleast from  a power 8 perspective, because SMT 1 >

> > > SMT 2 > SMT 4 > SMT8. So if one core is configured for SMT 2 and other

> > > core is configured for SMT4; all threads being busy, the  individual

> > > threads running on SMT2 core will complete more work than SMT 4 core

> > > threads.

> >

> > I agree for individual thread capacity but at core group level, the

> > core SMT 1 will have the same capacity as core group SMT 8 so load

> > balance will try to balance evenly the tasks between the 2 cores

> > whereas core SMT 8 > core SMT1 , isn't it ?

> >

>

> I believe that Core capacity irrespective of the number of threads

> should be similar. We wanted to give a small benefit if the core has

> multiple threads and that was smt_gain. Lets say we have 8 equal sw

> threads running on 2 cores; one being SMT 2 and other being SMT4.

> then 4 threads should be spread to each core. So that we would be fair

> to each of the 8 SW threads.


Do you mean that it would be the same with SMT 2 and SMT 8 ?
evenly spread the 8 SW threads between the 2 cores would be better
than 2 SW threads on core SMT 2 and 6 on core SMT8

>

> --

> Thanks and Regards

> Srikar Dronamraju

>
Peter Zijlstra Sept. 7, 2018, 12:42 p.m. | #11
On Wed, Aug 29, 2018 at 03:08:38PM +0100, Qais Yousef wrote:
> It was hard to track the history of this and *why* it's needed.


Yeah, my bad..

So at some point I wanted to do dynamic capacity and dynamic smt gain by
using the x86 APERF/MPERF stuff. But it never quite worked and we got
stuck with the remnants.
Qais Yousef Sept. 10, 2018, 11:05 a.m. | #12
On 04/09/18 11:37, Srikar Dronamraju wrote:
> * Vincent Guittot <vincent.guittot@linaro.org> [2018-09-04 11:36:26]:

>

>> Hi Srikar,

>>

>> Le Tuesday 04 Sep 2018 à 01:24:24 (-0700), Srikar Dronamraju a écrit :

>>> However after this change, capacity_orig of each SMT thread would be

>>> 1024. For example SMT 8 core capacity_orig would now be 8192.

>>>

>>> smt_gain was suppose to make a multi threaded core was slightly more

>>> powerful than a single threaded core. I suspect if that sometimes hurt

>> Is there system with both single threaded and multi threaded core ?

>> That was the main open point for me (and for Qais too)

>>

> I dont know of any systems that have come with single threaded and

> multithreaded. However some user can still offline few threads in a core

> while leaving other cores untouched. I dont really know why somebody

> would want to do it.  For example, some customer was toying with SMT 3

> mode in a SMT 8 power8 box.


What was the customer trying to achieve by this? Did this end up being 
useful?

If we get mixed SMT 3 and SMT 8 in the system we might have issues, but 
again I don't see how this makes sense from system point of view. I 
don't think power savings will be significant by turning off few 
hardware threads since the core which I'd expect to be the main energy 
consumer is still on. From performance perspective, SMT 3 might have 
less contention (hence better 'performance'), but we don't do anything 
special to decide to schedule work in this case to take advantage of 
that. So I don't think the setup is useful to cater for or encourage.

-- 
Qais Yousef
Qais Yousef Sept. 10, 2018, 11:23 a.m. | #13
On 07/09/18 13:42, Peter Zijlstra wrote:
> On Wed, Aug 29, 2018 at 03:08:38PM +0100, Qais Yousef wrote:

>> It was hard to track the history of this and *why* it's needed.

> Yeah, my bad..

>

> So at some point I wanted to do dynamic capacity and dynamic smt gain by

> using the x86 APERF/MPERF stuff. But it never quite worked and we got

> stuck with the remnants.



OK thanks for the info. I think that was 10 years ago? With all the code 
moves and renaming git log -L was getting more difficult to use as I dug 
deeper in history :p

Is this work officially dead then? If yes, it ended up not being useful 
or just never got around finishing it?

Thanks

-- 
Qais Yousef

Patch

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 2634774..212792b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -89,7 +89,6 @@  struct sched_domain {
 	unsigned int newidle_idx;
 	unsigned int wake_idx;
 	unsigned int forkexec_idx;
-	unsigned int smt_gain;
 
 	int nohz_idle;			/* NOHZ IDLE status */
 	int flags;			/* See SD_* */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4a2e8ca..b1715b8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1758,9 +1758,6 @@  unsigned long arch_scale_freq_capacity(int cpu)
 static __always_inline
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	if (sd && (sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
-		return sd->smt_gain / sd->span_weight;
-
 	return SCHED_CAPACITY_SCALE;
 }
 #endif
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 56a0fed..069c924 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1129,7 +1129,6 @@  sd_init(struct sched_domain_topology_level *tl,
 
 		.last_balance		= jiffies,
 		.balance_interval	= sd_weight,
-		.smt_gain		= 0,
 		.max_newidle_lb_cost	= 0,
 		.next_decay_max_lb_cost	= jiffies,
 		.child			= child,
@@ -1155,7 +1154,6 @@  sd_init(struct sched_domain_topology_level *tl,
 	if (sd->flags & SD_SHARE_CPUCAPACITY) {
 		sd->flags |= SD_PREFER_SIBLING;
 		sd->imbalance_pct = 110;
-		sd->smt_gain = 1178; /* ~15% */
 
 	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
 		sd->flags |= SD_PREFER_SIBLING;