mbox series

[v3,00/10] arm, arm64, cpufreq: frequency- and cpu-invariant accounting support for task scheduler

Message ID 20170727193312.9849-1-dietmar.eggemann@arm.com
Headers show
Series arm, arm64, cpufreq: frequency- and cpu-invariant accounting support for task scheduler | expand

Message

Dietmar Eggemann July 27, 2017, 7:33 p.m. UTC
For a more accurate (i.e. frequency- and cpu-invariant) accounting
the task scheduler needs a frequency-scaling and on a heterogeneous
system a cpu-scaling correction factor.

This patch-set implements a Frequency Invariance Engine (FIE)
based on the ratio of current frequency and maximum supported frequency
(topology_get_freq_scale()) in the arch topology driver (arm, arm64) to
provide such a frequency-scaling correction factor.
This is a solution to get frequency-invariant accounting support for
platforms without hardware-based performance tracking.

The Cpu Invariance Engine (CIE) (topology_get_cpu_scale()) providing a
cpu-scaling correction factor was already introduced by the "Fix issues
and factorize arm/arm64 capacity information code" patch-set [1] and is
currently part of v4.13-rc2.

This patch-set also enables the frequency- and cpu-invariant accounting
support. Enabling here means to associate (wire) the task scheduler
function name arch_scale_freq_capacity and arch_scale_cpu_capacity with
the FIE and CIE function names from drivers/base/arch_topology.c. This
replaces the task scheduler's default FIE and CIE in
kernel/sched/sched.h.

v2: review results:

During the v2 [2] review we agreed that cpufreq core is not the right
place to drive this kind of FIE. In case of a future fast-switching
arm/arm64 cpufreq driver, a return value other than
CPUFREQ_ENTRY_INVALID of the '->fast_switch' driver callback does not
indicate that the new frequency has been set. So calling the FIE's
setter function (arch_set_freq_scale()) in cpufreq_driver_fast_switch()
would not be correct.

Instead this version of the patch-set assumes that the FIE setter
function is provided by the cpufreq driver (slow- or fast-switching).

Slow-switching driver:

For a slow-switching driver the FIE setter function can be called in its
'->target_index' driver callback, after a non-error return value from
the (blocking) firmware call has been received.
Two slow-switching cpufreq drivers (arm-big-little (exclusively used in
arm/arm64) and cpufreq-dt) have been changed accordingly.

Fast-switching driver:

For a fast-switching driver the ARM System Control and Management
Interface (SCMI) document [3] (4.5 Performance domain management
protocol) provides an example for such a driver how to communicate with
the firmware.
It defines an optional performance domain related notification which
indicates that the requested frequency has been changed. So the FIE
setter function could be implemented here.
In case the firmware does not implement this notification or the driver
makes no use of it, the FIE setter would have to be placed in the
'->fast_switch' driver callback assuming that the firmware has set the
frequency.

Weak arch_set_freq_scale() interface:

Since a cpufreq driver (e.g. cpufreq-dt) can be build for different
arch's a default (empty) implementation of the FIE setter function in
cpufreq core is still needed for those arch's which do not implement it.

FIE/CIE inlining:

The FIE (topology_get_freq_scale()) and CIE (topology_get_cpu_scale())
getter functions have been coded as static inline functions so they can
be inlined into the task scheduler fast path (e.g.
__update_load_avg_se()).

+------------------------------+       +------------------------------+
|                              |       |                              |
| cpufreq:                     |       | arch:                        |
|                              |       |                              |
| weak arch_set_freq_scale() {}|       |                              |
|                              |       |                              |
+------------------------------+       |                              |
                                       |                              |
+------------------------------+       |                              |
|                              |       |                              |
| cpufreq driver:              |       |                              |
|                              |       |                              |
|                            +-----------> arch_set_freq_scale()      |
|                              |       |                              |
+------------------------------+       |   topology_set_cpu_scale()   |
                                       |                              |
+------------------------------+       |                              |
|                              |       |                              |
| task scheduler:              |       |                              |
|                              |       |                              |
| arch_scale_freq_capacity() +-----------> topology_get_freq_scale()  |
|                              |       |                              |
| arch_scale_cpu_capacity()  +-----------> topology_get_cpu_scale()   |
|                              |       |                              |
+------------------------------+       +------------------------------+

v1 review results:

During the v1 [4] review Viresh Kumar pointed out that such a FIE based
on cpufreq transition notifier will not work with future cpufreq
policies supporting fast frequency switching.
To include support for fast frequency switching policies the FIE
implementation has been changed. Whenever there is a frequency change
cpufreq now calls the arch specific function arch_set_freq_scale() which
has to be implemented by the architecture. In case the arch does not
specify this function FIE support is compiled out of cpufreq.
The advantage is that this would support fast frequency switching since
it does not rely on cpufreq transition (or policy) notifier anymore.

Patch high level description:

   [   01/10] Fix to free cpumask cpus_to_visit
   [   02/10] Default (empty, weak) arch_set_freq_scale() implementation
   [03,04/10] Call arch_set_freq_scale() from two cpufreq drivers
              (arm_big_little and cpufreq-dt)
   [   05/10] FIE
   [   06/10] Allow CIE inlining
   [07,08/10] Enable frequency- and cpu-invariant accounting on arm
   [09,10/10] Enable frequency- and cpu-invariant accounting on arm64

Changes v2->v3:

   - Rebase on top of v4.13-rc2
   - Fix 'notifier unregister'-'free cpumask' order in
     parsing_done_workfn() in [01/10]
   - Call arch_set_freq_scale() from cpufreq drivers (arm-big-little
     and cpufreq-dt) [02-04/10]
   - Allow inlining of topology_get_freq_scale() and    
     topology_get_cpu_scale() into task scheduler fastpath [05,06/10]

Changes v1->v2:
   - Rebase on top of next-20170630
   - Add fixup patch to free cpumask cpus_to_visit [01/10]
   - Propose solution to support fast frequency switching [02-04,07/10]
   - Add patch to allow CIE and FIE inlining [10/10]

The patch-set is based on top of v4.13-rc2 and is also available from:

   git://linux-arm.org/linux-de.git upstream/freq_and_cpu_inv_v3

Testing:

The patch-set has been tested on TC2 (arm, arm-big-little driver), JUNO
(arm64, arm-big-little driver) and Hikey620 (arm64, cpufreq-dt driver)
by running a ramp-up rt-app task pinned to a cpu with the ondemand
cpufreq governor and checking the load-tracking signals of this task.

Driver module loading has been tested on TC2, JUNO and Hikey620 as well.

[1] https://marc.info/?l=linux-kernel&m=149625018223002&w=2
[2] https://marc.info/?l=linux-pm&m=149933474313566&w=2
[3] http://arminfo.emea.arm.com/help/topic/com.arm.doc.den0056a/DEN0056A_System_Control_and_Management_Interface.pdf
[4] https://marc.info/?l=linux-kernel&m=149690865010019&w=2

Dietmar Eggemann (10):
  drivers base/arch_topology: free cpumask cpus_to_visit
  cpufreq: provide default frequency-invariance setter function
  cpufreq: arm_big_little: invoke frequency-invariance setter function
  cpufreq: dt: invoke frequency-invariance setter function
  drivers base/arch_topology: provide frequency-invariant accounting
    support
  drivers base/arch_topology: allow inlining cpu-invariant accounting
    support
  arm: wire frequency-invariant accounting support up to the task
    scheduler
  arm: wire cpu-invariant accounting support up to the task scheduler
  arm64: wire frequency-invariant accounting support up to the task
    scheduler
  arm64: wire cpu-invariant accounting support up to the task scheduler

 arch/arm/include/asm/topology.h   |  8 ++++++++
 arch/arm64/include/asm/topology.h |  8 ++++++++
 drivers/base/arch_topology.c      | 29 +++++++++++++++++++++++------
 drivers/cpufreq/arm_big_little.c  | 10 +++++++++-
 drivers/cpufreq/cpufreq-dt.c      | 12 ++++++++++--
 drivers/cpufreq/cpufreq.c         | 11 +++++++++++
 include/linux/arch_topology.h     | 18 +++++++++++++++++-
 include/linux/cpufreq.h           |  3 +++
 8 files changed, 89 insertions(+), 10 deletions(-)

-- 
2.11.0

Comments

Viresh Kumar July 28, 2017, 7:21 a.m. UTC | #1
On 27-07-17, 20:33, Dietmar Eggemann wrote:
> Free cpumask cpus_to_visit in case registering

> init_cpu_capacity_notifier has failed or the parsing of the cpu

> capacity-dmips-mhz property is done. The cpumask cpus_to_visit is

> only used inside the notifier call init_cpu_capacity_callback.

> 

> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

> Cc: Juri Lelli <juri.lelli@arm.com>

> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>

> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>

> Tested-by: Juri Lelli <juri.lelli@arm.com>

> Reviewed-by: Juri Lelli <juri.lelli@arm.com>

> ---

>  drivers/base/arch_topology.c | 12 ++++++++++--

>  1 file changed, 10 insertions(+), 2 deletions(-)


Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


-- 
viresh
Viresh Kumar July 28, 2017, 7:30 a.m. UTC | #2
On 27-07-17, 20:33, Dietmar Eggemann wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

> index 9bf97a366029..04e2f7e4964e 100644

> --- a/drivers/cpufreq/cpufreq.c

> +++ b/drivers/cpufreq/cpufreq.c

> @@ -2404,6 +2404,17 @@ int cpufreq_boost_enabled(void)

>  EXPORT_SYMBOL_GPL(cpufreq_boost_enabled);

>  

>  /*********************************************************************

> + *               FREQUENCY INVARIANT ACCOUNTING SUPPORT              *

> + *********************************************************************/


We don't need another of these fancy headers :)

Just add below routine somewhere at the top, maybe before
cpufreq_generic_init().

> +

> +__weak void arch_set_freq_scale(struct cpumask *cpus,

> +				unsigned long cur_freq,

> +				unsigned long max_freq)

> +{

> +}

> +EXPORT_SYMBOL_GPL(arch_set_freq_scale);

> +

> +/*********************************************************************

>   *               REGISTER / UNREGISTER CPUFREQ DRIVER                *

>   *********************************************************************/

>  static enum cpuhp_state hp_online;

> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h

> index f10a9b3761cd..e38acc1a4d47 100644

> --- a/include/linux/cpufreq.h

> +++ b/include/linux/cpufreq.h

> @@ -899,6 +899,9 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)

>  

>  extern unsigned int arch_freq_get_on_cpu(int cpu);

>  

> +extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

> +				unsigned long max_freq);

> +

>  /* the following are really really optional */

>  extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;

>  extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;

> -- 

> 2.11.0


-- 
viresh
Viresh Kumar July 28, 2017, 8:25 a.m. UTC | #3
On 27-07-17, 20:33, Dietmar Eggemann wrote:
> Call the frequency-invariance setter function arch_set_freq_scale()

> if the new frequency has been successfully set which is indicated by

> dev_pm_opp_set_rate() returning 0.

> 

> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>

> Cc: Viresh Kumar <viresh.kumar@linaro.org>

> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

> ---

>  drivers/cpufreq/cpufreq-dt.c | 12 ++++++++++--

>  1 file changed, 10 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c

> index fef3c2160691..cbac8a7dbc50 100644

> --- a/drivers/cpufreq/cpufreq-dt.c

> +++ b/drivers/cpufreq/cpufreq-dt.c

> @@ -43,9 +43,17 @@ static struct freq_attr *cpufreq_dt_attr[] = {

>  static int set_target(struct cpufreq_policy *policy, unsigned int index)

>  {

>  	struct private_data *priv = policy->driver_data;

> +	unsigned long freq = policy->freq_table[index].frequency;

> +	int ret;

> +

> +	ret = dev_pm_opp_set_rate(priv->cpu_dev, freq * 1000);

>  

> -	return dev_pm_opp_set_rate(priv->cpu_dev,

> -				   policy->freq_table[index].frequency * 1000);

> +	if (!ret) {

> +		arch_set_freq_scale(policy->related_cpus, freq,

> +				    policy->cpuinfo.max_freq);

> +	}

> +

> +	return ret;

>  }


Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


-- 
viresh
Viresh Kumar July 28, 2017, 8:26 a.m. UTC | #4
On 27-07-17, 20:33, Dietmar Eggemann wrote:
> Implements the arch-specific (arm and arm64) frequency-invariance setter

> function arch_set_freq_scale() which provides the following frequency

> scaling factor:

> 

>   current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_supported_freq(cpu)

> 

> One possible consumer of the frequency-invariance getter function

> topology_get_freq_scale() is the Per-Entity Load Tracking (PELT)

> mechanism of the task scheduler.

> 

> Allow inlining of topology_get_freq_scale() into the task scheduler

> fast path (e.g. __update_load_avg_se()) by coding it as a static inline

> function in the arch topology header file.

> 

> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

> Cc: Juri Lelli <juri.lelli@arm.com>

> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

> ---

>  drivers/base/arch_topology.c  | 14 ++++++++++++++

>  include/linux/arch_topology.h | 10 ++++++++++

>  2 files changed, 24 insertions(+)

> 

> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c

> index 562e0c93ae52..af9ab98a233e 100644

> --- a/drivers/base/arch_topology.c

> +++ b/drivers/base/arch_topology.c

> @@ -22,6 +22,20 @@

>  #include <linux/string.h>

>  #include <linux/sched/topology.h>

>  

> +DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;

> +

> +void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

> +			 unsigned long max_freq)

> +{

> +	unsigned long scale;

> +	int i;

> +

> +	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;

> +

> +	for_each_cpu(i, cpus)

> +		per_cpu(freq_scale, i) = scale;

> +}

> +

>  static DEFINE_MUTEX(cpu_scale_mutex);

>  static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

>  

> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h

> index 9af3c174c03a..3e3c2657c9a1 100644

> --- a/include/linux/arch_topology.h

> +++ b/include/linux/arch_topology.h

> @@ -4,6 +4,8 @@

>  #ifndef _LINUX_ARCH_TOPOLOGY_H_

>  #define _LINUX_ARCH_TOPOLOGY_H_

>  

> +#include <linux/percpu.h>

> +

>  void topology_normalize_cpu_scale(void);

>  

>  struct device_node;

> @@ -14,4 +16,12 @@ unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu);

>  

>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);

>  

> +DECLARE_PER_CPU(unsigned long, freq_scale);

> +

> +static inline

> +unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu)

> +{

> +	return per_cpu(freq_scale, cpu);

> +}

> +

>  #endif /* _LINUX_ARCH_TOPOLOGY_H_ */


Acked-by: Viresh Kumar <viresh.kumar@linaro.org>


-- 
viresh
Dietmar Eggemann Aug. 2, 2017, 12:55 p.m. UTC | #5
Hi Viresh,

On 28/07/17 08:30, Viresh Kumar wrote:
> On 27-07-17, 20:33, Dietmar Eggemann wrote:

>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c

>> index 9bf97a366029..04e2f7e4964e 100644

>> --- a/drivers/cpufreq/cpufreq.c

>> +++ b/drivers/cpufreq/cpufreq.c

>> @@ -2404,6 +2404,17 @@ int cpufreq_boost_enabled(void)

>>  EXPORT_SYMBOL_GPL(cpufreq_boost_enabled);

>>  

>>  /*********************************************************************

>> + *               FREQUENCY INVARIANT ACCOUNTING SUPPORT              *

>> + *********************************************************************/

> 

> We don't need another of these fancy headers :)

> 

> Just add below routine somewhere at the top, maybe before

> cpufreq_generic_init().


Yes, sure. Will address this in the upcoming v4.

Thanks the review!

-- Dietmar
Dietmar Eggemann Aug. 2, 2017, 1:04 p.m. UTC | #6
Hi Russell,

On 27/07/17 20:33, Dietmar Eggemann wrote:

[...]

> Patch high level description:

> 

>    [   01/10] Fix to free cpumask cpus_to_visit

>    [   02/10] Default (empty, weak) arch_set_freq_scale() implementation

>    [03,04/10] Call arch_set_freq_scale() from two cpufreq drivers

>               (arm_big_little and cpufreq-dt)

>    [   05/10] FIE

>    [   06/10] Allow CIE inlining

>    [07,08/10] Enable frequency- and cpu-invariant accounting on arm


Would you have time to review patch 7 and 8. I already got an Acked-by
from the ARM64 maintainer for the arm64 related counterpart functionality.

Thanks,

-- Dietmar

[...]
Dietmar Eggemann Aug. 4, 2017, 5:38 p.m. UTC | #7
Hi Rafael,

On 27/07/17 20:33, Dietmar Eggemann wrote:

[...]

> Patch high level description:

> 

>    [   01/10] Fix to free cpumask cpus_to_visit

>    [   02/10] Default (empty, weak) arch_set_freq_scale() implementation

>    [03,04/10] Call arch_set_freq_scale() from two cpufreq drivers

>               (arm_big_little and cpufreq-dt)


Could you please have a look at patches 2-4 which touch the cpufreq
subsystem.

Thanks,

-- Dietmar

[....]
Russell King (Oracle) Aug. 14, 2017, 1:16 p.m. UTC | #8
On Thu, Jul 27, 2017 at 08:33:09PM +0100, Dietmar Eggemann wrote:
> Commit dfbca41f3479 ("sched: Optimize freq invariant accounting")

> changed the wiring which now has to be done by associating

> arch_scale_freq_capacity with the actual implementation provided

> by the architecture.

> 

> Define arch_scale_freq_capacity to use the arch_topology "driver"

> function topology_get_freq_scale() for the task scheduler's

> frequency-invariant accounting instead of the default

> arch_scale_freq_capacity() in kernel/sched/sched.h.

> 

> Cc: Russell King <linux@arm.linux.org.uk>

> Cc: Juri Lelli <juri.lelli@arm.com>

> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>

> Tested-by: Juri Lelli <juri.lelli@arm.com>

> Reviewed-by: Juri Lelli <juri.lelli@arm.com>


Acked-by: Russell King <rmk+kernel@armlinux.org.uk>


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Russell King (Oracle) Aug. 14, 2017, 1:16 p.m. UTC | #9
On Thu, Jul 27, 2017 at 08:33:10PM +0100, Dietmar Eggemann wrote:
> Commit 8cd5601c5060 ("sched/fair: Convert arch_scale_cpu_capacity() from

> weak function to #define") changed the wiring which now has to be done

> by associating arch_scale_cpu_capacity with the actual implementation

> provided by the architecture.

> 

> Define arch_scale_cpu_capacity to use the arch_topology "driver"

> function topology_get_cpu_scale() for the task scheduler's cpu-invariant

> accounting instead of the default arch_scale_cpu_capacity() in

> kernel/sched/sched.h.

> 

> Cc: Russell King <linux@arm.linux.org.uk>

> Cc: Juri Lelli <juri.lelli@arm.com>

> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>

> Tested-by: Juri Lelli <juri.lelli@arm.com>

> Reviewed-by: Juri Lelli <juri.lelli@arm.com>


Acked-by: Russell King <rmk+kernel@armlinux.org.uk>


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up