[V3,1/2] topology: Allow multiple entities to provide sched_freq_tick() callback

Message ID d398729676f3d2b0d2ab024a2c9ea6e9ee1d0dca.1611829953.git.viresh.kumar@linaro.org
State New
Headers show
Series
  • cpufreq: cppc: Add support for frequency invariance
Related show

Commit Message

Viresh Kumar Jan. 28, 2021, 10:48 a.m.
This patch attempts to make it generic enough so other parts of the
kernel can also provide their own implementation of scale_freq_tick()
callback, which is called by the scheduler periodically to update the
per-cpu freq_scale variable.

The implementations now need to provide struct scale_freq_data for the
CPUs for which they have hardware counters available, and a callback
gets registered for each possible CPU in a per-cpu variable.

The arch specific (or ARM AMU) counters are updated to adapt to this and
they take the highest priority if they are available, i.e. they will be
used instead of CPPC based counters for example.

Note that this also defines SCALE_FREQ_SOURCE_CPUFREQ but doesn't use it
and it is added to show that cpufreq is also acts as source of
information for FIE and will be used by default if no other counters are
supported for a platform.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 arch/arm64/include/asm/topology.h | 10 +---
 arch/arm64/kernel/topology.c      | 89 ++++++++++++++-----------------
 drivers/base/arch_topology.c      | 56 +++++++++++++++++--
 include/linux/arch_topology.h     | 14 ++++-
 4 files changed, 105 insertions(+), 64 deletions(-)

-- 
2.25.0.rc1.19.g042ed3e048af

Comments

Ionela Voinescu Feb. 3, 2021, 11:45 a.m. | #1
Hi Viresh,

Many thanks for the renaming of functions/variables/enums.

I've cropped all the code that looks good to me, and I kept some
portions of interest.

On Thursday 28 Jan 2021 at 16:18:55 (+0530), Viresh Kumar wrote:
> This patch attempts to make it generic enough so other parts of the

> kernel can also provide their own implementation of scale_freq_tick()

> callback, which is called by the scheduler periodically to update the

> per-cpu freq_scale variable.

[..]
>  static void amu_fie_setup(const struct cpumask *cpus)

>  {

> @@ -227,7 +263,7 @@ static void amu_fie_setup(const struct cpumask *cpus)

>  	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

>  		return;

>  

> -	static_branch_enable(&amu_fie_key);

> +	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);

>  

>  	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",

>  		 cpumask_pr_args(cpus));

> @@ -283,53 +319,6 @@ static int __init init_amu_fie(void)

>  }

>  core_initcall(init_amu_fie);

[..]
> +void topology_set_scale_freq_source(struct scale_freq_data *data,

> +				    const struct cpumask *cpus)

> +{

> +	struct scale_freq_data *sfd;

> +	int cpu;

> +

> +	for_each_cpu(cpu, cpus) {

> +		sfd = per_cpu(sft_data, cpu);

> +

> +		/* Use ARCH provided counters whenever possible */

> +		if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {

> +			per_cpu(sft_data, cpu) = data;

> +			cpumask_set_cpu(cpu, &scale_freq_counters_mask);

> +		}

> +	}

>  }

> +EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);

[..]

I have one single comment left for this patch, and apologies in advance
for the long story.

In general, for frequency invariance, we're interested in whether the full
system is frequency invariant as well, for two reasons:
 - We want to be able to either set a scale factor for all CPUs or none
   at all.
 - If at some point during the boot process the system invariance status
   changes, we want to be able to inform dependents: schedutil and EAS.

This management is currently done on amu_fie_setup(), because before
these patches we only had two sources for frequency invariance: cpufreq
and AMU counters. But that's not enough after these two patches, from
both functional and code design point of view.

I have to mention that your code will work as it is for now, but only
because CPPC is the new other source of counters, and for CPPC we also
have cpufreq invariance available. But this only hides what can become a
problem later: if in the future we won't have cpufreq invariance for
CPPC or if another provider of counters is added and used in a system
without cpufreq invariance, the late initialization of these counters
will either break schedutil/scheduler if not all CPUs support those
counters, or keep EAS disabled, even if all CPUs support these counters,
and frequency invariance is later enabled.

Therefore, I think system level invariance management (checks and
call to rebuild_sched_domains_energy()) also needs to move from arm64
code to arch_topology code.

Thanks,
Ionela.
Viresh Kumar Feb. 5, 2021, 9:14 a.m. | #2
On 03-02-21, 11:45, Ionela Voinescu wrote:
> Therefore, I think system level invariance management (checks and

> call to rebuild_sched_domains_energy()) also needs to move from arm64

> code to arch_topology code.


Here is the 3rd patch of this series then :)

From: Viresh Kumar <viresh.kumar@linaro.org>

Date: Fri, 5 Feb 2021 13:31:53 +0530
Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance
 change

We already do this for the arm64, move it to arch_topology.c as we
manage all sched_freq_tick sources here now.

Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 arch/arm64/kernel/topology.c | 16 ----------------
 drivers/base/arch_topology.c | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1e47dfd465f8..47fca7376c93 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {
 
 static void amu_fie_setup(const struct cpumask *cpus)
 {
-	bool invariant;
 	int cpu;
 
 	/* We are already set since the last insmod of cpufreq driver */
@@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)
 
 	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
 
-	invariant = topology_scale_freq_invariant();
-
-	/* We aren't fully invariant yet */
-	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
-		return;
-
 	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
 
 	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
 		 cpumask_pr_args(cpus));
-
-	/*
-	 * Task scheduler behavior depends on frequency invariance support,
-	 * either cpufreq or counter driven. If the support status changes as
-	 * a result of counter initialisation and use, retrigger the build of
-	 * scheduling domains to ensure the information is propagated properly.
-	 */
-	if (!invariant)
-		rebuild_sched_domains_energy();
 }
 
 static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val,
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 20b511949cd8..3631877f4440 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -23,6 +23,7 @@
 
 static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
 static struct cpumask scale_freq_counters_mask;
+static bool scale_freq_invariant;
 
 static bool supports_scale_freq_counters(const struct cpumask *cpus)
 {
@@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void)
 	       supports_scale_freq_counters(cpu_online_mask);
 }
 
+static void update_scale_freq_invariant(bool status)
+{
+	if (scale_freq_invariant == status)
+		return;
+
+	/*
+	 * Task scheduler behavior depends on frequency invariance support,
+	 * either cpufreq or counter driven. If the support status changes as
+	 * a result of counter initialisation and use, retrigger the build of
+	 * scheduling domains to ensure the information is propagated properly.
+	 */
+	if (topology_scale_freq_invariant() == status) {
+		scale_freq_invariant = status;
+		rebuild_sched_domains_energy();
+	}
+}
+
 void topology_set_scale_freq_source(struct scale_freq_data *data,
 				    const struct cpumask *cpus)
 {
@@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,
 			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
 		}
 	}
+
+	update_scale_freq_invariant(true);
 }
 EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
 
@@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,
 			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
 		}
 	}
+
+	update_scale_freq_invariant(false);
 }
 EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
 
-- 
2.25.0.rc1.19.g042ed3e048af

-- 
viresh
Ionela Voinescu Feb. 17, 2021, 12:24 a.m. | #3
Hey,

On Friday 05 Feb 2021 at 14:44:24 (+0530), Viresh Kumar wrote:
> On 03-02-21, 11:45, Ionela Voinescu wrote:

> > Therefore, I think system level invariance management (checks and

> > call to rebuild_sched_domains_energy()) also needs to move from arm64

> > code to arch_topology code.

> 

> Here is the 3rd patch of this series then :)

> 


I think it could be merged in patch 1/2 as it's part of enabling the use
of multiple sources of information for FIE. Up to you!

> From: Viresh Kumar <viresh.kumar@linaro.org>

> Date: Fri, 5 Feb 2021 13:31:53 +0530

> Subject: [PATCH] drivers: arch_topology: rebuild sched domains on invariance

>  change

> 

> We already do this for the arm64, move it to arch_topology.c as we

> manage all sched_freq_tick sources here now.

> 

> Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  arch/arm64/kernel/topology.c | 16 ----------------

>  drivers/base/arch_topology.c | 22 ++++++++++++++++++++++

>  2 files changed, 22 insertions(+), 16 deletions(-)

> 

> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c

> index 1e47dfd465f8..47fca7376c93 100644

> --- a/arch/arm64/kernel/topology.c

> +++ b/arch/arm64/kernel/topology.c

> @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {

>  

>  static void amu_fie_setup(const struct cpumask *cpus)

>  {

> -	bool invariant;

>  	int cpu;

>  

>  	/* We are already set since the last insmod of cpufreq driver */

> @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)

>  

>  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);

>  

> -	invariant = topology_scale_freq_invariant();

> -

> -	/* We aren't fully invariant yet */

> -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

> -		return;

> -


You still need these checks, otherwise you could end up with only part
of the CPUs setting a scale factor, when only part of the CPUs support
AMUs and there is no cpufreq support for FIE.

>  	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);

>  

>  	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",

>  		 cpumask_pr_args(cpus));

> -

> -	/*

> -	 * Task scheduler behavior depends on frequency invariance support,

> -	 * either cpufreq or counter driven. If the support status changes as

> -	 * a result of counter initialisation and use, retrigger the build of

> -	 * scheduling domains to ensure the information is propagated properly.

> -	 */

> -	if (!invariant)

> -		rebuild_sched_domains_energy();

>  }

>  

>  static int init_amu_fie_callback(struct notifier_block *nb, unsigned long val,

> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c

> index 20b511949cd8..3631877f4440 100644

> --- a/drivers/base/arch_topology.c

> +++ b/drivers/base/arch_topology.c

> @@ -23,6 +23,7 @@

>  

>  static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);

>  static struct cpumask scale_freq_counters_mask;

> +static bool scale_freq_invariant;

>  

>  static bool supports_scale_freq_counters(const struct cpumask *cpus)

>  {

> @@ -35,6 +36,23 @@ bool topology_scale_freq_invariant(void)

>  	       supports_scale_freq_counters(cpu_online_mask);

>  }

>  

> +static void update_scale_freq_invariant(bool status)

> +{

> +	if (scale_freq_invariant == status)

> +		return;

> +

> +	/*

> +	 * Task scheduler behavior depends on frequency invariance support,

> +	 * either cpufreq or counter driven. If the support status changes as

> +	 * a result of counter initialisation and use, retrigger the build of

> +	 * scheduling domains to ensure the information is propagated properly.

> +	 */

> +	if (topology_scale_freq_invariant() == status) {

> +		scale_freq_invariant = status;

> +		rebuild_sched_domains_energy();

> +	}

> +}

> +

>  void topology_set_scale_freq_source(struct scale_freq_data *data,

>  				    const struct cpumask *cpus)

>  {

> @@ -50,6 +68,8 @@ void topology_set_scale_freq_source(struct scale_freq_data *data,

>  			cpumask_set_cpu(cpu, &scale_freq_counters_mask);

>  		}

>  	}


Small(ish) optimisation at the beginning of this function:

    if (cpumask_empty(&scale_freq_counters_mask))
        scale_freq_invariant = topology_scale_freq_invariant();

This will save you a call to rebuild_sched_domains_energy(), which is
quite expensive, when cpufreq supports FIE and we also have counters.


After comments addressed,

Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>

Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>


..for 1/2 and this addition.


Thanks,
Ionela.

> +

> +	update_scale_freq_invariant(true);

>  }

>  EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);

>  

> @@ -67,6 +87,8 @@ void topology_clear_scale_freq_source(enum scale_freq_source source,

>  			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);

>  		}

>  	}

> +

> +	update_scale_freq_invariant(false);

>  }

>  EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);

>  

> -- 

> 2.25.0.rc1.19.g042ed3e048af

> 

> -- 

> viresh
Viresh Kumar Feb. 17, 2021, 4:25 a.m. | #4
On 17-02-21, 00:24, Ionela Voinescu wrote:
> I think it could be merged in patch 1/2 as it's part of enabling the use

> of multiple sources of information for FIE. Up to you!


Sure.

> >  static void amu_fie_setup(const struct cpumask *cpus)

> >  {

> > -	bool invariant;

> >  	int cpu;

> >  

> >  	/* We are already set since the last insmod of cpufreq driver */

> > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)

> >  

> >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);

> >  

> > -	invariant = topology_scale_freq_invariant();

> > -

> > -	/* We aren't fully invariant yet */

> > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

> > -		return;

> > -

> 

> You still need these checks, otherwise you could end up with only part

> of the CPUs setting a scale factor, when only part of the CPUs support

> AMUs and there is no cpufreq support for FIE.


Both supports_scale_freq_counters() and topology_scale_freq_invariant() take
care of this now and they will keep reporting the system as invariant until the
time all the CPUs have counters (in absence of cpufreq).

The topology_set_scale_freq_source() API is supposed to be called multiple
times, probably once for each policy and so I don't see a need of these checks
anymore.

> Small(ish) optimisation at the beginning of this function:

> 

>     if (cpumask_empty(&scale_freq_counters_mask))

>         scale_freq_invariant = topology_scale_freq_invariant();

> 

> This will save you a call to rebuild_sched_domains_energy(), which is

> quite expensive, when cpufreq supports FIE and we also have counters.


Good Point.
 
> After comments addressed,

> 

> Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>


Thanks.

> Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>


Just out of curiosity, what exactly did you test and what was the setup ? :)

-- 
viresh
Ionela Voinescu Feb. 17, 2021, 11:30 a.m. | #5
Hi,

Replying this first as it's going to be relevant below:

> Just out of curiosity, what exactly did you test and what was the setup ? :)


I tested it on:

 - Juno R0 (CPUs [0, 3-5] are littles, CPUs [1-2] are bigs)
   + PMUs faking AMUs
   + userspace/schedutil +
   + cpufreq-FIE/!cpufreq-FIE
   + DT

This testing did not yet cover patch 2/2.

My checklist shows:
 - system invariance status correct - passed
 - scale factor correct (userspace cpufreq governor) - passed
 - arch_set_freq_scale bypassed - passed
 - partial "AMUs" support - failed (see below)
 - EAS enabling - passed

I don't have an automated process for this as many test cases involve
kernel source changes. In time I will automate all of these and
possibly cover all scenarios with FVP (fast models) testing, but for
now human error is possible :).

On Wednesday 17 Feb 2021 at 09:55:58 (+0530), Viresh Kumar wrote:
> On 17-02-21, 00:24, Ionela Voinescu wrote:

> > I think it could be merged in patch 1/2 as it's part of enabling the use

> > of multiple sources of information for FIE. Up to you!

> 

> Sure.

> 

> > >  static void amu_fie_setup(const struct cpumask *cpus)

> > >  {

> > > -	bool invariant;

> > >  	int cpu;

> > >  

> > >  	/* We are already set since the last insmod of cpufreq driver */

> > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)

> > >  

> > >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);

> > >  

> > > -	invariant = topology_scale_freq_invariant();

> > > -

> > > -	/* We aren't fully invariant yet */

> > > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

> > > -		return;

> > > -

> > 

> > You still need these checks, otherwise you could end up with only part

> > of the CPUs setting a scale factor, when only part of the CPUs support

> > AMUs and there is no cpufreq support for FIE.

> 

> Both supports_scale_freq_counters() and topology_scale_freq_invariant() take

> care of this now and they will keep reporting the system as invariant until the

> time all the CPUs have counters (in absence of cpufreq).

> 


Correct!

> The topology_set_scale_freq_source() API is supposed to be called multiple

> times, probably once for each policy and so I don't see a need of these checks

> anymore.

> 


The problem is not topology_scale_freq_invariant() but whether a scale
factor is set for some CPUs.

Scenario (test system above):
 - "AMUs" are only supported for [1-2],
 - cpufreq_supports_freq_invariance() -> false

What should happen:
 - topology_scale_freq_invariant() -> false (passed)
 - all CPUs should have their freq_scale unmodified (1024) - (failed)
   because only 2 out of 6 CPUs have a method of setting a scale factor

What does happen:
 - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale
   factor for [1-2] based on AMUs. This should not happen. We will end
   up with invariant signals for bigs and signals that are not freq
   invariant for littles.

Ionela.

> > Small(ish) optimisation at the beginning of this function:

> > 

> >     if (cpumask_empty(&scale_freq_counters_mask))

> >         scale_freq_invariant = topology_scale_freq_invariant();

> > 

> > This will save you a call to rebuild_sched_domains_energy(), which is

> > quite expensive, when cpufreq supports FIE and we also have counters.

> 

> Good Point.

>  

> > After comments addressed,

> > 

> > Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>

> 

> Thanks.

> 

> > Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>

> 

> 


> -- 

> viresh
Viresh Kumar Feb. 17, 2021, 11:40 a.m. | #6
On 17-02-21, 11:30, Ionela Voinescu wrote:
> The problem is not topology_scale_freq_invariant() but whether a scale

> factor is set for some CPUs.

> 

> Scenario (test system above):

>  - "AMUs" are only supported for [1-2],

>  - cpufreq_supports_freq_invariance() -> false

> 

> What should happen:

>  - topology_scale_freq_invariant() -> false (passed)

>  - all CPUs should have their freq_scale unmodified (1024) - (failed)

>    because only 2 out of 6 CPUs have a method of setting a scale factor

> 

> What does happen:

>  - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale

>    factor for [1-2] based on AMUs. This should not happen. We will end

>    up with invariant signals for bigs and signals that are not freq

>    invariant for littles.


Another case. cpufreq is included as a module and AMU is implemented
partially.

- first time cpufreq driver is inserted, we set up everything and
  freq_scale gets updated on ticks.

- remove cpufreq driver, we are back in same situation.

We can't control it that way.. Or we add another call layer in middle
before the tick-handler gets called for AMU, which will check if we
are fully invariant or not ?

-- 
viresh
Ionela Voinescu Feb. 17, 2021, 11:57 a.m. | #7
On Wednesday 17 Feb 2021 at 17:10:27 (+0530), Viresh Kumar wrote:
> On 17-02-21, 11:30, Ionela Voinescu wrote:

> > The problem is not topology_scale_freq_invariant() but whether a scale

> > factor is set for some CPUs.

> > 

> > Scenario (test system above):

> >  - "AMUs" are only supported for [1-2],

> >  - cpufreq_supports_freq_invariance() -> false

> > 

> > What should happen:

> >  - topology_scale_freq_invariant() -> false (passed)

> >  - all CPUs should have their freq_scale unmodified (1024) - (failed)

> >    because only 2 out of 6 CPUs have a method of setting a scale factor

> > 

> > What does happen:

> >  - arch_set_freq_tick() -> topology_set_freq_tick() will set a scale

> >    factor for [1-2] based on AMUs. This should not happen. We will end

> >    up with invariant signals for bigs and signals that are not freq

> >    invariant for littles.

> 

> Another case. cpufreq is included as a module and AMU is implemented

> partially.

> 

> - first time cpufreq driver is inserted, we set up everything and

>   freq_scale gets updated on ticks.

> 

> - remove cpufreq driver, we are back in same situation.

> 


Yes, but the littles (lacking AMUs) would have had a scale factor set
through arch_set_freq_scale() which will correspond to the last
frequency change through the cpufreq driver. When removing the driver,
it's unlikely that the frequency of littles will change (no driver).
 - topology_scale_freq_invariant() will still return true.
 - littles would still  have a scale factor set which is likely accurate
 - bigs will continue updating the scale factor through AMUs.

See a very useful comment someone added recently :) :

"""
+	/*
+	 * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU
+	 * counters don't have any dependency on cpufreq driver once we have
+	 * initialized AMU support and enabled invariance. The AMU counters will
+	 * keep on working just fine in the absence of the cpufreq driver, and
+	 * for the CPUs for which there are no counters available, the last set
+	 * value of freq_scale will remain valid as that is the frequency those
+	 * CPUs are running at.
+	 */
"""

> We can't control it that way.. Or we add another call layer in middle

> before the tick-handler gets called for AMU, which will check if we

> are fully invariant or not ?

> 


I would avoid additional work done on the tick, especially for a scenario which
is unlikely. If you think this case is worth supporting, it might be best to do
it at CPUFREQ_REMOVE_POLICY event.

Thanks,
Ionela.

> -- 

> viresh
Viresh Kumar Feb. 18, 2021, 7:23 a.m. | #8
On 17-02-21, 11:57, Ionela Voinescu wrote:
> See a very useful comment someone added recently :) :

> 

> """

> +	/*

> +	 * We don't need to handle CPUFREQ_REMOVE_POLICY event as the AMU

> +	 * counters don't have any dependency on cpufreq driver once we have

> +	 * initialized AMU support and enabled invariance. The AMU counters will

> +	 * keep on working just fine in the absence of the cpufreq driver, and

> +	 * for the CPUs for which there are no counters available, the last set

> +	 * value of freq_scale will remain valid as that is the frequency those

> +	 * CPUs are running at.

> +	 */

> """


Lol... 

-- 
viresh
Viresh Kumar Feb. 18, 2021, 9:33 a.m. | #9
On 17-02-21, 00:24, Ionela Voinescu wrote:
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c

> > index 1e47dfd465f8..47fca7376c93 100644

> > --- a/arch/arm64/kernel/topology.c

> > +++ b/arch/arm64/kernel/topology.c

> > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {

> >  

> >  static void amu_fie_setup(const struct cpumask *cpus)

> >  {

> > -	bool invariant;

> >  	int cpu;

> >  

> >  	/* We are already set since the last insmod of cpufreq driver */

> > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)

> >  

> >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);

> >  

> > -	invariant = topology_scale_freq_invariant();

> > -

> > -	/* We aren't fully invariant yet */

> > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

> > -		return;

> > -

> 

> You still need these checks, otherwise you could end up with only part

> of the CPUs setting a scale factor, when only part of the CPUs support

> AMUs and there is no cpufreq support for FIE.


Another look at it and here goes another reason (hope I don't have
another in-code comment somewhere else to kill this one) :)

We don't need to care for the reason you gave (which is a valid reason
otherwise), as we are talking specifically about amu_fie_setup() here
and it gets called from cpufreq policy-notifier. i.e. we won't support
AMUs without cpufreq being there in the first place and the same goes
for cppc-driver.

Does that sound reasonable ?

-- 
viresh
Ionela Voinescu Feb. 18, 2021, 4:36 p.m. | #10
Hey,

On Thursday 18 Feb 2021 at 15:03:04 (+0530), Viresh Kumar wrote:
> On 17-02-21, 00:24, Ionela Voinescu wrote:

> > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c

> > > index 1e47dfd465f8..47fca7376c93 100644

> > > --- a/arch/arm64/kernel/topology.c

> > > +++ b/arch/arm64/kernel/topology.c

> > > @@ -240,7 +240,6 @@ static struct scale_freq_data amu_sfd = {

> > >  

> > >  static void amu_fie_setup(const struct cpumask *cpus)

> > >  {

> > > -	bool invariant;

> > >  	int cpu;

> > >  

> > >  	/* We are already set since the last insmod of cpufreq driver */

> > > @@ -257,25 +256,10 @@ static void amu_fie_setup(const struct cpumask *cpus)

> > >  

> > >  	cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);

> > >  

> > > -	invariant = topology_scale_freq_invariant();

> > > -

> > > -	/* We aren't fully invariant yet */

> > > -	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))

> > > -		return;

> > > -

> > 

> > You still need these checks, otherwise you could end up with only part

> > of the CPUs setting a scale factor, when only part of the CPUs support

> > AMUs and there is no cpufreq support for FIE.

> 

> Another look at it and here goes another reason (hope I don't have

> another in-code comment somewhere else to kill this one) :)

> 

> We don't need to care for the reason you gave (which is a valid reason

> otherwise), as we are talking specifically about amu_fie_setup() here

> and it gets called from cpufreq policy-notifier. i.e. we won't support

> AMUs without cpufreq being there in the first place and the same goes

> for cppc-driver.

> 

> Does that sound reasonable ?

> 


Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't
get initialised either. But we do care if there is a cpufreq driver that
does not support frequency invariance, which is the example above.

The intention with the patches that made cpufreq based invariance generic
a while back was for it to be present, seamlessly, for as many drivers as
possible, as a less than accurate invariance default method is still
better than nothing. So only a few drivers today don't support cpufreq
based FI, but it's not a guarantee that it will stay this way.

Hope it makes sense,
Ionela.

> -- 

> viresh
Viresh Kumar Feb. 19, 2021, 4:58 a.m. | #11
On 18-02-21, 16:36, Ionela Voinescu wrote:
> Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't

> get initialised either. But we do care if there is a cpufreq driver that

> does not support frequency invariance, which is the example above.

> 

> The intention with the patches that made cpufreq based invariance generic

> a while back was for it to be present, seamlessly, for as many drivers as

> possible, as a less than accurate invariance default method is still

> better than nothing.


Right.

> So only a few drivers today don't support cpufreq based FI


Only two AFAICT, both x86, and the AMU stuff doesn't conflict with
them.

drivers/cpufreq/intel_pstate.c
drivers/cpufreq/longrun.c

> but it's not a guarantee that it will stay this way.


What do you mean by "no guarantee" here ?

The very core routines (cpufreq_freq_transition_end() and
cpufreq_driver_fast_switch()) of the cpufreq core call
arch_set_freq_scale() today and this isn't going to change anytime
soon. If something gets changed there someone will need to see other
parts of the kernel which may get broken with that.

I don't see any need of complicating other parts of the kernel like,
amu or cppc code for that. They should be kept simple and they should
assume cpufreq invariance will be supported as it is today.

-- 
viresh
Ionela Voinescu Feb. 19, 2021, 9:44 a.m. | #12
On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote:
> On 18-02-21, 16:36, Ionela Voinescu wrote:

> > Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't

> > get initialised either. But we do care if there is a cpufreq driver that

> > does not support frequency invariance, which is the example above.

> > 

> > The intention with the patches that made cpufreq based invariance generic

> > a while back was for it to be present, seamlessly, for as many drivers as

> > possible, as a less than accurate invariance default method is still

> > better than nothing.

> 

> Right.

> 

> > So only a few drivers today don't support cpufreq based FI

> 

> Only two AFAICT, both x86, and the AMU stuff doesn't conflict with

> them.

> 

> drivers/cpufreq/intel_pstate.c

> drivers/cpufreq/longrun.c

> 

> > but it's not a guarantee that it will stay this way.

> 

> What do you mean by "no guarantee" here ?

> 

> The very core routines (cpufreq_freq_transition_end() and

> cpufreq_driver_fast_switch()) of the cpufreq core call

> arch_set_freq_scale() today and this isn't going to change anytime

> soon. If something gets changed there someone will need to see other

> parts of the kernel which may get broken with that.

> 


Yes, but it won't really be straightforward to notice this breakage if
that happens, so in my opinion it was worth to keep that condition.

> I don't see any need of complicating other parts of the kernel like,

> amu or cppc code for that. They should be kept simple and they should

> assume cpufreq invariance will be supported as it is today.

> 


Fair enough! It is a corner case after all.

Thanks,
Ionela.

> -- 

> viresh
Viresh Kumar Feb. 19, 2021, 9:48 a.m. | #13
On 19-02-21, 09:44, Ionela Voinescu wrote:
> On Friday 19 Feb 2021 at 10:28:23 (+0530), Viresh Kumar wrote:

> > The very core routines (cpufreq_freq_transition_end() and

> > cpufreq_driver_fast_switch()) of the cpufreq core call

> > arch_set_freq_scale() today and this isn't going to change anytime

> > soon. If something gets changed there someone will need to see other

> > parts of the kernel which may get broken with that.

> > 

> 

> Yes, but it won't really be straightforward to notice this breakage if

> that happens, so in my opinion it was worth to keep that condition.


Right, but chances of that happening are close to zero right now. I
don't see any changes being made there in near future and so as we
agreed, lets leave it as is.

Btw, thanks for your feedback, it was indeed very valuable.

-- 
viresh

Patch

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 3b8dca4eb08d..ec2db3419c41 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -17,17 +17,9 @@  int pcibus_to_node(struct pci_bus *bus);
 #include <linux/arch_topology.h>
 
 void update_freq_counters_refs(void);
-void topology_scale_freq_tick(void);
-
-#ifdef CONFIG_ARM64_AMU_EXTN
-/*
- * Replace task scheduler's default counter-based
- * frequency-invariance scale factor setting.
- */
-#define arch_scale_freq_tick topology_scale_freq_tick
-#endif /* CONFIG_ARM64_AMU_EXTN */
 
 /* Replace task scheduler's default frequency-invariant accounting */
+#define arch_scale_freq_tick topology_scale_freq_tick
 #define arch_set_freq_scale topology_set_freq_scale
 #define arch_scale_freq_capacity topology_get_freq_scale
 #define arch_scale_freq_invariant topology_scale_freq_invariant
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index e08a4126453a..1e47dfd465f8 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -199,8 +199,44 @@  static int freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate)
 	return 0;
 }
 
-static DEFINE_STATIC_KEY_FALSE(amu_fie_key);
-#define amu_freq_invariant() static_branch_unlikely(&amu_fie_key)
+static void amu_scale_freq_tick(void)
+{
+	u64 prev_core_cnt, prev_const_cnt;
+	u64 core_cnt, const_cnt, scale;
+
+	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
+	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	update_freq_counters_refs();
+
+	const_cnt = this_cpu_read(arch_const_cycles_prev);
+	core_cnt = this_cpu_read(arch_core_cycles_prev);
+
+	if (unlikely(core_cnt <= prev_core_cnt ||
+		     const_cnt <= prev_const_cnt))
+		return;
+
+	/*
+	 *	    /\core    arch_max_freq_scale
+	 * scale =  ------- * --------------------
+	 *	    /\const   SCHED_CAPACITY_SCALE
+	 *
+	 * See validate_cpu_freq_invariance_counters() for details on
+	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
+	 */
+	scale = core_cnt - prev_core_cnt;
+	scale *= this_cpu_read(arch_max_freq_scale);
+	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
+			  const_cnt - prev_const_cnt);
+
+	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
+	this_cpu_write(freq_scale, (unsigned long)scale);
+}
+
+static struct scale_freq_data amu_sfd = {
+	.source = SCALE_FREQ_SOURCE_ARCH,
+	.set_freq_scale = amu_scale_freq_tick,
+};
 
 static void amu_fie_setup(const struct cpumask *cpus)
 {
@@ -227,7 +263,7 @@  static void amu_fie_setup(const struct cpumask *cpus)
 	if (!invariant && !cpumask_equal(amu_fie_cpus, cpu_present_mask))
 		return;
 
-	static_branch_enable(&amu_fie_key);
+	topology_set_scale_freq_source(&amu_sfd, amu_fie_cpus);
 
 	pr_debug("CPUs[%*pbl]: counters will be used for FIE.",
 		 cpumask_pr_args(cpus));
@@ -283,53 +319,6 @@  static int __init init_amu_fie(void)
 }
 core_initcall(init_amu_fie);
 
-bool arch_freq_counters_available(const struct cpumask *cpus)
-{
-	return amu_freq_invariant() &&
-	       cpumask_subset(cpus, amu_fie_cpus);
-}
-
-void topology_scale_freq_tick(void)
-{
-	u64 prev_core_cnt, prev_const_cnt;
-	u64 core_cnt, const_cnt, scale;
-	int cpu = smp_processor_id();
-
-	if (!amu_freq_invariant())
-		return;
-
-	if (!cpumask_test_cpu(cpu, amu_fie_cpus))
-		return;
-
-	prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
-	prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
-
-	update_freq_counters_refs();
-
-	const_cnt = this_cpu_read(arch_const_cycles_prev);
-	core_cnt = this_cpu_read(arch_core_cycles_prev);
-
-	if (unlikely(core_cnt <= prev_core_cnt ||
-		     const_cnt <= prev_const_cnt))
-		return;
-
-	/*
-	 *	    /\core    arch_max_freq_scale
-	 * scale =  ------- * --------------------
-	 *	    /\const   SCHED_CAPACITY_SCALE
-	 *
-	 * See validate_cpu_freq_invariance_counters() for details on
-	 * arch_max_freq_scale and the use of SCHED_CAPACITY_SHIFT.
-	 */
-	scale = core_cnt - prev_core_cnt;
-	scale *= this_cpu_read(arch_max_freq_scale);
-	scale = div64_u64(scale >> SCHED_CAPACITY_SHIFT,
-			  const_cnt - prev_const_cnt);
-
-	scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
-	this_cpu_write(freq_scale, (unsigned long)scale);
-}
-
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
 
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index de8587cc119e..20b511949cd8 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -21,17 +21,65 @@ 
 #include <linux/sched.h>
 #include <linux/smp.h>
 
+static DEFINE_PER_CPU(struct scale_freq_data *, sft_data);
+static struct cpumask scale_freq_counters_mask;
+
+static bool supports_scale_freq_counters(const struct cpumask *cpus)
+{
+	return cpumask_subset(cpus, &scale_freq_counters_mask);
+}
+
 bool topology_scale_freq_invariant(void)
 {
 	return cpufreq_supports_freq_invariance() ||
-	       arch_freq_counters_available(cpu_online_mask);
+	       supports_scale_freq_counters(cpu_online_mask);
+}
+
+void topology_set_scale_freq_source(struct scale_freq_data *data,
+				    const struct cpumask *cpus)
+{
+	struct scale_freq_data *sfd;
+	int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		sfd = per_cpu(sft_data, cpu);
+
+		/* Use ARCH provided counters whenever possible */
+		if (!sfd || sfd->source != SCALE_FREQ_SOURCE_ARCH) {
+			per_cpu(sft_data, cpu) = data;
+			cpumask_set_cpu(cpu, &scale_freq_counters_mask);
+		}
+	}
 }
+EXPORT_SYMBOL_GPL(topology_set_scale_freq_source);
 
-__weak bool arch_freq_counters_available(const struct cpumask *cpus)
+void topology_clear_scale_freq_source(enum scale_freq_source source,
+				      const struct cpumask *cpus)
 {
-	return false;
+	struct scale_freq_data *sfd;
+	int cpu;
+
+	for_each_cpu(cpu, cpus) {
+		sfd = per_cpu(sft_data, cpu);
+
+		if (sfd && sfd->source == source) {
+			per_cpu(sft_data, cpu) = NULL;
+			cpumask_clear_cpu(cpu, &scale_freq_counters_mask);
+		}
+	}
 }
+EXPORT_SYMBOL_GPL(topology_clear_scale_freq_source);
+
+void topology_scale_freq_tick(void)
+{
+	struct scale_freq_data *sfd = *this_cpu_ptr(&sft_data);
+
+	if (sfd)
+		sfd->set_freq_scale();
+}
+
 DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
+EXPORT_SYMBOL_GPL(freq_scale);
 
 void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 			     unsigned long max_freq)
@@ -47,7 +95,7 @@  void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 	 * want to update the scale factor with information from CPUFREQ.
 	 * Instead the scale factor will be updated from arch_scale_freq_tick.
 	 */
-	if (arch_freq_counters_available(cpus))
+	if (supports_scale_freq_counters(cpus))
 		return;
 
 	scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 0f6cd6b73a61..3bcfba5c21a7 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -34,7 +34,19 @@  void topology_set_freq_scale(const struct cpumask *cpus, unsigned long cur_freq,
 			     unsigned long max_freq);
 bool topology_scale_freq_invariant(void);
 
-bool arch_freq_counters_available(const struct cpumask *cpus);
+enum scale_freq_source {
+	SCALE_FREQ_SOURCE_CPUFREQ = 0,
+	SCALE_FREQ_SOURCE_ARCH,
+};
+
+struct scale_freq_data {
+	enum scale_freq_source source;
+	void (*set_freq_scale)(void);
+};
+
+void topology_scale_freq_tick(void);
+void topology_set_scale_freq_source(struct scale_freq_data *data, const struct cpumask *cpus);
+void topology_clear_scale_freq_source(enum scale_freq_source source, const struct cpumask *cpus);
 
 DECLARE_PER_CPU(unsigned long, thermal_pressure);