Message ID | cover.1624266901.git.viresh.kumar@linaro.org |
---|---|
Headers | show |
Series | cpufreq: cppc: Add support for frequency invariance | expand |
On 6/21/2021 5:19 AM, Viresh Kumar wrote: > CPPC cpufreq driver is used for ARM servers and this patch series tries to > provide counter-based frequency invariance support for them in the absence for > architecture specific counters (like AMUs). Viresh, this series works fine on my quick tests so far. BTW, I noticed some strange things even with the series applied mentioned below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know are those just hardware/firmware issues because Linux just faithfully exported the register values? == Arm64 server Foo == CPU max MHz: 3000.0000 CPU min MHz: 1000.0000 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf 300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 1000 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 200 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000? 2800 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300? 280 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 == Arm64 server Bar == CPU max MHz: 3000.0000 CPU min MHz: 375.0000 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost. 3300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0. 0 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 375 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 375 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto 0 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf 3000 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100
On 21-06-21, 16:48, Qian Cai wrote: > Viresh, this series works fine on my quick tests so far. Thanks for testing. > BTW, I > noticed some strange things even with the series applied mentioned > below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know > are those just hardware/firmware issues because Linux just > faithfully exported the register values? The values are exported by drivers/acpi/cppc_acpi.c I believe and they look to be based on simple register reads. > == Arm64 server Foo == > CPU max MHz: 3000.0000 > CPU min MHz: 1000.0000 > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 200 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000? > 2800 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300? > 280 nominal-perf is max perf, and highest-perf is boost-perf. Same goes with nominal-freq (i.e. policy->max). So 280 and 2800 look to be the correct values, 300 and 3000 come with boost enabled. Look at the first entry, highest_perf. > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > > == Arm64 server Bar == > CPU max MHz: 3000.0000 > CPU min MHz: 375.0000 > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost. > 3300 This isn't exported by cpufreq driver but acpi, and it just exports hardware values of highest_perf (with boost i.e.). cpufreq may or may not use this to support boost. > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0. > 0 Because corresponding hardware registers aren't implemented for your platform, this is the function that reads these registers: int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps) { ... /* Read optional lowest and nominal frequencies if present */ if (CPC_SUPPORTED(low_freq_reg)) cpc_read(cpunum, low_freq_reg, &low_f); if (CPC_SUPPORTED(nom_freq_reg)) cpc_read(cpunum, nom_freq_reg, &nom_f); perf_caps->lowest_freq = low_f; perf_caps->nominal_freq = nom_f; } > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 375 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 375 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto > 0 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf > 3000 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 -- viresh
On 21-06-21, 16:48, Qian Cai wrote: > > > On 6/21/2021 5:19 AM, Viresh Kumar wrote: > > CPPC cpufreq driver is used for ARM servers and this patch series tries to > > provide counter-based frequency invariance support for them in the absence for > > architecture specific counters (like AMUs). > > Viresh, this series works fine on my quick tests so far. Do you want me to add your Tested-by for the series ?
On 6/23/2021 12:16 AM, Viresh Kumar wrote: > On 21-06-21, 16:48, Qian Cai wrote: >> >> >> On 6/21/2021 5:19 AM, Viresh Kumar wrote: >>> CPPC cpufreq driver is used for ARM servers and this patch series tries to >>> provide counter-based frequency invariance support for them in the absence for >>> architecture specific counters (like AMUs). >> >> Viresh, this series works fine on my quick tests so far. > > Do you want me to add your Tested-by for the series ? Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in development, and will provide an update once ready. Also, I noticed the delivered perf is even smaller than lowest_perf (100). # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs ref:103377547901 del:54540736873 # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs ref:103379170101 del:54541599117 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 My understanding is that the delivered perf should fail into the range between lowest_perf and highest_perf. Is that assumption correct? This happens on 5.4-based kernel, so I am in process running your series on that system to see if there is any differences. In any case, if it is a bug it is pre-existing, but I'd like to understand a bit better in that front first.
On 23-06-21, 08:57, Qian Cai wrote: > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > development, and will provide an update once ready. Oh sure, np. > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > ref:103377547901 del:54540736873 > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > ref:103379170101 del:54541599117 > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > My understanding is that the delivered perf should fail into the range between > lowest_perf and highest_perf. Is that assumption correct? This happens on > 5.4-based kernel, so I am in process running your series on that system to see > if there is any differences. In any case, if it is a bug it is pre-existing, > but I'd like to understand a bit better in that front first. Vincent: Can that happen because of CPU idle ? -- viresh
On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 23-06-21, 08:57, Qian Cai wrote: > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > development, and will provide an update once ready. > > Oh sure, np. > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > ref:103377547901 del:54540736873 > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > ref:103379170101 del:54541599117 > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 I'm not sure that I understand your point. The formula above says that cpu8 run @ 53% of nominal performance > > > > My understanding is that the delivered perf should fail into the range between > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > 5.4-based kernel, so I am in process running your series on that system to see > > if there is any differences. In any case, if it is a bug it is pre-existing, > > but I'd like to understand a bit better in that front first. > > Vincent: > > Can that happen because of CPU idle ? > > -- > viresh
On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > Hi guys, > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > > development, and will provide an update once ready. > > > > > > Oh sure, np. > > > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > ref:103377547901 del:54540736873 > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > ref:103379170101 del:54541599117 > > > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > > > I'm not sure that I understand your point. The formula above says that > > cpu8 run @ 53% of nominal performance > > > > I think this is based on a previous example Qian had where: > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > > ..so the 100 is not from obtaining percentage, is the reference > performance. > > The logic of the formula is to obtain the delivered performance when > knowing the number of ticks for each counter, so: > > So if one gets (103379170101 - 103377547901) ticks for the counter at > running at 1GHz(perf 100), what is the frequency of the core, if its > counter ticked (54541599117 - 54540736873) times in the same interval > of time? > > The answer is 530MHz(perf 53), which is lower than the lowest frequency > at 1GHz(perf 100). But the nominal_perf is 280 and not 100 if i'm not wrong so the perf value is 148 > lowest_perf in this case > > > > > > > > > > My understanding is that the delivered perf should fail into the range between > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > > 5.4-based kernel, so I am in process running your series on that system to see > > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > > but I'd like to understand a bit better in that front first. > > > > > > Vincent: > > > > > > Can that happen because of CPU idle ? > > > > > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? > > Thanks, > Ionela. > > > > -- > > > viresh
On Thu, 24 Jun 2021 at 13:23, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > On Thursday 24 Jun 2021 at 13:15:04 (+0200), Vincent Guittot wrote: > > On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > Hi guys, > > > > > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > > > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > > > > development, and will provide an update once ready. > > > > > > > > > > Oh sure, np. > > > > > > > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > > ref:103377547901 del:54540736873 > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > > ref:103379170101 del:54541599117 > > > > > > > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > > > > > > > I'm not sure that I understand your point. The formula above says that > > > > cpu8 run @ 53% of nominal performance > > > > > > > > > > I think this is based on a previous example Qian had where: > > > > > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > > > 300 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > > > 1000 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > > > 100 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > > > 100 > > > > > > ..so the 100 is not from obtaining percentage, is the reference > > > performance. > > > > > > The logic of the formula is to obtain the delivered performance when > > > knowing the number of ticks for each counter, so: > > > > > > So if one gets (103379170101 - 103377547901) ticks for the counter at > > > running at 1GHz(perf 100), what is the frequency of the core, if its > > > counter ticked (54541599117 - 54540736873) times in the same interval > > > of time? > > > > > > The answer is 530MHz(perf 53), which is lower than the lowest frequency > > > at 1GHz(perf 100). > > > > But the nominal_perf is 280 and not 100 if i'm not wrong so the perf > > value is 148 > lowest_perf in this case > > > > Nominal performance has no meaning here. The reference counter ticks > with the frequency equivalent to reference performance. > > Nominal performance is the maximum performance when !boost. Highest > performance is the maximum performance available including boost > frequencies. So nominal performance has no impact in these translations > from counter values to delivered performance. my bad, nominal_perf == reference_perf on the systems that I have locally > > Hope it helps, > Ionela. > > > > > > > > > > > > > > > > > > > > > My understanding is that the delivered perf should fail into the range between > > > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > > > > 5.4-based kernel, so I am in process running your series on that system to see > > > > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > > > > but I'd like to understand a bit better in that front first. > > > > > > > > > > Vincent: > > > > > > > > > > Can that happen because of CPU idle ? > > > > > > > > > > > Not if the counters are implemented properly. The kernel considers that > > > both reference and delivered performance counters should stop or reset > > > during idle. The kernel would not account for idle itself. > > > > > > If the reference performance counter does not stop during idle, while > > > the core performance counter (delivered) does stop, the behavior above > > > should be seen very often. > > > > > > Qian, do you see these small delivered performance values often or > > > seldom? > > > > > > Thanks, > > > Ionela. > > > > > > > > -- > > > > > viresh
On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? Ionela, so I managed to upgrade the kernel on the system to today's linux-next which suppose to include this series. The delivered perf is now 280. However, scaling_min_freq (200 MHz) is not equal to lowest_perf (100). scaling_driver: acpi_cppc scaling_governor: schedutil Is that normal because lowest_nonlinear_perf is 200? Also, on this pretty idle system, 158 of 160 CPUs are always running in max freq (280 MHz). The other 2 are running in 243 and 213 MHz according to scaling_cur_freq. Apparently, "schedutil" does not work proper on this system. I am going to try other governors to narrow down the issue a bit. FYI, here is the acpi_cppc registers reading: /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs ref:160705801 del:449594095 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf 300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 1000 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 200 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq 2800 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf 280 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time 18446744073709551615
On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? FYI, the latest data point it that on the new kernel, the delivered performance does match the cpufreq_cur_freq. IOW, feedback_ctrs works fine. Also, "powersave" governor could bring down the scaling_cur_freq to scaling_min_freq. Right now, looks like the puzzles on this particular system as mentioned in the previous post are, 1) lowest_freq/lowest_perf != scaling_min_freq 2) CPPC + schedutil is not able to scale down CPUs.
Hi Qian, On Thursday 24 Jun 2021 at 11:17:55 (-0400), Qian Cai wrote: > > > On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > > Not if the counters are implemented properly. The kernel considers that > > both reference and delivered performance counters should stop or reset > > during idle. The kernel would not account for idle itself. > > > > If the reference performance counter does not stop during idle, while > > the core performance counter (delivered) does stop, the behavior above > > should be seen very often. > > > > Qian, do you see these small delivered performance values often or > > seldom? > > Ionela, so I managed to upgrade the kernel on the system to today's > linux-next which suppose to include this series. The delivered perf > is now 280. However, scaling_min_freq (200 MHz) is not equal to > lowest_perf (100). > > scaling_driver: acpi_cppc ^^^^^^^^^ I suppose you mean "cppc-cpufreq"? "acpi_cppc" is not a scaling driver option. > scaling_governor: schedutil > > Is that normal because lowest_nonlinear_perf is 200? > Yes, that is right : [1] > Also, on this pretty idle system, 158 of 160 CPUs are always running > in max freq (280 MHz). The other 2 are running in 243 and 213 MHz > according to scaling_cur_freq. Apparently, "schedutil" does not work > proper on this system. I am going to try other governors to narrow > down the issue a bit. So your CPUs run at frequencies between 200MHz and 280MHz? Based on your acpi_cppc information below I would have assumed 2GHz as lowest nonlinear and 2.8GHz as nominal. The reason for this is that according to the ACPI spec the frequency values in the _CPC objects are supposed to be in MHz, so 2800 MHz for nominal frequency would be 2.8GHz. When you try more governors, make sure to check out the difference between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives you the frequency that the governor (schedutil) is asking for, while the second is giving you the current frequency obtained from the counters. So to check the actual frequency the cores are running at, please check cpuinfo_cur_freq. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/cppc_cpufreq.c?h=v5.13-rc7#n296 [2] https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt Thanks, Ionela. > > FYI, here is the acpi_cppc registers reading: > > /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs > ref:160705801 del:449594095 > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 200 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq > 2800 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf > 280 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time > 18446744073709551615
On 6/25/2021 6:21 AM, Ionela Voinescu wrote: >> scaling_driver: acpi_cppc > ^^^^^^^^^ > I suppose you mean "cppc-cpufreq"? > > "acpi_cppc" is not a scaling driver option. Ionela, yes. Sorry about that. > So your CPUs run at frequencies between 200MHz and 280MHz? 2000 to 2800 MHz. > Based on your acpi_cppc information below I would have assumed 2GHz as > lowest nonlinear and 2.8GHz as nominal. The reason for this is that > according to the ACPI spec the frequency values in the _CPC objects are > supposed to be in MHz, so 2800 MHz for nominal frequency would be > 2.8GHz. > > When you try more governors, make sure to check out the difference > between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives > you the frequency that the governor (schedutil) is asking for, while the > second is giving you the current frequency obtained from the counters. > > So to check the actual frequency the cores are running at, please check > cpuinfo_cur_freq. The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle system. This looks like a regression somewhere as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same scenario. I'll bisect a bit unless you have better ideas?
Hey, On Friday 25 Jun 2021 at 09:31:58 (-0400), Qian Cai wrote: > > > On 6/25/2021 6:21 AM, Ionela Voinescu wrote: > >> scaling_driver: acpi_cppc > > ^^^^^^^^^ > > I suppose you mean "cppc-cpufreq"? > > > > "acpi_cppc" is not a scaling driver option. > > Ionela, yes. Sorry about that. > > > So your CPUs run at frequencies between 200MHz and 280MHz? > > 2000 to 2800 MHz. > Thank you for the clarification. > > Based on your acpi_cppc information below I would have assumed 2GHz as > > lowest nonlinear and 2.8GHz as nominal. The reason for this is that > > according to the ACPI spec the frequency values in the _CPC objects are > > supposed to be in MHz, so 2800 MHz for nominal frequency would be > > 2.8GHz. > > > > When you try more governors, make sure to check out the difference > > between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives > > you the frequency that the governor (schedutil) is asking for, while the > > second is giving you the current frequency obtained from the counters. > > > > So to check the actual frequency the cores are running at, please check > > cpuinfo_cur_freq. > > The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq" > and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle > system. This looks like a regression somewhere as in 5.4-based kernel, > I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same > scenario. I'll bisect a bit unless you have better ideas? Quick questions for you: 1. When you say you tried a 5.4 kernel, did you try it with these patches backported? They also have some dependencies with the recent changes in the arch topology driver and cpufreq so they would not be straight forward to backport. If the 5.4 kernel you tried did not have these patches, it might be best to try next/master that has these patches, but with CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that an incorrect frequency scale factor here would affect utilization that would then affect the schedutil frequency selection. I would not expect this behavior even if the scale factor was wrong, but it would be good to rule out. 2. Is your platform booting with all CPUs? Are any hotplug operations done in your scenario? Thanks, Ionela.
On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > Quick questions for you: > > 1. When you say you tried a 5.4 kernel, did you try it with these > patches backported? They also have some dependencies with the recent > changes in the arch topology driver and cpufreq so they would not be > straight forward to backport. No. It turned out that this 5.4-based kernel has "ondemand" governor by default which works fine which could even scale down to the lowest_perf (1000 MHz). Once switched the governor to "schedutil", it could keep the freq to the lowest. While on the latest kernel, it also works fine by using "ondemand" first and then switch to "schedutil". Even though it can only scale down to lowest_nonlinear_perf (2000 MHz). It is more of that using "schedutil" by default would not work. Also, on the latest kernel, even "userspace" governor only allows to scale down to 2000 MHz. > If the 5.4 kernel you tried did not have these patches, it might be best > to try next/master that has these patches, but with > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > an incorrect frequency scale factor here would affect utilization that > would then affect the schedutil frequency selection. I would not expect > this behavior even if the scale factor was wrong, but it would be good > to rule out. I'll try that at least see if CONFIG_ACPI_CPPC_CPUFREQ_FIE=n would make the latest kernel to be able to scale down to 1000 MHz. > 2. Is your platform booting with all CPUs? Are any hotplug operations > done in your scenario? Yes, booting with all CPUs. No additional hotplug there.
On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > Quick questions for you: > > 1. When you say you tried a 5.4 kernel, did you try it with these > patches backported? They also have some dependencies with the recent > changes in the arch topology driver and cpufreq so they would not be > straight forward to backport. > > If the 5.4 kernel you tried did not have these patches, it might be best > to try next/master that has these patches, but with > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > an incorrect frequency scale factor here would affect utilization that > would then affect the schedutil frequency selection. I would not expect > this behavior even if the scale factor was wrong, but it would be good > to rule out. > > 2. Is your platform booting with all CPUs? Are any hotplug operations > done in your scenario? Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 79 1000000 1 1160000 73 1400000 1 2000000 4 2010000 1 2800000 1 860000 Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will. # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 156 1000000 3 2000000 1 760000
On 6/25/2021 10:29 PM, Qian Cai wrote: > Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system: > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will. > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000 Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq. schedutil: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 80 1000000 78 1400000 1 2010000 1 2800000 userspace: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 158 1000000 2 2000000
Hi guys, On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > Hello, > > Changes since V2: > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > work using policy ->init() and exit() alone. > > - Two new cleanup patches 1/4 and 2/4. > > - Improved commit log of 3/4. > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > overlap (seen with Vincent's setup). > If you happen to have the data around, I would like to know more about your observations on ThunderX2. I tried ThunderX2 as well, with the following observations: Booting with userspace governor and all CPUs online, the CPPC frequency scale factor was all over the place (even much larger than 1024). My initial assumptions: - Counters do not behave properly in light of SMT - Firmware does not do a good job to keep the reference and core counters monotonic: save and restore at core off. So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of a single core (part of policy0). With this all works very well: root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1863.095370] CPU96: cppc scale: 697. [ 1863.175370] CPU0: cppc scale: 492. [ 1863.215367] CPU64: cppc scale: 492. [ 1863.235366] CPU96: cppc scale: 492. [ 1863.485368] CPU32: cppc scale: 492. root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1891.395363] CPU96: cppc scale: 558. [ 1891.415362] CPU0: cppc scale: 595. [ 1891.435362] CPU32: cppc scale: 615. [ 1891.465363] CPU96: cppc scale: 635. [ 1891.495361] CPU0: cppc scale: 673. [ 1891.515360] CPU32: cppc scale: 703. [ 1891.545360] CPU96: cppc scale: 738. [ 1891.575360] CPU0: cppc scale: 779. [ 1891.605360] CPU96: cppc scale: 829. [ 1891.635360] CPU0: cppc scale: 879. root@target:/sys/devices/system/cpu/cpufreq/policy0# root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1896.585363] CPU32: cppc scale: 1004. [ 1896.675359] CPU64: cppc scale: 973. [ 1896.715359] CPU0: cppc scale: 1024. I'm doing a rate limited printk only for increase/decrease values over 64 in the scale factor value. This showed me that SMT is handled properly. Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor stops being even close to correct, for example: [238394.770328] CPU96: cppc scale: 22328. [238395.628846] CPU96: cppc scale: 245. [238516.087115] CPU96: cppc scale: 930. [238523.385009] CPU96: cppc scale: 245. [238538.767473] CPU96: cppc scale: 936. [238538.867546] CPU96: cppc scale: 245. [238599.367932] CPU97: cppc scale: 2728. [238599.859865] CPU97: cppc scale: 452. [238647.786284] CPU96: cppc scale: 1438. [238669.604684] CPU96: cppc scale: 27306. [238676.805049] CPU96: cppc scale: 245. [238737.642902] CPU97: cppc scale: 2035. [238737.664995] CPU97: cppc scale: 452. [238788.066193] CPU96: cppc scale: 2749. [238788.110192] CPU96: cppc scale: 245. [238817.231659] CPU96: cppc scale: 2698. [238818.083687] CPU96: cppc scale: 245. [238845.466850] CPU97: cppc scale: 2990. [238847.477805] CPU97: cppc scale: 452. [238936.984107] CPU97: cppc scale: 1590. [238937.029079] CPU97: cppc scale: 452. [238979.052464] CPU97: cppc scale: 911. [238980.900668] CPU97: cppc scale: 452. [239149.587889] CPU96: cppc scale: 803. [239151.085516] CPU96: cppc scale: 245. [239303.871373] CPU64: cppc scale: 956. [239303.906837] CPU64: cppc scale: 245. [239308.666786] CPU96: cppc scale: 821. [239319.440634] CPU96: cppc scale: 245. [239389.978395] CPU97: cppc scale: 4229. [239391.969562] CPU97: cppc scale: 452. [239415.894738] CPU96: cppc scale: 630. [239417.875326] CPU96: cppc scale: 245. The counter values shown by feedback_ctrs do not seem monotonic even when only core 0 threads are online. ref:2812420736 del:166051103 ref:3683620736 del:641578595 ref:1049653440 del:1548202980 ref:2099053440 del:2120997459 ref:3185853440 del:2714205997 ref:712486144 del:3708490753 ref:3658438336 del:3401357212 ref:1570998080 del:2279728438 For now I was just wondering if you have seen the same and whether you have an opinion on this. > This is tested on my Hikey platform (without the actual read/write to > performance counters), with this script for over an hour: > > while true; do > for i in `seq 1 7`; > do > echo 0 > /sys/devices/system/cpu/cpu$i/online; > done; > > for i in `seq 1 7`; > do > echo 1 > /sys/devices/system/cpu/cpu$i/online; > done; > done > > > The same is done by Vincent on ThunderX2 and no issues were seen. Hotplug worked fine for me as well on both platforms I tested (Juno R2 and ThunderX2). Thanks, Ionela.
On Mon, 28 Jun 2021 at 14:14, Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi guys, > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > Hello, > > > > > > Changes since V2: > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > work using policy ->init() and exit() alone. > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > - Improved commit log of 3/4. > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > overlap (seen with Vincent's setup). > > > > > > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > With the counter being 32bits and the freq scaling being update at > tick, you can easily get a overflow on it in idle system. I can easily > imagine that when you unplug CPUs there is enough activity on the CPU > to update it regularly whereas with all CPUs, the idle time is longer > that the counter overflow > > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 There are 32bits and the overflow need to be handled by cppc_cpufreq driver > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > > > > This is tested on my Hikey platform (without the actual read/write to > > > performance counters), with this script for over an hour: > > > > > > while true; do > > > for i in `seq 1 7`; > > > do > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > > > > for i in `seq 1 7`; > > > do > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > done > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > and ThunderX2). > > > > Thanks, > > Ionela.
On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi guys, > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > Hello, > > > > > > Changes since V2: > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > work using policy ->init() and exit() alone. > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > - Improved commit log of 3/4. > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > overlap (seen with Vincent's setup). > > > > > > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > With the counter being 32bits and the freq scaling being update at > tick, you can easily get a overflow on it in idle system. I can easily > imagine that when you unplug CPUs there is enough activity on the CPU > to update it regularly whereas with all CPUs, the idle time is longer > that the counter overflow > Thanks! Yes, given the high wraparound time I thought they were 64 bit. All variables in software are 64 bit, but looking at bit width in the _CPC entries, the platform counters are 32 bit counters. > There are 32bits and the overflow need to be handled by cppc_cpufreq > driver I'm wondering if this would be best handled in the function that reads the counters or in the cppc_cpufreq driver that uses them. Probably the latter, as you say, as the read function should only return the raw values, but it does complicate things. Thanks, Ionela. > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > > > > This is tested on my Hikey platform (without the actual read/write to > > > performance counters), with this script for over an hour: > > > > > > while true; do > > > for i in `seq 1 7`; > > > do > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > > > > for i in `seq 1 7`; > > > do > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > done > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > and ThunderX2). > > > > Thanks, > > Ionela.
On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote: > On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > Hi guys, > > > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > > Hello, > > > > > > > > Changes since V2: > > > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > > work using policy ->init() and exit() alone. > > > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > > > - Improved commit log of 3/4. > > > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > > overlap (seen with Vincent's setup). > > > > > > > > > > If you happen to have the data around, I would like to know more about > > > your observations on ThunderX2. > > > > > > > > > I tried ThunderX2 as well, with the following observations: > > > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > > scale factor was all over the place (even much larger than 1024). > > > > > > My initial assumptions: > > > - Counters do not behave properly in light of SMT > > > - Firmware does not do a good job to keep the reference and core > > > counters monotonic: save and restore at core off. > > > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > > a single core (part of policy0). With this all works very well: > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1863.095370] CPU96: cppc scale: 697. > > > [ 1863.175370] CPU0: cppc scale: 492. > > > [ 1863.215367] CPU64: cppc scale: 492. > > > [ 1863.235366] CPU96: cppc scale: 492. > > > [ 1863.485368] CPU32: cppc scale: 492. > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1891.395363] CPU96: cppc scale: 558. > > > [ 1891.415362] CPU0: cppc scale: 595. > > > [ 1891.435362] CPU32: cppc scale: 615. > > > [ 1891.465363] CPU96: cppc scale: 635. > > > [ 1891.495361] CPU0: cppc scale: 673. > > > [ 1891.515360] CPU32: cppc scale: 703. > > > [ 1891.545360] CPU96: cppc scale: 738. > > > [ 1891.575360] CPU0: cppc scale: 779. > > > [ 1891.605360] CPU96: cppc scale: 829. > > > [ 1891.635360] CPU0: cppc scale: 879. > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1896.585363] CPU32: cppc scale: 1004. > > > [ 1896.675359] CPU64: cppc scale: 973. > > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > > > I'm doing a rate limited printk only for increase/decrease values over > > > 64 in the scale factor value. > > > > > > This showed me that SMT is handled properly. > > > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > > stops being even close to correct, for example: > > > > > > [238394.770328] CPU96: cppc scale: 22328. > > > [238395.628846] CPU96: cppc scale: 245. > > > [238516.087115] CPU96: cppc scale: 930. > > > [238523.385009] CPU96: cppc scale: 245. > > > [238538.767473] CPU96: cppc scale: 936. > > > [238538.867546] CPU96: cppc scale: 245. > > > [238599.367932] CPU97: cppc scale: 2728. > > > [238599.859865] CPU97: cppc scale: 452. > > > [238647.786284] CPU96: cppc scale: 1438. > > > [238669.604684] CPU96: cppc scale: 27306. > > > [238676.805049] CPU96: cppc scale: 245. > > > [238737.642902] CPU97: cppc scale: 2035. > > > [238737.664995] CPU97: cppc scale: 452. > > > [238788.066193] CPU96: cppc scale: 2749. > > > [238788.110192] CPU96: cppc scale: 245. > > > [238817.231659] CPU96: cppc scale: 2698. > > > [238818.083687] CPU96: cppc scale: 245. > > > [238845.466850] CPU97: cppc scale: 2990. > > > [238847.477805] CPU97: cppc scale: 452. > > > [238936.984107] CPU97: cppc scale: 1590. > > > [238937.029079] CPU97: cppc scale: 452. > > > [238979.052464] CPU97: cppc scale: 911. > > > [238980.900668] CPU97: cppc scale: 452. > > > [239149.587889] CPU96: cppc scale: 803. > > > [239151.085516] CPU96: cppc scale: 245. > > > [239303.871373] CPU64: cppc scale: 956. > > > [239303.906837] CPU64: cppc scale: 245. > > > [239308.666786] CPU96: cppc scale: 821. > > > [239319.440634] CPU96: cppc scale: 245. > > > [239389.978395] CPU97: cppc scale: 4229. > > > [239391.969562] CPU97: cppc scale: 452. > > > [239415.894738] CPU96: cppc scale: 630. > > > [239417.875326] CPU96: cppc scale: 245. > > > > > > > With the counter being 32bits and the freq scaling being update at > > tick, you can easily get a overflow on it in idle system. I can easily > > imagine that when you unplug CPUs there is enough activity on the CPU > > to update it regularly whereas with all CPUs, the idle time is longer > > that the counter overflow For sane counters, how long the CPU is idle should not matter (please see below my definition of sane counters). > > > > Thanks! Yes, given the high wraparound time I thought they were 64 bit. > All variables in software are 64 bit, but looking at bit width in the > _CPC entries, the platform counters are 32 bit counters. > I've looked a bit more over the code, and considering this particular system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe the wraparound is considered, and this should not cause these strange values in the scale factor. I consider the counters sane if both stop during idle - either they stop when CPU is clock gated, or some firmware does save/restore at core off. Therefore, in all idle cases they seem to have stopped, from the point of view of the OS. The ACPI spec mentions that both count "any time the processor is active". After the cores return from idle, the counters will wraparound at a minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would have happened which allows the getDelta() function in cppc_cpufreq driver to get the proper difference in values. Let me know what you think. Thanks, Ionela. > > There are 32bits and the overflow need to be handled by cppc_cpufreq > > driver > > I'm wondering if this would be best handled in the function that reads > the counters or in the cppc_cpufreq driver that uses them. Probably the > latter, as you say, as the read function should only return the raw > values, but it does complicate things. > > Thanks, > Ionela. > > > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > > when only core 0 threads are online. > > > > > > ref:2812420736 del:166051103 > > > ref:3683620736 del:641578595 > > > ref:1049653440 del:1548202980 > > > ref:2099053440 del:2120997459 > > > ref:3185853440 del:2714205997 > > > ref:712486144 del:3708490753 > > > ref:3658438336 del:3401357212 > > > ref:1570998080 del:2279728438 > > > > > > For now I was just wondering if you have seen the same and whether you > > > have an opinion on this. > > > > > > > This is tested on my Hikey platform (without the actual read/write to > > > > performance counters), with this script for over an hour: > > > > > > > > while true; do > > > > for i in `seq 1 7`; > > > > do > > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > > done; > > > > > > > > for i in `seq 1 7`; > > > > do > > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > > done; > > > > done > > > > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > > and ThunderX2). > > > > > > Thanks, > > > Ionela.
On 25-06-21, 09:31, Qian Cai wrote: > The problem is that all CPUs are never scaling down. > "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on > all CPUs on this idle system. This looks like a regression somewhere > as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to > 2000 MHz in the same scenario. I'll bisect a bit unless you have > better ideas? Few things which may let us understand the readings properly. - cpuinfo_cur_freq: eventually makes a call to cppc_cpufreq_get_rate() and returns the *actual* frequency hardware is running at (based on counter diff around 2 us delay). - scaling_cur_freq: is the frequency the cpufreq core thinks the hardware is running at, it would more in sync with what schedutil (or other governors) wants the CPU to run at. This can be different from what the hardware is running at, i.e. given by cpuinfo_cur_freq. -- viresh
On 25-06-21, 22:29, Qian Cai wrote: > Ionela, I found that set ACPI_PROCESSOR=y instead of > ACPI_PROCESSOR=m will fix the previous mentioned issues here (any > explanations of that?) even though the scaling down is not perfect. Not sure how this affects it. > Now, we have the following on this idle system: > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs > running lower than lowest_perf (1GHz). (Please wrap your lines at 80 columns, it makes it harder to read otherwise). I think only the counters stopping on idle can get us that. > Also, even though I set all CPUs to use "userspace" governor and set > freq to the lowest. A few CPUs keep changing at will. > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000 I think this is expected since the hardware is in control of frequency here. The software can only request it to run at X frequency, the hardware may choose to do something else nevertheless.
On 26-06-21, 09:41, Qian Cai wrote: > Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq. > > schedutil: > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 80 1000000 > 78 1400000 > 1 2010000 > 1 2800000 > > userspace: > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 158 1000000 > 2 2000000 ACPI_CPPC_CPUFREQ_FIE can play a role with schedutil, but not with userspace governor. Userspace doesn't use the values being updated by ACPI_CPPC_CPUFREQ_FIE. So I think the CPUs may not have been idle, just for some reason. Also, now that you are able run on latest kernel (linux-next), it would be better if we can talk in terms of that only going forward. 5.4 adds more to the already unstable results :)
On 28-06-21, 12:54, Ionela Voinescu wrote: > If you happen to have the data around, I would like to know more about > your observations on ThunderX2. > > > I tried ThunderX2 as well, with the following observations: > > Booting with userspace governor and all CPUs online, the CPPC frequency > scale factor was all over the place (even much larger than 1024). > > My initial assumptions: > - Counters do not behave properly in light of SMT > - Firmware does not do a good job to keep the reference and core > counters monotonic: save and restore at core off. > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > a single core (part of policy0). With this all works very well: Interesting. > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1863.095370] CPU96: cppc scale: 697. > [ 1863.175370] CPU0: cppc scale: 492. > [ 1863.215367] CPU64: cppc scale: 492. > [ 1863.235366] CPU96: cppc scale: 492. > [ 1863.485368] CPU32: cppc scale: 492. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1891.395363] CPU96: cppc scale: 558. > [ 1891.415362] CPU0: cppc scale: 595. > [ 1891.435362] CPU32: cppc scale: 615. > [ 1891.465363] CPU96: cppc scale: 635. > [ 1891.495361] CPU0: cppc scale: 673. > [ 1891.515360] CPU32: cppc scale: 703. > [ 1891.545360] CPU96: cppc scale: 738. > [ 1891.575360] CPU0: cppc scale: 779. > [ 1891.605360] CPU96: cppc scale: 829. > [ 1891.635360] CPU0: cppc scale: 879. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1896.585363] CPU32: cppc scale: 1004. > [ 1896.675359] CPU64: cppc scale: 973. > [ 1896.715359] CPU0: cppc scale: 1024. > > I'm doing a rate limited printk only for increase/decrease values over > 64 in the scale factor value. > > This showed me that SMT is handled properly. > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > stops being even close to correct, for example: > > [238394.770328] CPU96: cppc scale: 22328. > [238395.628846] CPU96: cppc scale: 245. > [238516.087115] CPU96: cppc scale: 930. > [238523.385009] CPU96: cppc scale: 245. > [238538.767473] CPU96: cppc scale: 936. > [238538.867546] CPU96: cppc scale: 245. > [238599.367932] CPU97: cppc scale: 2728. > [238599.859865] CPU97: cppc scale: 452. > [238647.786284] CPU96: cppc scale: 1438. > [238669.604684] CPU96: cppc scale: 27306. > [238676.805049] CPU96: cppc scale: 245. > [238737.642902] CPU97: cppc scale: 2035. > [238737.664995] CPU97: cppc scale: 452. > [238788.066193] CPU96: cppc scale: 2749. > [238788.110192] CPU96: cppc scale: 245. > [238817.231659] CPU96: cppc scale: 2698. > [238818.083687] CPU96: cppc scale: 245. > [238845.466850] CPU97: cppc scale: 2990. > [238847.477805] CPU97: cppc scale: 452. > [238936.984107] CPU97: cppc scale: 1590. > [238937.029079] CPU97: cppc scale: 452. > [238979.052464] CPU97: cppc scale: 911. > [238980.900668] CPU97: cppc scale: 452. > [239149.587889] CPU96: cppc scale: 803. > [239151.085516] CPU96: cppc scale: 245. > [239303.871373] CPU64: cppc scale: 956. > [239303.906837] CPU64: cppc scale: 245. > [239308.666786] CPU96: cppc scale: 821. > [239319.440634] CPU96: cppc scale: 245. > [239389.978395] CPU97: cppc scale: 4229. > [239391.969562] CPU97: cppc scale: 452. > [239415.894738] CPU96: cppc scale: 630. > [239417.875326] CPU96: cppc scale: 245. > > The counter values shown by feedback_ctrs do not seem monotonic even > when only core 0 threads are online. > > ref:2812420736 del:166051103 > ref:3683620736 del:641578595 > ref:1049653440 del:1548202980 > ref:2099053440 del:2120997459 > ref:3185853440 del:2714205997 > ref:712486144 del:3708490753 > ref:3658438336 del:3401357212 > ref:1570998080 del:2279728438 > > For now I was just wondering if you have seen the same and whether you > have an opinion on this. I think we also saw numbers like this, which didn't explain a lot on ThunderX2. We thought they may be due to rounding issues, but the offlining stuff adds an interesting factor to that. -- viresh
On Mon, 28 Jun 2021 at 23:37, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote: > > On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > > > Hi guys, > > > > > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > > > Hello, > > > > > > > > > > Changes since V2: > > > > > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > > > work using policy ->init() and exit() alone. > > > > > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > > > > > - Improved commit log of 3/4. > > > > > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > > > overlap (seen with Vincent's setup). > > > > > > > > > > > > > If you happen to have the data around, I would like to know more about > > > > your observations on ThunderX2. > > > > > > > > > > > > I tried ThunderX2 as well, with the following observations: > > > > > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > > > scale factor was all over the place (even much larger than 1024). > > > > > > > > My initial assumptions: > > > > - Counters do not behave properly in light of SMT > > > > - Firmware does not do a good job to keep the reference and core > > > > counters monotonic: save and restore at core off. > > > > > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > > > a single core (part of policy0). With this all works very well: > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1863.095370] CPU96: cppc scale: 697. > > > > [ 1863.175370] CPU0: cppc scale: 492. > > > > [ 1863.215367] CPU64: cppc scale: 492. > > > > [ 1863.235366] CPU96: cppc scale: 492. > > > > [ 1863.485368] CPU32: cppc scale: 492. > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1891.395363] CPU96: cppc scale: 558. > > > > [ 1891.415362] CPU0: cppc scale: 595. > > > > [ 1891.435362] CPU32: cppc scale: 615. > > > > [ 1891.465363] CPU96: cppc scale: 635. > > > > [ 1891.495361] CPU0: cppc scale: 673. > > > > [ 1891.515360] CPU32: cppc scale: 703. > > > > [ 1891.545360] CPU96: cppc scale: 738. > > > > [ 1891.575360] CPU0: cppc scale: 779. > > > > [ 1891.605360] CPU96: cppc scale: 829. > > > > [ 1891.635360] CPU0: cppc scale: 879. > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1896.585363] CPU32: cppc scale: 1004. > > > > [ 1896.675359] CPU64: cppc scale: 973. > > > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > > > > > I'm doing a rate limited printk only for increase/decrease values over > > > > 64 in the scale factor value. > > > > > > > > This showed me that SMT is handled properly. > > > > > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > > > stops being even close to correct, for example: > > > > > > > > [238394.770328] CPU96: cppc scale: 22328. > > > > [238395.628846] CPU96: cppc scale: 245. > > > > [238516.087115] CPU96: cppc scale: 930. > > > > [238523.385009] CPU96: cppc scale: 245. > > > > [238538.767473] CPU96: cppc scale: 936. > > > > [238538.867546] CPU96: cppc scale: 245. > > > > [238599.367932] CPU97: cppc scale: 2728. > > > > [238599.859865] CPU97: cppc scale: 452. > > > > [238647.786284] CPU96: cppc scale: 1438. > > > > [238669.604684] CPU96: cppc scale: 27306. > > > > [238676.805049] CPU96: cppc scale: 245. > > > > [238737.642902] CPU97: cppc scale: 2035. > > > > [238737.664995] CPU97: cppc scale: 452. > > > > [238788.066193] CPU96: cppc scale: 2749. > > > > [238788.110192] CPU96: cppc scale: 245. > > > > [238817.231659] CPU96: cppc scale: 2698. > > > > [238818.083687] CPU96: cppc scale: 245. > > > > [238845.466850] CPU97: cppc scale: 2990. > > > > [238847.477805] CPU97: cppc scale: 452. > > > > [238936.984107] CPU97: cppc scale: 1590. > > > > [238937.029079] CPU97: cppc scale: 452. > > > > [238979.052464] CPU97: cppc scale: 911. > > > > [238980.900668] CPU97: cppc scale: 452. > > > > [239149.587889] CPU96: cppc scale: 803. > > > > [239151.085516] CPU96: cppc scale: 245. > > > > [239303.871373] CPU64: cppc scale: 956. > > > > [239303.906837] CPU64: cppc scale: 245. > > > > [239308.666786] CPU96: cppc scale: 821. > > > > [239319.440634] CPU96: cppc scale: 245. > > > > [239389.978395] CPU97: cppc scale: 4229. > > > > [239391.969562] CPU97: cppc scale: 452. > > > > [239415.894738] CPU96: cppc scale: 630. > > > > [239417.875326] CPU96: cppc scale: 245. > > > > > > > > > > With the counter being 32bits and the freq scaling being update at > > > tick, you can easily get a overflow on it in idle system. I can easily > > > imagine that when you unplug CPUs there is enough activity on the CPU > > > to update it regularly whereas with all CPUs, the idle time is longer > > > that the counter overflow > > For sane counters, how long the CPU is idle should not matter (please > see below my definition of sane counters). > > > > > > > > Thanks! Yes, given the high wraparound time I thought they were 64 bit. > > All variables in software are 64 bit, but looking at bit width in the > > _CPC entries, the platform counters are 32 bit counters. > > > > I've looked a bit more over the code, and considering this particular > system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe > the wraparound is considered, and this should not cause these strange > values in the scale factor. > > I consider the counters sane if both stop during idle - either they stop > when CPU is clock gated, or some firmware does save/restore at core off. > Therefore, in all idle cases they seem to have stopped, from the point of > view of the OS. The ACPI spec mentions that both count "any time the > processor is active". > > After the cores return from idle, the counters will wraparound at a > minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would > have happened which allows the getDelta() function in cppc_cpufreq > driver to get the proper difference in values. > > Let me know what you think. This is not what I can see on my THX2. Counter are always increasing at the same pace even if the system is idle (nothing running except background activity) As long as you read counter often enough, the results is coherent with the freq that has been set by the governor > > Thanks, > Ionela. > > > > There are 32bits and the overflow need to be handled by cppc_cpufreq > > > driver > > > > I'm wondering if this would be best handled in the function that reads > > the counters or in the cppc_cpufreq driver that uses them. Probably the > > latter, as you say, as the read function should only return the raw > > values, but it does complicate things. > > > > Thanks, > > Ionela. > > > > > > > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > > > when only core 0 threads are online. > > > > > > > > ref:2812420736 del:166051103 > > > > ref:3683620736 del:641578595 > > > > ref:1049653440 del:1548202980 > > > > ref:2099053440 del:2120997459 > > > > ref:3185853440 del:2714205997 > > > > ref:712486144 del:3708490753 > > > > ref:3658438336 del:3401357212 > > > > ref:1570998080 del:2279728438 > > > > > > > > For now I was just wondering if you have seen the same and whether you > > > > have an opinion on this. > > > > > > > > > This is tested on my Hikey platform (without the actual read/write to > > > > > performance counters), with this script for over an hour: > > > > > > > > > > while true; do > > > > > for i in `seq 1 7`; > > > > > do > > > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > > > done; > > > > > > > > > > for i in `seq 1 7`; > > > > > do > > > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > > > done; > > > > > done > > > > > > > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > > > and ThunderX2). > > > > > > > > Thanks, > > > > Ionela.
Hi, On Tuesday 29 Jun 2021 at 10:50:28 (+0530), Viresh Kumar wrote: > On 28-06-21, 12:54, Ionela Voinescu wrote: > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > Interesting. > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > I think we also saw numbers like this, which didn't explain a lot on > ThunderX2. We thought they may be due to rounding issues, but the > offlining stuff adds an interesting factor to that. > More testing last night showed that having 1 core (with all 4 threads) online per socket works fine, while having 2 cores online per socket gives these bad values. My assumption is that the counters reset to 0 at core off which introduces the behavior. When there is a single core per socket, this single core cannot be turned off. I tried to boot with less CPUs and cpuidle.off=1 but it did not make a difference. I expect much of the idle control to be in hardware/firmware, so possibly cores were still turned off. I'll do more research on idle state management for SMT and debugging to see if it explains more, but it will take longer as I've ignored a few other responsibilities in the meantime. Thanks, Ionela. > -- > viresh
Hi Qian, Sorry for the delay. I was trying to run tests on ThunderX2 as well, as it can give me more control over testing, to get more insight on this. On Friday 25 Jun 2021 at 22:29:26 (-0400), Qian Cai wrote: > > > On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > > Quick questions for you: > > > > 1. When you say you tried a 5.4 kernel, did you try it with these > > patches backported? They also have some dependencies with the recent > > changes in the arch topology driver and cpufreq so they would not be > > straight forward to backport. > > > > If the 5.4 kernel you tried did not have these patches, it might be best > > to try next/master that has these patches, but with > > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > > an incorrect frequency scale factor here would affect utilization that > > would then affect the schedutil frequency selection. I would not expect > > this behavior even if the scale factor was wrong, but it would be good > > to rule out. > > > > 2. Is your platform booting with all CPUs? Are any hotplug operations > > done in your scenario? > > Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m > will fix the previous mentioned issues here (any explanations of that?) > even though the scaling down is not perfect. Now, we have the following > on this idle system: > I don't see how this would have played a role. The cppc cpufreq driver depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects. If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the cppc cpufreq driver could not be used at all. > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs running > lower than lowest_perf (1GHz). Also, even though I set all CPUs to use > "userspace" governor and set freq to the lowest. A few CPUs keep changing > at will. > I do not believe getting values lower than lowest is worrying as long as they are not much much lower and they don't happen very often. First of all firmware has control over CPU frequencies and it can decide to select a lower frequency if it wishes. Looking at the fact that it only happens rarely in your tests, it's also possible that this is introduced by the fact that the CPU only spends only a few cycles in active state. That would reduce the resolution of the computed current performance level, which results in a lower value when converted to frequency. Hope it helps, Ionela. > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000
On 6/29/2021 5:06 AM, Ionela Voinescu wrote: >> Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m >> will fix the previous mentioned issues here (any explanations of that?) >> even though the scaling down is not perfect. Now, we have the following >> on this idle system: >> > > I don't see how this would have played a role. The cppc cpufreq driver > depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn > needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects. > If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the > cppc cpufreq driver could not be used at all. Ionela, that is fine. I can live with setting ACPI_PROCESSOR=y to workaround. I was more of just curious about why manually loading processor.ko would set the lowest to 2GHz instead of 1GHz. Anyway, if running below the lowest is not the a concern, then I have concluded my testing here. Feel free to include: Tested-by: Qian Cai <quic_qiancai@quicinc.com>