Message ID | 20240127004321.1902477-2-davidai@google.com |
---|---|
State | New |
Headers | show |
Series | [v5,1/2] dt-bindings: cpufreq: add virtual cpufreq device | expand |
On Wed, Jan 31, 2024 at 10:23:03AM -0800, Saravana Kannan wrote: > On Wed, Jan 31, 2024 at 9:06 AM Rob Herring <robh@kernel.org> wrote: > > > > On Fri, Jan 26, 2024 at 04:43:15PM -0800, David Dai wrote: > > > Adding bindings to represent a virtual cpufreq device. > > > > > > Virtual machines may expose MMIO regions for a virtual cpufreq device > > > for guests to read frequency information or to request frequency > > > selection. The virtual cpufreq device has an individual controller for > > > each frequency domain. Performance points for a given domain can be > > > normalized across all domains for ease of allowing for virtual machines > > > to migrate between hosts. > > > > > > Co-developed-by: Saravana Kannan <saravanak@google.com> > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > Signed-off-by: David Dai <davidai@google.com> > > > --- > > > .../cpufreq/qemu,cpufreq-virtual.yaml | 110 ++++++++++++++++++ > > > > > + const: qemu,virtual-cpufreq > > > > Well, the filename almost matches the compatible. > > > > > + > > > + reg: > > > + maxItems: 1 > > > + description: > > > + Address and size of region containing frequency controls for each of the > > > + frequency domains. Regions for each frequency domain is placed > > > + contiguously and contain registers for controlling DVFS(Dynamic Frequency > > > + and Voltage) characteristics. The size of the region is proportional to > > > + total number of frequency domains. This device also needs the CPUs to > > > + list their OPPs using operating-points-v2 tables. The OPP tables for the > > > + CPUs should use normalized "frequency" values where the OPP with the > > > + highest performance among all the vCPUs is listed as 1024 KHz. The rest > > > + of the frequencies of all the vCPUs should be normalized based on their > > > + performance relative to that 1024 KHz OPP. This makes it much easier to > > > + migrate the VM across systems which might have different physical CPU > > > + OPPs. > > > + > > > +required: > > > + - compatible > > > + - reg > > > + > > > +additionalProperties: false > > > + > > > +examples: > > > + - | > > > + // This example shows a two CPU configuration with a frequency domain > > > + // for each CPU showing normalized performance points. > > > + cpus { > > > + #address-cells = <1>; > > > + #size-cells = <0>; > > > + > > > + cpu@0 { > > > + compatible = "arm,armv8"; > > > + device_type = "cpu"; > > > + reg = <0x0>; > > > + operating-points-v2 = <&opp_table0>; > > > + }; > > > + > > > + cpu@1 { > > > + compatible = "arm,armv8"; > > > + device_type = "cpu"; > > > + reg = <0x0>; > > > + operating-points-v2 = <&opp_table1>; > > > + }; > > > + }; > > > + > > > + opp_table0: opp-table-0 { > > > + compatible = "operating-points-v2"; > > > + > > > + opp64000 { opp-hz = /bits/ 64 <64000>; }; > > > > opp-64000 is the preferred form. > > > > > + opp128000 { opp-hz = /bits/ 64 <128000>; }; > > > + opp192000 { opp-hz = /bits/ 64 <192000>; }; > > > + opp256000 { opp-hz = /bits/ 64 <256000>; }; > > > + opp320000 { opp-hz = /bits/ 64 <320000>; }; > > > + opp384000 { opp-hz = /bits/ 64 <384000>; }; > > > + opp425000 { opp-hz = /bits/ 64 <425000>; }; > > > + }; > > > + > > > + opp_table1: opp-table-1 { > > > + compatible = "operating-points-v2"; > > > + > > > + opp64000 { opp-hz = /bits/ 64 <64000>; }; > > > + opp128000 { opp-hz = /bits/ 64 <128000>; }; > > > + opp192000 { opp-hz = /bits/ 64 <192000>; }; > > > + opp256000 { opp-hz = /bits/ 64 <256000>; }; > > > + opp320000 { opp-hz = /bits/ 64 <320000>; }; > > > + opp384000 { opp-hz = /bits/ 64 <384000>; }; > > > + opp448000 { opp-hz = /bits/ 64 <448000>; }; > > > + opp512000 { opp-hz = /bits/ 64 <512000>; }; > > > + opp576000 { opp-hz = /bits/ 64 <576000>; }; > > > + opp640000 { opp-hz = /bits/ 64 <640000>; }; > > > + opp704000 { opp-hz = /bits/ 64 <704000>; }; > > > + opp768000 { opp-hz = /bits/ 64 <768000>; }; > > > + opp832000 { opp-hz = /bits/ 64 <832000>; }; > > > + opp896000 { opp-hz = /bits/ 64 <896000>; }; > > > + opp960000 { opp-hz = /bits/ 64 <960000>; }; > > > + opp1024000 { opp-hz = /bits/ 64 <1024000>; }; > > > + > > > + }; > > > > I don't recall your prior versions having an OPP table. Maybe it was > > incomplete. You are designing the "h/w" interface. Why don't you make it > > discoverable or implicit (fixed for the h/w)? > > We also need the OPP tables to indicate which CPUs are part of the > same cluster, etc. Don't want to invent a new "protocol" and just use > existing DT bindings. Topology binding is for that. What about when x86 and other ACPI systems need to do this too? You define a discoverable interface, then it works regardless of firmware. KVM, Virtio, VFIO, etc. are all their own protocols. > > Do you really need it if the frequency is normalized? > > Yeah, we can have little and big CPUs and want to emulate different > performance levels. So while the Fmax on big is 1024, we still want to > be able to say little is 425. So we definitely need frequency tables. You need per CPU Fmax, sure. But all the frequencies? I don't follow why you don't just have a max available capacity and then request the desired capacity. Then the host maps that to an underlying OPP. Why have an intermediate set of fake frequencies? As these are normalized, I guess you are normalizing for capacity as well? Or you are using "capacity-dmips-mhz"? I'm also lost how this would work when you migrate and the underlying CPU changes. The DT is fixed. > > Also, we have "opp-level" for opaque values that aren't Hz. > > Still want to keep it Hz to be compatible with arch_freq_scale and > when virtualized CPU perf counters are available. Seems like no one would want "opp-level" then. Shrug. Anyway, if Viresh and Marc are fine with all this, I'll shut up. Rob
diff --git a/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml b/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml new file mode 100644 index 000000000000..cd617baf75e7 --- /dev/null +++ b/Documentation/devicetree/bindings/cpufreq/qemu,cpufreq-virtual.yaml @@ -0,0 +1,110 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/cpufreq/qemu,cpufreq-virtual.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Virtual CPUFreq + +maintainers: + - David Dai <davidai@google.com> + - Saravana Kannan <saravanak@google.com> + +description: + Virtual CPUFreq is a virtualized driver in guest kernels that sends frequency + selection of its vCPUs as a hint to the host through MMIO regions. Each vCPU + is associated with a frequency domain which can be shared with other vCPUs. + Each frequency domain has its own set of registers for frequency controls. + +properties: + compatible: + const: qemu,virtual-cpufreq + + reg: + maxItems: 1 + description: + Address and size of region containing frequency controls for each of the + frequency domains. Regions for each frequency domain is placed + contiguously and contain registers for controlling DVFS(Dynamic Frequency + and Voltage) characteristics. The size of the region is proportional to + total number of frequency domains. This device also needs the CPUs to + list their OPPs using operating-points-v2 tables. The OPP tables for the + CPUs should use normalized "frequency" values where the OPP with the + highest performance among all the vCPUs is listed as 1024 KHz. The rest + of the frequencies of all the vCPUs should be normalized based on their + performance relative to that 1024 KHz OPP. This makes it much easier to + migrate the VM across systems which might have different physical CPU + OPPs. + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + // This example shows a two CPU configuration with a frequency domain + // for each CPU showing normalized performance points. + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,armv8"; + device_type = "cpu"; + reg = <0x0>; + operating-points-v2 = <&opp_table0>; + }; + + cpu@1 { + compatible = "arm,armv8"; + device_type = "cpu"; + reg = <0x0>; + operating-points-v2 = <&opp_table1>; + }; + }; + + opp_table0: opp-table-0 { + compatible = "operating-points-v2"; + + opp64000 { opp-hz = /bits/ 64 <64000>; }; + opp128000 { opp-hz = /bits/ 64 <128000>; }; + opp192000 { opp-hz = /bits/ 64 <192000>; }; + opp256000 { opp-hz = /bits/ 64 <256000>; }; + opp320000 { opp-hz = /bits/ 64 <320000>; }; + opp384000 { opp-hz = /bits/ 64 <384000>; }; + opp425000 { opp-hz = /bits/ 64 <425000>; }; + }; + + opp_table1: opp-table-1 { + compatible = "operating-points-v2"; + + opp64000 { opp-hz = /bits/ 64 <64000>; }; + opp128000 { opp-hz = /bits/ 64 <128000>; }; + opp192000 { opp-hz = /bits/ 64 <192000>; }; + opp256000 { opp-hz = /bits/ 64 <256000>; }; + opp320000 { opp-hz = /bits/ 64 <320000>; }; + opp384000 { opp-hz = /bits/ 64 <384000>; }; + opp448000 { opp-hz = /bits/ 64 <448000>; }; + opp512000 { opp-hz = /bits/ 64 <512000>; }; + opp576000 { opp-hz = /bits/ 64 <576000>; }; + opp640000 { opp-hz = /bits/ 64 <640000>; }; + opp704000 { opp-hz = /bits/ 64 <704000>; }; + opp768000 { opp-hz = /bits/ 64 <768000>; }; + opp832000 { opp-hz = /bits/ 64 <832000>; }; + opp896000 { opp-hz = /bits/ 64 <896000>; }; + opp960000 { opp-hz = /bits/ 64 <960000>; }; + opp1024000 { opp-hz = /bits/ 64 <1024000>; }; + + }; + + soc { + #address-cells = <1>; + #size-cells = <1>; + + cpufreq@1040000 { + compatible = "qemu,virtual-cpufreq"; + reg = <0x1040000 0x10>; + }; + };