Message ID | 20231012175836.3408077-1-thierry.reding@gmail.com |
---|---|
Headers | show |
Series | thermal: tegra: Do not register cooling device | expand |
On 12/10/2023 19:58, Thierry Reding wrote: > From: Thierry Reding <treding@nvidia.com> > > Instead of passing around platform and plain devices and figuring out > the driver-private data within each helper, directly pass around the > driver-private data when it's available. > > Also store a pointer to the parent device in the main driver-private > data structure for easier access. > > Signed-off-by: Thierry Reding <treding@nvidia.com> > --- [ ... ] > -static void soctherm_debug_init(struct platform_device *pdev) > +static void soctherm_debug_init(struct tegra_soctherm *tegra) > { > - struct tegra_soctherm *tegra = platform_get_drvdata(pdev); > struct dentry *root; > > root = debugfs_create_dir("soctherm", NULL); > > tegra->debugfs_dir = root; > > - debugfs_create_file("reg_contents", 0644, root, pdev, ®s_fops); > + debugfs_create_file("reg_contents", 0644, root, tegra, ®s_fops); (Orthogonal to this series) : in case you are not aware of it there is the debugfs_create_regset32() function. That may make go away a bunch of code related to the debugfs code here. cf. https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/mediatek/lvts_thermal.c?h=thermal/bleeding-edge#n159 [ ... ]
Le jeu. 12 oct. 2023 à 19:58, Thierry Reding <thierry.reding@gmail.com> a écrit : > > From: Thierry Reding <treding@nvidia.com> > > Hi, > > this set of patches removes the registration of the SOCTHERM internal > throttling mechanism as cooling device. Since this throttling starts > automatically once a certain temperature threshold is crossed, it > doesn't make sense to represent it as a cooling device, which are > typically "manually" activated by the thermal framework when thermal > sensors report temperature thresholds being crossed. > > Instead of using the cooling device mechanism, this statically programs > the throttling mechanism when it is configured in device tree. In order > to do this, an additional device tree property is needed to replace the > information that was previously contained in trip points. > > There's a few preparatory patches to make the removal a bit simpler and > also some follow up cleanups included as well. > > Changes in v2: > - rework the device tree bindings: > - add nvidia,thermal-zones property to attach throttling to zones > - use -millicelsius suffix and add hysteresis > - add patch to store thermal zone device tree node for later use > - add patch to enforce self-encapsulation of the thermal core now that > no drivers need to reach into it anymore > > This applies on top of Daniel's self-encapsulation hardening series: > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > Thierry > > Thierry Reding (13): > thermal: Store device tree node for thermal zone devices > dt-bindings: thermal: tegra: Document throttle temperature > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > thermal: tegra: Use driver-private data consistently > thermal: tegra: Constify SoC-specific data > thermal: tegra: Do not register cooling device > thermal: tegra: Use unsigned int where appropriate > thermal: tegra: Avoid over-allocation of temporary array > thermal: tegra: Remove gratuitous error assignment > thermal: tegra: Minor stylistic cleanups > ARM: tegra: Rework SOCTHERM on Tegra124 > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > thermal: Enforce self-encapsulation > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > drivers/thermal/tegra/soctherm.h | 1 + > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > drivers/thermal/thermal_core.h | 2 +- > drivers/thermal/thermal_of.c | 3 + > 11 files changed, 329 insertions(+), 453 deletions(-) > > -- > 2.42.0 > I'm still experiencing the following message on jetson-tx1 with this serie applied on top of 6.6-rc5 (with iommu-next and tegra-next applied). oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: Failed to register thermal zone: -19 oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid prop Is this expected ?
On Fri, Oct 13, 2023 at 10:04:33AM +0200, Daniel Lezcano wrote: > On 12/10/2023 19:58, Thierry Reding wrote: > > From: Thierry Reding <treding@nvidia.com> > > > > Instead of passing around platform and plain devices and figuring out > > the driver-private data within each helper, directly pass around the > > driver-private data when it's available. > > > > Also store a pointer to the parent device in the main driver-private > > data structure for easier access. > > > > Signed-off-by: Thierry Reding <treding@nvidia.com> > > --- > > [ ... ] > > > -static void soctherm_debug_init(struct platform_device *pdev) > > +static void soctherm_debug_init(struct tegra_soctherm *tegra) > > { > > - struct tegra_soctherm *tegra = platform_get_drvdata(pdev); > > struct dentry *root; > > root = debugfs_create_dir("soctherm", NULL); > > tegra->debugfs_dir = root; > > - debugfs_create_file("reg_contents", 0644, root, pdev, ®s_fops); > > + debugfs_create_file("reg_contents", 0644, root, tegra, ®s_fops); > > (Orthogonal to this series) : in case you are not aware of it there is the > debugfs_create_regset32() function. That may make go away a bunch of code > related to the debugfs code here. > > cf. https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/mediatek/lvts_thermal.c?h=thermal/bleeding-edge#n159 Doesn't look like we could 1:1 transition to that because a bunch of the output is parameterized on some per-SoC variables, but it might be possible to rework the whole debugfs interfaces and split it up into multiple directories and files, much like LVTS does this. That's going to be quite a bit of work, but I can add it to my TODO list to look at when I get some spare time. Thierry
On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > From: Thierry Reding <treding@nvidia.com> > > > > Hi, > > > > this set of patches removes the registration of the SOCTHERM internal > > throttling mechanism as cooling device. Since this throttling starts > > automatically once a certain temperature threshold is crossed, it > > doesn't make sense to represent it as a cooling device, which are > > typically "manually" activated by the thermal framework when thermal > > sensors report temperature thresholds being crossed. > > > > Instead of using the cooling device mechanism, this statically programs > > the throttling mechanism when it is configured in device tree. In order > > to do this, an additional device tree property is needed to replace the > > information that was previously contained in trip points. > > > > There's a few preparatory patches to make the removal a bit simpler and > > also some follow up cleanups included as well. > > > > Changes in v2: > > - rework the device tree bindings: > > - add nvidia,thermal-zones property to attach throttling to zones > > - use -millicelsius suffix and add hysteresis > > - add patch to store thermal zone device tree node for later use > > - add patch to enforce self-encapsulation of the thermal core now that > > no drivers need to reach into it anymore > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > Thierry > > > > Thierry Reding (13): > > thermal: Store device tree node for thermal zone devices > > dt-bindings: thermal: tegra: Document throttle temperature > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > thermal: tegra: Use driver-private data consistently > > thermal: tegra: Constify SoC-specific data > > thermal: tegra: Do not register cooling device > > thermal: tegra: Use unsigned int where appropriate > > thermal: tegra: Avoid over-allocation of temporary array > > thermal: tegra: Remove gratuitous error assignment > > thermal: tegra: Minor stylistic cleanups > > ARM: tegra: Rework SOCTHERM on Tegra124 > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > thermal: Enforce self-encapsulation > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > drivers/thermal/tegra/soctherm.h | 1 + > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > drivers/thermal/thermal_core.h | 2 +- > > drivers/thermal/thermal_of.c | 3 + > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > -- > > 2.42.0 > > > > I'm still experiencing the following message on jetson-tx1 with this > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > applied). > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > Failed to register thermal zone: -19 Not sure about this one. I don't think I've seen it. Do you know if that's somehow caused by this series, or is it preexisting? > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > prop > > Is this expected ? This one is definitely not expected. I have seen this before, and it happens when the device tree doesn't contain all the properties that are expected. Patch 12 in this series should take care of that. Have you made sure that that's been applied and is what the kernel uses to boot with? Thierry
Le ven. 13 oct. 2023 à 13:43, Thierry Reding <thierry.reding@gmail.com> a écrit : > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > <thierry.reding@gmail.com> a écrit : > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > Hi, > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > throttling mechanism as cooling device. Since this throttling starts > > > automatically once a certain temperature threshold is crossed, it > > > doesn't make sense to represent it as a cooling device, which are > > > typically "manually" activated by the thermal framework when thermal > > > sensors report temperature thresholds being crossed. > > > > > > Instead of using the cooling device mechanism, this statically programs > > > the throttling mechanism when it is configured in device tree. In order > > > to do this, an additional device tree property is needed to replace the > > > information that was previously contained in trip points. > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > also some follow up cleanups included as well. > > > > > > Changes in v2: > > > - rework the device tree bindings: > > > - add nvidia,thermal-zones property to attach throttling to zones > > > - use -millicelsius suffix and add hysteresis > > > - add patch to store thermal zone device tree node for later use > > > - add patch to enforce self-encapsulation of the thermal core now that > > > no drivers need to reach into it anymore > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > Thierry > > > > > > Thierry Reding (13): > > > thermal: Store device tree node for thermal zone devices > > > dt-bindings: thermal: tegra: Document throttle temperature > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > thermal: tegra: Use driver-private data consistently > > > thermal: tegra: Constify SoC-specific data > > > thermal: tegra: Do not register cooling device > > > thermal: tegra: Use unsigned int where appropriate > > > thermal: tegra: Avoid over-allocation of temporary array > > > thermal: tegra: Remove gratuitous error assignment > > > thermal: tegra: Minor stylistic cleanups > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > thermal: Enforce self-encapsulation > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > drivers/thermal/tegra/soctherm.h | 1 + > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > drivers/thermal/thermal_core.h | 2 +- > > > drivers/thermal/thermal_of.c | 3 + > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > -- > > > 2.42.0 > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > applied). > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > Failed to register thermal zone: -19 > > Not sure about this one. I don't think I've seen it. Do you know if > that's somehow caused by this series, or is it preexisting? It's pre-existing from this serie. > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > prop > > > > Is this expected ? > > This one is definitely not expected. I have seen this before, and it > happens when the device tree doesn't contain all the properties that are > expected. Patch 12 in this series should take care of that. Have you > made sure that that's been applied and is what the kernel uses to boot > with? Yes, this dtb change in patch12 is propagated to the device (as seen in /boot/dtbs) But comparing with what's available at runtime in /proc/device-tree, I see some changes heavy { - hysteresis-millicelsius = <0xfa0>; + #cooling-cells = <0x02>; nvidia,cpu-throt-percent = <0x55>; nvidia,gpu-throt-level = <0x03>; nvidia,priority = <0x64>; - nvidia,thermal-zones = <0x49 0x4a>; - temperature-millicelsius = <0x180c4>; + phandle = <0x130>; }; I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). Could it be that the bootloader has changed these entries ? Can this be prevented ? (MAC ethernet address is set as appropropriate). Thanks
On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > Hi, > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > throttling mechanism as cooling device. Since this throttling starts > > > > automatically once a certain temperature threshold is crossed, it > > > > doesn't make sense to represent it as a cooling device, which are > > > > typically "manually" activated by the thermal framework when thermal > > > > sensors report temperature thresholds being crossed. > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > the throttling mechanism when it is configured in device tree. In order > > > > to do this, an additional device tree property is needed to replace the > > > > information that was previously contained in trip points. > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > also some follow up cleanups included as well. > > > > > > > > Changes in v2: > > > > - rework the device tree bindings: > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > - use -millicelsius suffix and add hysteresis > > > > - add patch to store thermal zone device tree node for later use > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > no drivers need to reach into it anymore > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > Thierry > > > > > > > > Thierry Reding (13): > > > > thermal: Store device tree node for thermal zone devices > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > thermal: tegra: Use driver-private data consistently > > > > thermal: tegra: Constify SoC-specific data > > > > thermal: tegra: Do not register cooling device > > > > thermal: tegra: Use unsigned int where appropriate > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > thermal: tegra: Remove gratuitous error assignment > > > > thermal: tegra: Minor stylistic cleanups > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > thermal: Enforce self-encapsulation > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > drivers/thermal/thermal_core.h | 2 +- > > > > drivers/thermal/thermal_of.c | 3 + > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > -- > > > > 2.42.0 > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > applied). > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > Failed to register thermal zone: -19 > > > > Not sure about this one. I don't think I've seen it. Do you know if > > that's somehow caused by this series, or is it preexisting? > > It's pre-existing from this serie. > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > prop > > > > > > Is this expected ? > > > > This one is definitely not expected. I have seen this before, and it > > happens when the device tree doesn't contain all the properties that are > > expected. Patch 12 in this series should take care of that. Have you > > made sure that that's been applied and is what the kernel uses to boot > > with? > > Yes, this dtb change in patch12 is propagated to the device (as seen > in /boot/dtbs) > But comparing with what's available at runtime in /proc/device-tree, I > see some changes > > heavy { > - hysteresis-millicelsius = <0xfa0>; > + #cooling-cells = <0x02>; > nvidia,cpu-throt-percent = <0x55>; > nvidia,gpu-throt-level = <0x03>; > nvidia,priority = <0x64>; > - nvidia,thermal-zones = <0x49 0x4a>; > - temperature-millicelsius = <0x180c4>; > + phandle = <0x130>; > }; Okay, that explains the error message. > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > Could it be that the bootloader has changed these entries ? Can this > be prevented ? I'm not aware of anything in the bootloader that would do this. Some versions of U-Boot that ships with L4T can copy certain nodes in DTB but I have never seen anything that would've touched thermal. Is it possible that you're not loading the DTB and end up receiving one from UEFI or cboot? > (MAC ethernet address is set as appropropriate). That's a completely separate mechanism and shouldn't touch thermal at all. Thierry
Le ven. 13 oct. 2023 à 15:13, Thierry Reding <thierry.reding@gmail.com> a écrit : > > On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > > <thierry.reding@gmail.com> a écrit : > > > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > Hi, > > > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > > throttling mechanism as cooling device. Since this throttling starts > > > > > automatically once a certain temperature threshold is crossed, it > > > > > doesn't make sense to represent it as a cooling device, which are > > > > > typically "manually" activated by the thermal framework when thermal > > > > > sensors report temperature thresholds being crossed. > > > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > > the throttling mechanism when it is configured in device tree. In order > > > > > to do this, an additional device tree property is needed to replace the > > > > > information that was previously contained in trip points. > > > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > > also some follow up cleanups included as well. > > > > > > > > > > Changes in v2: > > > > > - rework the device tree bindings: > > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > > - use -millicelsius suffix and add hysteresis > > > > > - add patch to store thermal zone device tree node for later use > > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > > no drivers need to reach into it anymore > > > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > > > Thierry > > > > > > > > > > Thierry Reding (13): > > > > > thermal: Store device tree node for thermal zone devices > > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > > thermal: tegra: Use driver-private data consistently > > > > > thermal: tegra: Constify SoC-specific data > > > > > thermal: tegra: Do not register cooling device > > > > > thermal: tegra: Use unsigned int where appropriate > > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > > thermal: tegra: Remove gratuitous error assignment > > > > > thermal: tegra: Minor stylistic cleanups > > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > > thermal: Enforce self-encapsulation > > > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > > drivers/thermal/thermal_core.h | 2 +- > > > > > drivers/thermal/thermal_of.c | 3 + > > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > > > -- > > > > > 2.42.0 > > > > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > > applied). > > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > > Failed to register thermal zone: -19 > > > > > > Not sure about this one. I don't think I've seen it. Do you know if > > > that's somehow caused by this series, or is it preexisting? > > > > It's pre-existing from this serie. > > > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > > prop > > > > > > > > Is this expected ? > > > > > > This one is definitely not expected. I have seen this before, and it > > > happens when the device tree doesn't contain all the properties that are > > > expected. Patch 12 in this series should take care of that. Have you > > > made sure that that's been applied and is what the kernel uses to boot > > > with? > > > > Yes, this dtb change in patch12 is propagated to the device (as seen > > in /boot/dtbs) > > But comparing with what's available at runtime in /proc/device-tree, I > > see some changes > > > > heavy { > > - hysteresis-millicelsius = <0xfa0>; > > + #cooling-cells = <0x02>; > > nvidia,cpu-throt-percent = <0x55>; > > nvidia,gpu-throt-level = <0x03>; > > nvidia,priority = <0x64>; > > - nvidia,thermal-zones = <0x49 0x4a>; > > - temperature-millicelsius = <0x180c4>; > > + phandle = <0x130>; > > }; > > Okay, that explains the error message. > > > > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > > Could it be that the bootloader has changed these entries ? Can this > > be prevented ? > > I'm not aware of anything in the bootloader that would do this. Some > versions of U-Boot that ships with L4T can copy certain nodes in DTB but > I have never seen anything that would've touched thermal. > > Is it possible that you're not loading the DTB and end up receiving one > from UEFI or cboot? Seems like it: a previous copy was in /boot/dts that took over /boot/dtbs. With updated dtb, the warning disappeared. Only remaining error is: kernel: max77620-thermal max77620-thermal: Failed to register thermal zone: -19 Thanks
On Fri, Oct 13, 2023 at 03:55:43PM +0200, Nicolas Chauvet wrote: > Le ven. 13 oct. 2023 à 15:13, Thierry Reding > <thierry.reding@gmail.com> a écrit : > > > > On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > > > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > > > <thierry.reding@gmail.com> a écrit : > > > > > > > > > > > > From: Thierry Reding <treding@nvidia.com> > > > > > > > > > > > > Hi, > > > > > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > > > throttling mechanism as cooling device. Since this throttling starts > > > > > > automatically once a certain temperature threshold is crossed, it > > > > > > doesn't make sense to represent it as a cooling device, which are > > > > > > typically "manually" activated by the thermal framework when thermal > > > > > > sensors report temperature thresholds being crossed. > > > > > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > > > the throttling mechanism when it is configured in device tree. In order > > > > > > to do this, an additional device tree property is needed to replace the > > > > > > information that was previously contained in trip points. > > > > > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > > > also some follow up cleanups included as well. > > > > > > > > > > > > Changes in v2: > > > > > > - rework the device tree bindings: > > > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > > > - use -millicelsius suffix and add hysteresis > > > > > > - add patch to store thermal zone device tree node for later use > > > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > > > no drivers need to reach into it anymore > > > > > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ > > > > > > > > > > > > Thierry > > > > > > > > > > > > Thierry Reding (13): > > > > > > thermal: Store device tree node for thermal zone devices > > > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > > > thermal: tegra: Use driver-private data consistently > > > > > > thermal: tegra: Constify SoC-specific data > > > > > > thermal: tegra: Do not register cooling device > > > > > > thermal: tegra: Use unsigned int where appropriate > > > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > > > thermal: tegra: Remove gratuitous error assignment > > > > > > thermal: tegra: Minor stylistic cleanups > > > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > > > thermal: Enforce self-encapsulation > > > > > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > > > drivers/thermal/thermal_core.h | 2 +- > > > > > > drivers/thermal/thermal_of.c | 3 + > > > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.42.0 > > > > > > > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > > > applied). > > > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > > > Failed to register thermal zone: -19 > > > > > > > > Not sure about this one. I don't think I've seen it. Do you know if > > > > that's somehow caused by this series, or is it preexisting? > > > > > > It's pre-existing from this serie. > > > > > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > > > prop > > > > > > > > > > Is this expected ? > > > > > > > > This one is definitely not expected. I have seen this before, and it > > > > happens when the device tree doesn't contain all the properties that are > > > > expected. Patch 12 in this series should take care of that. Have you > > > > made sure that that's been applied and is what the kernel uses to boot > > > > with? > > > > > > Yes, this dtb change in patch12 is propagated to the device (as seen > > > in /boot/dtbs) > > > But comparing with what's available at runtime in /proc/device-tree, I > > > see some changes > > > > > > heavy { > > > - hysteresis-millicelsius = <0xfa0>; > > > + #cooling-cells = <0x02>; > > > nvidia,cpu-throt-percent = <0x55>; > > > nvidia,gpu-throt-level = <0x03>; > > > nvidia,priority = <0x64>; > > > - nvidia,thermal-zones = <0x49 0x4a>; > > > - temperature-millicelsius = <0x180c4>; > > > + phandle = <0x130>; > > > }; > > > > Okay, that explains the error message. > > > > > > > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > > > Could it be that the bootloader has changed these entries ? Can this > > > be prevented ? > > > > I'm not aware of anything in the bootloader that would do this. Some > > versions of U-Boot that ships with L4T can copy certain nodes in DTB but > > I have never seen anything that would've touched thermal. > > > > Is it possible that you're not loading the DTB and end up receiving one > > from UEFI or cboot? > Seems like it: a previous copy was in /boot/dts that took over /boot/dtbs. > With updated dtb, the warning disappeared. Okay, great! > Only remaining error is: kernel: max77620-thermal max77620-thermal: > Failed to register thermal zone: -19 Looks like that's -ENODEV from devm_thermal_of_zone_register() via max77620_thermal_probe(). Looking at thermal_of_zone_register(), the -ENODEV case is treated specially, so perhaps we should be doing the same thing in max77620_probe()? Let me send out a patch. Thierry
On 12/10/2023 19:58, Thierry Reding wrote: > From: Thierry Reding <treding@nvidia.com> > > The throttle configurations need to be associated with one or more > thermal zones before they can be enabled, so introduce a new property, > called nvidia,thermal-zones, that contains a list of phandles to the > thermal zones for which a throttle configuration is applicable. > > Signed-off-by: Thierry Reding <treding@nvidia.com> > --- > Changes in v2: > - new patch to hook up throttling with thermal zones > > .../bindings/thermal/nvidia,tegra124-soctherm.yaml | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml b/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml > index 0eb6277082fe..359344f60a6e 100644 > --- a/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml > +++ b/Documentation/devicetree/bindings/thermal/nvidia,tegra124-soctherm.yaml > @@ -161,6 +161,11 @@ properties: > throttling is engaged after the OC event is deasserted. > default: 0 > > + nvidia,thermal-zones: > + $ref: /schemas/types.yaml#/definitions/phandle > + description: A list of phandles to the thermal zones that this > + throttle configuration applies to. > + From a DT perspective, I believe it is more correct to point to the devices the hardware throttling mechanism applies to instead of the thermal zones which is kind of software component > # optional > nvidia,thermtrips: > $ref: /schemas/types.yaml#/definitions/uint32-matrix
From: Thierry Reding <treding@nvidia.com> Hi, this set of patches removes the registration of the SOCTHERM internal throttling mechanism as cooling device. Since this throttling starts automatically once a certain temperature threshold is crossed, it doesn't make sense to represent it as a cooling device, which are typically "manually" activated by the thermal framework when thermal sensors report temperature thresholds being crossed. Instead of using the cooling device mechanism, this statically programs the throttling mechanism when it is configured in device tree. In order to do this, an additional device tree property is needed to replace the information that was previously contained in trip points. There's a few preparatory patches to make the removal a bit simpler and also some follow up cleanups included as well. Changes in v2: - rework the device tree bindings: - add nvidia,thermal-zones property to attach throttling to zones - use -millicelsius suffix and add hysteresis - add patch to store thermal zone device tree node for later use - add patch to enforce self-encapsulation of the thermal core now that no drivers need to reach into it anymore This applies on top of Daniel's self-encapsulation hardening series: https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@linaro.org/ Thierry Thierry Reding (13): thermal: Store device tree node for thermal zone devices dt-bindings: thermal: tegra: Document throttle temperature dt-bindings: thermal: tegra: Add nvidia,thermal-zones property thermal: tegra: Use driver-private data consistently thermal: tegra: Constify SoC-specific data thermal: tegra: Do not register cooling device thermal: tegra: Use unsigned int where appropriate thermal: tegra: Avoid over-allocation of temporary array thermal: tegra: Remove gratuitous error assignment thermal: tegra: Minor stylistic cleanups ARM: tegra: Rework SOCTHERM on Tegra124 arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 thermal: Enforce self-encapsulation .../thermal/nvidia,tegra124-soctherm.yaml | 19 + arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- drivers/thermal/tegra/soctherm.h | 1 + drivers/thermal/tegra/tegra124-soctherm.c | 4 + drivers/thermal/tegra/tegra132-soctherm.c | 4 + drivers/thermal/tegra/tegra210-soctherm.c | 4 + drivers/thermal/thermal_core.h | 2 +- drivers/thermal/thermal_of.c | 3 + 11 files changed, 329 insertions(+), 453 deletions(-)