Message ID | 20250530103527.2244951-1-d-gole@ti.com |
---|---|
State | New |
Headers | show |
Series | [RFC] pmdomain: arm: scmi_pm_domain: Do lazy init as part of probe | expand |
Hi, On May 30, 2025 at 17:31:08 +0100, Sudeep Holla wrote: > On Fri, May 30, 2025 at 04:05:27PM +0530, Dhruva Gole wrote: > > Optimize the SCMI power domain driver to only initialize domains that are > > actually referenced in the device tree. Previously, the driver would > > initialize all possible domains up to the maximum ID, which could lead to > > unnecessary firmware calls and longer probe times. > > > > Key changes: > > - Scan device tree to identify which power domains are actually referenced > > How do this work with runtime DT overlays ? Thanks for bringing this up, I hadn't considered runtime DT overlays for this particular patch. Off the top of my mind, we can initialize all at probe, but only query state for referenced ones. > > > - Use bitmap to track needed domains instead of initializing all > > - Only perform state queries and initialization for referenced domains > > - Maintain proper array sizing for power domain framework compatibility > > - Keep full provider structure to support late binding > > > > This optimization reduces probe time and unnecessary firmware interactions > > by only initializing power domains that are actually used in the system. > > Why is this very specific to power domains only ? This must apply for other > domains like perf or clock or reset ? Yes, it should. Starting out with just power domains for now though. I haven't looked at other places like perf and clock yet. > > > For example, in a system with 100 possible domains but only 3 referenced > > in the device tree, we now only initialize those 3 domains instead of > > all 100. > > > > Well, how much of these PD will get used in the final products ? I can > understand the need to use just 3 in devel platforms. Just trying to see > how realistic is the scenario ? Is there any other optimisation possible > from the firmware ? Does getting the state of a PD takes so much time > on the platform in question ? Well, it's not only about how much time it takes on any particular platform to query the state of a PD. Even if say it takes 10us to query it, it will add a whole 1ms to the probe time. This mean 1ms more of boot time, and perhaps boot time experts can chime in here but every optimisation possible matters! ARM systems are usually very strict on boot time requirements. Even if we somehow optimise the firmware, to me it seems like kernel is wasting time querrying for something that it doesn't need at that moment. Just replying here to your next reply on the thread: > And I missed another point. Someone will soon complain Linux no longer > turns off unused power domains. So I am inclined against this optimisation. > We need to consider all the above point before . I agree on some points like the runtime overlay. However I am not sure why Linux should be the one to turn OFF power domains that it's _unaware_ of. "unused" would probably be a wrong assumption to make. There maybe other entities that would be using power domains that are not in the device tree. (For eg. an RTOS running on the same system accessing a PD that is controlled by the SCP firmware but not mentioned in the device tree.) While one may argue that's why firmware should be the one to keep ref counts to avoid resource conflicts, one could also argue that firmwares could be the one to turn off unused power domains. Why should the kernel touch something that it hasn't been told about explicitly in the DT? Isn't the whole point of DT to be the literal hardware descriptor for the kernel? So if something doesn't exist in the DT, kernel shouldn't have other places telling it does - in this context power domains.
On Tue, Jun 10, 2025 at 05:13:14PM +0530, Dhruva Gole wrote: > Hi, > > On May 30, 2025 at 17:31:08 +0100, Sudeep Holla wrote: > > On Fri, May 30, 2025 at 04:05:27PM +0530, Dhruva Gole wrote: > > > Optimize the SCMI power domain driver to only initialize domains that are > > > actually referenced in the device tree. Previously, the driver would > > > initialize all possible domains up to the maximum ID, which could lead to > > > unnecessary firmware calls and longer probe times. > > > > > > Key changes: > > > - Scan device tree to identify which power domains are actually referenced > > Hi, forgot this thread... > > -- > Best regards, > Dhruva Gole > Texas Instruments Incorporated > > How do this work with runtime DT overlays ? > > Thanks for bringing this up, I hadn't considered runtime DT overlays for > this particular patch. > Off the top of my mind, we can initialize all at probe, but only query state > for referenced ones. Indeed, DT overlays is a good point, I missed that issue... > > > > > > - Use bitmap to track needed domains instead of initializing all > > > - Only perform state queries and initialization for referenced domains > > > - Maintain proper array sizing for power domain framework compatibility > > > - Keep full provider structure to support late binding > > > > > > This optimization reduces probe time and unnecessary firmware interactions > > > by only initializing power domains that are actually used in the system. > > > > Why is this very specific to power domains only ? This must apply for other > > domains like perf or clock or reset ? > > Yes, it should. Starting out with just power domains for now though. I > haven't looked at other places like perf and clock yet. > > > > > > For example, in a system with 100 possible domains but only 3 referenced > > > in the device tree, we now only initialize those 3 domains instead of > > > all 100. > > > > > > > Well, how much of these PD will get used in the final products ? I can > > understand the need to use just 3 in devel platforms. Just trying to see > > how realistic is the scenario ? Is there any other optimisation possible > > from the firmware ? Does getting the state of a PD takes so much time > > on the platform in question ? > > Well, it's not only about how much time it takes on any particular platform > to query the state of a PD. Even if say it takes 10us to query it, it will > add a whole 1ms to the probe time. This mean 1ms more of boot time, and > perhaps boot time experts can chime in here but every optimisation > possible matters! > ARM systems are usually very strict on boot time requirements. > > Even if we somehow optimise the firmware, to me it seems like kernel is > wasting time querrying for something that it doesn't need at that moment. > I can agree on this, and it would be interesting to optimize the usual SCMI 'query storm' at probe time across all protocols (liek Sudeep suggested), BUT first of all I would ask, if Linux refers just 3 resources why the firmware, when queried, returns the full set of N resources....one of the selling point of the SCMI protocol (especially in virtualized envs) should be that the server can expose a per-agent view so that each agent on the same system will see just the stuff that it needs, or in some case have limited access to some resources. So my point is, why are you (linux agent) even able to see such resources when querying the server, if those resources are NOT meant to be accessed at all by that agent in any capacity, not even read-oly ? I could understand that a small ratio of exposed/used resources for the sake of easy of configuration BUT it would seem unreasonable to, say, expose 100 or 100 of resources to agent A if such agent only use 3/4...consider that in turn, exposing all of such unused resources, will impact query time and query traffic (beside later also being needlessly registered with the core without your patch) > Just replying here to your next reply on the thread: > > And I missed another point. Someone will soon complain Linux no longer > > turns off unused power domains. So I am inclined against this optimisation. > > We need to consider all the above point before . > > I agree on some points like the runtime overlay. However I am not sure > why Linux should be the one to turn OFF power domains that it's > _unaware_ of. "unused" would probably be a wrong assumption to make. > There maybe other entities that would be using power domains that > are not in the device tree. (For eg. an RTOS running on the same system > accessing a PD that is controlled by the SCP firmware but not mentioned in the > device tree.) > > While one may argue that's why firmware should be the one to keep ref > counts to avoid resource conflicts, one could also argue that firmwares > could be the one to turn off unused power domains. > > Why should the kernel touch something that it hasn't been told about > explicitly in the DT? Isn't the whole point of DT to be the literal > hardware descriptor for the kernel? So if something doesn't exist in the > DT, kernel shouldn't have other places telling it does - in this context > power domains. As per my understanding, in a subsystem like clocks, the Kernel will turn off any unused (from his pov) resource that is found to be on at the end of the probe phase of the subsystem itself...in other words kernel will do try to turn off any resource that it has no use for it (so that it has not found in the DT)...so currently if we register 100 resources (all that we found) and only 3 are used (referred) it will issue an OFF for all the remaining unused 97... ..and you are right, there will be other non-Linux agents on the system issuing their requests on that same resources if they are shared, and the server will have to refcount such requests to act properly when handling shared resources, so if res_X is not used by the Kernel and we just dont register it with the Kernel subsystem (like in your patch), it wont be turned off as unused anymore AND that would be FINE, since, this agent has NO use for them and never issued any ON-request before... ...BUT.... :P ...the issue would be when you have a system where you have something like an SCMI capable bootloader that will have turned ON some of these shared resources during its boot-phase AND then left those resources ON, as a courtesy :P, so that the 'incoming' booting Kernel can found them ON (I think usually is what happens with resources related to displays if you dont want to disrupt the boot..) ...in such a scenario the booting Kernel takes-over, in the eyes of the server, the same agent role as the bootloader which is substituting, since it highjacks and uses the same SCMI channel as the disappearfing bootloader.. ...so the server will see that resources as refcounted+1 (left by the ghost of the bootloader :D)...so for power optimnization the Kernel will issue an OFF on all unused resources and that will cause a refcount-1 server side and the final physical OFF when all the other possible users of that shared resource will have released it... ...if you just dont register such resource becasue it is NOT referred in the DT, surely the Kernel wont try to turn it off as unused and so you will break this case.... ..same if the bootloader is NON-SCMI compliant and just leave a resource ON... ..all of this complicated by the never ending question around which resource state should be returned on query: physical vs virtual (currently is IMPDEF) So I think your optimization is nice in general (even better for all protocols) BUT we should address all these corner cases somehow... ... or proof all of my hallucination above non-existent/non-accurate :P, which is posibility since we discuss all of this more than 1 year ago... Thanks, Cristian
On Tue, Jun 10, 2025 at 05:13:14PM +0530, Dhruva Gole wrote: > Hi, > > On May 30, 2025 at 17:31:08 +0100, Sudeep Holla wrote: > > On Fri, May 30, 2025 at 04:05:27PM +0530, Dhruva Gole wrote: > > > Optimize the SCMI power domain driver to only initialize domains that are > > > actually referenced in the device tree. Previously, the driver would > > > initialize all possible domains up to the maximum ID, which could lead to > > > unnecessary firmware calls and longer probe times. > > > > > > Key changes: > > > - Scan device tree to identify which power domains are actually referenced > > > > How do this work with runtime DT overlays ? > > Thanks for bringing this up, I hadn't considered runtime DT overlays for > this particular patch. > Off the top of my mind, we can initialize all at probe, but only query state > for referenced ones. > Good to know. > > > > > - Use bitmap to track needed domains instead of initializing all > > > - Only perform state queries and initialization for referenced domains > > > - Maintain proper array sizing for power domain framework compatibility > > > - Keep full provider structure to support late binding > > > > > > This optimization reduces probe time and unnecessary firmware interactions > > > by only initializing power domains that are actually used in the system. > > > > Why is this very specific to power domains only ? This must apply for other > > domains like perf or clock or reset ? > > Yes, it should. Starting out with just power domains for now though. I > haven't looked at other places like perf and clock yet. > Good that we are aligned in terms of understanding the general depth of the issue. > > > > > For example, in a system with 100 possible domains but only 3 referenced > > > in the device tree, we now only initialize those 3 domains instead of > > > all 100. > > > > > > > Well, how much of these PD will get used in the final products ? I can > > understand the need to use just 3 in devel platforms. Just trying to see > > how realistic is the scenario ? Is there any other optimisation possible > > from the firmware ? Does getting the state of a PD takes so much time > > on the platform in question ? > > Well, it's not only about how much time it takes on any particular platform > to query the state of a PD. Even if say it takes 10us to query it, it will > add a whole 1ms to the probe time. This mean 1ms more of boot time, and > perhaps boot time experts can chime in here but every optimisation > possible matters! > ARM systems are usually very strict on boot time requirements. > Understood, but where we consider this as micro optimisation or not is another topic all together IMO. > Even if we somehow optimise the firmware, to me it seems like kernel is > wasting time querrying for something that it doesn't need at that moment. > Well agreed, but for overlays this may not be trivial. More fundamental questions below. > Just replying here to your next reply on the thread: > > And I missed another point. Someone will soon complain Linux no longer > > turns off unused power domains. So I am inclined against this optimisation. > > We need to consider all the above point before . > > I agree on some points like the runtime overlay. However I am not sure > why Linux should be the one to turn OFF power domains that it's > _unaware_ of. "unused" would probably be a wrong assumption to make. I have to disagree here. If the firmware is making the kernel aware of N power domains, then just not using it in DT and labeling the kernel is unaware of those PDs is simply and completely wrong. Why did the firmware make this(Linux) agent aware of those PDs then ? IMO, it is just *unused*. > There maybe other entities that would be using power domains that > are not in the device tree. (For eg. an RTOS running on the same system > accessing a PD that is controlled by the SCP firmware but not mentioned in the > device tree.) > Well I assume those RTOS or entities are different agents in terms of SCP/SCMI and they can have their own view of PDs > While one may argue that's why firmware should be the one to keep ref > counts to avoid resource conflicts, one could also argue that firmwares > could be the one to turn off unused power domains. > Yes firmware has the final say, but it can't track what resources Linux is using and what needs to stay on or off especially if no request from OS has come. There was an issue that if bootloaders turn on PD and Linux avoids making explicitly request to the firmware as it finds it ON, then firmware will leave it on even if Linux is not using it. So, yes it is Linux responsibility to turn of any unused PDs. We are not going to debate on that as there are platforms that are already relying on this feature. > Why should the kernel touch something that it hasn't been told about > explicitly in the DT? Isn't the whole point of DT to be the literal > hardware descriptor for the kernel? Well, I understand and kind of agree. But why does the firmware inform Linux about PDs that are never used by the Linux and always used by some other agent like RTOS. That is the main point here and nothing related to DT. We rely or trust the information from the firmware to be more dynamic compared to DT. So DT is second class citizen in this context. It only provides info that firmware can't provide not the other way around. That's the whole point of moving towards this firmware interfaces that are more dynamic and runtime aware than the DT which are static. > So if something doesn't exist in the DT, kernel shouldn't have other > places telling it does - in this context power domains. > Firmware is discoverable, we only add information that are hard to discover like domain IDs assigned to the device and so on in the DT. But information from the firmware must be always more accurate than DT. So if the firmware says there are 100 PDs for Linux agent but 50 of them are exclusively used for some other agent, then something wrong with the firmware here.
diff --git a/drivers/pmdomain/arm/scmi_pm_domain.c b/drivers/pmdomain/arm/scmi_pm_domain.c index 2a213c218126..8acbfe249ed2 100644 --- a/drivers/pmdomain/arm/scmi_pm_domain.c +++ b/drivers/pmdomain/arm/scmi_pm_domain.c @@ -10,6 +10,8 @@ #include <linux/module.h> #include <linux/pm_domain.h> #include <linux/scmi_protocol.h> +#include <linux/of.h> + static const struct scmi_power_proto_ops *power_ops; @@ -47,14 +49,20 @@ static int scmi_pd_power_off(struct generic_pm_domain *domain) static int scmi_pm_domain_probe(struct scmi_device *sdev) { - int num_domains, i; struct device *dev = &sdev->dev; - struct device_node *np = dev->of_node; + struct device_node *np; struct scmi_pm_domain *scmi_pd; - struct genpd_onecell_data *scmi_pd_data; - struct generic_pm_domain **domains; + struct of_phandle_args args; const struct scmi_handle *handle = sdev->handle; struct scmi_protocol_handle *ph; + struct genpd_onecell_data *scmi_pd_data; + struct generic_pm_domain **domains; + int max_id = -1; + int index, num_domains; + ktime_t start_time = ktime_get(); + unsigned long *domain_ids; + + dev_err(dev, "Starting optimized SCMI power domain probe\n"); if (!handle) return -ENODEV; @@ -69,54 +77,90 @@ static int scmi_pm_domain_probe(struct scmi_device *sdev) return num_domains; } - scmi_pd = devm_kcalloc(dev, num_domains, sizeof(*scmi_pd), GFP_KERNEL); - if (!scmi_pd) + domain_ids = devm_bitmap_zalloc(dev, num_domains, GFP_KERNEL); + if (!domain_ids) return -ENOMEM; + /* Find referenced domain IDs and mark them in bitmap */ + for_each_node_with_property(np, "power-domains") { + index = 0; + while (!of_parse_phandle_with_args(np, "power-domains", + "#power-domain-cells", + index, &args)) { + if (args.args_count >= 1 && args.np == dev->of_node) { + int id = args.args[0]; + if (id < num_domains) { + set_bit(id, domain_ids); + max_id = max(max_id, id); + dev_dbg(dev, "Found power domain reference %d from node %pOF\n", + id, np); + } + } + of_node_put(args.np); + index++; + } + } + + if (max_id < 0) { + dev_warn(dev, "No power domains referenced in device tree\n"); + /* Create provider anyway as domains might be referenced later */ + max_id = 0; + } + + dev_warn(dev, "Highest referenced domain ID: %d\n", max_id); + scmi_pd_data = devm_kzalloc(dev, sizeof(*scmi_pd_data), GFP_KERNEL); if (!scmi_pd_data) return -ENOMEM; - domains = devm_kcalloc(dev, num_domains, sizeof(*domains), GFP_KERNEL); + domains = devm_kcalloc(dev, max_id + 1, sizeof(*domains), GFP_KERNEL); if (!domains) return -ENOMEM; - for (i = 0; i < num_domains; i++, scmi_pd++) { + scmi_pd = devm_kcalloc(dev, max_id + 1, sizeof(*scmi_pd), GFP_KERNEL); + if (!scmi_pd) + return -ENOMEM; + + /* Initialize only referenced domains */ + for_each_set_bit(index, domain_ids, num_domains) { u32 state; - if (power_ops->state_get(ph, i, &state)) { - dev_warn(dev, "failed to get state for domain %d\n", i); + if (power_ops->state_get(ph, index, &state)) { + dev_err(dev, "Domain %d not available\n", index); continue; } - /* - * Register the explicit power on request to the firmware so - * that it is tracked as used by OSPM agent and not - * accidentally turned off with OSPM's knowledge - */ - if (state == SCMI_POWER_STATE_GENERIC_ON) - power_ops->state_set(ph, i, state); + dev_warn(dev, "Initializing referenced domain %d\n", index); - scmi_pd->domain = i; + scmi_pd->domain = index; scmi_pd->ph = ph; - scmi_pd->name = power_ops->name_get(ph, i); + scmi_pd->name = power_ops->name_get(ph, index); scmi_pd->genpd.name = scmi_pd->name; scmi_pd->genpd.power_off = scmi_pd_power_off; scmi_pd->genpd.power_on = scmi_pd_power_on; scmi_pd->genpd.flags = GENPD_FLAG_ACTIVE_WAKEUP; + if (state == SCMI_POWER_STATE_GENERIC_ON) { + dev_warn(dev, "Domain %d is ON, registering state\n", index); + power_ops->state_set(ph, index, state); + } + pm_genpd_init(&scmi_pd->genpd, NULL, state == SCMI_POWER_STATE_GENERIC_OFF); - domains[i] = &scmi_pd->genpd; + domains[index] = &scmi_pd->genpd; + scmi_pd++; } scmi_pd_data->domains = domains; - scmi_pd_data->num_domains = num_domains; + scmi_pd_data->num_domains = max_id + 1; dev_set_drvdata(dev, scmi_pd_data); - return of_genpd_add_provider_onecell(np, scmi_pd_data); + dev_err(dev, "SCMI power domains probe completed in %lld us\n", + ktime_us_delta(ktime_get(), start_time)); + + return of_genpd_add_provider_onecell(dev->of_node, scmi_pd_data); } static void scmi_pm_domain_remove(struct scmi_device *sdev)
Optimize the SCMI power domain driver to only initialize domains that are actually referenced in the device tree. Previously, the driver would initialize all possible domains up to the maximum ID, which could lead to unnecessary firmware calls and longer probe times. Key changes: - Scan device tree to identify which power domains are actually referenced - Use bitmap to track needed domains instead of initializing all - Only perform state queries and initialization for referenced domains - Maintain proper array sizing for power domain framework compatibility - Keep full provider structure to support late binding This optimization reduces probe time and unnecessary firmware interactions by only initializing power domains that are actually used in the system. For example, in a system with 100 possible domains but only 3 referenced in the device tree, we now only initialize those 3 domains instead of all 100. Signed-off-by: Dhruva Gole <d-gole@ti.com> --- Hi all, The approach of doing a lazy init was briefly proposed in this [1] 2024 Embedded Open Source talk. It was also brought up in the monthly ARM SCMI meetings that take place and it didn't recieve too much opposition. This greatly helps to improve the boot time, and I have some data to back this up as well. This[2] experiment was done on a TI AM62L SoC (which is yet to make it upstream) to measure the time taken in the scmi pm domain probe function when it does a full 0..N scmi pd init vs just the ones being used in the device tree. If you have any feedback on this, please let me know. If not, I will go ahead and post a "non-RFC" patch assuming everyone is mostly on board with this. Also request other SCMI consumers to test this out as much as possible to see if it breaks in any situations. [1] https://static.sched.com/hosted_files/eoss24/2f/ARM_SCMI_Primer_Dhruva_kamlesh.pdf [2] https://gist.github.com/DhruvaG2000/75d6a4c31e817f56587537b022761c45 Applies cleanly on top of next-20250530, built and tested on top of the same --- drivers/pmdomain/arm/scmi_pm_domain.c | 88 ++++++++++++++++++++------- 1 file changed, 66 insertions(+), 22 deletions(-)