diff mbox

clk: scpi: error when clock fails to register

Message ID 20170628135345.9337-1-jbrunet@baylibre.com
State Accepted
Commit 2b286b09a048df80fd5f7dfc5057c2837679a1ab
Headers show

Commit Message

Jerome Brunet June 28, 2017, 1:53 p.m. UTC
Current implementation of scpi_clk_add just print a warning when clock
fails to register but then keep going as if nothing happened. The
provider is then registered with bogus data.

This may latter lead to an Oops in __clk_create_clk when
hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

This patch fixes the issue and errors if a clock fails to register.

Fixes: cd52c2a4b5c4 ("clk: add support for clocks provided by SCP(System Control Processor)")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

---
 drivers/clk/clk-scpi.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

-- 
2.9.4

Comments

Sudeep Holla June 28, 2017, 3:04 p.m. UTC | #1
On 28/06/17 14:53, Jerome Brunet wrote:
> Current implementation of scpi_clk_add just print a warning when clock

> fails to register but then keep going as if nothing happened. The

> provider is then registered with bogus data.

> 

> This may latter lead to an Oops in __clk_create_clk when

> hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

> 


What's the path of this call ? I see one in devm_clk_hw_register
but that's one which failed.

Also one of the reason for keeping it continuing is, if firmware fails
on some non-critical clock, that's fine rather than punishing the entire
set of clocks and may even fail the boot.

-- 
Regards,
Sudeep
Jerome Brunet June 28, 2017, 3:38 p.m. UTC | #2
On Wed, 2017-06-28 at 16:04 +0100, Sudeep Holla wrote:
> 

> On 28/06/17 14:53, Jerome Brunet wrote:

> > Current implementation of scpi_clk_add just print a warning when clock

> > fails to register but then keep going as if nothing happened. The

> > provider is then registered with bogus data.

> > 

> > This may latter lead to an Oops in __clk_create_clk when

> > hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

> > 

> What's the path of this call ? I see one in devm_clk_hw_register

> but that's one which failed.

> 


bL cpu freq driver requesting the cpu clock, which failed to register. Here the
Oops call trace:

[    2.202284] [<ffff00000849a058>] __clk_create_clk.part.18+0x68/0xb0
[    2.208494] [<ffff00000849ac2c>] __of_clk_get_from_provider+0xfc/0x140
[    2.214962] [<ffff000008496c28>] __of_clk_get_by_name+0x100/0x118
[    2.220999] [<ffff000008496c94>] clk_get+0x2c/0x78
[    2.225744] [<ffff000008570110>] dev_pm_opp_get_opp_table+0xb0/0x118
[    2.232039] [<ffff000008570940>] dev_pm_opp_add+0x20/0x68
[    2.237388] [<ffff0000087a0f30>] scpi_init_opp_table+0xa8/0x188
[    2.243252] [<ffff0000087a0558>] _get_cluster_clk_and_freq_table+0x80/0x180
[    2.250151] [<ffff0000087a0a48>] bL_cpufreq_init+0x3f0/0x480
[    2.255758] [<ffff00000879eed8>] cpufreq_online+0xc0/0x658
[    2.261191] [<ffff00000879f500>] cpufreq_add_dev+0x78/0x88
[    2.266625] [<ffff00000855c2c4>] subsys_interface_register+0x84/0xc8
[    2.272922] [<ffff00000879e330>] cpufreq_register_driver+0x138/0x1b8
[    2.279218] [<ffff0000087a0b4c>] bL_cpufreq_register+0x74/0x120
[    2.285083] [<ffff0000087a1038>] scpi_cpufreq_probe+0x28/0x38
[    2.290776] [<ffff00000855fbf0>] platform_drv_probe+0x50/0xb8
[    2.296468] [<ffff00000855dd84>] driver_probe_device+0x21c/0x2d8

But that's not the point. The point is there is path in clk-scpi driver which
registers uninitialized data in the clock provider. That's not good. 

> Also one of the reason for keeping it continuing is, if firmware fails

> on some non-critical clock, that's fine rather than punishing the entire

> set of clocks and may even fail the boot.


I understand, but you have no way to know whether a clock is critical or not so 
this explanation looks a bit weak, plus it does not keep the boot from failing
... not for me at least.

As explained this approach is registering corrupt data in the provider when
failing. It makes the kernel Oops, it shall be fixed.

If you have a better solution later on, I don't think there would be any problem
to revert this patch.
Sudeep Holla June 28, 2017, 3:52 p.m. UTC | #3
On 28/06/17 16:38, Jerome Brunet wrote:
> On Wed, 2017-06-28 at 16:04 +0100, Sudeep Holla wrote:

>>

>> On 28/06/17 14:53, Jerome Brunet wrote:

>>> Current implementation of scpi_clk_add just print a warning when clock

>>> fails to register but then keep going as if nothing happened. The

>>> provider is then registered with bogus data.

>>>

>>> This may latter lead to an Oops in __clk_create_clk when

>>> hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

>>>

>> What's the path of this call ? I see one in devm_clk_hw_register

>> but that's one which failed.

>>

> 

> bL cpu freq driver requesting the cpu clock, which failed to register. Here the

> Oops call trace:

> 

> [    2.202284] [<ffff00000849a058>] __clk_create_clk.part.18+0x68/0xb0

> [    2.208494] [<ffff00000849ac2c>] __of_clk_get_from_provider+0xfc/0x140

> [    2.214962] [<ffff000008496c28>] __of_clk_get_by_name+0x100/0x118

> [    2.220999] [<ffff000008496c94>] clk_get+0x2c/0x78

> [    2.225744] [<ffff000008570110>] dev_pm_opp_get_opp_table+0xb0/0x118

> [    2.232039] [<ffff000008570940>] dev_pm_opp_add+0x20/0x68

> [    2.237388] [<ffff0000087a0f30>] scpi_init_opp_table+0xa8/0x188

> [    2.243252] [<ffff0000087a0558>] _get_cluster_clk_and_freq_table+0x80/0x180

> [    2.250151] [<ffff0000087a0a48>] bL_cpufreq_init+0x3f0/0x480

> [    2.255758] [<ffff00000879eed8>] cpufreq_online+0xc0/0x658

> [    2.261191] [<ffff00000879f500>] cpufreq_add_dev+0x78/0x88

> [    2.266625] [<ffff00000855c2c4>] subsys_interface_register+0x84/0xc8

> [    2.272922] [<ffff00000879e330>] cpufreq_register_driver+0x138/0x1b8

> [    2.279218] [<ffff0000087a0b4c>] bL_cpufreq_register+0x74/0x120

> [    2.285083] [<ffff0000087a1038>] scpi_cpufreq_probe+0x28/0x38

> [    2.290776] [<ffff00000855fbf0>] platform_drv_probe+0x50/0xb8

> [    2.296468] [<ffff00000855dd84>] driver_probe_device+0x21c/0x2d8

> 


Thanks for this stack. I just worked out the same path now. I did come
up with the patch as below. That should work if my understanding is correct.

> But that's not the point. The point is there is path in clk-scpi driver which

> registers uninitialized data in the clock provider. That's not good. 

> 

>> Also one of the reason for keeping it continuing is, if firmware fails

>> on some non-critical clock, that's fine rather than punishing the entire

>> set of clocks and may even fail the boot.

> 

> I understand, but you have no way to know whether a clock is critical or not so 

> this explanation looks a bit weak, plus it does not keep the boot from failing

> ... not for me at least.

> 

> As explained this approach is registering corrupt data in the provider when

> failing. It makes the kernel Oops, it shall be fixed.

> 


Agreed, I want to look at ways to fix that, hence requested you more data.

> If you have a better solution later on, I don't think there would be any problem

> to revert this patch.

> 


Sure I am not against the patch as a fix. I was just trying to better
understand the problem. I had seen the usefulness of skipping on Juno
platforms
in earlier days. Also I am now working on the new SCMI[1] specification
and just want to cover this.

---


        return of_clk_add_hw_provider(np, scpi_of_clk_src_get, clk_data);

-- 
Regards,
Sudeepdiff --git i/drivers/clk/clk-scpi.c w/drivers/clk/clk-scpi.c
index 96d37175d0ad..d83c0b84798d 100644
--- i/drivers/clk/clk-scpi.c
+++ w/drivers/clk/clk-scpi.c
@@ -245,11 +245,14 @@ static int scpi_clk_add(struct device *dev, struct
device_node *np,
                sclk->id = val;

                err = scpi_clk_ops_init(dev, match, sclk, name);
-               if (err)
+               if (err) {
                        dev_err(dev, "failed to register clock '%s'\n",
name);
-               else
+                       clk_data->clk[idx] = NULL;
+                       devm_kfree(dev, sclk);
+               } else {
                        dev_dbg(dev, "Registered clock '%s'\n", name);
-               clk_data->clk[idx] = sclk;
+                       clk_data->clk[idx] = sclk;
+               }
        }

Jerome Brunet June 28, 2017, 4:46 p.m. UTC | #4
On Wed, 2017-06-28 at 16:52 +0100, Sudeep Holla wrote:
> 

> On 28/06/17 16:38, Jerome Brunet wrote:

> > On Wed, 2017-06-28 at 16:04 +0100, Sudeep Holla wrote:

> > > 

> > > On 28/06/17 14:53, Jerome Brunet wrote:

> > > > Current implementation of scpi_clk_add just print a warning when clock

> > > > fails to register but then keep going as if nothing happened. The

> > > > provider is then registered with bogus data.

> > > > 

> > > > This may latter lead to an Oops in __clk_create_clk when

> > > > hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

> > > > 

> > > 

> > > What's the path of this call ? I see one in devm_clk_hw_register

> > > but that's one which failed.

> > > 

> > 

> > bL cpu freq driver requesting the cpu clock, which failed to register. Here

> > the

> > Oops call trace:

> > 

> > [    2.202284] [<ffff00000849a058>] __clk_create_clk.part.18+0x68/0xb0

> > [    2.208494] [<ffff00000849ac2c>] __of_clk_get_from_provider+0xfc/0x140

> > [    2.214962] [<ffff000008496c28>] __of_clk_get_by_name+0x100/0x118

> > [    2.220999] [<ffff000008496c94>] clk_get+0x2c/0x78

> > [    2.225744] [<ffff000008570110>] dev_pm_opp_get_opp_table+0xb0/0x118

> > [    2.232039] [<ffff000008570940>] dev_pm_opp_add+0x20/0x68

> > [    2.237388] [<ffff0000087a0f30>] scpi_init_opp_table+0xa8/0x188

> > [    2.243252] [<ffff0000087a0558>]

> > _get_cluster_clk_and_freq_table+0x80/0x180

> > [    2.250151] [<ffff0000087a0a48>] bL_cpufreq_init+0x3f0/0x480

> > [    2.255758] [<ffff00000879eed8>] cpufreq_online+0xc0/0x658

> > [    2.261191] [<ffff00000879f500>] cpufreq_add_dev+0x78/0x88

> > [    2.266625] [<ffff00000855c2c4>] subsys_interface_register+0x84/0xc8

> > [    2.272922] [<ffff00000879e330>] cpufreq_register_driver+0x138/0x1b8

> > [    2.279218] [<ffff0000087a0b4c>] bL_cpufreq_register+0x74/0x120

> > [    2.285083] [<ffff0000087a1038>] scpi_cpufreq_probe+0x28/0x38

> > [    2.290776] [<ffff00000855fbf0>] platform_drv_probe+0x50/0xb8

> > [    2.296468] [<ffff00000855dd84>] driver_probe_device+0x21c/0x2d8

> > 

> 

> Thanks for this stack. I just worked out the same path now. I did come

> up with the patch as below. That should work if my understanding is correct.


I tried.
It does not work unfortunately. Still crashes but somewhere else:
[    2.301482] [<ffff00000849e67c>] scpi_of_clk_src_get+0x14/0x58
[    2.307261] [<ffff000008495f40>] __of_clk_get_by_name+0x100/0x118
[    2.313297] [<ffff000008495fac>] clk_get+0x2c/0x78
[    2.318044] [<ffff00000856f4d0>] dev_pm_opp_get_opp_table+0xb0/0x118
[    2.324338] [<ffff00000856fd00>] dev_pm_opp_add+0x20/0x68
[    2.329687] [<ffff0000087a04f8>] scpi_init_opp_table+0xa8/0x188
[    2.335550] [<ffff00000879fb20>] _get_cluster_clk_and_freq_table+0x80/0x180
[    2.342450] [<ffff0000087a0010>] bL_cpufreq_init+0x3f0/0x480
[    2.348056] [<ffff00000879e4a0>] cpufreq_online+0xc0/0x658
[    2.353490] [<ffff00000879eac8>] cpufreq_add_dev+0x78/0x88
[    2.358924] [<ffff00000855b684>] subsys_interface_register+0x84/0xc8
[    2.365220] [<ffff00000879d8f8>] cpufreq_register_driver+0x138/0x1b8
[    2.371516] [<ffff0000087a0114>] bL_cpufreq_register+0x74/0x120
[    2.377381] [<ffff0000087a0600>] scpi_cpufreq_probe+0x28/0x38
[    2.383076] [<ffff00000855efb0>] platform_drv_probe+0x50/0xb8
[    2.388766] [<ffff00000855d144>] driver_probe_device+0x21c/0x2d8

I have not looked at ALL the clock providers, but I have seen a few and I don't
remember seeing any which fails, at some point, to register a clocks and still
register successfully. 

It seems strange to continue with a broken controller.

> 

> > But that's not the point. The point is there is path in clk-scpi driver

> > which

> > registers uninitialized data in the clock provider. That's not good. 

> > 

> > > Also one of the reason for keeping it continuing is, if firmware fails

> > > on some non-critical clock, that's fine rather than punishing the entire

> > > set of clocks and may even fail the boot.

> > 

> > I understand, but you have no way to know whether a clock is critical or not

> > so 

> > this explanation looks a bit weak, plus it does not keep the boot from

> > failing

> > ... not for me at least.

> > 

> > As explained this approach is registering corrupt data in the provider when

> > failing. It makes the kernel Oops, it shall be fixed.

> > 

> 

> Agreed, I want to look at ways to fix that, hence requested you more data.

> 

> > If you have a better solution later on, I don't think there would be any

> > problem

> > to revert this patch.

> > 

> 

> Sure I am not against the patch as a fix. I was just trying to better

> understand the problem. I had seen the usefulness of skipping on Juno

> platforms

> in earlier days. Also I am now working on the new SCMI[1] specification

> and just want to cover this.

> 

> ---

> 

> diff --git i/drivers/clk/clk-scpi.c w/drivers/clk/clk-scpi.c

> index 96d37175d0ad..d83c0b84798d 100644

> --- i/drivers/clk/clk-scpi.c

> +++ w/drivers/clk/clk-scpi.c

> @@ -245,11 +245,14 @@ static int scpi_clk_add(struct device *dev, struct

> device_node *np,

>                 sclk->id = val;

> 

>                 err = scpi_clk_ops_init(dev, match, sclk, name);

> -               if (err)

> +               if (err) {

>                         dev_err(dev, "failed to register clock '%s'\n",

> name);

> -               else

> +                       clk_data->clk[idx] = NULL;

> +                       devm_kfree(dev, sclk);

> +               } else {

>                         dev_dbg(dev, "Registered clock '%s'\n", name);

> -               clk_data->clk[idx] = sclk;

> +                       clk_data->clk[idx] = sclk;

> +               }

>         }

> 

>         return of_clk_add_hw_provider(np, scpi_of_clk_src_get, clk_data);

>
Sudeep Holla June 28, 2017, 5:07 p.m. UTC | #5
On 28/06/17 17:46, Jerome Brunet wrote:
> On Wed, 2017-06-28 at 16:52 +0100, Sudeep Holla wrote:


[..]

>>

>> Thanks for this stack. I just worked out the same path now. I did come

>> up with the patch as below. That should work if my understanding is correct.

> 

> I tried.


Thanks.

> It does not work unfortunately. Still crashes but somewhere else:

> [    2.301482] [<ffff00000849e67c>] scpi_of_clk_src_get+0x14/0x58

> [    2.307261] [<ffff000008495f40>] __of_clk_get_by_name+0x100/0x118

> [    2.313297] [<ffff000008495fac>] clk_get+0x2c/0x78

> [    2.318044] [<ffff00000856f4d0>] dev_pm_opp_get_opp_table+0xb0/0x118

> [    2.324338] [<ffff00000856fd00>] dev_pm_opp_add+0x20/0x68

> [    2.329687] [<ffff0000087a04f8>] scpi_init_opp_table+0xa8/0x188

> [    2.335550] [<ffff00000879fb20>] _get_cluster_clk_and_freq_table+0x80/0x180

> [    2.342450] [<ffff0000087a0010>] bL_cpufreq_init+0x3f0/0x480

> [    2.348056] [<ffff00000879e4a0>] cpufreq_online+0xc0/0x658

> [    2.353490] [<ffff00000879eac8>] cpufreq_add_dev+0x78/0x88

> [    2.358924] [<ffff00000855b684>] subsys_interface_register+0x84/0xc8

> [    2.365220] [<ffff00000879d8f8>] cpufreq_register_driver+0x138/0x1b8

> [    2.371516] [<ffff0000087a0114>] bL_cpufreq_register+0x74/0x120

> [    2.377381] [<ffff0000087a0600>] scpi_cpufreq_probe+0x28/0x38

> [    2.383076] [<ffff00000855efb0>] platform_drv_probe+0x50/0xb8

> [    2.388766] [<ffff00000855d144>] driver_probe_device+0x21c/0x2d8

> 


Looks like a different route and I know why. I have added an extra check
now which should work if I have not missed anything more.

> I have not looked at ALL the clock providers, but I have seen a few and I don't

> remember seeing any which fails, at some point, to register a clocks and still

> register successfully. 

> 


No problem, as I said I am fine with the patch you sent as a fix for now
but just curious to know what are the issues to be fixed to continue
supporting that feature. Please bear with me.

> It seems strange to continue with a broken controller.

> 


I would have agreed if it was single driver or h/w controlled by Linux.
Since it's in the firmware, we should allow the working clocks/opps to
work though few are broken. It's not good if we had to disable
everything if some piece of firmware is not yet ready or broken.
But again, we can get it working later, for now, I am fine with you patch.

Regards,
Sudeep

---


@@ -245,11 +245,14 @@ static int scpi_clk_add(struct device *dev, struct
device_node *np,
                sclk->id = val;

                err = scpi_clk_ops_init(dev, match, sclk, name);
-               if (err)
+               if (err) {
                        dev_err(dev, "failed to register clock '%s'\n",
name);
-               else
+                       clk_data->clk[idx] = NULL;
+                       devm_kfree(dev, sclk);
+               } else {
                        dev_dbg(dev, "Registered clock '%s'\n", name);
-               clk_data->clk[idx] = sclk;
+                       clk_data->clk[idx] = sclk;
+               }
        }

        return of_clk_add_hw_provider(np, scpi_of_clk_src_get, clk_data);diff --git i/drivers/clk/clk-scpi.c w/drivers/clk/clk-scpi.c
index 96d37175d0ad..a0b9b4c84be3 100644
--- i/drivers/clk/clk-scpi.c
+++ w/drivers/clk/clk-scpi.c
@@ -192,7 +192,7 @@ scpi_of_clk_src_get(struct of_phandle_args *clkspec,
void *data)

        for (count = 0; count < clk_data->clk_num; count++) {
                sclk = clk_data->clk[count];
-               if (idx == sclk->id)
+               if (sclk && idx == sclk->id)
                        return &sclk->hw;
        }

Stephen Boyd June 28, 2017, 10:33 p.m. UTC | #6
On 06/28, Sudeep Holla wrote:
> everything if some piece of firmware is not yet ready or broken.

> But again, we can get it working later, for now, I am fine with you patch.

> 


So that's an acked-by, reviewed-by tag for the original patch?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Jerome Brunet June 29, 2017, 8:50 a.m. UTC | #7
On Wed, 2017-06-28 at 18:07 +0100, Sudeep Holla wrote:
> 

> On 28/06/17 17:46, Jerome Brunet wrote:

> > On Wed, 2017-06-28 at 16:52 +0100, Sudeep Holla wrote:

> 

> [..]

> 

> > > 

> > > Thanks for this stack. I just worked out the same path now. I did come

> > > up with the patch as below. That should work if my understanding is

> > > correct.

> > 

> > I tried.

> 

> Thanks.

> 

> > It does not work unfortunately. Still crashes but somewhere else:

> > [    2.301482] [<ffff00000849e67c>] scpi_of_clk_src_get+0x14/0x58

> > [    2.307261] [<ffff000008495f40>] __of_clk_get_by_name+0x100/0x118

> > [    2.313297] [<ffff000008495fac>] clk_get+0x2c/0x78

> > [    2.318044] [<ffff00000856f4d0>] dev_pm_opp_get_opp_table+0xb0/0x118

> > [    2.324338] [<ffff00000856fd00>] dev_pm_opp_add+0x20/0x68

> > [    2.329687] [<ffff0000087a04f8>] scpi_init_opp_table+0xa8/0x188

> > [    2.335550] [<ffff00000879fb20>]

> > _get_cluster_clk_and_freq_table+0x80/0x180

> > [    2.342450] [<ffff0000087a0010>] bL_cpufreq_init+0x3f0/0x480

> > [    2.348056] [<ffff00000879e4a0>] cpufreq_online+0xc0/0x658

> > [    2.353490] [<ffff00000879eac8>] cpufreq_add_dev+0x78/0x88

> > [    2.358924] [<ffff00000855b684>] subsys_interface_register+0x84/0xc8

> > [    2.365220] [<ffff00000879d8f8>] cpufreq_register_driver+0x138/0x1b8

> > [    2.371516] [<ffff0000087a0114>] bL_cpufreq_register+0x74/0x120

> > [    2.377381] [<ffff0000087a0600>] scpi_cpufreq_probe+0x28/0x38

> > [    2.383076] [<ffff00000855efb0>] platform_drv_probe+0x50/0xb8

> > [    2.388766] [<ffff00000855d144>] driver_probe_device+0x21c/0x2d8

> > 

> 

> Looks like a different route and I know why. I have added an extra check

> now which should work if I have not missed anything more.

> 

> > I have not looked at ALL the clock providers, but I have seen a few and I

> > don't

> > remember seeing any which fails, at some point, to register a clocks and

> > still

> > register successfully. 

> > 

> 

> No problem, as I said I am fine with the patch you sent as a fix for now

> but just curious to know what are the issues to be fixed to continue

> supporting that feature. Please bear with me.


I am :) and I understand what you are trying to do, having a degraded clock
provider is better than nothing according to you, correct?

I'm wondering whether this is correct or not, that why I'm challenging this a
bit.

If you failed to register an scpi clock it is probably because the communication
with the FW is not working, or at least 'not that good', right ?

If for some reason, you manage to register some other clocks from the same FW,
how confident can you be that communication will be ok for them ? that the
settings you request will be applied correctly ?

Is it possible that you may be causing more harm/damage playing with a broken HW
?

> 

> > It seems strange to continue with a broken controller.

> > 

> 

> I would have agreed if it was single driver or h/w controlled by Linux.

> Since it's in the firmware, we should allow the working clocks/opps to

> work though few are broken. It's not good if we had to disable

> everything if some piece of firmware is not yet ready or broken.

> But again, we can get it working later, for now, I am fine with you patch.


I tried your last version, and it does not Oops, at least not for me.

The end result still looks odd to me:
[    1.115219] scpi_clocks scpi:clocks: failed to register clock 'vcpu'
[    1.159490] cpu cpu0: _get_cluster_clk_and_freq_table: Failed to get clk for
cpu: 0, cluster: 0
[    1.162986] cpu cpu0: _get_cluster_clk_and_freq_table: Failed to get data for
cluster: 0
[    1.170945] cpu cpu1: _get_cluster_clk_and_freq_table: Failed to get clk for
cpu: 1, cluster: 0
[    1.179634] cpu cpu1: _get_cluster_clk_and_freq_table: Failed to get data for
cluster: 0
[    1.187654] cpu cpu2: _get_cluster_clk_and_freq_table: Failed to get clk for
cpu: 2, cluster: 0
[    1.196284] cpu cpu2: _get_cluster_clk_and_freq_table: Failed to get data for
cluster: 0
[    1.204375] cpu cpu3: _get_cluster_clk_and_freq_table: Failed to get clk for
cpu: 3, cluster: 0
[    1.212911] cpu cpu3: _get_cluster_clk_and_freq_table: Failed to get data for
cluster: 0
[    1.220612] arm_big_little: bL_cpufreq_register: Registered platform driver:
scpi

So now, I have an scpi clock provider which registers successfully but fails to
register its only clock. As a consequence, I also have a cpufreq driver which
manages to register but has no clock cpu clock to drive ...

> 

> Regards,

> Sudeep

> 

> ---

> 

> diff --git i/drivers/clk/clk-scpi.c w/drivers/clk/clk-scpi.c

> index 96d37175d0ad..a0b9b4c84be3 100644

> --- i/drivers/clk/clk-scpi.c

> +++ w/drivers/clk/clk-scpi.c

> @@ -192,7 +192,7 @@ scpi_of_clk_src_get(struct of_phandle_args *clkspec,

> void *data)

> 

>         for (count = 0; count < clk_data->clk_num; count++) {

>                 sclk = clk_data->clk[count];

> -               if (idx == sclk->id)

> +               if (sclk && idx == sclk->id)

>                         return &sclk->hw;

>         }

> 

> @@ -245,11 +245,14 @@ static int scpi_clk_add(struct device *dev, struct

> device_node *np,

>                 sclk->id = val;

> 

>                 err = scpi_clk_ops_init(dev, match, sclk, name);

> -               if (err)

> +               if (err) {

>                         dev_err(dev, "failed to register clock '%s'\n",

> name);

> -               else

> +                       clk_data->clk[idx] = NULL;

> +                       devm_kfree(dev, sclk);

> +               } else {

>                         dev_dbg(dev, "Registered clock '%s'\n", name);

> -               clk_data->clk[idx] = sclk;

> +                       clk_data->clk[idx] = sclk;

> +               }

>         }

> 

>         return of_clk_add_hw_provider(np, scpi_of_clk_src_get, clk_data);

> 

> --

> To unsubscribe from this list: send the line "unsubscribe linux-clk" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla June 29, 2017, 9:03 a.m. UTC | #8
Hi Jerome,

Thanks for the fix.

On 28/06/17 14:53, Jerome Brunet wrote:
> Current implementation of scpi_clk_add just print a warning when clock

> fails to register but then keep going as if nothing happened. The

> provider is then registered with bogus data.

> 

> This may latter lead to an Oops in __clk_create_clk when

> hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

> 

> This patch fixes the issue and errors if a clock fails to register.

> 

> Fixes: cd52c2a4b5c4 ("clk: add support for clocks provided by SCP(System Control Processor)")


Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>


-- 
Regards,
Sudeep
Sudeep Holla June 29, 2017, 9:12 a.m. UTC | #9
Hi Jerome,

On 29/06/17 09:50, Jerome Brunet wrote:
> On Wed, 2017-06-28 at 18:07 +0100, Sudeep Holla wrote:

>>

>> On 28/06/17 17:46, Jerome Brunet wrote:

>>> On Wed, 2017-06-28 at 16:52 +0100, Sudeep Holla wrote:

>>

>> [..]

>>

>>>>

>>>> Thanks for this stack. I just worked out the same path now. I did come

>>>> up with the patch as below. That should work if my understanding is

>>>> correct.

>>>

>>> I tried.

>>

>> Thanks.

>>

>>> It does not work unfortunately. Still crashes but somewhere else:

>>> [    2.301482] [<ffff00000849e67c>] scpi_of_clk_src_get+0x14/0x58

>>> [    2.307261] [<ffff000008495f40>] __of_clk_get_by_name+0x100/0x118

>>> [    2.313297] [<ffff000008495fac>] clk_get+0x2c/0x78

>>> [    2.318044] [<ffff00000856f4d0>] dev_pm_opp_get_opp_table+0xb0/0x118

>>> [    2.324338] [<ffff00000856fd00>] dev_pm_opp_add+0x20/0x68

>>> [    2.329687] [<ffff0000087a04f8>] scpi_init_opp_table+0xa8/0x188

>>> [    2.335550] [<ffff00000879fb20>]

>>> _get_cluster_clk_and_freq_table+0x80/0x180

>>> [    2.342450] [<ffff0000087a0010>] bL_cpufreq_init+0x3f0/0x480

>>> [    2.348056] [<ffff00000879e4a0>] cpufreq_online+0xc0/0x658

>>> [    2.353490] [<ffff00000879eac8>] cpufreq_add_dev+0x78/0x88

>>> [    2.358924] [<ffff00000855b684>] subsys_interface_register+0x84/0xc8

>>> [    2.365220] [<ffff00000879d8f8>] cpufreq_register_driver+0x138/0x1b8

>>> [    2.371516] [<ffff0000087a0114>] bL_cpufreq_register+0x74/0x120

>>> [    2.377381] [<ffff0000087a0600>] scpi_cpufreq_probe+0x28/0x38

>>> [    2.383076] [<ffff00000855efb0>] platform_drv_probe+0x50/0xb8

>>> [    2.388766] [<ffff00000855d144>] driver_probe_device+0x21c/0x2d8

>>>

>>

>> Looks like a different route and I know why. I have added an extra check

>> now which should work if I have not missed anything more.

>>

>>> I have not looked at ALL the clock providers, but I have seen a few and I

>>> don't

>>> remember seeing any which fails, at some point, to register a clocks and

>>> still

>>> register successfully. 

>>>

>>

>> No problem, as I said I am fine with the patch you sent as a fix for now

>> but just curious to know what are the issues to be fixed to continue

>> supporting that feature. Please bear with me.

> 

> I am :) and I understand what you are trying to do, having a degraded clock

> provider is better than nothing according to you, correct?

> 

> I'm wondering whether this is correct or not, that why I'm challenging this a

> bit.

> 


Fair enough. But the situation I had on my platform is that it provides
DVFS support for 2 CPU clusters and 1 GPU domain. I didn't want to block
using CPUFreq until GPU DVFS was properly supported in the firmware.
I had similar situation with the clock and hence I allowed it to continue.

> If you failed to register an scpi clock it is probably because the communication

> with the FW is not working, or at least 'not that good', right ?

> 


Not exactly, what if the error is for that particular clock. That's my
point. If we have reached so far means the communication is fine. Just a
fault piece of hardware which may not be critical.

> If for some reason, you manage to register some other clocks from the same FW,

> how confident can you be that communication will be ok for them ? that the

> settings you request will be applied correctly ?

> 


Not sure, I am not registering the clock. Think SCPI as a single clock
provider with multiple clock outputs. You don't want to disable it
entirely if one of the clock outputs have problem. That's my counter
argument.

> Is it possible that you may be causing more harm/damage playing with a broken HW

> ?

> 

Not sure how if we are not registering that clock output from the h/w
clock provider perspective.

>>

>>> It seems strange to continue with a broken controller.

>>>

>>

>> I would have agreed if it was single driver or h/w controlled by Linux.

>> Since it's in the firmware, we should allow the working clocks/opps to

>> work though few are broken. It's not good if we had to disable

>> everything if some piece of firmware is not yet ready or broken.

>> But again, we can get it working later, for now, I am fine with you patch.

> 

> I tried your last version, and it does not Oops, at least not for me.

> 

> The end result still looks odd to me:

> [    1.115219] scpi_clocks scpi:clocks: failed to register clock 'vcpu'

> [    1.159490] cpu cpu0: _get_cluster_clk_and_freq_table: Failed to get clk for

> cpu: 0, cluster: 0

> [    1.162986] cpu cpu0: _get_cluster_clk_and_freq_table: Failed to get data for

> cluster: 0

> [    1.170945] cpu cpu1: _get_cluster_clk_and_freq_table: Failed to get clk for

> cpu: 1, cluster: 0

> [    1.179634] cpu cpu1: _get_cluster_clk_and_freq_table: Failed to get data for

> cluster: 0

> [    1.187654] cpu cpu2: _get_cluster_clk_and_freq_table: Failed to get clk for

> cpu: 2, cluster: 0

> [    1.196284] cpu cpu2: _get_cluster_clk_and_freq_table: Failed to get data for

> cluster: 0

> [    1.204375] cpu cpu3: _get_cluster_clk_and_freq_table: Failed to get clk for

> cpu: 3, cluster: 0

> [    1.212911] cpu cpu3: _get_cluster_clk_and_freq_table: Failed to get data for

> cluster: 0

> [    1.220612] arm_big_little: bL_cpufreq_register: Registered platform driver:

> scpi

> 

> So now, I have an scpi clock provider which registers successfully but fails to

> register its only clock. As a consequence, I also have a cpufreq driver which

> manages to register but has no clock cpu clock to drive ...

> 


Yes, I agree the above is not entirely acceptable situation.

-- 
Regards,
Sudeep
Stephen Boyd June 30, 2017, 12:25 a.m. UTC | #10
On 06/28, Jerome Brunet wrote:
> Current implementation of scpi_clk_add just print a warning when clock

> fails to register but then keep going as if nothing happened. The

> provider is then registered with bogus data.

> 

> This may latter lead to an Oops in __clk_create_clk when

> hlist_add_head(&clk->clks_node, &hw->core->clks) is called.

> 

> This patch fixes the issue and errors if a clock fails to register.

> 

> Fixes: cd52c2a4b5c4 ("clk: add support for clocks provided by SCP(System Control Processor)")

> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

> ---


Applied to clk-next

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
diff mbox

Patch

diff --git a/drivers/clk/clk-scpi.c b/drivers/clk/clk-scpi.c
index 96d37175d0ad..e44b5ca91fed 100644
--- a/drivers/clk/clk-scpi.c
+++ b/drivers/clk/clk-scpi.c
@@ -245,10 +245,12 @@  static int scpi_clk_add(struct device *dev, struct device_node *np,
 		sclk->id = val;
 
 		err = scpi_clk_ops_init(dev, match, sclk, name);
-		if (err)
+		if (err) {
 			dev_err(dev, "failed to register clock '%s'\n", name);
-		else
-			dev_dbg(dev, "Registered clock '%s'\n", name);
+			return err;
+		}
+
+		dev_dbg(dev, "Registered clock '%s'\n", name);
 		clk_data->clk[idx] = sclk;
 	}