diff mbox series

Revert "clk: fix __clk_init_parent() for single parent clocks"

Message ID 20181204163257.32085-1-jbrunet@baylibre.com
State New
Headers show
Series Revert "clk: fix __clk_init_parent() for single parent clocks" | expand

Commit Message

Jerome Brunet Dec. 4, 2018, 4:32 p.m. UTC
This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64.

From the original commit message:
  It turned out a problem because there are some single-parent clocks
  that implement .get_parent() callback and return non-zero index.
  The SOCFPGA clock is the case; the commit broke the SOCFPGA boards.

It is wrong for a clock to return an index >= num_parents. CCF checks
for this condition in clk_core_get_parent_by_index(). This function sets
the parent to NULL if index is incoherent, which seems like the only
sane choice.

commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks")
appears to be a work around installed in the core framework for a problem
that is platform specific, and should probably be fixed in the platform code.

Further more, it introduces a problem in a corner case of the mux clock.
Take mux with multiple parents, but only one is known, the rest being
undocumented. The register reset has one of unknown parent set.

Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"):
 * get_parent() is called, register is read and give an unknown index.
   -> the mux is orphaned.
 * a call to set_rate() will reparent the mux to the only known parent.

With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"):
 * the register is never read.
 * parent is wrongly assumed to be the only known one.
   As a consequence, all the calculation deriving from the mux will be
   wrong
 * Since we believe the only know parent to be set, set_parent() won't
   ever be called and the register won't be set with the correct value.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

---
 drivers/clk/clk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.19.1

Comments

Stephen Boyd Dec. 4, 2018, 6:05 p.m. UTC | #1
Quoting Jerome Brunet (2018-12-04 08:32:57)
> This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64.

> 

> From the original commit message:

>   It turned out a problem because there are some single-parent clocks

>   that implement .get_parent() callback and return non-zero index.

>   The SOCFPGA clock is the case; the commit broke the SOCFPGA boards.

> 

> It is wrong for a clock to return an index >= num_parents. CCF checks

> for this condition in clk_core_get_parent_by_index(). This function sets

> the parent to NULL if index is incoherent, which seems like the only

> sane choice.

> 

> commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks")

> appears to be a work around installed in the core framework for a problem

> that is platform specific, and should probably be fixed in the platform code.


Ouch. I see that I even pointed that out in 2016 but never got a reply
or a fix patch[1].

> 

> Further more, it introduces a problem in a corner case of the mux clock.

> Take mux with multiple parents, but only one is known, the rest being

> undocumented. The register reset has one of unknown parent set.

> 

> Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"):

>  * get_parent() is called, register is read and give an unknown index.

>    -> the mux is orphaned.

>  * a call to set_rate() will reparent the mux to the only known parent.

> 

> With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"):

>  * the register is never read.

>  * parent is wrongly assumed to be the only known one.

>    As a consequence, all the calculation deriving from the mux will be

>    wrong

>  * Since we believe the only know parent to be set, set_parent() won't

>    ever be called and the register won't be set with the correct value.


Isn't this the broken bad case all over again? Why register a clk as a
mux and then only say it has one parent?

> 

> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

> ---


Is this related to the other patch you sent? Can you send series for
related patches please?

[1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org
Jerome Brunet Dec. 4, 2018, 7:51 p.m. UTC | #2
On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote:
> Quoting Jerome Brunet (2018-12-04 08:32:57)

> > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64.

> > 

> > From the original commit message:

> >   It turned out a problem because there are some single-parent clocks

> >   that implement .get_parent() callback and return non-zero index.

> >   The SOCFPGA clock is the case; the commit broke the SOCFPGA boards.

> > 

> > It is wrong for a clock to return an index >= num_parents. CCF checks

> > for this condition in clk_core_get_parent_by_index(). This function sets

> > the parent to NULL if index is incoherent, which seems like the only

> > sane choice.

> > 

> > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent

> > clocks")

> > appears to be a work around installed in the core framework for a problem

> > that is platform specific, and should probably be fixed in the platform

> > code.

> 

> Ouch. I see that I even pointed that out in 2016 but never got a reply

> or a fix patch[1].

> 

> > Further more, it introduces a problem in a corner case of the mux clock.

> > Take mux with multiple parents, but only one is known, the rest being

> > undocumented. The register reset has one of unknown parent set.

> > 

> > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single

> > parent clocks"):

> >  * get_parent() is called, register is read and give an unknown index.

> >    -> the mux is orphaned.

> >  * a call to set_rate() will reparent the mux to the only known parent.

> > 

> > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent

> > clocks"):

> >  * the register is never read.

> >  * parent is wrongly assumed to be the only known one.

> >    As a consequence, all the calculation deriving from the mux will be

> >    wrong

> >  * Since we believe the only know parent to be set, set_parent() won't

> >    ever be called and the register won't be set with the correct value.

> 

> Isn't this the broken bad case all over again? Why register a clk as a

> mux and then only say it has one parent?


I understand it is a bit odd but as I explained it is a corner case.

We are really trying to drive a mux here, applying a values will change the
clock signal we get. Documentation being what it is, we only know one the
parent. The other parent could anything or maybe not connected at all, who
know. That is not the important part actually

If such mux was already set to the known entry by default, it would be OK to
ignore it. But if it is not, then we CCF to realise that and change the
setting accordingly.

This the case of the 'ao_cts_cec' clock in the following patch:
https://lore.kernel.org/patchwork/patch/1021028/

by default the value in the register is 0, but the only one that makes sense
for us is 1.

> 

> > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

> > ---

> 

> Is this related to the other patch you sent? Can you send series for

> related patches please?

> 

> [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org


Actually I was intially doing a series, and stopped when my cover letter
started with "those are two unrelated patches ..." ;)

I found these things while debugging the same thing but there is no deps
between them.
Stephen Boyd Dec. 5, 2018, 7:17 a.m. UTC | #3
(Adding Dinh's korg email)

I also wonder if this driver is even used anymore or maybe we can delete
it?

Quoting Stephen Boyd (2018-12-04 11:34:16)
> This driver creates a gate clk with the possibility to have multiple

> parents. That can cause problems if the common clk framework tries to

> call the get_parent() op and gets back a number that's larger than the

> number of parents the clk says it supports in

> clk_init_data::num_parents. Let's duplicate the clk_ops structure each

> time this function is called and drop the get/set parent ops when there

> is only one parent. This allows the framework to consider a number

> larger than clk_init_data::num_parents as an error condition of the

> get_parent() clk op, clearing the way for proper code.

> 

> Cc: Jerome Brunet <jbrunet@baylibre.com>

> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>

> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>

> Signed-off-by: Stephen Boyd <sboyd@kernel.org>

> ---

>  drivers/clk/socfpga/clk-gate.c | 22 +++++++++++++---------

>  1 file changed, 13 insertions(+), 9 deletions(-)

> 

> diff --git a/drivers/clk/socfpga/clk-gate.c b/drivers/clk/socfpga/clk-gate.c

> index aa7a6e6a15b6..73e03328d5c5 100644

> --- a/drivers/clk/socfpga/clk-gate.c

> +++ b/drivers/clk/socfpga/clk-gate.c

> @@ -176,8 +176,7 @@ static struct clk_ops gateclk_ops = {

>         .set_parent = socfpga_clk_set_parent,

>  };

>  

> -static void __init __socfpga_gate_init(struct device_node *node,

> -       const struct clk_ops *ops)

> +void __init socfpga_gate_init(struct device_node *node)

>  {

>         u32 clk_gate[2];

>         u32 div_reg[3];

> @@ -188,12 +187,17 @@ static void __init __socfpga_gate_init(struct device_node *node,

>         const char *clk_name = node->name;

>         const char *parent_name[SOCFPGA_MAX_PARENTS];

>         struct clk_init_data init;

> +       struct clk_ops *ops;

>         int rc;

>  

>         socfpga_clk = kzalloc(sizeof(*socfpga_clk), GFP_KERNEL);

>         if (WARN_ON(!socfpga_clk))

>                 return;

>  

> +       ops = kmemdup(&gateclk_ops, sizeof(gateclk_ops), GFP_KERNEL);

> +       if (WARN_ON(!ops))

> +               return;

> +

>         rc = of_property_read_u32_array(node, "clk-gate", clk_gate, 2);

>         if (rc)

>                 clk_gate[0] = 0;

> @@ -202,8 +206,8 @@ static void __init __socfpga_gate_init(struct device_node *node,

>                 socfpga_clk->hw.reg = clk_mgr_base_addr + clk_gate[0];

>                 socfpga_clk->hw.bit_idx = clk_gate[1];

>  

> -               gateclk_ops.enable = clk_gate_ops.enable;

> -               gateclk_ops.disable = clk_gate_ops.disable;

> +               ops->enable = clk_gate_ops.enable;

> +               ops->disable = clk_gate_ops.disable;

>         }

>  

>         rc = of_property_read_u32(node, "fixed-divider", &fixed_div);

> @@ -234,6 +238,11 @@ static void __init __socfpga_gate_init(struct device_node *node,

>         init.flags = 0;

>  

>         init.num_parents = of_clk_parent_fill(node, parent_name, SOCFPGA_MAX_PARENTS);

> +       if (init.num_parents < 2) {

> +               ops->get_parent = NULL;

> +               ops->set_parent = NULL;

> +       }

> +

>         init.parent_names = parent_name;

>         socfpga_clk->hw.hw.init = &init;

>  

> @@ -246,8 +255,3 @@ static void __init __socfpga_gate_init(struct device_node *node,

>         if (WARN_ON(rc))

>                 return;

>  }

> -

> -void __init socfpga_gate_init(struct device_node *node)

> -{

> -       __socfpga_gate_init(node, &gateclk_ops);

> -}
Stephen Boyd Dec. 5, 2018, 7:54 a.m. UTC | #4
Quoting Jerome Brunet (2018-12-04 11:51:17)
> On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote:

> > Quoting Jerome Brunet (2018-12-04 08:32:57)

> > > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64.

> > > 

> > > From the original commit message:

> > >   It turned out a problem because there are some single-parent clocks

> > >   that implement .get_parent() callback and return non-zero index.

> > >   The SOCFPGA clock is the case; the commit broke the SOCFPGA boards.

> > > 

> > > It is wrong for a clock to return an index >= num_parents. CCF checks

> > > for this condition in clk_core_get_parent_by_index(). This function sets

> > > the parent to NULL if index is incoherent, which seems like the only

> > > sane choice.

> > > 

> > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent

> > > clocks")

> > > appears to be a work around installed in the core framework for a problem

> > > that is platform specific, and should probably be fixed in the platform

> > > code.

> > 

> > Ouch. I see that I even pointed that out in 2016 but never got a reply

> > or a fix patch[1].

> > 

> > > Further more, it introduces a problem in a corner case of the mux clock.

> > > Take mux with multiple parents, but only one is known, the rest being

> > > undocumented. The register reset has one of unknown parent set.

> > > 

> > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single

> > > parent clocks"):

> > >  * get_parent() is called, register is read and give an unknown index.

> > >    -> the mux is orphaned.

> > >  * a call to set_rate() will reparent the mux to the only known parent.

> > > 

> > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent

> > > clocks"):

> > >  * the register is never read.

> > >  * parent is wrongly assumed to be the only known one.

> > >    As a consequence, all the calculation deriving from the mux will be

> > >    wrong

> > >  * Since we believe the only know parent to be set, set_parent() won't

> > >    ever be called and the register won't be set with the correct value.

> > 

> > Isn't this the broken bad case all over again? Why register a clk as a

> > mux and then only say it has one parent?

> 

> I understand it is a bit odd but as I explained it is a corner case.

> 

> We are really trying to drive a mux here, applying a values will change the

> clock signal we get. Documentation being what it is, we only know one the

> parent. The other parent could anything or maybe not connected at all, who

> know. That is not the important part actually

> 

> If such mux was already set to the known entry by default, it would be OK to

> ignore it. But if it is not, then we CCF to realise that and change the

> setting accordingly.

> 

> This the case of the 'ao_cts_cec' clock in the following patch:

> https://lore.kernel.org/patchwork/patch/1021028/

> 

> by default the value in the register is 0, but the only one that makes sense

> for us is 1.


Ok. Thanks for the explanation. What do you do to fix this problem in
the 'ao_cts_cec' clk implementation? Sounds like you just rely on
clk_set_rate() to fix this all up for you when the consumer changes the
rate?

If we have something that is default parented to something we're not
telling the framework about for some reason then one solution would be
to have some init code reparent the clk in hardware before registering
it with the framework. Basically "fix up" the clk tree in hardware to be
consistent with what software can reason about. 

This also reminds me of some audio clks I've seen before where the
default parent is some external pin and it can be reparented to an
internal clk with clk_set_rate/parent. In that case, I think we forced
the parent over to the internal clk before registering so that it would
always get parented properly in the framework. Right now, this is going
to cause problems for plans to probe defer clk_get() on orphan clks.

Maybe this could work better if we allowed
'assigned-clock-parents/rates' to reparent clks regardless of orphan
status and/or had some flag that could be set on clks to indicate that
we're OK if clk_get() is called on it when it's an orphan because this
isn't a problem?

Or we can make the defer on orphan code only defer if the clk has a
single parent and it's an orphan and return the clk if there is at least
one parent of the clk that has been registered and isn't marked as an
orphan. That would need to be recursively applied, so I guess we would
update our orphan marking code to indicate that clk_get() on the clk
should probe defer or not instead of indicating the clk's orphan status.
Doing this would also side-step the problem Rockchip was having where
certain parents never appeared but the clk was parented to it in
hardware, essentially blocking the clk from ever being touched by
consumers.

On the other hand, not having the clk there causes us to do a global
search of the clk name space each time we look for parents by the
"unknown" index, which in the Rockchip case was quite often because the
clk was changing between two other parents with a third one that never
got registered. So it may be better to just register an "unknown" clk as
a root clk with a rate of 0 that you can then hook everything up to that
is hardware reset to this unknown input. That way nothing is orphaned
and everyone is happy. We could do this for the audio case I talked
about earlier too so when the external pin is left unconnected we could
register some 'unconnected' clk and parent the audio clk to it.

> 

> > 

> > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

> > > ---

> > 

> > Is this related to the other patch you sent? Can you send series for

> > related patches please?

> > 

> > [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org

> 

> Actually I was intially doing a series, and stopped when my cover letter

> started with "those are two unrelated patches ..." ;)

> 

> I found these things while debugging the same thing but there is no deps

> between them.

> 


Ok.
Dinh Nguyen Dec. 5, 2018, 3:17 p.m. UTC | #5
Hi Stephen,

On 12/5/18 1:17 AM, Stephen Boyd wrote:
> (Adding Dinh's korg email)

> 

> I also wonder if this driver is even used anymore or maybe we can delete

> it?

> 


The armv7 SoCFPGA platforms are using this driver.

Dinh
Stephen Boyd Dec. 5, 2018, 3:55 p.m. UTC | #6
Quoting Dinh Nguyen (2018-12-05 07:17:41)
> Hi Stephen,

> 

> On 12/5/18 1:17 AM, Stephen Boyd wrote:

> > (Adding Dinh's korg email)

> > 

> > I also wonder if this driver is even used anymore or maybe we can delete

> > it?

> > 

> 

> The armv7 SoCFPGA platforms are using this driver.

> 


Ok and those platforms are booting in kernelci.org as
https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps?
Jerome Brunet Dec. 5, 2018, 5:20 p.m. UTC | #7
On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote:
> Quoting Jerome Brunet (2018-12-04 11:51:17)

> > On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote:

> > > Quoting Jerome Brunet (2018-12-04 08:32:57)

> > > > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64.

> > > > 

> > > > From the original commit message:

> > > >   It turned out a problem because there are some single-parent clocks

> > > >   that implement .get_parent() callback and return non-zero index.

> > > >   The SOCFPGA clock is the case; the commit broke the SOCFPGA boards.

> > > > 

> > > > It is wrong for a clock to return an index >= num_parents. CCF checks

> > > > for this condition in clk_core_get_parent_by_index(). This function

> > > > sets

> > > > the parent to NULL if index is incoherent, which seems like the only

> > > > sane choice.

> > > > 

> > > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent

> > > > clocks")

> > > > appears to be a work around installed in the core framework for a

> > > > problem

> > > > that is platform specific, and should probably be fixed in the

> > > > platform

> > > > code.

> > > 

> > > Ouch. I see that I even pointed that out in 2016 but never got a reply

> > > or a fix patch[1].

> > > 

> > > > Further more, it introduces a problem in a corner case of the mux

> > > > clock.

> > > > Take mux with multiple parents, but only one is known, the rest being

> > > > undocumented. The register reset has one of unknown parent set.

> > > > 

> > > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single

> > > > parent clocks"):

> > > >  * get_parent() is called, register is read and give an unknown index.

> > > >    -> the mux is orphaned.

> > > >  * a call to set_rate() will reparent the mux to the only known

> > > > parent.

> > > > 

> > > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single

> > > > parent

> > > > clocks"):

> > > >  * the register is never read.

> > > >  * parent is wrongly assumed to be the only known one.

> > > >    As a consequence, all the calculation deriving from the mux will be

> > > >    wrong

> > > >  * Since we believe the only know parent to be set, set_parent() won't

> > > >    ever be called and the register won't be set with the correct

> > > > value.

> > > 

> > > Isn't this the broken bad case all over again? Why register a clk as a

> > > mux and then only say it has one parent?

> > 

> > I understand it is a bit odd but as I explained it is a corner case.

> > 

> > We are really trying to drive a mux here, applying a values will change

> > the

> > clock signal we get. Documentation being what it is, we only know one the

> > parent. The other parent could anything or maybe not connected at all, who

> > know. That is not the important part actually

> > 

> > If such mux was already set to the known entry by default, it would be OK

> > to

> > ignore it. But if it is not, then we CCF to realise that and change the

> > setting accordingly.

> > 

> > This the case of the 'ao_cts_cec' clock in the following patch:

> > https://lore.kernel.org/patchwork/patch/1021028/

> > 

> > by default the value in the register is 0, but the only one that makes

> > sense

> > for us is 1.

> 

> Ok. Thanks for the explanation. What do you do to fix this problem in

> the 'ao_cts_cec' clk implementation?


IMO, there is nothing to fix, the implementation is correct. 

>  Sounds like you just rely on

> clk_set_rate() to fix this all up for you when the consumer changes the

> rate?


To setup the rate as the customer expect, yes.

> 

> If we have something that is default parented to something we're not

> telling the framework about for some reason then one solution would be

> to have some init code reparent the clk in hardware before registering

> it with the framework. Basically "fix up" the clk tree in hardware to be

> consistent with what software can reason about. 


Should we really ? Something the framework does not know about is not
necessarily something wrong.

It would best if we could model the tree completly, but most of the time, we
just approximate it the best we can.

The framework just knows how to setup rates, it has no idea what rate needs to
be set or when - And I think it is best if does not make any assumption about
that.

Pushing it a bit, this unknown parent might important to the boot sequence.
How would you know when it is safe to change the setting ?
What would change it to ? It Seems obvious when there is only one known
parent, but what if there two ? which one do you pick ? Is it really the role
of CCF to make this kind of choice ? 

> 

> This also reminds me of some audio clks I've seen before where the

> default parent is some external pin and it can be reparented to an

> internal clk with clk_set_rate/parent. In that case, I think we forced

> the parent over to the internal clk before registering so that it would

> always get parented properly in the framework. Right now, this is going

> to cause problems for plans to probe defer clk_get() on orphan clks.


clk_get() on orphaned clock should probably return an error if it is not the
case already ? or 0 maybe ?

> 

> Maybe this could work better if we allowed

> 'assigned-clock-parents/rates' to reparent clks regardless of orphan

> status and/or had some flag that could be set on clks to indicate that

> we're OK if clk_get() is called on it when it's an orphan because this

> isn't a problem?


Not sure I understand your point here.

> 

> Or we can make the defer on orphan code only defer if the clk has a

> single parent and it's an orphan and return the clk if there is at least

> one parent of the clk that has been registered and isn't marked as an

> orphan. That would need to be recursively applied, so I guess we would

> update our orphan marking code to indicate that clk_get() on the clk

> should probe defer or not instead of indicating the clk's orphan status.

> Doing this would also side-step the problem Rockchip was having where

> certain parents never appeared but the clk was parented to it in

> hardware, essentially blocking the clk from ever being touched by

> consumers.


... and now you lost me completly :)

> 

> On the other hand, not having the clk there causes us to do a global

> search of the clk name space each time we look for parents by the

> "unknown" index, which in the Rockchip case was quite often because the

> clk was changing between two other parents with a third one that never

> got registered. So it may be better to just register an "unknown" clk as

> a root clk with a rate of 0 that you can then hook everything up to that

> is hardware reset to this unknown input. That way nothing is orphaned

> and everyone is happy. We could do this for the audio case I talked

> about earlier too so when the external pin is left unconnected we could

> register some 'unconnected' clk and parent the audio clk to it.


We have sometimes a few orphans in the amlogic clock tree, it does not seems
to a problem.

I don't really understand the benefit of having a fake clock that would adopt
all the orphan ? You guess there is problem I'm not aware of ...

> 

> > > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>

> > > > ---

> > > 

> > > Is this related to the other patch you sent? Can you send series for

> > > related patches please?

> > > 

> > > [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org

> > 

> > Actually I was intially doing a series, and stopped when my cover letter

> > started with "those are two unrelated patches ..." ;)

> > 

> > I found these things while debugging the same thing but there is no deps

> > between them.

> > 

> 

> Ok.
Dinh Nguyen Dec. 6, 2018, 3:16 p.m. UTC | #8
On 12/5/18 9:55 AM, Stephen Boyd wrote:
> Quoting Dinh Nguyen (2018-12-05 07:17:41)

>> Hi Stephen,

>>

>> On 12/5/18 1:17 AM, Stephen Boyd wrote:

>>> (Adding Dinh's korg email)

>>>

>>> I also wonder if this driver is even used anymore or maybe we can delete

>>> it?

>>>

>>

>> The armv7 SoCFPGA platforms are using this driver.

>>

> 

> Ok and those platforms are booting in kernelci.org as

> https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps?

> 


Yes that's right.

Dinh
Stephen Boyd Dec. 6, 2018, 10:08 p.m. UTC | #9
Sorry this email is long.

TL;DR is that I don't think we need to revert the patch as you suggest.
Instead, we should fix the driver to indicate NULL as the parent name of
the index it tells the framework about.

Quoting Jerome Brunet (2018-12-05 09:20:09)
> On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote:

> > 

> > Ok. Thanks for the explanation. What do you do to fix this problem in

> > the 'ao_cts_cec' clk implementation?

> 

> IMO, there is nothing to fix, the implementation is correct. 

> 

> >  Sounds like you just rely on

> > clk_set_rate() to fix this all up for you when the consumer changes the

> > rate?

> 

> To setup the rate as the customer expect, yes.


Ok.

> 

> > 

> > If we have something that is default parented to something we're not

> > telling the framework about for some reason then one solution would be

> > to have some init code reparent the clk in hardware before registering

> > it with the framework. Basically "fix up" the clk tree in hardware to be

> > consistent with what software can reason about. 

> 

> Should we really ? Something the framework does not know about is not

> necessarily something wrong.

> 

> It would best if we could model the tree completly, but most of the time, we

> just approximate it the best we can.

> 

> The framework just knows how to setup rates, it has no idea what rate needs to

> be set or when - And I think it is best if does not make any assumption about

> that.

> 

> Pushing it a bit, this unknown parent might important to the boot sequence.

> How would you know when it is safe to change the setting ?

> What would change it to ? It Seems obvious when there is only one known

> parent, but what if there two ? which one do you pick ? Is it really the role

> of CCF to make this kind of choice ? 


Agreed. We can't mandate this requirement because of what you say. There
are pitfalls such as not always knowing if it's safe to change the clk
tree before clks are registered.

> 

> > 

> > This also reminds me of some audio clks I've seen before where the

> > default parent is some external pin and it can be reparented to an

> > internal clk with clk_set_rate/parent. In that case, I think we forced

> > the parent over to the internal clk before registering so that it would

> > always get parented properly in the framework. Right now, this is going

> > to cause problems for plans to probe defer clk_get() on orphan clks.

> 

> clk_get() on orphaned clock should probably return an error if it is not the

> case already ? or 0 maybe ?


I'd prefer that it return EPROBE_DEFER if the clk is an orphan, except
that it causes problems for this case where muxes move away from
something unknown to something known.

> 

> > 

> > Maybe this could work better if we allowed

> > 'assigned-clock-parents/rates' to reparent clks regardless of orphan

> > status and/or had some flag that could be set on clks to indicate that

> > we're OK if clk_get() is called on it when it's an orphan because this

> > isn't a problem?

> 

> Not sure I understand your point here.


There is a problem with assigned-clock DT properties and orphan clks and
making orphan clks defer driver probes.

> 

> > 

> > Or we can make the defer on orphan code only defer if the clk has a

> > single parent and it's an orphan and return the clk if there is at least

> > one parent of the clk that has been registered and isn't marked as an

> > orphan. That would need to be recursively applied, so I guess we would

> > update our orphan marking code to indicate that clk_get() on the clk

> > should probe defer or not instead of indicating the clk's orphan status.

> > Doing this would also side-step the problem Rockchip was having where

> > certain parents never appeared but the clk was parented to it in

> > hardware, essentially blocking the clk from ever being touched by

> > consumers.

> 

> ... and now you lost me completly :)


Hmm alright.

> 

> > 

> > On the other hand, not having the clk there causes us to do a global

> > search of the clk name space each time we look for parents by the

> > "unknown" index, which in the Rockchip case was quite often because the

> > clk was changing between two other parents with a third one that never

> > got registered. So it may be better to just register an "unknown" clk as

> > a root clk with a rate of 0 that you can then hook everything up to that

> > is hardware reset to this unknown input. That way nothing is orphaned

> > and everyone is happy. We could do this for the audio case I talked

> > about earlier too so when the external pin is left unconnected we could

> > register some 'unconnected' clk and parent the audio clk to it.

> 

> We have sometimes a few orphans in the amlogic clock tree, it does not seems

> to a problem.


I suppose drivers aren't calling clk_set_rate() on the orphaned clks and
experiencing problems? There are problems around clk rates that could
happen if there aren't any proper parents of a mux.

> 

> I don't really understand the benefit of having a fake clock that would adopt

> all the orphan ? You guess there is problem I'm not aware of ...


I see three cases that can go wrong if we make orphan clks probe defer
clk_get():

  1) NULL as the parent name for some parent index that the hardware
     indicates is the current parent of the clk when the clk is registered.

In this case, we need to make clk_get() succeed and not return
-EPROBE_DEFER, so I think we need to make a special "unknown" clk in the
framework so that clks aren't orphaned when the hardware is indicating
the parent is something we don't (and won't) know anything about.

  2) get_parent/set_parent on a clk with num_parents == 1

This seems to be how ao_cts_cec is implemented. This hides the fact that
get_parent may return some number above the number of parents and then
has that quietly make the parent of the clk NULL during registration so
that it's orphaned. Then we rely on set_rate() or set_parent() being
called to fix the parent chain at a later time.

That is complicated and non-obvious. I'd rather that we specify NULL as
a parent in the parent_names array if we're going to return that index
to the framework and then have the framework go to case #1. This makes
it easier for the framework to assume that [0, num_parents) is the valid
range of indices returned from the get_parent() op and a number outside
that range is an error, clearing the way for errors to be returned from
get_parent() if it somehow fails.

  3) A string as the parent name for some parent index that the hardware
     indicates is the current parent of the clk when the clk is registered,
     but that parent clk may never be registered with the framework.

I imagine this can happen with an external gpio clk that feeds some SoC
internal clk. That gpio could be supplied by some clk signal that the
board designer decides to send there, and it could also just "magically
appear" by applying some DT overlay. Due to our use of global strings
for parent linking, we don't have a good way to figure out that nothing
is connected to some clock input signal, making this quite a nightmare
to figure out a runtime!

One solution is to replace the parent name with NULL and ignore the
runtime update case. This will work and bring us back to case #1. The
problem is that if it is connected to something like a GPIO and it
defaults to that pin instead of some internal clk we're stuck with an
"orphan" clk that we can't clk_get(). I suppose the driver could read DT
and see if this pin is not connected and then mark this input parent
name as NULL. We can punt on solving the DT overlay problem until later,
because we would need some way to dynamically resolve a new parent of a
clk and update the tree when the DT node is modified to use a new GPIO.
Jerome Brunet Dec. 7, 2018, 10:03 a.m. UTC | #10
On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote:
> Sorry this email is long.


I tried reply the best I could to your use cases

> 

> TL;DR is that I don't think we need to revert the patch as you suggest.

> Instead, we should fix the driver to indicate NULL as the parent name of

> the index it tells the framework about.


The driver, can't indicate NULL, it just provides the index of the parent.
What we are saying is not very far apart.

Drivers implementing get_parent() must have way to tell the framework "I have
no idea what I'm connected to ATM"

Since get_parent() return a u8, the only way to signal a problem like this is
to return an out of bound value, which the framework already safely understand
and maps to NULL, as you request

This is done in the other patch I sent (... and now they are related ;) ...)
by returning num_parents in the clk_mux code

This is an important case, whatever the number of parent. I think it makes no
sense that a clock with 2 known parents can return this error, a one with only
one known parent can't.

This why I still think the change should be reverted.

If you really insist on keeping it, there should at least be a warning when
registering a clock with 'get_parent != NULL' and 'num_parents == 1', to let
the user know that whatever is implemented there will be ignored ... instead
of silently ignoring it. I think it would be a mistake, but at least there
would be no suprise.

Earlier, you mention having a special parent that would adopt the the orphan.
Actually, it already exists, it is the framework itself. If, instead of
returning 0, it returned a special value to let us know we reached the root
and there actually no signal, we would be easy to make sure the orphan (with
no signal) are never accidently used.

> 

> Quoting Jerome Brunet (2018-12-05 09:20:09)

> > On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote:

> > > Ok. Thanks for the explanation. What do you do to fix this problem in

> > > the 'ao_cts_cec' clk implementation?

> > 

> > IMO, there is nothing to fix, the implementation is correct. 

> > 

> > >  Sounds like you just rely on

> > > clk_set_rate() to fix this all up for you when the consumer changes the

> > > rate?

> > 

> > To setup the rate as the customer expect, yes.

> 

> Ok.

> 

> > > If we have something that is default parented to something we're not

> > > telling the framework about for some reason then one solution would be

> > > to have some init code reparent the clk in hardware before registering

> > > it with the framework. Basically "fix up" the clk tree in hardware to be

> > > consistent with what software can reason about. 

> > 

> > Should we really ? Something the framework does not know about is not

> > necessarily something wrong.

> > 

> > It would best if we could model the tree completly, but most of the time,

> > we

> > just approximate it the best we can.

> > 

> > The framework just knows how to setup rates, it has no idea what rate

> > needs to

> > be set or when - And I think it is best if does not make any assumption

> > about

> > that.

> > 

> > Pushing it a bit, this unknown parent might important to the boot

> > sequence.

> > How would you know when it is safe to change the setting ?

> > What would change it to ? It Seems obvious when there is only one known

> > parent, but what if there two ? which one do you pick ? Is it really the

> > role

> > of CCF to make this kind of choice ? 

> 

> Agreed. We can't mandate this requirement because of what you say. There

> are pitfalls such as not always knowing if it's safe to change the clk

> tree before clks are registered.

> 

> > > This also reminds me of some audio clks I've seen before where the

> > > default parent is some external pin and it can be reparented to an

> > > internal clk with clk_set_rate/parent. In that case, I think we forced

> > > the parent over to the internal clk before registering so that it would

> > > always get parented properly in the framework. Right now, this is going

> > > to cause problems for plans to probe defer clk_get() on orphan clks.

> > 

> > clk_get() on orphaned clock should probably return an error if it is not

> > the

> > case already ? or 0 maybe ?

> 

> I'd prefer that it return EPROBE_DEFER if the clk is an orphan, except

> that it causes problems for this case where muxes move away from

> something unknown to something known.

> 

> > > Maybe this could work better if we allowed

> > > 'assigned-clock-parents/rates' to reparent clks regardless of orphan

> > > status and/or had some flag that could be set on clks to indicate that

> > > we're OK if clk_get() is called on it when it's an orphan because this

> > > isn't a problem?

> > 

> > Not sure I understand your point here.

> 

> There is a problem with assigned-clock DT properties and orphan clks and

> making orphan clks defer driver probes.

> 

> > > Or we can make the defer on orphan code only defer if the clk has a

> > > single parent and it's an orphan and return the clk if there is at least

> > > one parent of the clk that has been registered and isn't marked as an

> > > orphan. That would need to be recursively applied, so I guess we would

> > > update our orphan marking code to indicate that clk_get() on the clk

> > > should probe defer or not instead of indicating the clk's orphan status.

> > > Doing this would also side-step the problem Rockchip was having where

> > > certain parents never appeared but the clk was parented to it in

> > > hardware, essentially blocking the clk from ever being touched by

> > > consumers.

> > 

> > ... and now you lost me completly :)

> 

> Hmm alright.

> 

> > > On the other hand, not having the clk there causes us to do a global

> > > search of the clk name space each time we look for parents by the

> > > "unknown" index, which in the Rockchip case was quite often because the

> > > clk was changing between two other parents with a third one that never

> > > got registered. So it may be better to just register an "unknown" clk as

> > > a root clk with a rate of 0 that you can then hook everything up to that

> > > is hardware reset to this unknown input. That way nothing is orphaned

> > > and everyone is happy. We could do this for the audio case I talked

> > > about earlier too so when the external pin is left unconnected we could

> > > register some 'unconnected' clk and parent the audio clk to it.

> > 

> > We have sometimes a few orphans in the amlogic clock tree, it does not

> > seems

> > to a problem.


To the best of my knowledge, there will always be orphan in the tree. The root
clock (usually an xtal) being a special case of this. 

It seems to me that the problem is we can't differentiate between this special
orphan and the rest because none will return an error when calling get_rate()

xtal will provide something to round_rate() while other clock which are
supposed to be orphaned will defer to the framework which gives a 0 rate.

I think this is your real issue with orphan in the framework. The framework
does not distinguish between a signal with rate == 0 and an error (no signal)
> 

> I suppose drivers aren't calling clk_set_rate() on the orphaned clks and

> experiencing problems? There are problems around clk rates that could

> happen if there aren't any proper parents of a mux.

> 

> > I don't really understand the benefit of having a fake clock that would

> > adopt

> > all the orphan ? You guess there is problem I'm not aware of ...

> 

> I see three cases that can go wrong if we make orphan clks probe defer

> clk_get():

> 

>   1) NULL as the parent name for some parent index that the hardware

>      indicates is the current parent of the clk when the clk is registered.


I don't think you can have NULL has a parent_name

> 

> In this case, we need to make clk_get() succeed and not return

> -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the

> framework so that clks aren't orphaned when the hardware is indicating

> the parent is something we don't (and won't) know anything about.


I think DEFER is returned when clock is unknown, not when it is orphaned.

> 

>   2) get_parent/set_parent on a clk with num_parents == 1

> 

> This seems to be how ao_cts_cec is implemented. This hides the fact that

> get_parent may return some number above the number of parents


The index returned will be either 0, to indicate the only known parent, or
num_parents (1) to indicate an invalid parent to framework. This is the topic
of the other patch I sent you

You must have way for a driver to tell you that it is not connected to
anything it knows. That's a very basic error case and this is not specific to
num_parents == 1. 

With any number of num_parent, there is still a possibility that a driver ends
up in configuration where it cannot tell what it is his parent. (invalid
combination, reserved values, crappy doc ...)

This is why is important to call get_parent() (if available) even when
num_parent == 1. The driver might tell if it is connected as expected - or not

> and then

> has that quietly make the parent of the clk NULL during registration so

> that it's orphaned. Then we rely on set_rate() or set_parent() being

> called to fix the parent chain at a later time.

> 

> That is complicated and non-obvious. I'd rather that we specify NULL as

> a parent in the parent_names array if we're going to return that index

> to the framework and then have the framework go to case #1. This makes

> it easier for the framework to assume that [0, num_parents) is the valid

> range of indices returned from the get_parent() op and a number outside

> that range is an error, clearing the way for errors to be returned from

> get_parent() if it somehow fails.


The fix I sent is quite simple.

The above would require that:
1) the framework accept NULL in the list of parent (which it forbids ATM
AFAIK)
2) Have all drivers account for this a create this fake parent

This seems to be putting more work on the drivers with no real gain for the
framework, none that I can see at least.

> 

>   3) A string as the parent name for some parent index that the hardware

>      indicates is the current parent of the clk when the clk is registered,

>      but that parent clk may never be registered with the framework.

> 

> I imagine this can happen with an external gpio clk that feeds some SoC

> internal clk. That gpio could be supplied by some clk signal that the

> board designer decides to send there, and it could also just "magically

> appear" by applying some DT overlay. Due to our use of global strings

> for parent linking, we don't have a good way to figure out that nothing

> is connected to some clock input signal, making this quite a nightmare

> to figure out a runtime!

> 

> One solution is to replace the parent name with NULL and ignore the

> runtime update case. This will work and bring us back to case #1. The

> problem is that if it is connected to something like a GPIO and it

> defaults to that pin instead of some internal clk we're stuck with an

> "orphan" clk that we can't clk_get(). I suppose the driver could read DT

> and see if this pin is not connected and then mark this input parent

> name as NULL. We can punt on solving the DT overlay problem until later,

> because we would need some way to dynamically resolve a new parent of a

> clk and update the tree when the DT node is modified to use a new GPIO.


I don't understand what is the problem here.
In Amlogic we have a lot of this cases, with optional input clocks of the axg-
audio clock controller. Many muxes have parent names of clock that don't
exists.

CCF will return a 0 rate for these inputs.
It works well as it is, but as said above, I think it would be better if we
could distinguish between a signal with 0 rate and no signal at all (error)

>
Stephen Boyd Dec. 18, 2018, 1:54 a.m. UTC | #11
Quoting Jerome Brunet (2018-12-07 02:03:19)
> On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote:

> > Sorry this email is long.

> 

> I tried reply the best I could to your use cases

> 

> > 

> > TL;DR is that I don't think we need to revert the patch as you suggest.

> > Instead, we should fix the driver to indicate NULL as the parent name of

> > the index it tells the framework about.

> 

> The driver, can't indicate NULL, it just provides the index of the parent.

> What we are saying is not very far apart.

> 

> Drivers implementing get_parent() must have way to tell the framework "I have

> no idea what I'm connected to ATM"

> 

> Since get_parent() return a u8, the only way to signal a problem like this is

> to return an out of bound value, which the framework already safely understand

> and maps to NULL, as you request


I'm saying that the clk with two parents but only one 'known' clk parent
should have a parent_names array of two strings, where the second
element is NULL.

	.parent_names = { "known_clk", NULL }
	.num_parents = 2,

And then the .get_parent op is OK to return '1' and then we know that
the clk isn't known to be anything, but it's otherwise a valid index for
the clk to be parented to. This is the orphan case. It's not the error
case, which would be some value returned from .get_parent() that's >=
num_parents.

I am not talking about the return value from get_parent() returning NULL
or the driver indicating that the parent is a NULL clk. I'm just saying
that the clk driver that wants to return an index that's outside the
bounds of the parent_names array needs to do the right thing and either
grow the parent_names array to say "it's unknown what's here, I've
indicated that with a NULL name", or it needs to return another value
outside of the array so the framework knows it's an error.

> 

> This is done in the other patch I sent (... and now they are related ;) ...)

> by returning num_parents in the clk_mux code

> 

> This is an important case, whatever the number of parent. I think it makes no

> sense that a clock with 2 known parents can return this error, a one with only

> one known parent can't.



> 

> This why I still think the change should be reverted.

> 

> If you really insist on keeping it, there should at least be a warning when

> registering a clock with 'get_parent != NULL' and 'num_parents == 1', to let

> the user know that whatever is implemented there will be ignored ... instead

> of silently ignoring it. I think it would be a mistake, but at least there

> would be no suprise.


Maybe we should add the warning. At least it would be good to grep
around and try to find these cases and update them. The documentation on
the get_parent clk op says it's unnecessary right now when 0 or 1 parent
exists, so maybe that needs an update too.

> 

> Earlier, you mention having a special parent that would adopt the the orphan.

> Actually, it already exists, it is the framework itself. If, instead of

> returning 0, it returned a special value to let us know we reached the root

> and there actually no signal, we would be easy to make sure the orphan (with

> no signal) are never accidently used.


I don't understand this part. We have orphan tracking code that knows
when a clk is no longer in an orphan chain (i.e. there is a signal
there).

> 

> > 

> > Quoting Jerome Brunet (2018-12-05 09:20:09)

> > > We have sometimes a few orphans in the amlogic clock tree, it does not

> > > seems

> > > to a problem.

> 

> To the best of my knowledge, there will always be orphan in the tree. The root

> clock (usually an xtal) being a special case of this. 

> 

> It seems to me that the problem is we can't differentiate between this special

> orphan and the rest because none will return an error when calling get_rate()


The root clk isn't an orphan. It has num_parents == 0 and is treated
specially as such.

> 

> xtal will provide something to round_rate() while other clock which are

> supposed to be orphaned will defer to the framework which gives a 0 rate.

> 

> I think this is your real issue with orphan in the framework. The framework

> does not distinguish between a signal with rate == 0 and an error (no signal)


Ok. I'm not sure that rate is the entire problem, but I suppose it's one
problem.

> > 

> > I suppose drivers aren't calling clk_set_rate() on the orphaned clks and

> > experiencing problems? There are problems around clk rates that could

> > happen if there aren't any proper parents of a mux.

> > 

> > > I don't really understand the benefit of having a fake clock that would

> > > adopt

> > > all the orphan ? You guess there is problem I'm not aware of ...

> > 

> > I see three cases that can go wrong if we make orphan clks probe defer

> > clk_get():

> > 

> >   1) NULL as the parent name for some parent index that the hardware

> >      indicates is the current parent of the clk when the clk is registered.

> 

> I don't think you can have NULL has a parent_name


Yes, it looks like Mike made that decision when introducing the CCF in
2012. I don't know why it is a problem though. Maybe kstrdup() on a NULL
string returns NULl and then errors can't be detected? Should be simple
enough to skip those ones when doing the deep copy though.

> 

> > 

> > In this case, we need to make clk_get() succeed and not return

> > -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the

> > framework so that clks aren't orphaned when the hardware is indicating

> > the parent is something we don't (and won't) know anything about.

> 

> I think DEFER is returned when clock is unknown, not when it is orphaned.


Yes it's only returned right now if the provider node hasn't registered
a provider function. We have wanted to extend that further so that clks
can't be acquired until the clk is properly rooted in the clk tree
instead of being orphaned. It may not matter on amlogic but it matters
on other platforms.

> 

> > 

> >   2) get_parent/set_parent on a clk with num_parents == 1

> > 

> > This seems to be how ao_cts_cec is implemented. This hides the fact that

> > get_parent may return some number above the number of parents

> 

> The index returned will be either 0, to indicate the only known parent, or

> num_parents (1) to indicate an invalid parent to framework. This is the topic

> of the other patch I sent you

> 

> You must have way for a driver to tell you that it is not connected to

> anything it knows. That's a very basic error case and this is not specific to

> num_parents == 1. 

> 

> With any number of num_parent, there is still a possibility that a driver ends

> up in configuration where it cannot tell what it is his parent. (invalid

> combination, reserved values, crappy doc ...)

> 

> This is why is important to call get_parent() (if available) even when

> num_parent == 1. The driver might tell if it is connected as expected - or not


Ok. Thanks for the explanation. What should the framework do when the
parent is not known? Mark the clk as not an orphan but parent it to
NULL? That might work here.

> 

> > and then

> > has that quietly make the parent of the clk NULL during registration so

> > that it's orphaned. Then we rely on set_rate() or set_parent() being

> > called to fix the parent chain at a later time.

> > 

> > That is complicated and non-obvious. I'd rather that we specify NULL as

> > a parent in the parent_names array if we're going to return that index

> > to the framework and then have the framework go to case #1. This makes

> > it easier for the framework to assume that [0, num_parents) is the valid

> > range of indices returned from the get_parent() op and a number outside

> > that range is an error, clearing the way for errors to be returned from

> > get_parent() if it somehow fails.

> 

> The fix I sent is quite simple.

> 

> The above would require that:

> 1) the framework accept NULL in the list of parent (which it forbids ATM

> AFAIK)

> 2) Have all drivers account for this a create this fake parent

> 

> This seems to be putting more work on the drivers with no real gain for the

> framework, none that I can see at least.

> 

> > 

> >   3) A string as the parent name for some parent index that the hardware

> >      indicates is the current parent of the clk when the clk is registered,

> >      but that parent clk may never be registered with the framework.

> > 

> > I imagine this can happen with an external gpio clk that feeds some SoC

> > internal clk. That gpio could be supplied by some clk signal that the

> > board designer decides to send there, and it could also just "magically

> > appear" by applying some DT overlay. Due to our use of global strings

> > for parent linking, we don't have a good way to figure out that nothing

> > is connected to some clock input signal, making this quite a nightmare

> > to figure out a runtime!

> > 

> > One solution is to replace the parent name with NULL and ignore the

> > runtime update case. This will work and bring us back to case #1. The

> > problem is that if it is connected to something like a GPIO and it

> > defaults to that pin instead of some internal clk we're stuck with an

> > "orphan" clk that we can't clk_get(). I suppose the driver could read DT

> > and see if this pin is not connected and then mark this input parent

> > name as NULL. We can punt on solving the DT overlay problem until later,

> > because we would need some way to dynamically resolve a new parent of a

> > clk and update the tree when the DT node is modified to use a new GPIO.

> 

> I don't understand what is the problem here.

> In Amlogic we have a lot of this cases, with optional input clocks of the axg-

> audio clock controller. Many muxes have parent names of clock that don't

> exists.

> 

> CCF will return a 0 rate for these inputs.

> It works well as it is, but as said above, I think it would be better if we

> could distinguish between a signal with 0 rate and no signal at all (error)

> 


Ok. It sounds like we want similar things but I'm worried about the
orphan probe defer logic getting messed up.

How about we tri-state the get_parent() return value? So now [0,
num_parents) means a valid parent is known, num_parents means "I don't
know what it is but it must be something valid", and [num_parents + 1,
255] means some error occurred. Sounds obfuscated to me still.

Alternatively, we could make a get_parent_hw() op that returns a pointer
to a clk_hw structure of the parent clk. Then drivers can be converted
to use this new clk_op instead of get_parent and we can have it return
an error pointer when some error happens, NULL when the parent is not
known to the software but is otherwise valid, and the parent pointer
when the parent is findable. It would still be tri-stated then, but at
least it wouldn't be some odd u8 numberspace that it all has to fit
into. This patch starts to do that. What do you think? Clk providers
would need to implement a clk_ops that mostly does
clk_hw_get_parent_by_index(), or they could stash the parent_hw pointer
in their structure, or index into it based on some global numberspace in
their driver.

-----8<----
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index af011974d4ec..5fe3c41e0b7b 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2242,14 +2242,84 @@ struct clk *clk_get_parent(struct clk *clk)
 }
 EXPORT_SYMBOL_GPL(clk_get_parent);
 
-static struct clk_core *__clk_init_parent(struct clk_core *core)
+static struct clk_core *
+__clk_init_parent(struct clk_core *core, bool update_orphan)
 {
 	u8 index = 0;
+	struct clk_hw *parent_hw = NULL;
 
-	if (core->num_parents > 1 && core->ops->get_parent)
-		index = core->ops->get_parent(core->hw);
+	if (core->ops->get_parent_hw) {
+		parent_hw = core->ops->get_parent_hw(core->hw);
+		/*
+		 * The provider driver doesn't know what the parent is,
+		 * but it's at least something valid, so it's not an
+		 * orphan, just a clk with some unknown parent.
+		 */
+		if (!parent_hw && update_orphan)
+			core->orphan = false;
+	} else {
+		if (core->num_parents > 1 && core->ops->get_parent)
+			index = core->ops->get_parent(core->hw);
+
+		parent_hw = clk_hw_get_parent_by_index(core->hw, index);
+	}
+
+	if (IS_ERR(parent_hw)) {
+		/* Parent clk provider hasn't probed yet, orphan it */
+		if (PTR_ERR(parent_hw) == -EPROBE_DEFER) {
+			if (update_orphan)
+				core->orphan = true;
+
+			return NULL;
+		}
+
+		return ERR_CAST(parent_hw);
+	}
+
+	if (!parent_hw)
+		return NULL;
+
+	return parent_hw->core;
+}
+
+static int clk_init_parent(struct clk_core *core)
+{
+	core->parent = __clk_init_parent(core, true);
+	if (IS_ERR(core->parent))
+		return PTR_ERR(core->parent);
+
+	/*
+	 * Populate core->parent if parent has already been clk_core_init'd. If
+	 * parent has not yet been clk_core_init'd then place clk in the orphan
+	 * list.  If clk doesn't have any parents then place it in the root
+	 * clk list.
+	 *
+	 * Every time a new clk is clk_init'd then we walk the list of orphan
+	 * clocks and re-parent any that are children of the clock currently
+	 * being clk_init'd.
+	 */
+	if (core->parent) {
+		hlist_add_head(&core->child_node,
+				&core->parent->children);
+		core->orphan = core->parent->orphan;
+	} else if (!core->num_parents) {
+		hlist_add_head(&core->child_node, &clk_root_list);
+		core->orphan = false;
+	} else {
+		hlist_add_head(&core->child_node, &clk_orphan_list);
+	}
+
+	return 0;
+}
 
-	return clk_core_get_parent_by_index(core, index);
+static struct clk_core *clk_find_parent(struct clk_core *core)
+{
+	return __clk_init_parent(core, false);
+}
+
+static bool clk_has_parent_op(const struct clk_ops *ops)
+{
+	return ops->get_parent || ops->get_parent_hw;
 }
 
 static void clk_core_reparent(struct clk_core *core,
@@ -3045,14 +3115,14 @@ static int __clk_core_init(struct clk_core *core)
 		goto out;
 	}
 
-	if (core->ops->set_parent && !core->ops->get_parent) {
+	if (core->ops->set_parent && !clk_has_parent_op(core->ops)) {
 		pr_err("%s: %s must implement .get_parent & .set_parent\n",
 		       __func__, core->name);
 		ret = -EINVAL;
 		goto out;
 	}
 
-	if (core->num_parents > 1 && !core->ops->get_parent) {
+	if (core->num_parents > 1 && !clk_has_parent_op(core->ops)) {
 		pr_err("%s: %s must implement .get_parent as it has multi parents\n",
 		       __func__, core->name);
 		ret = -EINVAL;
@@ -3073,29 +3143,9 @@ static int __clk_core_init(struct clk_core *core)
 				"%s: invalid NULL in %s's .parent_names\n",
 				__func__, core->name);
 
-	core->parent = __clk_init_parent(core);
-
-	/*
-	 * Populate core->parent if parent has already been clk_core_init'd. If
-	 * parent has not yet been clk_core_init'd then place clk in the orphan
-	 * list.  If clk doesn't have any parents then place it in the root
-	 * clk list.
-	 *
-	 * Every time a new clk is clk_init'd then we walk the list of orphan
-	 * clocks and re-parent any that are children of the clock currently
-	 * being clk_init'd.
-	 */
-	if (core->parent) {
-		hlist_add_head(&core->child_node,
-				&core->parent->children);
-		core->orphan = core->parent->orphan;
-	} else if (!core->num_parents) {
-		hlist_add_head(&core->child_node, &clk_root_list);
-		core->orphan = false;
-	} else {
-		hlist_add_head(&core->child_node, &clk_orphan_list);
-		core->orphan = true;
-	}
+	ret = clk_init_parent(core);
+	if (ret)
+		goto out;
 
 	/*
 	 * optional platform-specific magic
@@ -3173,7 +3223,14 @@ static int __clk_core_init(struct clk_core *core)
 	 * parent.
 	 */
 	hlist_for_each_entry_safe(orphan, tmp2, &clk_orphan_list, child_node) {
-		struct clk_core *parent = __clk_init_parent(orphan);
+		struct clk_core *parent = clk_find_parent(orphan);
+
+		/*
+		 * Error parent should have been caught before and returned
+		 * as an error during registration.
+		 */
+		if (WARN_ON_ONCE(IS_ERR(parent)))
+			continue;
 
 		/*
 		 * We need to use __clk_set_parent_before() and _after() to
diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h
index 60c51871b04b..8b84dee942bf 100644
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -155,6 +155,14 @@ struct clk_duty {
  *		multiple parents.  It is optional (and unnecessary) for clocks
  *		with 0 or 1 parents.
  *
+ * @get_parent_hw: Queries the hardware to determine the parent of a clock.  The
+ *		return value is a clk_hw pointer corresponding to
+ *		the parent clock. In short, this function translates the parent
+ *		value read from hardware into a pointer to the clk_hw for that clk.
+ *		Currently only called when the clock is initialized by
+ *		__clk_init.  This callback is mandatory for clocks with
+ *		multiple parents.  It is optional for clocks with 0 or 1 parents.
+ *
  * @set_rate:	Change the rate of this clock. The requested rate is specified
  *		by the second argument, which should typically be the return
  *		of .round_rate call.  The third argument gives the parent rate
@@ -238,6 +246,7 @@ struct clk_ops {
 					  struct clk_rate_request *req);
 	int		(*set_parent)(struct clk_hw *hw, u8 index);
 	u8		(*get_parent)(struct clk_hw *hw);
+	struct clk_hw * (*get_parent_hw)(struct clk_hw *hw);
 	int		(*set_rate)(struct clk_hw *hw, unsigned long rate,
 				    unsigned long parent_rate);
 	int		(*set_rate_and_parent)(struct clk_hw *hw,
Stephen Boyd Dec. 18, 2018, 1:54 a.m. UTC | #12
Quoting Dinh Nguyen (2018-12-06 07:16:47)
> 

> 

> On 12/5/18 9:55 AM, Stephen Boyd wrote:

> > Quoting Dinh Nguyen (2018-12-05 07:17:41)

> >> Hi Stephen,

> >>

> >> On 12/5/18 1:17 AM, Stephen Boyd wrote:

> >>> (Adding Dinh's korg email)

> >>>

> >>> I also wonder if this driver is even used anymore or maybe we can delete

> >>> it?

> >>>

> >>

> >> The armv7 SoCFPGA platforms are using this driver.

> >>

> > 

> > Ok and those platforms are booting in kernelci.org as

> > https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps?

> > 

> 

> Yes that's right.

> 


Ok. Dinh, can you review this patch? Or test it out? Or do I need to get
someone in kernelci to try it out?
Dinh Nguyen Dec. 18, 2018, 3:48 p.m. UTC | #13
On 12/17/18 7:54 PM, Stephen Boyd wrote:
> Quoting Dinh Nguyen (2018-12-06 07:16:47)

>>

>>

>> On 12/5/18 9:55 AM, Stephen Boyd wrote:

>>> Quoting Dinh Nguyen (2018-12-05 07:17:41)

>>>> Hi Stephen,

>>>>

>>>> On 12/5/18 1:17 AM, Stephen Boyd wrote:

>>>>> (Adding Dinh's korg email)

>>>>>

>>>>> I also wonder if this driver is even used anymore or maybe we can delete

>>>>> it?

>>>>>

>>>>

>>>> The armv7 SoCFPGA platforms are using this driver.

>>>>

>>>

>>> Ok and those platforms are booting in kernelci.org as

>>> https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps?

>>>

>>

>> Yes that's right.

>>

> 

> Ok. Dinh, can you review this patch? Or test it out? Or do I need to get

> someone in kernelci to try it out?

> 


Just tested on an Arria5 board, feel free to add:

Tested-by: Dinh Nguyen <dinguyen@kernel.org>


Thanks,
Dinh
Jerome Brunet Dec. 21, 2018, 4:03 p.m. UTC | #14
On Mon, 2018-12-17 at 17:54 -0800, Stephen Boyd wrote:
> Quoting Jerome Brunet (2018-12-07 02:03:19)

> > On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote:

> > > Sorry this email is long.

> > 

> > I tried reply the best I could to your use cases

> > 

> > > TL;DR is that I don't think we need to revert the patch as you suggest.

> > > Instead, we should fix the driver to indicate NULL as the parent name of

> > > the index it tells the framework about.

> > 

> > The driver, can't indicate NULL, it just provides the index of the parent.

> > What we are saying is not very far apart.

> > 

> > Drivers implementing get_parent() must have way to tell the framework "I

> > have

> > no idea what I'm connected to ATM"

> > 

> > Since get_parent() return a u8, the only way to signal a problem like this

> > is

> > to return an out of bound value, which the framework already safely

> > understand

> > and maps to NULL, as you request

> 

> I'm saying that the clk with two parents but only one 'known' clk parent

> should have a parent_names array of two strings, where the second

> element is NULL.

> 

> 	.parent_names = { "known_clk", NULL }

> 	.num_parents = 2,


The problem is that CCF disallowed this from the beginning when rejecting NULL
in parent tables.

Are saying we should find and fix all the controllers with mux using value
table to avoid unknown/disconnected mux inputs ... sounds like a lot of work
for a benefit I still don't really understand.

Anywayu, adding support for this would be nice, I just don't understand how
the proposed patch how here makes it any harder.

> 

> And then the .get_parent op is OK to return '1' and then we know that

> the clk isn't known to be anything, but it's otherwise a valid index for

> the clk to be parented to. This is the orphan case. It's not the error

> case, which would be some value returned from .get_parent() that's >=

> num_parents.

> 

> I am not talking about the return value from get_parent() returning NULL

> or the driver indicating that the parent is a NULL clk. I'm just saying

> that the clk driver that wants to return an index that's outside the

> bounds of the parent_names array needs to do the right thing and either

> grow the parent_names array to say "it's unknown what's here, I've

> indicated that with a NULL name", or it needs to return another value

> outside of the array so the framework knows it's an error.


AFAIK, the framework knows that index == 'num_parents' is an error.

> 

> > This is done in the other patch I sent (... and now they are related ;)

> > ...)

> > by returning num_parents in the clk_mux code

> > 

> > This is an important case, whatever the number of parent. I think it makes

> > no

> > sense that a clock with 2 known parents can return this error, a one with

> > only

> > one known parent can't.

> > This why I still think the change should be reverted.

> > 

> > If you really insist on keeping it, there should at least be a warning

> > when

> > registering a clock with 'get_parent != NULL' and 'num_parents == 1', to

> > let

> > the user know that whatever is implemented there will be ignored ...

> > instead

> > of silently ignoring it. I think it would be a mistake, but at least there

> > would be no suprise.

> 

> Maybe we should add the warning. At least it would be good to grep

> around and try to find these cases and update them. The documentation on

> the get_parent clk op says it's unnecessary right now when 0 or 1 parent

> exists, so maybe that needs an update too.


If we are going to ignore something, I think that would be a minimum, yes.
I still don't like the fact that may we chose to ignore this.

> 

> > Earlier, you mention having a special parent that would adopt the the

> > orphan.

> > Actually, it already exists, it is the framework itself. If, instead of

> > returning 0, it returned a special value to let us know we reached the

> > root

> > and there actually no signal, we would be easy to make sure the orphan

> > (with

> > no signal) are never accidently used.

> 

> I don't understand this part. We have orphan tracking code that knows

> when a clk is no longer in an orphan chain (i.e. there is a signal

> there).

> 

> > > Quoting Jerome Brunet (2018-12-05 09:20:09)

> > > > We have sometimes a few orphans in the amlogic clock tree, it does not

> > > > seems

> > > > to a problem.

> > 

> > To the best of my knowledge, there will always be orphan in the tree. The

> > root

> > clock (usually an xtal) being a special case of this. 

> > 

> > It seems to me that the problem is we can't differentiate between this

> > special

> > orphan and the rest because none will return an error when calling

> > get_rate()

> 

> The root clk isn't an orphan. It has num_parents == 0 and is treated

> specially as such.


got confused, sorry

> 

> > xtal will provide something to round_rate() while other clock which are

> > supposed to be orphaned will defer to the framework which gives a 0 rate.

> > 

> > I think this is your real issue with orphan in the framework. The

> > framework

> > does not distinguish between a signal with rate == 0 and an error (no

> > signal)

> 

> Ok. I'm not sure that rate is the entire problem, but I suppose it's one

> problem.

> 

> > > I suppose drivers aren't calling clk_set_rate() on the orphaned clks and

> > > experiencing problems? There are problems around clk rates that could

> > > happen if there aren't any proper parents of a mux.

> > > 

> > > > I don't really understand the benefit of having a fake clock that

> > > > would

> > > > adopt

> > > > all the orphan ? You guess there is problem I'm not aware of ...

> > > 

> > > I see three cases that can go wrong if we make orphan clks probe defer

> > > clk_get():

> > > 

> > >   1) NULL as the parent name for some parent index that the hardware

> > >      indicates is the current parent of the clk when the clk is

> > > registered.

> > 

> > I don't think you can have NULL has a parent_name

> 

> Yes, it looks like Mike made that decision when introducing the CCF in

> 2012. I don't know why it is a problem though. Maybe kstrdup() on a NULL

> string returns NULl and then errors can't be detected? Should be simple

> enough to skip those ones when doing the deep copy though.


Maybe but all controller so far have been done with this in mind, what about
those ?

> 

> > > In this case, we need to make clk_get() succeed and not return

> > > -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the

> > > framework so that clks aren't orphaned when the hardware is indicating

> > > the parent is something we don't (and won't) know anything about.

> > 

> > I think DEFER is returned when clock is unknown, not when it is orphaned.

> 

> Yes it's only returned right now if the provider node hasn't registered

> a provider function. We have wanted to extend that further so that clks

> can't be acquired until the clk is properly rooted in the clk tree

> instead of being orphaned. It may not matter on amlogic but it matters

> on other platforms.

> 

> > >   2) get_parent/set_parent on a clk with num_parents == 1

> > > 

> > > This seems to be how ao_cts_cec is implemented. This hides the fact that

> > > get_parent may return some number above the number of parents

> > 

> > The index returned will be either 0, to indicate the only known parent, or

> > num_parents (1) to indicate an invalid parent to framework. This is the

> > topic

> > of the other patch I sent you

> > 

> > You must have way for a driver to tell you that it is not connected to

> > anything it knows. That's a very basic error case and this is not specific

> > to

> > num_parents == 1. 

> > 

> > With any number of num_parent, there is still a possibility that a driver

> > ends

> > up in configuration where it cannot tell what it is his parent. (invalid

> > combination, reserved values, crappy doc ...)

> > 

> > This is why is important to call get_parent() (if available) even when

> > num_parent == 1. The driver might tell if it is connected as expected - or

> > not

> 

> Ok. Thanks for the explanation. What should the framework do when the

> parent is not known? Mark the clk as not an orphan but parent it to

> NULL? That might work here.


I don't really get the difference in the framework ATM. Is this linked to the
'orphan probe defer logic' WIP you have been mentionning ?

> 

> > > and then

> > > has that quietly make the parent of the clk NULL during registration so

> > > that it's orphaned. Then we rely on set_rate() or set_parent() being

> > > called to fix the parent chain at a later time.

> > > 

> > > That is complicated and non-obvious. I'd rather that we specify NULL as

> > > a parent in the parent_names array if we're going to return that index

> > > to the framework and then have the framework go to case #1. This makes

> > > it easier for the framework to assume that [0, num_parents) is the valid

> > > range of indices returned from the get_parent() op and a number outside

> > > that range is an error, clearing the way for errors to be returned from

> > > get_parent() if it somehow fails.

> > 

> > The fix I sent is quite simple.

> > 

> > The above would require that:

> > 1) the framework accept NULL in the list of parent (which it forbids ATM

> > AFAIK)

> > 2) Have all drivers account for this a create this fake parent

> > 

> > This seems to be putting more work on the drivers with no real gain for

> > the

> > framework, none that I can see at least.

> > 

> > >   3) A string as the parent name for some parent index that the hardware

> > >      indicates is the current parent of the clk when the clk is

> > > registered,

> > >      but that parent clk may never be registered with the framework.

> > > 

> > > I imagine this can happen with an external gpio clk that feeds some SoC

> > > internal clk. That gpio could be supplied by some clk signal that the

> > > board designer decides to send there, and it could also just "magically

> > > appear" by applying some DT overlay. Due to our use of global strings

> > > for parent linking, we don't have a good way to figure out that nothing

> > > is connected to some clock input signal, making this quite a nightmare

> > > to figure out a runtime!

> > > 

> > > One solution is to replace the parent name with NULL and ignore the

> > > runtime update case. This will work and bring us back to case #1. The

> > > problem is that if it is connected to something like a GPIO and it

> > > defaults to that pin instead of some internal clk we're stuck with an

> > > "orphan" clk that we can't clk_get(). I suppose the driver could read DT

> > > and see if this pin is not connected and then mark this input parent

> > > name as NULL. We can punt on solving the DT overlay problem until later,

> > > because we would need some way to dynamically resolve a new parent of a

> > > clk and update the tree when the DT node is modified to use a new GPIO.

> > 

> > I don't understand what is the problem here.

> > In Amlogic we have a lot of this cases, with optional input clocks of the

> > axg-

> > audio clock controller. Many muxes have parent names of clock that don't

> > exists.

> > 

> > CCF will return a 0 rate for these inputs.

> > It works well as it is, but as said above, I think it would be better if

> > we

> > could distinguish between a signal with 0 rate and no signal at all

> > (error)

> > 

> 


All this below is very interresting but it looks like a very complex way to
solve what was initially a very simple problem, with considerations that are
still WIP a this stage.

Even with these future constraints in mind, I don't understand how the
proposed patch changes anything to the situtation. A clock with a get_parent()
callback (and several known parents) may still return an out of bound value
... CCF will still have to deal with this case gracefully, as it is currently
doing

> Ok. It sounds like we want similar things but I'm worried about the

> orphan probe defer logic getting messed up.

> 

> How about we tri-state the get_parent() return value? So now [0,

> num_parents) means a valid parent is known, num_parents means "I don't

> know what it is but it must be something valid", and [num_parents + 1,

> 255] means some error occurred. Sounds obfuscated to me still.

> 

> Alternatively, we could make a get_parent_hw() op that returns a pointer

> to a clk_hw structure of the parent clk. Then drivers can be converted

> to use this new clk_op instead of get_parent and we can have it return

> an error pointer when some error happens, NULL when the parent is not

> known to the software but is otherwise valid, and the parent pointer

> when the parent is findable. It would still be tri-stated then, but at

> least it wouldn't be some odd u8 numberspace that it all has to fit

> into. This patch starts to do that. What do you think? Clk providers

> would need to implement a clk_ops that mostly does

> clk_hw_get_parent_by_index(), or they could stash the parent_hw pointer

> in their structure, or index into it based on some global numberspace in

> their driver.

> 

> -----8<----

> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c

> index af011974d4ec..5fe3c41e0b7b 100644

> --- a/drivers/clk/clk.c

> +++ b/drivers/clk/clk.c

> @@ -2242,14 +2242,84 @@ struct clk *clk_get_parent(struct clk *clk)

>  }

>  EXPORT_SYMBOL_GPL(clk_get_parent);

>  

> -static struct clk_core *__clk_init_parent(struct clk_core *core)

> +static struct clk_core *

> +__clk_init_parent(struct clk_core *core, bool update_orphan)

>  {

>  	u8 index = 0;

> +	struct clk_hw *parent_hw = NULL;

>  

> -	if (core->num_parents > 1 && core->ops->get_parent)

> -		index = core->ops->get_parent(core->hw);

> +	if (core->ops->get_parent_hw) {

> +		parent_hw = core->ops->get_parent_hw(core->hw);

> +		/*

> +		 * The provider driver doesn't know what the parent is,

> +		 * but it's at least something valid, so it's not an

> +		 * orphan, just a clk with some unknown parent.

> +		 */

> +		if (!parent_hw && update_orphan)

> +			core->orphan = false;

> +	} else {

> +		if (core->num_parents > 1 && core->ops->get_parent)

> +			index = core->ops->get_parent(core->hw);

> +

> +		parent_hw = clk_hw_get_parent_by_index(core->hw, index);

> +	}

> +

> +	if (IS_ERR(parent_hw)) {

> +		/* Parent clk provider hasn't probed yet, orphan it */

> +		if (PTR_ERR(parent_hw) == -EPROBE_DEFER) {

> +			if (update_orphan)

> +				core->orphan = true;

> +

> +			return NULL;

> +		}

> +

> +		return ERR_CAST(parent_hw);

> +	}

> +

> +	if (!parent_hw)

> +		return NULL;

> +

> +	return parent_hw->core;

> +}

> +

> +static int clk_init_parent(struct clk_core *core)

> +{

> +	core->parent = __clk_init_parent(core, true);

> +	if (IS_ERR(core->parent))

> +		return PTR_ERR(core->parent);

> +

> +	/*

> +	 * Populate core->parent if parent has already been clk_core_init'd.

> If

> +	 * parent has not yet been clk_core_init'd then place clk in the

> orphan

> +	 * list.  If clk doesn't have any parents then place it in the root

> +	 * clk list.

> +	 *

> +	 * Every time a new clk is clk_init'd then we walk the list of orphan

> +	 * clocks and re-parent any that are children of the clock currently

> +	 * being clk_init'd.

> +	 */

> +	if (core->parent) {

> +		hlist_add_head(&core->child_node,

> +				&core->parent->children);

> +		core->orphan = core->parent->orphan;

> +	} else if (!core->num_parents) {

> +		hlist_add_head(&core->child_node, &clk_root_list);

> +		core->orphan = false;

> +	} else {

> +		hlist_add_head(&core->child_node, &clk_orphan_list);

> +	}

> +

> +	return 0;

> +}

>  

> -	return clk_core_get_parent_by_index(core, index);

> +static struct clk_core *clk_find_parent(struct clk_core *core)

> +{

> +	return __clk_init_parent(core, false);

> +}

> +

> +static bool clk_has_parent_op(const struct clk_ops *ops)

> +{

> +	return ops->get_parent || ops->get_parent_hw;

>  }

>  

>  static void clk_core_reparent(struct clk_core *core,

> @@ -3045,14 +3115,14 @@ static int __clk_core_init(struct clk_core *core)

>  		goto out;

>  	}

>  

> -	if (core->ops->set_parent && !core->ops->get_parent) {

> +	if (core->ops->set_parent && !clk_has_parent_op(core->ops)) {

>  		pr_err("%s: %s must implement .get_parent & .set_parent\n",

>  		       __func__, core->name);

>  		ret = -EINVAL;

>  		goto out;

>  	}

>  

> -	if (core->num_parents > 1 && !core->ops->get_parent) {

> +	if (core->num_parents > 1 && !clk_has_parent_op(core->ops)) {

>  		pr_err("%s: %s must implement .get_parent as it has multi

> parents\n",

>  		       __func__, core->name);

>  		ret = -EINVAL;

> @@ -3073,29 +3143,9 @@ static int __clk_core_init(struct clk_core *core)

>  				"%s: invalid NULL in %s's .parent_names\n",

>  				__func__, core->name);

>  

> -	core->parent = __clk_init_parent(core);

> -

> -	/*

> -	 * Populate core->parent if parent has already been clk_core_init'd.

> If

> -	 * parent has not yet been clk_core_init'd then place clk in the

> orphan

> -	 * list.  If clk doesn't have any parents then place it in the root

> -	 * clk list.

> -	 *

> -	 * Every time a new clk is clk_init'd then we walk the list of orphan

> -	 * clocks and re-parent any that are children of the clock currently

> -	 * being clk_init'd.

> -	 */

> -	if (core->parent) {

> -		hlist_add_head(&core->child_node,

> -				&core->parent->children);

> -		core->orphan = core->parent->orphan;

> -	} else if (!core->num_parents) {

> -		hlist_add_head(&core->child_node, &clk_root_list);

> -		core->orphan = false;

> -	} else {

> -		hlist_add_head(&core->child_node, &clk_orphan_list);

> -		core->orphan = true;

> -	}

> +	ret = clk_init_parent(core);

> +	if (ret)

> +		goto out;

>  

>  	/*

>  	 * optional platform-specific magic

> @@ -3173,7 +3223,14 @@ static int __clk_core_init(struct clk_core *core)

>  	 * parent.

>  	 */

>  	hlist_for_each_entry_safe(orphan, tmp2, &clk_orphan_list, child_node)

> {

> -		struct clk_core *parent = __clk_init_parent(orphan);

> +		struct clk_core *parent = clk_find_parent(orphan);

> +

> +		/*

> +		 * Error parent should have been caught before and returned

> +		 * as an error during registration.

> +		 */

> +		if (WARN_ON_ONCE(IS_ERR(parent)))

> +			continue;

>  

>  		/*

>  		 * We need to use __clk_set_parent_before() and _after() to

> diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h

> index 60c51871b04b..8b84dee942bf 100644

> --- a/include/linux/clk-provider.h

> +++ b/include/linux/clk-provider.h

> @@ -155,6 +155,14 @@ struct clk_duty {

>   *		multiple parents.  It is optional (and unnecessary) for clocks

>   *		with 0 or 1 parents.

>   *

> + * @get_parent_hw: Queries the hardware to determine the parent of a

> clock.  The

> + *		return value is a clk_hw pointer corresponding to

> + *		the parent clock. In short, this function translates the

> parent

> + *		value read from hardware into a pointer to the clk_hw for that

> clk.

> + *		Currently only called when the clock is initialized by

> + *		__clk_init.  This callback is mandatory for clocks with

> + *		multiple parents.  It is optional for clocks with 0 or 1

> parents.

> + *

>   * @set_rate:	Change the rate of this clock. The requested rate is

> specified

>   *		by the second argument, which should typically be the return

>   *		of .round_rate call.  The third argument gives the parent rate

> @@ -238,6 +246,7 @@ struct clk_ops {

>  					  struct clk_rate_request *req);

>  	int		(*set_parent)(struct clk_hw *hw, u8 index);

>  	u8		(*get_parent)(struct clk_hw *hw);

> +	struct clk_hw * (*get_parent_hw)(struct clk_hw *hw);

>  	int		(*set_rate)(struct clk_hw *hw, unsigned long rate,

>  				    unsigned long parent_rate);

>  	int		(*set_rate_and_parent)(struct clk_hw *hw,

> 

>
Stephen Boyd Dec. 21, 2018, 10:24 p.m. UTC | #15
Quoting Jerome Brunet (2018-12-21 08:03:29)
> On Mon, 2018-12-17 at 17:54 -0800, Stephen Boyd wrote:

> > 

> > Ok. Thanks for the explanation. What should the framework do when the

> > parent is not known? Mark the clk as not an orphan but parent it to

> > NULL? That might work here.

> 

> I don't really get the difference in the framework ATM. Is this linked to the

> 'orphan probe defer logic' WIP you have been mentionning ?


Yes, we have orphan marking logic, but we haven't finished the topic and
returned PROBE_DEFER for orphaned clks. It was blocked on some Allwinner
drivers having problems but that was over a year ago so I don't see
anything blocking us now. I really want to do it soon so now is the time
to fix all of this and see what breaks.

> 

> All this below is very interresting but it looks like a very complex way to

> solve what was initially a very simple problem, with considerations that are

> still WIP a this stage.

> 

> Even with these future constraints in mind, I don't understand how the

> proposed patch changes anything to the situtation. A clock with a get_parent()

> callback (and several known parents) may still return an out of bound value

> ... CCF will still have to deal with this case gracefully, as it is currently

> doing


Ok. I think the problem you see is that the clk provider doesn't know
anything besides parent name is a string? And so you're not sure how the
provider can return NULL for "this clk won't ever appear", vs. an error
pointer for "things failed while reading"?

I think you understand I'm not trying to change clk_ops::get_parent().
I'm sidestepping the problem and other problems that we have with the
return type of that clk_op (u8) by introducing a new clk_op that can
return an error pointer for exceptional cases. Drivers would need to
migrate to this new clk op to fix the problem you're solving, and those
drivers will also need to know what their clk parents are and what the
clk_hw pointers are for them.
diff mbox series

Patch

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index af011974d4ec..1b33ef3c37f4 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2246,7 +2246,7 @@  static struct clk_core *__clk_init_parent(struct clk_core *core)
 {
 	u8 index = 0;
 
-	if (core->num_parents > 1 && core->ops->get_parent)
+	if (core->ops->get_parent)
 		index = core->ops->get_parent(core->hw);
 
 	return clk_core_get_parent_by_index(core, index);