Message ID | 20181204163257.32085-1-jbrunet@baylibre.com |
---|---|
State | New |
Headers | show |
Series | Revert "clk: fix __clk_init_parent() for single parent clocks" | expand |
Quoting Jerome Brunet (2018-12-04 08:32:57) > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64. > > From the original commit message: > It turned out a problem because there are some single-parent clocks > that implement .get_parent() callback and return non-zero index. > The SOCFPGA clock is the case; the commit broke the SOCFPGA boards. > > It is wrong for a clock to return an index >= num_parents. CCF checks > for this condition in clk_core_get_parent_by_index(). This function sets > the parent to NULL if index is incoherent, which seems like the only > sane choice. > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks") > appears to be a work around installed in the core framework for a problem > that is platform specific, and should probably be fixed in the platform code. Ouch. I see that I even pointed that out in 2016 but never got a reply or a fix patch[1]. > > Further more, it introduces a problem in a corner case of the mux clock. > Take mux with multiple parents, but only one is known, the rest being > undocumented. The register reset has one of unknown parent set. > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"): > * get_parent() is called, register is read and give an unknown index. > -> the mux is orphaned. > * a call to set_rate() will reparent the mux to the only known parent. > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"): > * the register is never read. > * parent is wrongly assumed to be the only known one. > As a consequence, all the calculation deriving from the mux will be > wrong > * Since we believe the only know parent to be set, set_parent() won't > ever be called and the register won't be set with the correct value. Isn't this the broken bad case all over again? Why register a clk as a mux and then only say it has one parent? > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> > --- Is this related to the other patch you sent? Can you send series for related patches please? [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org
On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote: > Quoting Jerome Brunet (2018-12-04 08:32:57) > > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64. > > > > From the original commit message: > > It turned out a problem because there are some single-parent clocks > > that implement .get_parent() callback and return non-zero index. > > The SOCFPGA clock is the case; the commit broke the SOCFPGA boards. > > > > It is wrong for a clock to return an index >= num_parents. CCF checks > > for this condition in clk_core_get_parent_by_index(). This function sets > > the parent to NULL if index is incoherent, which seems like the only > > sane choice. > > > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent > > clocks") > > appears to be a work around installed in the core framework for a problem > > that is platform specific, and should probably be fixed in the platform > > code. > > Ouch. I see that I even pointed that out in 2016 but never got a reply > or a fix patch[1]. > > > Further more, it introduces a problem in a corner case of the mux clock. > > Take mux with multiple parents, but only one is known, the rest being > > undocumented. The register reset has one of unknown parent set. > > > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single > > parent clocks"): > > * get_parent() is called, register is read and give an unknown index. > > -> the mux is orphaned. > > * a call to set_rate() will reparent the mux to the only known parent. > > > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent > > clocks"): > > * the register is never read. > > * parent is wrongly assumed to be the only known one. > > As a consequence, all the calculation deriving from the mux will be > > wrong > > * Since we believe the only know parent to be set, set_parent() won't > > ever be called and the register won't be set with the correct value. > > Isn't this the broken bad case all over again? Why register a clk as a > mux and then only say it has one parent? I understand it is a bit odd but as I explained it is a corner case. We are really trying to drive a mux here, applying a values will change the clock signal we get. Documentation being what it is, we only know one the parent. The other parent could anything or maybe not connected at all, who know. That is not the important part actually If such mux was already set to the known entry by default, it would be OK to ignore it. But if it is not, then we CCF to realise that and change the setting accordingly. This the case of the 'ao_cts_cec' clock in the following patch: https://lore.kernel.org/patchwork/patch/1021028/ by default the value in the register is 0, but the only one that makes sense for us is 1. > > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> > > --- > > Is this related to the other patch you sent? Can you send series for > related patches please? > > [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org Actually I was intially doing a series, and stopped when my cover letter started with "those are two unrelated patches ..." ;) I found these things while debugging the same thing but there is no deps between them.
(Adding Dinh's korg email) I also wonder if this driver is even used anymore or maybe we can delete it? Quoting Stephen Boyd (2018-12-04 11:34:16) > This driver creates a gate clk with the possibility to have multiple > parents. That can cause problems if the common clk framework tries to > call the get_parent() op and gets back a number that's larger than the > number of parents the clk says it supports in > clk_init_data::num_parents. Let's duplicate the clk_ops structure each > time this function is called and drop the get/set parent ops when there > is only one parent. This allows the framework to consider a number > larger than clk_init_data::num_parents as an error condition of the > get_parent() clk op, clearing the way for proper code. > > Cc: Jerome Brunet <jbrunet@baylibre.com> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com> > Cc: Dinh Nguyen <dinguyen@opensource.altera.com> > Signed-off-by: Stephen Boyd <sboyd@kernel.org> > --- > drivers/clk/socfpga/clk-gate.c | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/drivers/clk/socfpga/clk-gate.c b/drivers/clk/socfpga/clk-gate.c > index aa7a6e6a15b6..73e03328d5c5 100644 > --- a/drivers/clk/socfpga/clk-gate.c > +++ b/drivers/clk/socfpga/clk-gate.c > @@ -176,8 +176,7 @@ static struct clk_ops gateclk_ops = { > .set_parent = socfpga_clk_set_parent, > }; > > -static void __init __socfpga_gate_init(struct device_node *node, > - const struct clk_ops *ops) > +void __init socfpga_gate_init(struct device_node *node) > { > u32 clk_gate[2]; > u32 div_reg[3]; > @@ -188,12 +187,17 @@ static void __init __socfpga_gate_init(struct device_node *node, > const char *clk_name = node->name; > const char *parent_name[SOCFPGA_MAX_PARENTS]; > struct clk_init_data init; > + struct clk_ops *ops; > int rc; > > socfpga_clk = kzalloc(sizeof(*socfpga_clk), GFP_KERNEL); > if (WARN_ON(!socfpga_clk)) > return; > > + ops = kmemdup(&gateclk_ops, sizeof(gateclk_ops), GFP_KERNEL); > + if (WARN_ON(!ops)) > + return; > + > rc = of_property_read_u32_array(node, "clk-gate", clk_gate, 2); > if (rc) > clk_gate[0] = 0; > @@ -202,8 +206,8 @@ static void __init __socfpga_gate_init(struct device_node *node, > socfpga_clk->hw.reg = clk_mgr_base_addr + clk_gate[0]; > socfpga_clk->hw.bit_idx = clk_gate[1]; > > - gateclk_ops.enable = clk_gate_ops.enable; > - gateclk_ops.disable = clk_gate_ops.disable; > + ops->enable = clk_gate_ops.enable; > + ops->disable = clk_gate_ops.disable; > } > > rc = of_property_read_u32(node, "fixed-divider", &fixed_div); > @@ -234,6 +238,11 @@ static void __init __socfpga_gate_init(struct device_node *node, > init.flags = 0; > > init.num_parents = of_clk_parent_fill(node, parent_name, SOCFPGA_MAX_PARENTS); > + if (init.num_parents < 2) { > + ops->get_parent = NULL; > + ops->set_parent = NULL; > + } > + > init.parent_names = parent_name; > socfpga_clk->hw.hw.init = &init; > > @@ -246,8 +255,3 @@ static void __init __socfpga_gate_init(struct device_node *node, > if (WARN_ON(rc)) > return; > } > - > -void __init socfpga_gate_init(struct device_node *node) > -{ > - __socfpga_gate_init(node, &gateclk_ops); > -}
Quoting Jerome Brunet (2018-12-04 11:51:17) > On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote: > > Quoting Jerome Brunet (2018-12-04 08:32:57) > > > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64. > > > > > > From the original commit message: > > > It turned out a problem because there are some single-parent clocks > > > that implement .get_parent() callback and return non-zero index. > > > The SOCFPGA clock is the case; the commit broke the SOCFPGA boards. > > > > > > It is wrong for a clock to return an index >= num_parents. CCF checks > > > for this condition in clk_core_get_parent_by_index(). This function sets > > > the parent to NULL if index is incoherent, which seems like the only > > > sane choice. > > > > > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent > > > clocks") > > > appears to be a work around installed in the core framework for a problem > > > that is platform specific, and should probably be fixed in the platform > > > code. > > > > Ouch. I see that I even pointed that out in 2016 but never got a reply > > or a fix patch[1]. > > > > > Further more, it introduces a problem in a corner case of the mux clock. > > > Take mux with multiple parents, but only one is known, the rest being > > > undocumented. The register reset has one of unknown parent set. > > > > > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single > > > parent clocks"): > > > * get_parent() is called, register is read and give an unknown index. > > > -> the mux is orphaned. > > > * a call to set_rate() will reparent the mux to the only known parent. > > > > > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent > > > clocks"): > > > * the register is never read. > > > * parent is wrongly assumed to be the only known one. > > > As a consequence, all the calculation deriving from the mux will be > > > wrong > > > * Since we believe the only know parent to be set, set_parent() won't > > > ever be called and the register won't be set with the correct value. > > > > Isn't this the broken bad case all over again? Why register a clk as a > > mux and then only say it has one parent? > > I understand it is a bit odd but as I explained it is a corner case. > > We are really trying to drive a mux here, applying a values will change the > clock signal we get. Documentation being what it is, we only know one the > parent. The other parent could anything or maybe not connected at all, who > know. That is not the important part actually > > If such mux was already set to the known entry by default, it would be OK to > ignore it. But if it is not, then we CCF to realise that and change the > setting accordingly. > > This the case of the 'ao_cts_cec' clock in the following patch: > https://lore.kernel.org/patchwork/patch/1021028/ > > by default the value in the register is 0, but the only one that makes sense > for us is 1. Ok. Thanks for the explanation. What do you do to fix this problem in the 'ao_cts_cec' clk implementation? Sounds like you just rely on clk_set_rate() to fix this all up for you when the consumer changes the rate? If we have something that is default parented to something we're not telling the framework about for some reason then one solution would be to have some init code reparent the clk in hardware before registering it with the framework. Basically "fix up" the clk tree in hardware to be consistent with what software can reason about. This also reminds me of some audio clks I've seen before where the default parent is some external pin and it can be reparented to an internal clk with clk_set_rate/parent. In that case, I think we forced the parent over to the internal clk before registering so that it would always get parented properly in the framework. Right now, this is going to cause problems for plans to probe defer clk_get() on orphan clks. Maybe this could work better if we allowed 'assigned-clock-parents/rates' to reparent clks regardless of orphan status and/or had some flag that could be set on clks to indicate that we're OK if clk_get() is called on it when it's an orphan because this isn't a problem? Or we can make the defer on orphan code only defer if the clk has a single parent and it's an orphan and return the clk if there is at least one parent of the clk that has been registered and isn't marked as an orphan. That would need to be recursively applied, so I guess we would update our orphan marking code to indicate that clk_get() on the clk should probe defer or not instead of indicating the clk's orphan status. Doing this would also side-step the problem Rockchip was having where certain parents never appeared but the clk was parented to it in hardware, essentially blocking the clk from ever being touched by consumers. On the other hand, not having the clk there causes us to do a global search of the clk name space each time we look for parents by the "unknown" index, which in the Rockchip case was quite often because the clk was changing between two other parents with a third one that never got registered. So it may be better to just register an "unknown" clk as a root clk with a rate of 0 that you can then hook everything up to that is hardware reset to this unknown input. That way nothing is orphaned and everyone is happy. We could do this for the audio case I talked about earlier too so when the external pin is left unconnected we could register some 'unconnected' clk and parent the audio clk to it. > > > > > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> > > > --- > > > > Is this related to the other patch you sent? Can you send series for > > related patches please? > > > > [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org > > Actually I was intially doing a series, and stopped when my cover letter > started with "those are two unrelated patches ..." ;) > > I found these things while debugging the same thing but there is no deps > between them. > Ok.
Hi Stephen, On 12/5/18 1:17 AM, Stephen Boyd wrote: > (Adding Dinh's korg email) > > I also wonder if this driver is even used anymore or maybe we can delete > it? > The armv7 SoCFPGA platforms are using this driver. Dinh
Quoting Dinh Nguyen (2018-12-05 07:17:41) > Hi Stephen, > > On 12/5/18 1:17 AM, Stephen Boyd wrote: > > (Adding Dinh's korg email) > > > > I also wonder if this driver is even used anymore or maybe we can delete > > it? > > > > The armv7 SoCFPGA platforms are using this driver. > Ok and those platforms are booting in kernelci.org as https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps?
On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote: > Quoting Jerome Brunet (2018-12-04 11:51:17) > > On Tue, 2018-12-04 at 10:05 -0800, Stephen Boyd wrote: > > > Quoting Jerome Brunet (2018-12-04 08:32:57) > > > > This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64. > > > > > > > > From the original commit message: > > > > It turned out a problem because there are some single-parent clocks > > > > that implement .get_parent() callback and return non-zero index. > > > > The SOCFPGA clock is the case; the commit broke the SOCFPGA boards. > > > > > > > > It is wrong for a clock to return an index >= num_parents. CCF checks > > > > for this condition in clk_core_get_parent_by_index(). This function > > > > sets > > > > the parent to NULL if index is incoherent, which seems like the only > > > > sane choice. > > > > > > > > commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent > > > > clocks") > > > > appears to be a work around installed in the core framework for a > > > > problem > > > > that is platform specific, and should probably be fixed in the > > > > platform > > > > code. > > > > > > Ouch. I see that I even pointed that out in 2016 but never got a reply > > > or a fix patch[1]. > > > > > > > Further more, it introduces a problem in a corner case of the mux > > > > clock. > > > > Take mux with multiple parents, but only one is known, the rest being > > > > undocumented. The register reset has one of unknown parent set. > > > > > > > > Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single > > > > parent clocks"): > > > > * get_parent() is called, register is read and give an unknown index. > > > > -> the mux is orphaned. > > > > * a call to set_rate() will reparent the mux to the only known > > > > parent. > > > > > > > > With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single > > > > parent > > > > clocks"): > > > > * the register is never read. > > > > * parent is wrongly assumed to be the only known one. > > > > As a consequence, all the calculation deriving from the mux will be > > > > wrong > > > > * Since we believe the only know parent to be set, set_parent() won't > > > > ever be called and the register won't be set with the correct > > > > value. > > > > > > Isn't this the broken bad case all over again? Why register a clk as a > > > mux and then only say it has one parent? > > > > I understand it is a bit odd but as I explained it is a corner case. > > > > We are really trying to drive a mux here, applying a values will change > > the > > clock signal we get. Documentation being what it is, we only know one the > > parent. The other parent could anything or maybe not connected at all, who > > know. That is not the important part actually > > > > If such mux was already set to the known entry by default, it would be OK > > to > > ignore it. But if it is not, then we CCF to realise that and change the > > setting accordingly. > > > > This the case of the 'ao_cts_cec' clock in the following patch: > > https://lore.kernel.org/patchwork/patch/1021028/ > > > > by default the value in the register is 0, but the only one that makes > > sense > > for us is 1. > > Ok. Thanks for the explanation. What do you do to fix this problem in > the 'ao_cts_cec' clk implementation? IMO, there is nothing to fix, the implementation is correct. > Sounds like you just rely on > clk_set_rate() to fix this all up for you when the consumer changes the > rate? To setup the rate as the customer expect, yes. > > If we have something that is default parented to something we're not > telling the framework about for some reason then one solution would be > to have some init code reparent the clk in hardware before registering > it with the framework. Basically "fix up" the clk tree in hardware to be > consistent with what software can reason about. Should we really ? Something the framework does not know about is not necessarily something wrong. It would best if we could model the tree completly, but most of the time, we just approximate it the best we can. The framework just knows how to setup rates, it has no idea what rate needs to be set or when - And I think it is best if does not make any assumption about that. Pushing it a bit, this unknown parent might important to the boot sequence. How would you know when it is safe to change the setting ? What would change it to ? It Seems obvious when there is only one known parent, but what if there two ? which one do you pick ? Is it really the role of CCF to make this kind of choice ? > > This also reminds me of some audio clks I've seen before where the > default parent is some external pin and it can be reparented to an > internal clk with clk_set_rate/parent. In that case, I think we forced > the parent over to the internal clk before registering so that it would > always get parented properly in the framework. Right now, this is going > to cause problems for plans to probe defer clk_get() on orphan clks. clk_get() on orphaned clock should probably return an error if it is not the case already ? or 0 maybe ? > > Maybe this could work better if we allowed > 'assigned-clock-parents/rates' to reparent clks regardless of orphan > status and/or had some flag that could be set on clks to indicate that > we're OK if clk_get() is called on it when it's an orphan because this > isn't a problem? Not sure I understand your point here. > > Or we can make the defer on orphan code only defer if the clk has a > single parent and it's an orphan and return the clk if there is at least > one parent of the clk that has been registered and isn't marked as an > orphan. That would need to be recursively applied, so I guess we would > update our orphan marking code to indicate that clk_get() on the clk > should probe defer or not instead of indicating the clk's orphan status. > Doing this would also side-step the problem Rockchip was having where > certain parents never appeared but the clk was parented to it in > hardware, essentially blocking the clk from ever being touched by > consumers. ... and now you lost me completly :) > > On the other hand, not having the clk there causes us to do a global > search of the clk name space each time we look for parents by the > "unknown" index, which in the Rockchip case was quite often because the > clk was changing between two other parents with a third one that never > got registered. So it may be better to just register an "unknown" clk as > a root clk with a rate of 0 that you can then hook everything up to that > is hardware reset to this unknown input. That way nothing is orphaned > and everyone is happy. We could do this for the audio case I talked > about earlier too so when the external pin is left unconnected we could > register some 'unconnected' clk and parent the audio clk to it. We have sometimes a few orphans in the amlogic clock tree, it does not seems to a problem. I don't really understand the benefit of having a fake clock that would adopt all the orphan ? You guess there is problem I'm not aware of ... > > > > > Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> > > > > --- > > > > > > Is this related to the other patch you sent? Can you send series for > > > related patches please? > > > > > > [1] https://lkml.kernel.org/r/20160209181833.GA24167@codeaurora.org > > > > Actually I was intially doing a series, and stopped when my cover letter > > started with "those are two unrelated patches ..." ;) > > > > I found these things while debugging the same thing but there is no deps > > between them. > > > > Ok.
On 12/5/18 9:55 AM, Stephen Boyd wrote: > Quoting Dinh Nguyen (2018-12-05 07:17:41) >> Hi Stephen, >> >> On 12/5/18 1:17 AM, Stephen Boyd wrote: >>> (Adding Dinh's korg email) >>> >>> I also wonder if this driver is even used anymore or maybe we can delete >>> it? >>> >> >> The armv7 SoCFPGA platforms are using this driver. >> > > Ok and those platforms are booting in kernelci.org as > https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps? > Yes that's right. Dinh
Sorry this email is long. TL;DR is that I don't think we need to revert the patch as you suggest. Instead, we should fix the driver to indicate NULL as the parent name of the index it tells the framework about. Quoting Jerome Brunet (2018-12-05 09:20:09) > On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote: > > > > Ok. Thanks for the explanation. What do you do to fix this problem in > > the 'ao_cts_cec' clk implementation? > > IMO, there is nothing to fix, the implementation is correct. > > > Sounds like you just rely on > > clk_set_rate() to fix this all up for you when the consumer changes the > > rate? > > To setup the rate as the customer expect, yes. Ok. > > > > > If we have something that is default parented to something we're not > > telling the framework about for some reason then one solution would be > > to have some init code reparent the clk in hardware before registering > > it with the framework. Basically "fix up" the clk tree in hardware to be > > consistent with what software can reason about. > > Should we really ? Something the framework does not know about is not > necessarily something wrong. > > It would best if we could model the tree completly, but most of the time, we > just approximate it the best we can. > > The framework just knows how to setup rates, it has no idea what rate needs to > be set or when - And I think it is best if does not make any assumption about > that. > > Pushing it a bit, this unknown parent might important to the boot sequence. > How would you know when it is safe to change the setting ? > What would change it to ? It Seems obvious when there is only one known > parent, but what if there two ? which one do you pick ? Is it really the role > of CCF to make this kind of choice ? Agreed. We can't mandate this requirement because of what you say. There are pitfalls such as not always knowing if it's safe to change the clk tree before clks are registered. > > > > > This also reminds me of some audio clks I've seen before where the > > default parent is some external pin and it can be reparented to an > > internal clk with clk_set_rate/parent. In that case, I think we forced > > the parent over to the internal clk before registering so that it would > > always get parented properly in the framework. Right now, this is going > > to cause problems for plans to probe defer clk_get() on orphan clks. > > clk_get() on orphaned clock should probably return an error if it is not the > case already ? or 0 maybe ? I'd prefer that it return EPROBE_DEFER if the clk is an orphan, except that it causes problems for this case where muxes move away from something unknown to something known. > > > > > Maybe this could work better if we allowed > > 'assigned-clock-parents/rates' to reparent clks regardless of orphan > > status and/or had some flag that could be set on clks to indicate that > > we're OK if clk_get() is called on it when it's an orphan because this > > isn't a problem? > > Not sure I understand your point here. There is a problem with assigned-clock DT properties and orphan clks and making orphan clks defer driver probes. > > > > > Or we can make the defer on orphan code only defer if the clk has a > > single parent and it's an orphan and return the clk if there is at least > > one parent of the clk that has been registered and isn't marked as an > > orphan. That would need to be recursively applied, so I guess we would > > update our orphan marking code to indicate that clk_get() on the clk > > should probe defer or not instead of indicating the clk's orphan status. > > Doing this would also side-step the problem Rockchip was having where > > certain parents never appeared but the clk was parented to it in > > hardware, essentially blocking the clk from ever being touched by > > consumers. > > ... and now you lost me completly :) Hmm alright. > > > > > On the other hand, not having the clk there causes us to do a global > > search of the clk name space each time we look for parents by the > > "unknown" index, which in the Rockchip case was quite often because the > > clk was changing between two other parents with a third one that never > > got registered. So it may be better to just register an "unknown" clk as > > a root clk with a rate of 0 that you can then hook everything up to that > > is hardware reset to this unknown input. That way nothing is orphaned > > and everyone is happy. We could do this for the audio case I talked > > about earlier too so when the external pin is left unconnected we could > > register some 'unconnected' clk and parent the audio clk to it. > > We have sometimes a few orphans in the amlogic clock tree, it does not seems > to a problem. I suppose drivers aren't calling clk_set_rate() on the orphaned clks and experiencing problems? There are problems around clk rates that could happen if there aren't any proper parents of a mux. > > I don't really understand the benefit of having a fake clock that would adopt > all the orphan ? You guess there is problem I'm not aware of ... I see three cases that can go wrong if we make orphan clks probe defer clk_get(): 1) NULL as the parent name for some parent index that the hardware indicates is the current parent of the clk when the clk is registered. In this case, we need to make clk_get() succeed and not return -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the framework so that clks aren't orphaned when the hardware is indicating the parent is something we don't (and won't) know anything about. 2) get_parent/set_parent on a clk with num_parents == 1 This seems to be how ao_cts_cec is implemented. This hides the fact that get_parent may return some number above the number of parents and then has that quietly make the parent of the clk NULL during registration so that it's orphaned. Then we rely on set_rate() or set_parent() being called to fix the parent chain at a later time. That is complicated and non-obvious. I'd rather that we specify NULL as a parent in the parent_names array if we're going to return that index to the framework and then have the framework go to case #1. This makes it easier for the framework to assume that [0, num_parents) is the valid range of indices returned from the get_parent() op and a number outside that range is an error, clearing the way for errors to be returned from get_parent() if it somehow fails. 3) A string as the parent name for some parent index that the hardware indicates is the current parent of the clk when the clk is registered, but that parent clk may never be registered with the framework. I imagine this can happen with an external gpio clk that feeds some SoC internal clk. That gpio could be supplied by some clk signal that the board designer decides to send there, and it could also just "magically appear" by applying some DT overlay. Due to our use of global strings for parent linking, we don't have a good way to figure out that nothing is connected to some clock input signal, making this quite a nightmare to figure out a runtime! One solution is to replace the parent name with NULL and ignore the runtime update case. This will work and bring us back to case #1. The problem is that if it is connected to something like a GPIO and it defaults to that pin instead of some internal clk we're stuck with an "orphan" clk that we can't clk_get(). I suppose the driver could read DT and see if this pin is not connected and then mark this input parent name as NULL. We can punt on solving the DT overlay problem until later, because we would need some way to dynamically resolve a new parent of a clk and update the tree when the DT node is modified to use a new GPIO.
On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote: > Sorry this email is long. I tried reply the best I could to your use cases > > TL;DR is that I don't think we need to revert the patch as you suggest. > Instead, we should fix the driver to indicate NULL as the parent name of > the index it tells the framework about. The driver, can't indicate NULL, it just provides the index of the parent. What we are saying is not very far apart. Drivers implementing get_parent() must have way to tell the framework "I have no idea what I'm connected to ATM" Since get_parent() return a u8, the only way to signal a problem like this is to return an out of bound value, which the framework already safely understand and maps to NULL, as you request This is done in the other patch I sent (... and now they are related ;) ...) by returning num_parents in the clk_mux code This is an important case, whatever the number of parent. I think it makes no sense that a clock with 2 known parents can return this error, a one with only one known parent can't. This why I still think the change should be reverted. If you really insist on keeping it, there should at least be a warning when registering a clock with 'get_parent != NULL' and 'num_parents == 1', to let the user know that whatever is implemented there will be ignored ... instead of silently ignoring it. I think it would be a mistake, but at least there would be no suprise. Earlier, you mention having a special parent that would adopt the the orphan. Actually, it already exists, it is the framework itself. If, instead of returning 0, it returned a special value to let us know we reached the root and there actually no signal, we would be easy to make sure the orphan (with no signal) are never accidently used. > > Quoting Jerome Brunet (2018-12-05 09:20:09) > > On Tue, 2018-12-04 at 23:54 -0800, Stephen Boyd wrote: > > > Ok. Thanks for the explanation. What do you do to fix this problem in > > > the 'ao_cts_cec' clk implementation? > > > > IMO, there is nothing to fix, the implementation is correct. > > > > > Sounds like you just rely on > > > clk_set_rate() to fix this all up for you when the consumer changes the > > > rate? > > > > To setup the rate as the customer expect, yes. > > Ok. > > > > If we have something that is default parented to something we're not > > > telling the framework about for some reason then one solution would be > > > to have some init code reparent the clk in hardware before registering > > > it with the framework. Basically "fix up" the clk tree in hardware to be > > > consistent with what software can reason about. > > > > Should we really ? Something the framework does not know about is not > > necessarily something wrong. > > > > It would best if we could model the tree completly, but most of the time, > > we > > just approximate it the best we can. > > > > The framework just knows how to setup rates, it has no idea what rate > > needs to > > be set or when - And I think it is best if does not make any assumption > > about > > that. > > > > Pushing it a bit, this unknown parent might important to the boot > > sequence. > > How would you know when it is safe to change the setting ? > > What would change it to ? It Seems obvious when there is only one known > > parent, but what if there two ? which one do you pick ? Is it really the > > role > > of CCF to make this kind of choice ? > > Agreed. We can't mandate this requirement because of what you say. There > are pitfalls such as not always knowing if it's safe to change the clk > tree before clks are registered. > > > > This also reminds me of some audio clks I've seen before where the > > > default parent is some external pin and it can be reparented to an > > > internal clk with clk_set_rate/parent. In that case, I think we forced > > > the parent over to the internal clk before registering so that it would > > > always get parented properly in the framework. Right now, this is going > > > to cause problems for plans to probe defer clk_get() on orphan clks. > > > > clk_get() on orphaned clock should probably return an error if it is not > > the > > case already ? or 0 maybe ? > > I'd prefer that it return EPROBE_DEFER if the clk is an orphan, except > that it causes problems for this case where muxes move away from > something unknown to something known. > > > > Maybe this could work better if we allowed > > > 'assigned-clock-parents/rates' to reparent clks regardless of orphan > > > status and/or had some flag that could be set on clks to indicate that > > > we're OK if clk_get() is called on it when it's an orphan because this > > > isn't a problem? > > > > Not sure I understand your point here. > > There is a problem with assigned-clock DT properties and orphan clks and > making orphan clks defer driver probes. > > > > Or we can make the defer on orphan code only defer if the clk has a > > > single parent and it's an orphan and return the clk if there is at least > > > one parent of the clk that has been registered and isn't marked as an > > > orphan. That would need to be recursively applied, so I guess we would > > > update our orphan marking code to indicate that clk_get() on the clk > > > should probe defer or not instead of indicating the clk's orphan status. > > > Doing this would also side-step the problem Rockchip was having where > > > certain parents never appeared but the clk was parented to it in > > > hardware, essentially blocking the clk from ever being touched by > > > consumers. > > > > ... and now you lost me completly :) > > Hmm alright. > > > > On the other hand, not having the clk there causes us to do a global > > > search of the clk name space each time we look for parents by the > > > "unknown" index, which in the Rockchip case was quite often because the > > > clk was changing between two other parents with a third one that never > > > got registered. So it may be better to just register an "unknown" clk as > > > a root clk with a rate of 0 that you can then hook everything up to that > > > is hardware reset to this unknown input. That way nothing is orphaned > > > and everyone is happy. We could do this for the audio case I talked > > > about earlier too so when the external pin is left unconnected we could > > > register some 'unconnected' clk and parent the audio clk to it. > > > > We have sometimes a few orphans in the amlogic clock tree, it does not > > seems > > to a problem. To the best of my knowledge, there will always be orphan in the tree. The root clock (usually an xtal) being a special case of this. It seems to me that the problem is we can't differentiate between this special orphan and the rest because none will return an error when calling get_rate() xtal will provide something to round_rate() while other clock which are supposed to be orphaned will defer to the framework which gives a 0 rate. I think this is your real issue with orphan in the framework. The framework does not distinguish between a signal with rate == 0 and an error (no signal) > > I suppose drivers aren't calling clk_set_rate() on the orphaned clks and > experiencing problems? There are problems around clk rates that could > happen if there aren't any proper parents of a mux. > > > I don't really understand the benefit of having a fake clock that would > > adopt > > all the orphan ? You guess there is problem I'm not aware of ... > > I see three cases that can go wrong if we make orphan clks probe defer > clk_get(): > > 1) NULL as the parent name for some parent index that the hardware > indicates is the current parent of the clk when the clk is registered. I don't think you can have NULL has a parent_name > > In this case, we need to make clk_get() succeed and not return > -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the > framework so that clks aren't orphaned when the hardware is indicating > the parent is something we don't (and won't) know anything about. I think DEFER is returned when clock is unknown, not when it is orphaned. > > 2) get_parent/set_parent on a clk with num_parents == 1 > > This seems to be how ao_cts_cec is implemented. This hides the fact that > get_parent may return some number above the number of parents The index returned will be either 0, to indicate the only known parent, or num_parents (1) to indicate an invalid parent to framework. This is the topic of the other patch I sent you You must have way for a driver to tell you that it is not connected to anything it knows. That's a very basic error case and this is not specific to num_parents == 1. With any number of num_parent, there is still a possibility that a driver ends up in configuration where it cannot tell what it is his parent. (invalid combination, reserved values, crappy doc ...) This is why is important to call get_parent() (if available) even when num_parent == 1. The driver might tell if it is connected as expected - or not > and then > has that quietly make the parent of the clk NULL during registration so > that it's orphaned. Then we rely on set_rate() or set_parent() being > called to fix the parent chain at a later time. > > That is complicated and non-obvious. I'd rather that we specify NULL as > a parent in the parent_names array if we're going to return that index > to the framework and then have the framework go to case #1. This makes > it easier for the framework to assume that [0, num_parents) is the valid > range of indices returned from the get_parent() op and a number outside > that range is an error, clearing the way for errors to be returned from > get_parent() if it somehow fails. The fix I sent is quite simple. The above would require that: 1) the framework accept NULL in the list of parent (which it forbids ATM AFAIK) 2) Have all drivers account for this a create this fake parent This seems to be putting more work on the drivers with no real gain for the framework, none that I can see at least. > > 3) A string as the parent name for some parent index that the hardware > indicates is the current parent of the clk when the clk is registered, > but that parent clk may never be registered with the framework. > > I imagine this can happen with an external gpio clk that feeds some SoC > internal clk. That gpio could be supplied by some clk signal that the > board designer decides to send there, and it could also just "magically > appear" by applying some DT overlay. Due to our use of global strings > for parent linking, we don't have a good way to figure out that nothing > is connected to some clock input signal, making this quite a nightmare > to figure out a runtime! > > One solution is to replace the parent name with NULL and ignore the > runtime update case. This will work and bring us back to case #1. The > problem is that if it is connected to something like a GPIO and it > defaults to that pin instead of some internal clk we're stuck with an > "orphan" clk that we can't clk_get(). I suppose the driver could read DT > and see if this pin is not connected and then mark this input parent > name as NULL. We can punt on solving the DT overlay problem until later, > because we would need some way to dynamically resolve a new parent of a > clk and update the tree when the DT node is modified to use a new GPIO. I don't understand what is the problem here. In Amlogic we have a lot of this cases, with optional input clocks of the axg- audio clock controller. Many muxes have parent names of clock that don't exists. CCF will return a 0 rate for these inputs. It works well as it is, but as said above, I think it would be better if we could distinguish between a signal with 0 rate and no signal at all (error) >
Quoting Jerome Brunet (2018-12-07 02:03:19) > On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote: > > Sorry this email is long. > > I tried reply the best I could to your use cases > > > > > TL;DR is that I don't think we need to revert the patch as you suggest. > > Instead, we should fix the driver to indicate NULL as the parent name of > > the index it tells the framework about. > > The driver, can't indicate NULL, it just provides the index of the parent. > What we are saying is not very far apart. > > Drivers implementing get_parent() must have way to tell the framework "I have > no idea what I'm connected to ATM" > > Since get_parent() return a u8, the only way to signal a problem like this is > to return an out of bound value, which the framework already safely understand > and maps to NULL, as you request I'm saying that the clk with two parents but only one 'known' clk parent should have a parent_names array of two strings, where the second element is NULL. .parent_names = { "known_clk", NULL } .num_parents = 2, And then the .get_parent op is OK to return '1' and then we know that the clk isn't known to be anything, but it's otherwise a valid index for the clk to be parented to. This is the orphan case. It's not the error case, which would be some value returned from .get_parent() that's >= num_parents. I am not talking about the return value from get_parent() returning NULL or the driver indicating that the parent is a NULL clk. I'm just saying that the clk driver that wants to return an index that's outside the bounds of the parent_names array needs to do the right thing and either grow the parent_names array to say "it's unknown what's here, I've indicated that with a NULL name", or it needs to return another value outside of the array so the framework knows it's an error. > > This is done in the other patch I sent (... and now they are related ;) ...) > by returning num_parents in the clk_mux code > > This is an important case, whatever the number of parent. I think it makes no > sense that a clock with 2 known parents can return this error, a one with only > one known parent can't. > > This why I still think the change should be reverted. > > If you really insist on keeping it, there should at least be a warning when > registering a clock with 'get_parent != NULL' and 'num_parents == 1', to let > the user know that whatever is implemented there will be ignored ... instead > of silently ignoring it. I think it would be a mistake, but at least there > would be no suprise. Maybe we should add the warning. At least it would be good to grep around and try to find these cases and update them. The documentation on the get_parent clk op says it's unnecessary right now when 0 or 1 parent exists, so maybe that needs an update too. > > Earlier, you mention having a special parent that would adopt the the orphan. > Actually, it already exists, it is the framework itself. If, instead of > returning 0, it returned a special value to let us know we reached the root > and there actually no signal, we would be easy to make sure the orphan (with > no signal) are never accidently used. I don't understand this part. We have orphan tracking code that knows when a clk is no longer in an orphan chain (i.e. there is a signal there). > > > > > Quoting Jerome Brunet (2018-12-05 09:20:09) > > > We have sometimes a few orphans in the amlogic clock tree, it does not > > > seems > > > to a problem. > > To the best of my knowledge, there will always be orphan in the tree. The root > clock (usually an xtal) being a special case of this. > > It seems to me that the problem is we can't differentiate between this special > orphan and the rest because none will return an error when calling get_rate() The root clk isn't an orphan. It has num_parents == 0 and is treated specially as such. > > xtal will provide something to round_rate() while other clock which are > supposed to be orphaned will defer to the framework which gives a 0 rate. > > I think this is your real issue with orphan in the framework. The framework > does not distinguish between a signal with rate == 0 and an error (no signal) Ok. I'm not sure that rate is the entire problem, but I suppose it's one problem. > > > > I suppose drivers aren't calling clk_set_rate() on the orphaned clks and > > experiencing problems? There are problems around clk rates that could > > happen if there aren't any proper parents of a mux. > > > > > I don't really understand the benefit of having a fake clock that would > > > adopt > > > all the orphan ? You guess there is problem I'm not aware of ... > > > > I see three cases that can go wrong if we make orphan clks probe defer > > clk_get(): > > > > 1) NULL as the parent name for some parent index that the hardware > > indicates is the current parent of the clk when the clk is registered. > > I don't think you can have NULL has a parent_name Yes, it looks like Mike made that decision when introducing the CCF in 2012. I don't know why it is a problem though. Maybe kstrdup() on a NULL string returns NULl and then errors can't be detected? Should be simple enough to skip those ones when doing the deep copy though. > > > > > In this case, we need to make clk_get() succeed and not return > > -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the > > framework so that clks aren't orphaned when the hardware is indicating > > the parent is something we don't (and won't) know anything about. > > I think DEFER is returned when clock is unknown, not when it is orphaned. Yes it's only returned right now if the provider node hasn't registered a provider function. We have wanted to extend that further so that clks can't be acquired until the clk is properly rooted in the clk tree instead of being orphaned. It may not matter on amlogic but it matters on other platforms. > > > > > 2) get_parent/set_parent on a clk with num_parents == 1 > > > > This seems to be how ao_cts_cec is implemented. This hides the fact that > > get_parent may return some number above the number of parents > > The index returned will be either 0, to indicate the only known parent, or > num_parents (1) to indicate an invalid parent to framework. This is the topic > of the other patch I sent you > > You must have way for a driver to tell you that it is not connected to > anything it knows. That's a very basic error case and this is not specific to > num_parents == 1. > > With any number of num_parent, there is still a possibility that a driver ends > up in configuration where it cannot tell what it is his parent. (invalid > combination, reserved values, crappy doc ...) > > This is why is important to call get_parent() (if available) even when > num_parent == 1. The driver might tell if it is connected as expected - or not Ok. Thanks for the explanation. What should the framework do when the parent is not known? Mark the clk as not an orphan but parent it to NULL? That might work here. > > > and then > > has that quietly make the parent of the clk NULL during registration so > > that it's orphaned. Then we rely on set_rate() or set_parent() being > > called to fix the parent chain at a later time. > > > > That is complicated and non-obvious. I'd rather that we specify NULL as > > a parent in the parent_names array if we're going to return that index > > to the framework and then have the framework go to case #1. This makes > > it easier for the framework to assume that [0, num_parents) is the valid > > range of indices returned from the get_parent() op and a number outside > > that range is an error, clearing the way for errors to be returned from > > get_parent() if it somehow fails. > > The fix I sent is quite simple. > > The above would require that: > 1) the framework accept NULL in the list of parent (which it forbids ATM > AFAIK) > 2) Have all drivers account for this a create this fake parent > > This seems to be putting more work on the drivers with no real gain for the > framework, none that I can see at least. > > > > > 3) A string as the parent name for some parent index that the hardware > > indicates is the current parent of the clk when the clk is registered, > > but that parent clk may never be registered with the framework. > > > > I imagine this can happen with an external gpio clk that feeds some SoC > > internal clk. That gpio could be supplied by some clk signal that the > > board designer decides to send there, and it could also just "magically > > appear" by applying some DT overlay. Due to our use of global strings > > for parent linking, we don't have a good way to figure out that nothing > > is connected to some clock input signal, making this quite a nightmare > > to figure out a runtime! > > > > One solution is to replace the parent name with NULL and ignore the > > runtime update case. This will work and bring us back to case #1. The > > problem is that if it is connected to something like a GPIO and it > > defaults to that pin instead of some internal clk we're stuck with an > > "orphan" clk that we can't clk_get(). I suppose the driver could read DT > > and see if this pin is not connected and then mark this input parent > > name as NULL. We can punt on solving the DT overlay problem until later, > > because we would need some way to dynamically resolve a new parent of a > > clk and update the tree when the DT node is modified to use a new GPIO. > > I don't understand what is the problem here. > In Amlogic we have a lot of this cases, with optional input clocks of the axg- > audio clock controller. Many muxes have parent names of clock that don't > exists. > > CCF will return a 0 rate for these inputs. > It works well as it is, but as said above, I think it would be better if we > could distinguish between a signal with 0 rate and no signal at all (error) > Ok. It sounds like we want similar things but I'm worried about the orphan probe defer logic getting messed up. How about we tri-state the get_parent() return value? So now [0, num_parents) means a valid parent is known, num_parents means "I don't know what it is but it must be something valid", and [num_parents + 1, 255] means some error occurred. Sounds obfuscated to me still. Alternatively, we could make a get_parent_hw() op that returns a pointer to a clk_hw structure of the parent clk. Then drivers can be converted to use this new clk_op instead of get_parent and we can have it return an error pointer when some error happens, NULL when the parent is not known to the software but is otherwise valid, and the parent pointer when the parent is findable. It would still be tri-stated then, but at least it wouldn't be some odd u8 numberspace that it all has to fit into. This patch starts to do that. What do you think? Clk providers would need to implement a clk_ops that mostly does clk_hw_get_parent_by_index(), or they could stash the parent_hw pointer in their structure, or index into it based on some global numberspace in their driver. -----8<---- diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index af011974d4ec..5fe3c41e0b7b 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -2242,14 +2242,84 @@ struct clk *clk_get_parent(struct clk *clk) } EXPORT_SYMBOL_GPL(clk_get_parent); -static struct clk_core *__clk_init_parent(struct clk_core *core) +static struct clk_core * +__clk_init_parent(struct clk_core *core, bool update_orphan) { u8 index = 0; + struct clk_hw *parent_hw = NULL; - if (core->num_parents > 1 && core->ops->get_parent) - index = core->ops->get_parent(core->hw); + if (core->ops->get_parent_hw) { + parent_hw = core->ops->get_parent_hw(core->hw); + /* + * The provider driver doesn't know what the parent is, + * but it's at least something valid, so it's not an + * orphan, just a clk with some unknown parent. + */ + if (!parent_hw && update_orphan) + core->orphan = false; + } else { + if (core->num_parents > 1 && core->ops->get_parent) + index = core->ops->get_parent(core->hw); + + parent_hw = clk_hw_get_parent_by_index(core->hw, index); + } + + if (IS_ERR(parent_hw)) { + /* Parent clk provider hasn't probed yet, orphan it */ + if (PTR_ERR(parent_hw) == -EPROBE_DEFER) { + if (update_orphan) + core->orphan = true; + + return NULL; + } + + return ERR_CAST(parent_hw); + } + + if (!parent_hw) + return NULL; + + return parent_hw->core; +} + +static int clk_init_parent(struct clk_core *core) +{ + core->parent = __clk_init_parent(core, true); + if (IS_ERR(core->parent)) + return PTR_ERR(core->parent); + + /* + * Populate core->parent if parent has already been clk_core_init'd. If + * parent has not yet been clk_core_init'd then place clk in the orphan + * list. If clk doesn't have any parents then place it in the root + * clk list. + * + * Every time a new clk is clk_init'd then we walk the list of orphan + * clocks and re-parent any that are children of the clock currently + * being clk_init'd. + */ + if (core->parent) { + hlist_add_head(&core->child_node, + &core->parent->children); + core->orphan = core->parent->orphan; + } else if (!core->num_parents) { + hlist_add_head(&core->child_node, &clk_root_list); + core->orphan = false; + } else { + hlist_add_head(&core->child_node, &clk_orphan_list); + } + + return 0; +} - return clk_core_get_parent_by_index(core, index); +static struct clk_core *clk_find_parent(struct clk_core *core) +{ + return __clk_init_parent(core, false); +} + +static bool clk_has_parent_op(const struct clk_ops *ops) +{ + return ops->get_parent || ops->get_parent_hw; } static void clk_core_reparent(struct clk_core *core, @@ -3045,14 +3115,14 @@ static int __clk_core_init(struct clk_core *core) goto out; } - if (core->ops->set_parent && !core->ops->get_parent) { + if (core->ops->set_parent && !clk_has_parent_op(core->ops)) { pr_err("%s: %s must implement .get_parent & .set_parent\n", __func__, core->name); ret = -EINVAL; goto out; } - if (core->num_parents > 1 && !core->ops->get_parent) { + if (core->num_parents > 1 && !clk_has_parent_op(core->ops)) { pr_err("%s: %s must implement .get_parent as it has multi parents\n", __func__, core->name); ret = -EINVAL; @@ -3073,29 +3143,9 @@ static int __clk_core_init(struct clk_core *core) "%s: invalid NULL in %s's .parent_names\n", __func__, core->name); - core->parent = __clk_init_parent(core); - - /* - * Populate core->parent if parent has already been clk_core_init'd. If - * parent has not yet been clk_core_init'd then place clk in the orphan - * list. If clk doesn't have any parents then place it in the root - * clk list. - * - * Every time a new clk is clk_init'd then we walk the list of orphan - * clocks and re-parent any that are children of the clock currently - * being clk_init'd. - */ - if (core->parent) { - hlist_add_head(&core->child_node, - &core->parent->children); - core->orphan = core->parent->orphan; - } else if (!core->num_parents) { - hlist_add_head(&core->child_node, &clk_root_list); - core->orphan = false; - } else { - hlist_add_head(&core->child_node, &clk_orphan_list); - core->orphan = true; - } + ret = clk_init_parent(core); + if (ret) + goto out; /* * optional platform-specific magic @@ -3173,7 +3223,14 @@ static int __clk_core_init(struct clk_core *core) * parent. */ hlist_for_each_entry_safe(orphan, tmp2, &clk_orphan_list, child_node) { - struct clk_core *parent = __clk_init_parent(orphan); + struct clk_core *parent = clk_find_parent(orphan); + + /* + * Error parent should have been caught before and returned + * as an error during registration. + */ + if (WARN_ON_ONCE(IS_ERR(parent))) + continue; /* * We need to use __clk_set_parent_before() and _after() to diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 60c51871b04b..8b84dee942bf 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -155,6 +155,14 @@ struct clk_duty { * multiple parents. It is optional (and unnecessary) for clocks * with 0 or 1 parents. * + * @get_parent_hw: Queries the hardware to determine the parent of a clock. The + * return value is a clk_hw pointer corresponding to + * the parent clock. In short, this function translates the parent + * value read from hardware into a pointer to the clk_hw for that clk. + * Currently only called when the clock is initialized by + * __clk_init. This callback is mandatory for clocks with + * multiple parents. It is optional for clocks with 0 or 1 parents. + * * @set_rate: Change the rate of this clock. The requested rate is specified * by the second argument, which should typically be the return * of .round_rate call. The third argument gives the parent rate @@ -238,6 +246,7 @@ struct clk_ops { struct clk_rate_request *req); int (*set_parent)(struct clk_hw *hw, u8 index); u8 (*get_parent)(struct clk_hw *hw); + struct clk_hw * (*get_parent_hw)(struct clk_hw *hw); int (*set_rate)(struct clk_hw *hw, unsigned long rate, unsigned long parent_rate); int (*set_rate_and_parent)(struct clk_hw *hw,
Quoting Dinh Nguyen (2018-12-06 07:16:47) > > > On 12/5/18 9:55 AM, Stephen Boyd wrote: > > Quoting Dinh Nguyen (2018-12-05 07:17:41) > >> Hi Stephen, > >> > >> On 12/5/18 1:17 AM, Stephen Boyd wrote: > >>> (Adding Dinh's korg email) > >>> > >>> I also wonder if this driver is even used anymore or maybe we can delete > >>> it? > >>> > >> > >> The armv7 SoCFPGA platforms are using this driver. > >> > > > > Ok and those platforms are booting in kernelci.org as > > https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps? > > > > Yes that's right. > Ok. Dinh, can you review this patch? Or test it out? Or do I need to get someone in kernelci to try it out?
On 12/17/18 7:54 PM, Stephen Boyd wrote: > Quoting Dinh Nguyen (2018-12-06 07:16:47) >> >> >> On 12/5/18 9:55 AM, Stephen Boyd wrote: >>> Quoting Dinh Nguyen (2018-12-05 07:17:41) >>>> Hi Stephen, >>>> >>>> On 12/5/18 1:17 AM, Stephen Boyd wrote: >>>>> (Adding Dinh's korg email) >>>>> >>>>> I also wonder if this driver is even used anymore or maybe we can delete >>>>> it? >>>>> >>>> >>>> The armv7 SoCFPGA platforms are using this driver. >>>> >>> >>> Ok and those platforms are booting in kernelci.org as >>> https://kernelci.org/boot/socfpga_cyclone5_de0_sockit/ perhaps? >>> >> >> Yes that's right. >> > > Ok. Dinh, can you review this patch? Or test it out? Or do I need to get > someone in kernelci to try it out? > Just tested on an Arria5 board, feel free to add: Tested-by: Dinh Nguyen <dinguyen@kernel.org> Thanks, Dinh
On Mon, 2018-12-17 at 17:54 -0800, Stephen Boyd wrote: > Quoting Jerome Brunet (2018-12-07 02:03:19) > > On Thu, 2018-12-06 at 14:08 -0800, Stephen Boyd wrote: > > > Sorry this email is long. > > > > I tried reply the best I could to your use cases > > > > > TL;DR is that I don't think we need to revert the patch as you suggest. > > > Instead, we should fix the driver to indicate NULL as the parent name of > > > the index it tells the framework about. > > > > The driver, can't indicate NULL, it just provides the index of the parent. > > What we are saying is not very far apart. > > > > Drivers implementing get_parent() must have way to tell the framework "I > > have > > no idea what I'm connected to ATM" > > > > Since get_parent() return a u8, the only way to signal a problem like this > > is > > to return an out of bound value, which the framework already safely > > understand > > and maps to NULL, as you request > > I'm saying that the clk with two parents but only one 'known' clk parent > should have a parent_names array of two strings, where the second > element is NULL. > > .parent_names = { "known_clk", NULL } > .num_parents = 2, The problem is that CCF disallowed this from the beginning when rejecting NULL in parent tables. Are saying we should find and fix all the controllers with mux using value table to avoid unknown/disconnected mux inputs ... sounds like a lot of work for a benefit I still don't really understand. Anywayu, adding support for this would be nice, I just don't understand how the proposed patch how here makes it any harder. > > And then the .get_parent op is OK to return '1' and then we know that > the clk isn't known to be anything, but it's otherwise a valid index for > the clk to be parented to. This is the orphan case. It's not the error > case, which would be some value returned from .get_parent() that's >= > num_parents. > > I am not talking about the return value from get_parent() returning NULL > or the driver indicating that the parent is a NULL clk. I'm just saying > that the clk driver that wants to return an index that's outside the > bounds of the parent_names array needs to do the right thing and either > grow the parent_names array to say "it's unknown what's here, I've > indicated that with a NULL name", or it needs to return another value > outside of the array so the framework knows it's an error. AFAIK, the framework knows that index == 'num_parents' is an error. > > > This is done in the other patch I sent (... and now they are related ;) > > ...) > > by returning num_parents in the clk_mux code > > > > This is an important case, whatever the number of parent. I think it makes > > no > > sense that a clock with 2 known parents can return this error, a one with > > only > > one known parent can't. > > This why I still think the change should be reverted. > > > > If you really insist on keeping it, there should at least be a warning > > when > > registering a clock with 'get_parent != NULL' and 'num_parents == 1', to > > let > > the user know that whatever is implemented there will be ignored ... > > instead > > of silently ignoring it. I think it would be a mistake, but at least there > > would be no suprise. > > Maybe we should add the warning. At least it would be good to grep > around and try to find these cases and update them. The documentation on > the get_parent clk op says it's unnecessary right now when 0 or 1 parent > exists, so maybe that needs an update too. If we are going to ignore something, I think that would be a minimum, yes. I still don't like the fact that may we chose to ignore this. > > > Earlier, you mention having a special parent that would adopt the the > > orphan. > > Actually, it already exists, it is the framework itself. If, instead of > > returning 0, it returned a special value to let us know we reached the > > root > > and there actually no signal, we would be easy to make sure the orphan > > (with > > no signal) are never accidently used. > > I don't understand this part. We have orphan tracking code that knows > when a clk is no longer in an orphan chain (i.e. there is a signal > there). > > > > Quoting Jerome Brunet (2018-12-05 09:20:09) > > > > We have sometimes a few orphans in the amlogic clock tree, it does not > > > > seems > > > > to a problem. > > > > To the best of my knowledge, there will always be orphan in the tree. The > > root > > clock (usually an xtal) being a special case of this. > > > > It seems to me that the problem is we can't differentiate between this > > special > > orphan and the rest because none will return an error when calling > > get_rate() > > The root clk isn't an orphan. It has num_parents == 0 and is treated > specially as such. got confused, sorry > > > xtal will provide something to round_rate() while other clock which are > > supposed to be orphaned will defer to the framework which gives a 0 rate. > > > > I think this is your real issue with orphan in the framework. The > > framework > > does not distinguish between a signal with rate == 0 and an error (no > > signal) > > Ok. I'm not sure that rate is the entire problem, but I suppose it's one > problem. > > > > I suppose drivers aren't calling clk_set_rate() on the orphaned clks and > > > experiencing problems? There are problems around clk rates that could > > > happen if there aren't any proper parents of a mux. > > > > > > > I don't really understand the benefit of having a fake clock that > > > > would > > > > adopt > > > > all the orphan ? You guess there is problem I'm not aware of ... > > > > > > I see three cases that can go wrong if we make orphan clks probe defer > > > clk_get(): > > > > > > 1) NULL as the parent name for some parent index that the hardware > > > indicates is the current parent of the clk when the clk is > > > registered. > > > > I don't think you can have NULL has a parent_name > > Yes, it looks like Mike made that decision when introducing the CCF in > 2012. I don't know why it is a problem though. Maybe kstrdup() on a NULL > string returns NULl and then errors can't be detected? Should be simple > enough to skip those ones when doing the deep copy though. Maybe but all controller so far have been done with this in mind, what about those ? > > > > In this case, we need to make clk_get() succeed and not return > > > -EPROBE_DEFER, so I think we need to make a special "unknown" clk in the > > > framework so that clks aren't orphaned when the hardware is indicating > > > the parent is something we don't (and won't) know anything about. > > > > I think DEFER is returned when clock is unknown, not when it is orphaned. > > Yes it's only returned right now if the provider node hasn't registered > a provider function. We have wanted to extend that further so that clks > can't be acquired until the clk is properly rooted in the clk tree > instead of being orphaned. It may not matter on amlogic but it matters > on other platforms. > > > > 2) get_parent/set_parent on a clk with num_parents == 1 > > > > > > This seems to be how ao_cts_cec is implemented. This hides the fact that > > > get_parent may return some number above the number of parents > > > > The index returned will be either 0, to indicate the only known parent, or > > num_parents (1) to indicate an invalid parent to framework. This is the > > topic > > of the other patch I sent you > > > > You must have way for a driver to tell you that it is not connected to > > anything it knows. That's a very basic error case and this is not specific > > to > > num_parents == 1. > > > > With any number of num_parent, there is still a possibility that a driver > > ends > > up in configuration where it cannot tell what it is his parent. (invalid > > combination, reserved values, crappy doc ...) > > > > This is why is important to call get_parent() (if available) even when > > num_parent == 1. The driver might tell if it is connected as expected - or > > not > > Ok. Thanks for the explanation. What should the framework do when the > parent is not known? Mark the clk as not an orphan but parent it to > NULL? That might work here. I don't really get the difference in the framework ATM. Is this linked to the 'orphan probe defer logic' WIP you have been mentionning ? > > > > and then > > > has that quietly make the parent of the clk NULL during registration so > > > that it's orphaned. Then we rely on set_rate() or set_parent() being > > > called to fix the parent chain at a later time. > > > > > > That is complicated and non-obvious. I'd rather that we specify NULL as > > > a parent in the parent_names array if we're going to return that index > > > to the framework and then have the framework go to case #1. This makes > > > it easier for the framework to assume that [0, num_parents) is the valid > > > range of indices returned from the get_parent() op and a number outside > > > that range is an error, clearing the way for errors to be returned from > > > get_parent() if it somehow fails. > > > > The fix I sent is quite simple. > > > > The above would require that: > > 1) the framework accept NULL in the list of parent (which it forbids ATM > > AFAIK) > > 2) Have all drivers account for this a create this fake parent > > > > This seems to be putting more work on the drivers with no real gain for > > the > > framework, none that I can see at least. > > > > > 3) A string as the parent name for some parent index that the hardware > > > indicates is the current parent of the clk when the clk is > > > registered, > > > but that parent clk may never be registered with the framework. > > > > > > I imagine this can happen with an external gpio clk that feeds some SoC > > > internal clk. That gpio could be supplied by some clk signal that the > > > board designer decides to send there, and it could also just "magically > > > appear" by applying some DT overlay. Due to our use of global strings > > > for parent linking, we don't have a good way to figure out that nothing > > > is connected to some clock input signal, making this quite a nightmare > > > to figure out a runtime! > > > > > > One solution is to replace the parent name with NULL and ignore the > > > runtime update case. This will work and bring us back to case #1. The > > > problem is that if it is connected to something like a GPIO and it > > > defaults to that pin instead of some internal clk we're stuck with an > > > "orphan" clk that we can't clk_get(). I suppose the driver could read DT > > > and see if this pin is not connected and then mark this input parent > > > name as NULL. We can punt on solving the DT overlay problem until later, > > > because we would need some way to dynamically resolve a new parent of a > > > clk and update the tree when the DT node is modified to use a new GPIO. > > > > I don't understand what is the problem here. > > In Amlogic we have a lot of this cases, with optional input clocks of the > > axg- > > audio clock controller. Many muxes have parent names of clock that don't > > exists. > > > > CCF will return a 0 rate for these inputs. > > It works well as it is, but as said above, I think it would be better if > > we > > could distinguish between a signal with 0 rate and no signal at all > > (error) > > > All this below is very interresting but it looks like a very complex way to solve what was initially a very simple problem, with considerations that are still WIP a this stage. Even with these future constraints in mind, I don't understand how the proposed patch changes anything to the situtation. A clock with a get_parent() callback (and several known parents) may still return an out of bound value ... CCF will still have to deal with this case gracefully, as it is currently doing > Ok. It sounds like we want similar things but I'm worried about the > orphan probe defer logic getting messed up. > > How about we tri-state the get_parent() return value? So now [0, > num_parents) means a valid parent is known, num_parents means "I don't > know what it is but it must be something valid", and [num_parents + 1, > 255] means some error occurred. Sounds obfuscated to me still. > > Alternatively, we could make a get_parent_hw() op that returns a pointer > to a clk_hw structure of the parent clk. Then drivers can be converted > to use this new clk_op instead of get_parent and we can have it return > an error pointer when some error happens, NULL when the parent is not > known to the software but is otherwise valid, and the parent pointer > when the parent is findable. It would still be tri-stated then, but at > least it wouldn't be some odd u8 numberspace that it all has to fit > into. This patch starts to do that. What do you think? Clk providers > would need to implement a clk_ops that mostly does > clk_hw_get_parent_by_index(), or they could stash the parent_hw pointer > in their structure, or index into it based on some global numberspace in > their driver. > > -----8<---- > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index af011974d4ec..5fe3c41e0b7b 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -2242,14 +2242,84 @@ struct clk *clk_get_parent(struct clk *clk) > } > EXPORT_SYMBOL_GPL(clk_get_parent); > > -static struct clk_core *__clk_init_parent(struct clk_core *core) > +static struct clk_core * > +__clk_init_parent(struct clk_core *core, bool update_orphan) > { > u8 index = 0; > + struct clk_hw *parent_hw = NULL; > > - if (core->num_parents > 1 && core->ops->get_parent) > - index = core->ops->get_parent(core->hw); > + if (core->ops->get_parent_hw) { > + parent_hw = core->ops->get_parent_hw(core->hw); > + /* > + * The provider driver doesn't know what the parent is, > + * but it's at least something valid, so it's not an > + * orphan, just a clk with some unknown parent. > + */ > + if (!parent_hw && update_orphan) > + core->orphan = false; > + } else { > + if (core->num_parents > 1 && core->ops->get_parent) > + index = core->ops->get_parent(core->hw); > + > + parent_hw = clk_hw_get_parent_by_index(core->hw, index); > + } > + > + if (IS_ERR(parent_hw)) { > + /* Parent clk provider hasn't probed yet, orphan it */ > + if (PTR_ERR(parent_hw) == -EPROBE_DEFER) { > + if (update_orphan) > + core->orphan = true; > + > + return NULL; > + } > + > + return ERR_CAST(parent_hw); > + } > + > + if (!parent_hw) > + return NULL; > + > + return parent_hw->core; > +} > + > +static int clk_init_parent(struct clk_core *core) > +{ > + core->parent = __clk_init_parent(core, true); > + if (IS_ERR(core->parent)) > + return PTR_ERR(core->parent); > + > + /* > + * Populate core->parent if parent has already been clk_core_init'd. > If > + * parent has not yet been clk_core_init'd then place clk in the > orphan > + * list. If clk doesn't have any parents then place it in the root > + * clk list. > + * > + * Every time a new clk is clk_init'd then we walk the list of orphan > + * clocks and re-parent any that are children of the clock currently > + * being clk_init'd. > + */ > + if (core->parent) { > + hlist_add_head(&core->child_node, > + &core->parent->children); > + core->orphan = core->parent->orphan; > + } else if (!core->num_parents) { > + hlist_add_head(&core->child_node, &clk_root_list); > + core->orphan = false; > + } else { > + hlist_add_head(&core->child_node, &clk_orphan_list); > + } > + > + return 0; > +} > > - return clk_core_get_parent_by_index(core, index); > +static struct clk_core *clk_find_parent(struct clk_core *core) > +{ > + return __clk_init_parent(core, false); > +} > + > +static bool clk_has_parent_op(const struct clk_ops *ops) > +{ > + return ops->get_parent || ops->get_parent_hw; > } > > static void clk_core_reparent(struct clk_core *core, > @@ -3045,14 +3115,14 @@ static int __clk_core_init(struct clk_core *core) > goto out; > } > > - if (core->ops->set_parent && !core->ops->get_parent) { > + if (core->ops->set_parent && !clk_has_parent_op(core->ops)) { > pr_err("%s: %s must implement .get_parent & .set_parent\n", > __func__, core->name); > ret = -EINVAL; > goto out; > } > > - if (core->num_parents > 1 && !core->ops->get_parent) { > + if (core->num_parents > 1 && !clk_has_parent_op(core->ops)) { > pr_err("%s: %s must implement .get_parent as it has multi > parents\n", > __func__, core->name); > ret = -EINVAL; > @@ -3073,29 +3143,9 @@ static int __clk_core_init(struct clk_core *core) > "%s: invalid NULL in %s's .parent_names\n", > __func__, core->name); > > - core->parent = __clk_init_parent(core); > - > - /* > - * Populate core->parent if parent has already been clk_core_init'd. > If > - * parent has not yet been clk_core_init'd then place clk in the > orphan > - * list. If clk doesn't have any parents then place it in the root > - * clk list. > - * > - * Every time a new clk is clk_init'd then we walk the list of orphan > - * clocks and re-parent any that are children of the clock currently > - * being clk_init'd. > - */ > - if (core->parent) { > - hlist_add_head(&core->child_node, > - &core->parent->children); > - core->orphan = core->parent->orphan; > - } else if (!core->num_parents) { > - hlist_add_head(&core->child_node, &clk_root_list); > - core->orphan = false; > - } else { > - hlist_add_head(&core->child_node, &clk_orphan_list); > - core->orphan = true; > - } > + ret = clk_init_parent(core); > + if (ret) > + goto out; > > /* > * optional platform-specific magic > @@ -3173,7 +3223,14 @@ static int __clk_core_init(struct clk_core *core) > * parent. > */ > hlist_for_each_entry_safe(orphan, tmp2, &clk_orphan_list, child_node) > { > - struct clk_core *parent = __clk_init_parent(orphan); > + struct clk_core *parent = clk_find_parent(orphan); > + > + /* > + * Error parent should have been caught before and returned > + * as an error during registration. > + */ > + if (WARN_ON_ONCE(IS_ERR(parent))) > + continue; > > /* > * We need to use __clk_set_parent_before() and _after() to > diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h > index 60c51871b04b..8b84dee942bf 100644 > --- a/include/linux/clk-provider.h > +++ b/include/linux/clk-provider.h > @@ -155,6 +155,14 @@ struct clk_duty { > * multiple parents. It is optional (and unnecessary) for clocks > * with 0 or 1 parents. > * > + * @get_parent_hw: Queries the hardware to determine the parent of a > clock. The > + * return value is a clk_hw pointer corresponding to > + * the parent clock. In short, this function translates the > parent > + * value read from hardware into a pointer to the clk_hw for that > clk. > + * Currently only called when the clock is initialized by > + * __clk_init. This callback is mandatory for clocks with > + * multiple parents. It is optional for clocks with 0 or 1 > parents. > + * > * @set_rate: Change the rate of this clock. The requested rate is > specified > * by the second argument, which should typically be the return > * of .round_rate call. The third argument gives the parent rate > @@ -238,6 +246,7 @@ struct clk_ops { > struct clk_rate_request *req); > int (*set_parent)(struct clk_hw *hw, u8 index); > u8 (*get_parent)(struct clk_hw *hw); > + struct clk_hw * (*get_parent_hw)(struct clk_hw *hw); > int (*set_rate)(struct clk_hw *hw, unsigned long rate, > unsigned long parent_rate); > int (*set_rate_and_parent)(struct clk_hw *hw, > >
Quoting Jerome Brunet (2018-12-21 08:03:29) > On Mon, 2018-12-17 at 17:54 -0800, Stephen Boyd wrote: > > > > Ok. Thanks for the explanation. What should the framework do when the > > parent is not known? Mark the clk as not an orphan but parent it to > > NULL? That might work here. > > I don't really get the difference in the framework ATM. Is this linked to the > 'orphan probe defer logic' WIP you have been mentionning ? Yes, we have orphan marking logic, but we haven't finished the topic and returned PROBE_DEFER for orphaned clks. It was blocked on some Allwinner drivers having problems but that was over a year ago so I don't see anything blocking us now. I really want to do it soon so now is the time to fix all of this and see what breaks. > > All this below is very interresting but it looks like a very complex way to > solve what was initially a very simple problem, with considerations that are > still WIP a this stage. > > Even with these future constraints in mind, I don't understand how the > proposed patch changes anything to the situtation. A clock with a get_parent() > callback (and several known parents) may still return an out of bound value > ... CCF will still have to deal with this case gracefully, as it is currently > doing Ok. I think the problem you see is that the clk provider doesn't know anything besides parent name is a string? And so you're not sure how the provider can return NULL for "this clk won't ever appear", vs. an error pointer for "things failed while reading"? I think you understand I'm not trying to change clk_ops::get_parent(). I'm sidestepping the problem and other problems that we have with the return type of that clk_op (u8) by introducing a new clk_op that can return an error pointer for exceptional cases. Drivers would need to migrate to this new clk op to fix the problem you're solving, and those drivers will also need to know what their clk parents are and what the clk_hw pointers are for them.
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index af011974d4ec..1b33ef3c37f4 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -2246,7 +2246,7 @@ static struct clk_core *__clk_init_parent(struct clk_core *core) { u8 index = 0; - if (core->num_parents > 1 && core->ops->get_parent) + if (core->ops->get_parent) index = core->ops->get_parent(core->hw); return clk_core_get_parent_by_index(core, index);
This reverts commit 2430a94d1e719b7b4af2a65b781a4c036eb22e64. From the original commit message: It turned out a problem because there are some single-parent clocks that implement .get_parent() callback and return non-zero index. The SOCFPGA clock is the case; the commit broke the SOCFPGA boards. It is wrong for a clock to return an index >= num_parents. CCF checks for this condition in clk_core_get_parent_by_index(). This function sets the parent to NULL if index is incoherent, which seems like the only sane choice. commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks") appears to be a work around installed in the core framework for a problem that is platform specific, and should probably be fixed in the platform code. Further more, it introduces a problem in a corner case of the mux clock. Take mux with multiple parents, but only one is known, the rest being undocumented. The register reset has one of unknown parent set. Before commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"): * get_parent() is called, register is read and give an unknown index. -> the mux is orphaned. * a call to set_rate() will reparent the mux to the only known parent. With commit 2430a94d1e71 ("clk: fix __clk_init_parent() for single parent clocks"): * the register is never read. * parent is wrongly assumed to be the only known one. As a consequence, all the calculation deriving from the mux will be wrong * Since we believe the only know parent to be set, set_parent() won't ever be called and the register won't be set with the correct value. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> --- drivers/clk/clk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.19.1