diff mbox series

[07/13] clk: samsung: gs101: mark PERIC0 IP TOP gate clock as critical

Message ID 20231214105243.3707730-8-tudor.ambarus@linaro.org
State New
Headers show
Series GS101 Oriole: CMU_PERIC0 support and USI updates | expand

Commit Message

Tudor Ambarus Dec. 14, 2023, 10:52 a.m. UTC
Testing USI8 I2C with an eeprom revealed that when the USI8 leaf clock
is disabled it leads to the CMU_TOP PERIC0 IP gate clock disablement,
which then makes the system hang. To prevent this, mark
CLK_GOUT_CMU_PERIC0_IP as critical. Other clocks will be marked
accordingly when tested.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/clk/samsung/clk-gs101.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tudor Ambarus Dec. 19, 2023, 4:47 p.m. UTC | #1
Hi, Sam!

On 12/14/23 16:43, Sam Protsenko wrote:
> On Thu, Dec 14, 2023 at 10:15 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>
>>
>>
>> On 12/14/23 16:09, Sam Protsenko wrote:
>>> On Thu, Dec 14, 2023 at 10:01 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 12/14/23 15:37, Sam Protsenko wrote:
>>>>> On Thu, Dec 14, 2023 at 4:52 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>>>
>>>>>> Testing USI8 I2C with an eeprom revealed that when the USI8 leaf clock
>>>>>> is disabled it leads to the CMU_TOP PERIC0 IP gate clock disablement,
>>>>>> which then makes the system hang. To prevent this, mark
>>>>>> CLK_GOUT_CMU_PERIC0_IP as critical. Other clocks will be marked
>>>>>> accordingly when tested.
>>>>>>
>>>>>> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
>>>>>> ---
>>>>>>  drivers/clk/samsung/clk-gs101.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
>>>>>> index 3d194520b05e..08d80fca9cd6 100644
>>>>>> --- a/drivers/clk/samsung/clk-gs101.c
>>>>>> +++ b/drivers/clk/samsung/clk-gs101.c
>>>>>> @@ -1402,7 +1402,7 @@ static const struct samsung_gate_clock cmu_top_gate_clks[] __initconst = {
>>>>>>              "mout_cmu_peric0_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC0_BUS,
>>>>>>              21, 0, 0),
>>>>>>         GATE(CLK_GOUT_CMU_PERIC0_IP, "gout_cmu_peric0_ip", "mout_cmu_peric0_ip",
>>>>>> -            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, 0, 0),
>>>>>> +            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, CLK_IS_CRITICAL, 0),
>>>>>
>>>>> This clock doesn't seem like a leaf clock. It's also not a bus clock.
>>>>> Leaving it always running makes the whole PERIC0 CMU clocked, which
>>>>> usually should be avoided. Is it possible that the system freezes
>>>>> because some other clock (which depends on peric0_ip) gets disabled as
>>>>> a consequence of disabling peric0_ip? Maybe it's some leaf clock which
>>>>> is not implemented yet in the clock driver? Just looks weird to me
>>>>> that the system hangs because of CMU IP clock disablement. It's
>>>>> usually something much more specific.
>>>>
>>>> The system hang happened when I tested USI8 in I2C configuration with an
>>>> eeprom. After the eeprom is read the leaf gate clock that gets disabled
>>>> is the one on PERIC0 (CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK). I assume
>>>> this leads to the CMU_TOP gate (CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP)
>>>> disablement which makes the system hang. Either marking the CMU_TOP gate
>>>> clock as critical (as I did in this patch) or marking the leaf PERIC0
>>>> gate clock as critical, gets rid of the system hang. Did I choose wrong?
>>>>
>>>
>>> Did you already implement 100% of clocks in CMU_PERIC0? If no, there
>>
>> yes.

I checked again all the clocks. I implemented all but one, the one
defined by the CLK_CON_BUF_CLKBUF_PERIC0_IP register. Unfortunately I
don't have any reference on how it should be defined so I won't touch it
yet. But I have some good news too, see below.

> 
> Ok. Are there any other CMUs (perhaps not implemented yet) which
> consume clocks from CMU_PERIC0, specifically PERIC0_IP clock or some
> clocks derived from it? If so, is there a chance some particular leaf
> clock in those CMUs actually renders the system frozen when disabled
> as a consequence of disabling PERIC0_IP, and would explain better why
> the freeze happens?
> 
> For now I think it's ok to have that CLK_IS_CRITICAL flag here,
> because as you said you implemented all clocks in this CMU and neither
> of those looks like a critical one. But I'd advice to add a TODO
> comment saying it's probably a temporary solution before actual leaf
> clock which leads to freeze is identified (which probably resides in
> some other not implemented yet CMU).
> 
>>
>>> is a chance some other leaf clock (which is not implemented yet in
>>> your driver) gets disabled as a result of PERIC0_IP disablement, which
>>> might actually lead to that hang you observe. Usually it's some
>>> meaningful leaf clock, e.g. GIC or interconnect clocks. Please check
>>> clk-exynos850.c driver for CLK_IS_CRITICAL and CLK_IGNORE_UNUSED flags
>>> and the corresponding comments I left there, maybe it'll give you more
>>> particular idea about what to look for. Yes, making the whole CMU
>>> always running without understanding why (i.e. because of which
>>> particular leaf clock) might not be the best way of handling this
>>
>> because of CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK
> 
> That's not a root cause here. And I think PERIC0_IP is neither.
> 

you were right!
>>
>>> issue. I might be mistaken, but at least please check if you
>>> implemented all clocks for PERIC0 first and if making some meaningful
>>> leaf clock critical makes more sense.
>>>

I determined which leaf clocks shall be marked as critical. I enabled
the debugfs clock write access. Then I made sure that the parents of the
PERIC0 CMU have at least one user so that they don't get disabled after
an enable-disable sequence on a leaf clock. The I took all the PERIC0
gate clocks and enabled and disabled them one by one. Whichever hang the
system when the clock was disabled was marked as critical. The list of
critical leaf clocks is as following:

"gout_peric0_peric0_cmu_peric0_pclk",
"gout_peric0_lhm_axi_p_peric0_i_clk",
"gout_peric0_peric0_top1_ipclk_0",
"gout_peric0_peric0_top1_pclk_0".

I'll update v2 with this instead. Thanks for the help, Sam!
Cheers,
ta
Sam Protsenko Dec. 19, 2023, 5:31 p.m. UTC | #2
On Tue, Dec 19, 2023 at 10:47 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>
> Hi, Sam!
>
> On 12/14/23 16:43, Sam Protsenko wrote:
> > On Thu, Dec 14, 2023 at 10:15 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>
> >>
> >>
> >> On 12/14/23 16:09, Sam Protsenko wrote:
> >>> On Thu, Dec 14, 2023 at 10:01 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/14/23 15:37, Sam Protsenko wrote:
> >>>>> On Thu, Dec 14, 2023 at 4:52 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>>>>
> >>>>>> Testing USI8 I2C with an eeprom revealed that when the USI8 leaf clock
> >>>>>> is disabled it leads to the CMU_TOP PERIC0 IP gate clock disablement,
> >>>>>> which then makes the system hang. To prevent this, mark
> >>>>>> CLK_GOUT_CMU_PERIC0_IP as critical. Other clocks will be marked
> >>>>>> accordingly when tested.
> >>>>>>
> >>>>>> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
> >>>>>> ---
> >>>>>>  drivers/clk/samsung/clk-gs101.c | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
> >>>>>> index 3d194520b05e..08d80fca9cd6 100644
> >>>>>> --- a/drivers/clk/samsung/clk-gs101.c
> >>>>>> +++ b/drivers/clk/samsung/clk-gs101.c
> >>>>>> @@ -1402,7 +1402,7 @@ static const struct samsung_gate_clock cmu_top_gate_clks[] __initconst = {
> >>>>>>              "mout_cmu_peric0_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC0_BUS,
> >>>>>>              21, 0, 0),
> >>>>>>         GATE(CLK_GOUT_CMU_PERIC0_IP, "gout_cmu_peric0_ip", "mout_cmu_peric0_ip",
> >>>>>> -            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, 0, 0),
> >>>>>> +            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, CLK_IS_CRITICAL, 0),
> >>>>>
> >>>>> This clock doesn't seem like a leaf clock. It's also not a bus clock.
> >>>>> Leaving it always running makes the whole PERIC0 CMU clocked, which
> >>>>> usually should be avoided. Is it possible that the system freezes
> >>>>> because some other clock (which depends on peric0_ip) gets disabled as
> >>>>> a consequence of disabling peric0_ip? Maybe it's some leaf clock which
> >>>>> is not implemented yet in the clock driver? Just looks weird to me
> >>>>> that the system hangs because of CMU IP clock disablement. It's
> >>>>> usually something much more specific.
> >>>>
> >>>> The system hang happened when I tested USI8 in I2C configuration with an
> >>>> eeprom. After the eeprom is read the leaf gate clock that gets disabled
> >>>> is the one on PERIC0 (CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK). I assume
> >>>> this leads to the CMU_TOP gate (CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP)
> >>>> disablement which makes the system hang. Either marking the CMU_TOP gate
> >>>> clock as critical (as I did in this patch) or marking the leaf PERIC0
> >>>> gate clock as critical, gets rid of the system hang. Did I choose wrong?
> >>>>
> >>>
> >>> Did you already implement 100% of clocks in CMU_PERIC0? If no, there
> >>
> >> yes.
>
> I checked again all the clocks. I implemented all but one, the one
> defined by the CLK_CON_BUF_CLKBUF_PERIC0_IP register. Unfortunately I
> don't have any reference on how it should be defined so I won't touch it
> yet. But I have some good news too, see below.
>
> >
> > Ok. Are there any other CMUs (perhaps not implemented yet) which
> > consume clocks from CMU_PERIC0, specifically PERIC0_IP clock or some
> > clocks derived from it? If so, is there a chance some particular leaf
> > clock in those CMUs actually renders the system frozen when disabled
> > as a consequence of disabling PERIC0_IP, and would explain better why
> > the freeze happens?
> >
> > For now I think it's ok to have that CLK_IS_CRITICAL flag here,
> > because as you said you implemented all clocks in this CMU and neither
> > of those looks like a critical one. But I'd advice to add a TODO
> > comment saying it's probably a temporary solution before actual leaf
> > clock which leads to freeze is identified (which probably resides in
> > some other not implemented yet CMU).
> >
> >>
> >>> is a chance some other leaf clock (which is not implemented yet in
> >>> your driver) gets disabled as a result of PERIC0_IP disablement, which
> >>> might actually lead to that hang you observe. Usually it's some
> >>> meaningful leaf clock, e.g. GIC or interconnect clocks. Please check
> >>> clk-exynos850.c driver for CLK_IS_CRITICAL and CLK_IGNORE_UNUSED flags
> >>> and the corresponding comments I left there, maybe it'll give you more
> >>> particular idea about what to look for. Yes, making the whole CMU
> >>> always running without understanding why (i.e. because of which
> >>> particular leaf clock) might not be the best way of handling this
> >>
> >> because of CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK
> >
> > That's not a root cause here. And I think PERIC0_IP is neither.
> >
>
> you were right!
> >>
> >>> issue. I might be mistaken, but at least please check if you
> >>> implemented all clocks for PERIC0 first and if making some meaningful
> >>> leaf clock critical makes more sense.
> >>>
>
> I determined which leaf clocks shall be marked as critical. I enabled
> the debugfs clock write access. Then I made sure that the parents of the
> PERIC0 CMU have at least one user so that they don't get disabled after
> an enable-disable sequence on a leaf clock. The I took all the PERIC0
> gate clocks and enabled and disabled them one by one. Whichever hang the
> system when the clock was disabled was marked as critical. The list of
> critical leaf clocks is as following:
>

Nice! I used somehow similar procedure for clk-exynos850, doing
basically the same thing, but in core clock driver code.

> "gout_peric0_peric0_cmu_peric0_pclk",
> "gout_peric0_lhm_axi_p_peric0_i_clk",
> "gout_peric0_peric0_top1_ipclk_0",
> "gout_peric0_peric0_top1_pclk_0".
>
> I'll update v2 with this instead. Thanks for the help, Sam!

Glad you weren't discouraged by my meticulousness :) In clk-exynos850
I usually used CLK_IGNORE_UNUSED for clocks like XXX_CMU_XXX (in your
case it's PERIC0_CMU_PERIC0), with a corresponding comment. Those
clocks usually can be used to disable the bus clock for corresponding
CMU IP-core (in your case CMU_PERIC0), which makes it impossible to
access the registers from that CMU block, as its register interface is
not clocked anymore. Guess I saw something similar in Exynos5433 or
Exynos7 clk drivers, or maybe Sylwester or Krzysztof told me to do so
-- don't really remember. For AXI clock it also seems logical to keep
it running (AXI bus might be used for GIC and memory). But again,
maybe CLK_IGNORE_UNUSED flag would be more appropriate that
CLK_IS_CRITICAL? For the last two clocks -- it's hard to tell what
exactly they do. Is TOP1 some other CMU or block name, and is there
any further users for those clocks?

Anyways, if you are working on v2, please consider doing next two
things while at it:

  1. For each critical clock: add corresponding comment explaining why
it's marked so
  2. Consider using CLK_IGNORE_UNUSED instead of CLK_IS_CRITICAL when
appropriate; both have their use in different cases

Btw, if you check other Exynos clk drivers, there is a lot of examples
for flags like those.

> Cheers,
> ta
Tudor Ambarus Dec. 20, 2023, 2:22 p.m. UTC | #3
Hi, Sam!

On 12/19/23 17:31, Sam Protsenko wrote:
> On Tue, Dec 19, 2023 at 10:47 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>
>> Hi, Sam!
>>
>> On 12/14/23 16:43, Sam Protsenko wrote:
>>> On Thu, Dec 14, 2023 at 10:15 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 12/14/23 16:09, Sam Protsenko wrote:
>>>>> On Thu, Dec 14, 2023 at 10:01 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/14/23 15:37, Sam Protsenko wrote:
>>>>>>> On Thu, Dec 14, 2023 at 4:52 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>>>>>>>>
>>>>>>>> Testing USI8 I2C with an eeprom revealed that when the USI8 leaf clock
>>>>>>>> is disabled it leads to the CMU_TOP PERIC0 IP gate clock disablement,
>>>>>>>> which then makes the system hang. To prevent this, mark
>>>>>>>> CLK_GOUT_CMU_PERIC0_IP as critical. Other clocks will be marked
>>>>>>>> accordingly when tested.
>>>>>>>>
>>>>>>>> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
>>>>>>>> ---
>>>>>>>>  drivers/clk/samsung/clk-gs101.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
>>>>>>>> index 3d194520b05e..08d80fca9cd6 100644
>>>>>>>> --- a/drivers/clk/samsung/clk-gs101.c
>>>>>>>> +++ b/drivers/clk/samsung/clk-gs101.c
>>>>>>>> @@ -1402,7 +1402,7 @@ static const struct samsung_gate_clock cmu_top_gate_clks[] __initconst = {
>>>>>>>>              "mout_cmu_peric0_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC0_BUS,
>>>>>>>>              21, 0, 0),
>>>>>>>>         GATE(CLK_GOUT_CMU_PERIC0_IP, "gout_cmu_peric0_ip", "mout_cmu_peric0_ip",
>>>>>>>> -            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, 0, 0),
>>>>>>>> +            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, CLK_IS_CRITICAL, 0),
>>>>>>>
>>>>>>> This clock doesn't seem like a leaf clock. It's also not a bus clock.
>>>>>>> Leaving it always running makes the whole PERIC0 CMU clocked, which
>>>>>>> usually should be avoided. Is it possible that the system freezes
>>>>>>> because some other clock (which depends on peric0_ip) gets disabled as
>>>>>>> a consequence of disabling peric0_ip? Maybe it's some leaf clock which
>>>>>>> is not implemented yet in the clock driver? Just looks weird to me
>>>>>>> that the system hangs because of CMU IP clock disablement. It's
>>>>>>> usually something much more specific.
>>>>>>
>>>>>> The system hang happened when I tested USI8 in I2C configuration with an
>>>>>> eeprom. After the eeprom is read the leaf gate clock that gets disabled
>>>>>> is the one on PERIC0 (CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK). I assume
>>>>>> this leads to the CMU_TOP gate (CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP)
>>>>>> disablement which makes the system hang. Either marking the CMU_TOP gate
>>>>>> clock as critical (as I did in this patch) or marking the leaf PERIC0
>>>>>> gate clock as critical, gets rid of the system hang. Did I choose wrong?
>>>>>>
>>>>>
>>>>> Did you already implement 100% of clocks in CMU_PERIC0? If no, there
>>>>
>>>> yes.
>>
>> I checked again all the clocks. I implemented all but one, the one
>> defined by the CLK_CON_BUF_CLKBUF_PERIC0_IP register. Unfortunately I
>> don't have any reference on how it should be defined so I won't touch it
>> yet. But I have some good news too, see below.
>>
>>>
>>> Ok. Are there any other CMUs (perhaps not implemented yet) which
>>> consume clocks from CMU_PERIC0, specifically PERIC0_IP clock or some
>>> clocks derived from it? If so, is there a chance some particular leaf
>>> clock in those CMUs actually renders the system frozen when disabled
>>> as a consequence of disabling PERIC0_IP, and would explain better why
>>> the freeze happens?
>>>
>>> For now I think it's ok to have that CLK_IS_CRITICAL flag here,
>>> because as you said you implemented all clocks in this CMU and neither
>>> of those looks like a critical one. But I'd advice to add a TODO
>>> comment saying it's probably a temporary solution before actual leaf
>>> clock which leads to freeze is identified (which probably resides in
>>> some other not implemented yet CMU).
>>>
>>>>
>>>>> is a chance some other leaf clock (which is not implemented yet in
>>>>> your driver) gets disabled as a result of PERIC0_IP disablement, which
>>>>> might actually lead to that hang you observe. Usually it's some
>>>>> meaningful leaf clock, e.g. GIC or interconnect clocks. Please check
>>>>> clk-exynos850.c driver for CLK_IS_CRITICAL and CLK_IGNORE_UNUSED flags
>>>>> and the corresponding comments I left there, maybe it'll give you more
>>>>> particular idea about what to look for. Yes, making the whole CMU
>>>>> always running without understanding why (i.e. because of which
>>>>> particular leaf clock) might not be the best way of handling this
>>>>
>>>> because of CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK
>>>
>>> That's not a root cause here. And I think PERIC0_IP is neither.
>>>
>>
>> you were right!
>>>>
>>>>> issue. I might be mistaken, but at least please check if you
>>>>> implemented all clocks for PERIC0 first and if making some meaningful
>>>>> leaf clock critical makes more sense.
>>>>>
>>
>> I determined which leaf clocks shall be marked as critical. I enabled
>> the debugfs clock write access. Then I made sure that the parents of the
>> PERIC0 CMU have at least one user so that they don't get disabled after
>> an enable-disable sequence on a leaf clock. The I took all the PERIC0
>> gate clocks and enabled and disabled them one by one. Whichever hang the
>> system when the clock was disabled was marked as critical. The list of
>> critical leaf clocks is as following:
>>
> 
> Nice! I used somehow similar procedure for clk-exynos850, doing
> basically the same thing, but in core clock driver code.
> 
>> "gout_peric0_peric0_cmu_peric0_pclk",
>> "gout_peric0_lhm_axi_p_peric0_i_clk",
>> "gout_peric0_peric0_top1_ipclk_0",
>> "gout_peric0_peric0_top1_pclk_0".
>>
>> I'll update v2 with this instead. Thanks for the help, Sam!
> 
> Glad you weren't discouraged by my meticulousness :) In clk-exynos850
> I usually used CLK_IGNORE_UNUSED for clocks like XXX_CMU_XXX (in your
> case it's PERIC0_CMU_PERIC0), with a corresponding comment. Those
> clocks usually can be used to disable the bus clock for corresponding
> CMU IP-core (in your case CMU_PERIC0), which makes it impossible to
> access the registers from that CMU block, as its register interface is
> not clocked anymore. Guess I saw something similar in Exynos5433 or
> Exynos7 clk drivers, or maybe Sylwester or Krzysztof told me to do so
> -- don't really remember. For AXI clock it also seems logical to keep
> it running (AXI bus might be used for GIC and memory). But again,
> maybe CLK_IGNORE_UNUSED flag would be more appropriate that
> CLK_IS_CRITICAL? For the last two clocks -- it's hard to tell what
> exactly they do. Is TOP1 some other CMU or block name, and is there
> any further users for those clocks?
> 
> Anyways, if you are working on v2, please consider doing next two
> things while at it:
> 
>   1. For each critical clock: add corresponding comment explaining why
> it's marked so

Will do.

>   2. Consider using CLK_IGNORE_UNUSED instead of CLK_IS_CRITICAL when
> appropriate; both have their use in different cases
> 
> Btw, if you check other Exynos clk drivers, there is a lot of examples
> for flags like those.
> 
Thanks for the feedback, it's educative.

I played a little with the clk debugfs and I think all should be marked
as critical. What I did was to make sure that their parents are enabled
already and then I enabled and disabled each. Each time I disabled one
of them the system hung. Thus in case they will be used, if one disable
them on an error path, it will hang the system. We can't disable them at
suspend either. Thus I propose to keep them as critical.

Thanks!
ta
Sam Protsenko Dec. 20, 2023, 3:12 p.m. UTC | #4
On Wed, Dec 20, 2023 at 8:22 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
>
> Hi, Sam!
>
> On 12/19/23 17:31, Sam Protsenko wrote:
> > On Tue, Dec 19, 2023 at 10:47 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>
> >> Hi, Sam!
> >>
> >> On 12/14/23 16:43, Sam Protsenko wrote:
> >>> On Thu, Dec 14, 2023 at 10:15 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/14/23 16:09, Sam Protsenko wrote:
> >>>>> On Thu, Dec 14, 2023 at 10:01 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 12/14/23 15:37, Sam Protsenko wrote:
> >>>>>>> On Thu, Dec 14, 2023 at 4:52 AM Tudor Ambarus <tudor.ambarus@linaro.org> wrote:
> >>>>>>>>
> >>>>>>>> Testing USI8 I2C with an eeprom revealed that when the USI8 leaf clock
> >>>>>>>> is disabled it leads to the CMU_TOP PERIC0 IP gate clock disablement,
> >>>>>>>> which then makes the system hang. To prevent this, mark
> >>>>>>>> CLK_GOUT_CMU_PERIC0_IP as critical. Other clocks will be marked
> >>>>>>>> accordingly when tested.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
> >>>>>>>> ---
> >>>>>>>>  drivers/clk/samsung/clk-gs101.c | 2 +-
> >>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
> >>>>>>>> index 3d194520b05e..08d80fca9cd6 100644
> >>>>>>>> --- a/drivers/clk/samsung/clk-gs101.c
> >>>>>>>> +++ b/drivers/clk/samsung/clk-gs101.c
> >>>>>>>> @@ -1402,7 +1402,7 @@ static const struct samsung_gate_clock cmu_top_gate_clks[] __initconst = {
> >>>>>>>>              "mout_cmu_peric0_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC0_BUS,
> >>>>>>>>              21, 0, 0),
> >>>>>>>>         GATE(CLK_GOUT_CMU_PERIC0_IP, "gout_cmu_peric0_ip", "mout_cmu_peric0_ip",
> >>>>>>>> -            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, 0, 0),
> >>>>>>>> +            CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, CLK_IS_CRITICAL, 0),
> >>>>>>>
> >>>>>>> This clock doesn't seem like a leaf clock. It's also not a bus clock.
> >>>>>>> Leaving it always running makes the whole PERIC0 CMU clocked, which
> >>>>>>> usually should be avoided. Is it possible that the system freezes
> >>>>>>> because some other clock (which depends on peric0_ip) gets disabled as
> >>>>>>> a consequence of disabling peric0_ip? Maybe it's some leaf clock which
> >>>>>>> is not implemented yet in the clock driver? Just looks weird to me
> >>>>>>> that the system hangs because of CMU IP clock disablement. It's
> >>>>>>> usually something much more specific.
> >>>>>>
> >>>>>> The system hang happened when I tested USI8 in I2C configuration with an
> >>>>>> eeprom. After the eeprom is read the leaf gate clock that gets disabled
> >>>>>> is the one on PERIC0 (CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK). I assume
> >>>>>> this leads to the CMU_TOP gate (CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP)
> >>>>>> disablement which makes the system hang. Either marking the CMU_TOP gate
> >>>>>> clock as critical (as I did in this patch) or marking the leaf PERIC0
> >>>>>> gate clock as critical, gets rid of the system hang. Did I choose wrong?
> >>>>>>
> >>>>>
> >>>>> Did you already implement 100% of clocks in CMU_PERIC0? If no, there
> >>>>
> >>>> yes.
> >>
> >> I checked again all the clocks. I implemented all but one, the one
> >> defined by the CLK_CON_BUF_CLKBUF_PERIC0_IP register. Unfortunately I
> >> don't have any reference on how it should be defined so I won't touch it
> >> yet. But I have some good news too, see below.
> >>
> >>>
> >>> Ok. Are there any other CMUs (perhaps not implemented yet) which
> >>> consume clocks from CMU_PERIC0, specifically PERIC0_IP clock or some
> >>> clocks derived from it? If so, is there a chance some particular leaf
> >>> clock in those CMUs actually renders the system frozen when disabled
> >>> as a consequence of disabling PERIC0_IP, and would explain better why
> >>> the freeze happens?
> >>>
> >>> For now I think it's ok to have that CLK_IS_CRITICAL flag here,
> >>> because as you said you implemented all clocks in this CMU and neither
> >>> of those looks like a critical one. But I'd advice to add a TODO
> >>> comment saying it's probably a temporary solution before actual leaf
> >>> clock which leads to freeze is identified (which probably resides in
> >>> some other not implemented yet CMU).
> >>>
> >>>>
> >>>>> is a chance some other leaf clock (which is not implemented yet in
> >>>>> your driver) gets disabled as a result of PERIC0_IP disablement, which
> >>>>> might actually lead to that hang you observe. Usually it's some
> >>>>> meaningful leaf clock, e.g. GIC or interconnect clocks. Please check
> >>>>> clk-exynos850.c driver for CLK_IS_CRITICAL and CLK_IGNORE_UNUSED flags
> >>>>> and the corresponding comments I left there, maybe it'll give you more
> >>>>> particular idea about what to look for. Yes, making the whole CMU
> >>>>> always running without understanding why (i.e. because of which
> >>>>> particular leaf clock) might not be the best way of handling this
> >>>>
> >>>> because of CLK_GOUT_PERIC0_CLK_PERIC0_USI8_USI_CLK
> >>>
> >>> That's not a root cause here. And I think PERIC0_IP is neither.
> >>>
> >>
> >> you were right!
> >>>>
> >>>>> issue. I might be mistaken, but at least please check if you
> >>>>> implemented all clocks for PERIC0 first and if making some meaningful
> >>>>> leaf clock critical makes more sense.
> >>>>>
> >>
> >> I determined which leaf clocks shall be marked as critical. I enabled
> >> the debugfs clock write access. Then I made sure that the parents of the
> >> PERIC0 CMU have at least one user so that they don't get disabled after
> >> an enable-disable sequence on a leaf clock. The I took all the PERIC0
> >> gate clocks and enabled and disabled them one by one. Whichever hang the
> >> system when the clock was disabled was marked as critical. The list of
> >> critical leaf clocks is as following:
> >>
> >
> > Nice! I used somehow similar procedure for clk-exynos850, doing
> > basically the same thing, but in core clock driver code.
> >
> >> "gout_peric0_peric0_cmu_peric0_pclk",
> >> "gout_peric0_lhm_axi_p_peric0_i_clk",
> >> "gout_peric0_peric0_top1_ipclk_0",
> >> "gout_peric0_peric0_top1_pclk_0".
> >>
> >> I'll update v2 with this instead. Thanks for the help, Sam!
> >
> > Glad you weren't discouraged by my meticulousness :) In clk-exynos850
> > I usually used CLK_IGNORE_UNUSED for clocks like XXX_CMU_XXX (in your
> > case it's PERIC0_CMU_PERIC0), with a corresponding comment. Those
> > clocks usually can be used to disable the bus clock for corresponding
> > CMU IP-core (in your case CMU_PERIC0), which makes it impossible to
> > access the registers from that CMU block, as its register interface is
> > not clocked anymore. Guess I saw something similar in Exynos5433 or
> > Exynos7 clk drivers, or maybe Sylwester or Krzysztof told me to do so
> > -- don't really remember. For AXI clock it also seems logical to keep
> > it running (AXI bus might be used for GIC and memory). But again,
> > maybe CLK_IGNORE_UNUSED flag would be more appropriate that
> > CLK_IS_CRITICAL? For the last two clocks -- it's hard to tell what
> > exactly they do. Is TOP1 some other CMU or block name, and is there
> > any further users for those clocks?
> >
> > Anyways, if you are working on v2, please consider doing next two
> > things while at it:
> >
> >   1. For each critical clock: add corresponding comment explaining why
> > it's marked so
>
> Will do.
>
> >   2. Consider using CLK_IGNORE_UNUSED instead of CLK_IS_CRITICAL when
> > appropriate; both have their use in different cases
> >
> > Btw, if you check other Exynos clk drivers, there is a lot of examples
> > for flags like those.
> >
> Thanks for the feedback, it's educative.
>
> I played a little with the clk debugfs and I think all should be marked
> as critical. What I did was to make sure that their parents are enabled
> already and then I enabled and disabled each. Each time I disabled one
> of them the system hung. Thus in case they will be used, if one disable
> them on an error path, it will hang the system. We can't disable them at
> suspend either. Thus I propose to keep them as critical.
>

Do you see those clocks potentially used by some actual consumers in
future? If no, maybe CLK_IGNORE_UNUSED is enough (just to make sure
the core clock framework won't disable those during the clocks
initialization)? Anyway, I don't have any strong preferences in this
case. If you think CLK_IS_CRITICAL is better in this case, I'd say go
for it.

Also, on a bit different note: please make sure there is no
"clk_ignore_unused" param in your kernel cmdline (e.g. passed from the
bootloader via dts). The clock driver should be functional without
that param. Though it might take some additional work.

> Thanks!
> ta
diff mbox series

Patch

diff --git a/drivers/clk/samsung/clk-gs101.c b/drivers/clk/samsung/clk-gs101.c
index 3d194520b05e..08d80fca9cd6 100644
--- a/drivers/clk/samsung/clk-gs101.c
+++ b/drivers/clk/samsung/clk-gs101.c
@@ -1402,7 +1402,7 @@  static const struct samsung_gate_clock cmu_top_gate_clks[] __initconst = {
 	     "mout_cmu_peric0_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC0_BUS,
 	     21, 0, 0),
 	GATE(CLK_GOUT_CMU_PERIC0_IP, "gout_cmu_peric0_ip", "mout_cmu_peric0_ip",
-	     CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, 0, 0),
+	     CLK_CON_GAT_GATE_CLKCMU_PERIC0_IP, 21, CLK_IS_CRITICAL, 0),
 	GATE(CLK_GOUT_CMU_PERIC1_BUS, "gout_cmu_peric1_bus",
 	     "mout_cmu_peric1_bus", CLK_CON_GAT_GATE_CLKCMU_PERIC1_BUS,
 	     21, 0, 0),