clk: samsung: exynos5420: Keep top G3D clocks enabled

Message ID 20191121101145.15899-1-m.szyprowski@samsung.com
State New
Headers show
Series
  • clk: samsung: exynos5420: Keep top G3D clocks enabled
Related show

Commit Message

Marek Szyprowski Nov. 21, 2019, 10:11 a.m.
All top clocks on G3D path has to be enabled all the time to allow proper
G3D power domain operation. This is achieved by adding CRITICAL flag to
"mout_sw_aclk_g3d" clock, what keeps this clock and all its parents
enabled.

This fixes following imprecise abort issue observed on Odroid XU3/XU4
after enabling Panfrost driver by commit 1a5a85c56402 "ARM: dts: exynos:
Add Mali/GPU node on Exynos5420 and enable it on Odroid XU3/4"):

panfrost 11800000.gpu: clock rate = 400000000
panfrost 11800000.gpu: failed to get regulator: -517
panfrost 11800000.gpu: regulator init failed -517
Power domain G3D disable failed
...
panfrost 11800000.gpu: clock rate = 400000000
8<--- cut here ---
Unhandled fault: imprecise external abort (0x1406) at 0x00000000
pgd = (ptrval)
[00000000] *pgd=00000000
Internal error: : 1406 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 7 PID: 53 Comm: kworker/7:1 Not tainted 5.4.0-rc8-next-20191119-00032-g56f1001191a6 #6923
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: events deferred_probe_work_func
PC is at panfrost_gpu_soft_reset+0x94/0x110
LR is at ___might_sleep+0x128/0x2dc
...
[<c05c231c>] (panfrost_gpu_soft_reset) from [<c05c2704>] (panfrost_gpu_init+0x10/0x67c)
[<c05c2704>] (panfrost_gpu_init) from [<c05c15d0>] (panfrost_device_init+0x158/0x2cc)
[<c05c15d0>] (panfrost_device_init) from [<c05c0cb0>] (panfrost_probe+0x80/0x178)
[<c05c0cb0>] (panfrost_probe) from [<c05cfaa0>] (platform_drv_probe+0x48/0x9c)
[<c05cfaa0>] (platform_drv_probe) from [<c05cd20c>] (really_probe+0x1c4/0x474)
[<c05cd20c>] (really_probe) from [<c05cd694>] (driver_probe_device+0x78/0x1bc)
[<c05cd694>] (driver_probe_device) from [<c05cb374>] (bus_for_each_drv+0x74/0xb8)
[<c05cb374>] (bus_for_each_drv) from [<c05ccfa8>] (__device_attach+0xd4/0x16c)
[<c05ccfa8>] (__device_attach) from [<c05cc110>] (bus_probe_device+0x88/0x90)
[<c05cc110>] (bus_probe_device) from [<c05cc634>] (deferred_probe_work_func+0x4c/0xd0)
[<c05cc634>] (deferred_probe_work_func) from [<c0149df0>] (process_one_work+0x300/0x864)
[<c0149df0>] (process_one_work) from [<c014a3ac>] (worker_thread+0x58/0x5a0)
[<c014a3ac>] (worker_thread) from [<c0151174>] (kthread+0x12c/0x160)
[<c0151174>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xee03dfb0 to 0xee03dff8)
...
Code: e594300c e5933020 e3130c01 1a00000f (ebefff50).
---[ end trace badde2b74a65a540 ]---

In the above case, the Panfrost driver disables G3D clocks after failure
of getting the needed regulator and return with -EPROVE_DEFER code. This
causes G3D power domain disable failure and then, during second probe
an imprecise abort is triggered due to undefined power domain state.

Fixes: 45f10dabb56b ("clk: samsung: exynos5420: Add SET_RATE_PARENT flag to clocks on G3D path")
Fixes: c9f7567aff31 ("clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

---
 drivers/clk/samsung/clk-exynos5420.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.17.1

Comments

Chanwoo Choi Nov. 22, 2019, 1:43 a.m. | #1
Hi Marek,

On 11/21/19 7:11 PM, Marek Szyprowski wrote:
> All top clocks on G3D path has to be enabled all the time to allow proper

> G3D power domain operation. This is achieved by adding CRITICAL flag to

> "mout_sw_aclk_g3d" clock, what keeps this clock and all its parents

> enabled.

> 

> This fixes following imprecise abort issue observed on Odroid XU3/XU4

> after enabling Panfrost driver by commit 1a5a85c56402 "ARM: dts: exynos:

> Add Mali/GPU node on Exynos5420 and enable it on Odroid XU3/4"):

> 

> panfrost 11800000.gpu: clock rate = 400000000

> panfrost 11800000.gpu: failed to get regulator: -517

> panfrost 11800000.gpu: regulator init failed -517

> Power domain G3D disable failed

> ...

> panfrost 11800000.gpu: clock rate = 400000000

> 8<--- cut here ---

> Unhandled fault: imprecise external abort (0x1406) at 0x00000000

> pgd = (ptrval)

> [00000000] *pgd=00000000

> Internal error: : 1406 [#1] PREEMPT SMP ARM

> Modules linked in:

> CPU: 7 PID: 53 Comm: kworker/7:1 Not tainted 5.4.0-rc8-next-20191119-00032-g56f1001191a6 #6923

> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)

> Workqueue: events deferred_probe_work_func

> PC is at panfrost_gpu_soft_reset+0x94/0x110

> LR is at ___might_sleep+0x128/0x2dc

> ...

> [<c05c231c>] (panfrost_gpu_soft_reset) from [<c05c2704>] (panfrost_gpu_init+0x10/0x67c)

> [<c05c2704>] (panfrost_gpu_init) from [<c05c15d0>] (panfrost_device_init+0x158/0x2cc)

> [<c05c15d0>] (panfrost_device_init) from [<c05c0cb0>] (panfrost_probe+0x80/0x178)

> [<c05c0cb0>] (panfrost_probe) from [<c05cfaa0>] (platform_drv_probe+0x48/0x9c)

> [<c05cfaa0>] (platform_drv_probe) from [<c05cd20c>] (really_probe+0x1c4/0x474)

> [<c05cd20c>] (really_probe) from [<c05cd694>] (driver_probe_device+0x78/0x1bc)

> [<c05cd694>] (driver_probe_device) from [<c05cb374>] (bus_for_each_drv+0x74/0xb8)

> [<c05cb374>] (bus_for_each_drv) from [<c05ccfa8>] (__device_attach+0xd4/0x16c)

> [<c05ccfa8>] (__device_attach) from [<c05cc110>] (bus_probe_device+0x88/0x90)

> [<c05cc110>] (bus_probe_device) from [<c05cc634>] (deferred_probe_work_func+0x4c/0xd0)

> [<c05cc634>] (deferred_probe_work_func) from [<c0149df0>] (process_one_work+0x300/0x864)

> [<c0149df0>] (process_one_work) from [<c014a3ac>] (worker_thread+0x58/0x5a0)

> [<c014a3ac>] (worker_thread) from [<c0151174>] (kthread+0x12c/0x160)

> [<c0151174>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)

> Exception stack(0xee03dfb0 to 0xee03dff8)

> ...

> Code: e594300c e5933020 e3130c01 1a00000f (ebefff50).

> ---[ end trace badde2b74a65a540 ]---

> 

> In the above case, the Panfrost driver disables G3D clocks after failure

> of getting the needed regulator and return with -EPROVE_DEFER code. This

> causes G3D power domain disable failure and then, during second probe

> an imprecise abort is triggered due to undefined power domain state.

> 

> Fixes: 45f10dabb56b ("clk: samsung: exynos5420: Add SET_RATE_PARENT flag to clocks on G3D path")

> Fixes: c9f7567aff31 ("clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU")

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---

>  drivers/clk/samsung/clk-exynos5420.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c

> index 3a991ca1ee36..89126ba66995 100644

> --- a/drivers/clk/samsung/clk-exynos5420.c

> +++ b/drivers/clk/samsung/clk-exynos5420.c

> @@ -712,7 +712,7 @@ static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = {

>  	MUX(0, "mout_sw_aclk266_g2d", mout_sw_aclk266_g2d_p,

>  			SRC_TOP12, 12, 1),

>  	MUX_F(0, "mout_sw_aclk_g3d", mout_sw_aclk_g3d_p, SRC_TOP12, 16, 1,

> -	      CLK_SET_RATE_PARENT, 0),

> +	      CLK_IS_CRITICAL | CLK_SET_RATE_PARENT, 0),

>  	MUX(0, "mout_sw_aclk300_jpeg", mout_sw_aclk300_jpeg_p,

>  			SRC_TOP12, 20, 1),

>  	MUX(CLK_MOUT_SW_ACLK300, "mout_sw_aclk300_disp1",

> 


Acked-by: Chanwoo Choi <cw00.choi@samsung.com>



-- 
Best Regards,
Chanwoo Choi
Samsung Electronics
Sylwester Nawrocki Dec. 12, 2019, 11:53 a.m. | #2
Cc: Stephen and Michael

On 11/21/19 11:11, Marek Szyprowski wrote:
> All top clocks on G3D path has to be enabled all the time to allow proper

> G3D power domain operation. This is achieved by adding CRITICAL flag to

> "mout_sw_aclk_g3d" clock, what keeps this clock and all its parents

> enabled.

> 

> This fixes following imprecise abort issue observed on Odroid XU3/XU4

> after enabling Panfrost driver by commit 1a5a85c56402 "ARM: dts: exynos:

> Add Mali/GPU node on Exynos5420 and enable it on Odroid XU3/4"):

> 

> panfrost 11800000.gpu: clock rate = 400000000

> panfrost 11800000.gpu: failed to get regulator: -517

> panfrost 11800000.gpu: regulator init failed -517

> Power domain G3D disable failed

> ...

> panfrost 11800000.gpu: clock rate = 400000000

> 8<--- cut here ---

> Unhandled fault: imprecise external abort (0x1406) at 0x00000000

> pgd = (ptrval)

> [00000000] *pgd=00000000

> Internal error: : 1406 [#1] PREEMPT SMP ARM

> Modules linked in:

> CPU: 7 PID: 53 Comm: kworker/7:1 Not tainted 5.4.0-rc8-next-20191119-00032-g56f1001191a6 #6923

> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)

> Workqueue: events deferred_probe_work_func

> PC is at panfrost_gpu_soft_reset+0x94/0x110

> LR is at ___might_sleep+0x128/0x2dc

> ...

> [<c05c231c>] (panfrost_gpu_soft_reset) from [<c05c2704>] (panfrost_gpu_init+0x10/0x67c)

> [<c05c2704>] (panfrost_gpu_init) from [<c05c15d0>] (panfrost_device_init+0x158/0x2cc)

> [<c05c15d0>] (panfrost_device_init) from [<c05c0cb0>] (panfrost_probe+0x80/0x178)

> [<c05c0cb0>] (panfrost_probe) from [<c05cfaa0>] (platform_drv_probe+0x48/0x9c)

> [<c05cfaa0>] (platform_drv_probe) from [<c05cd20c>] (really_probe+0x1c4/0x474)

> [<c05cd20c>] (really_probe) from [<c05cd694>] (driver_probe_device+0x78/0x1bc)

> [<c05cd694>] (driver_probe_device) from [<c05cb374>] (bus_for_each_drv+0x74/0xb8)

> [<c05cb374>] (bus_for_each_drv) from [<c05ccfa8>] (__device_attach+0xd4/0x16c)

> [<c05ccfa8>] (__device_attach) from [<c05cc110>] (bus_probe_device+0x88/0x90)

> [<c05cc110>] (bus_probe_device) from [<c05cc634>] (deferred_probe_work_func+0x4c/0xd0)

> [<c05cc634>] (deferred_probe_work_func) from [<c0149df0>] (process_one_work+0x300/0x864)

> [<c0149df0>] (process_one_work) from [<c014a3ac>] (worker_thread+0x58/0x5a0)

> [<c014a3ac>] (worker_thread) from [<c0151174>] (kthread+0x12c/0x160)

> [<c0151174>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)

> Exception stack(0xee03dfb0 to 0xee03dff8)

> ...

> Code: e594300c e5933020 e3130c01 1a00000f (ebefff50).

> ---[ end trace badde2b74a65a540 ]---

> 

> In the above case, the Panfrost driver disables G3D clocks after failure

> of getting the needed regulator and return with -EPROVE_DEFER code. This

> causes G3D power domain disable failure and then, during second probe

> an imprecise abort is triggered due to undefined power domain state.

> 

> Fixes: 45f10dabb56b ("clk: samsung: exynos5420: Add SET_RATE_PARENT flag to clocks on G3D path")

> Fixes: c9f7567aff31 ("clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU")

> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

> ---

>  drivers/clk/samsung/clk-exynos5420.c | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c

> index 3a991ca1ee36..89126ba66995 100644

> --- a/drivers/clk/samsung/clk-exynos5420.c

> +++ b/drivers/clk/samsung/clk-exynos5420.c

> @@ -712,7 +712,7 @@ static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = {

>  	MUX(0, "mout_sw_aclk266_g2d", mout_sw_aclk266_g2d_p,

>  			SRC_TOP12, 12, 1),

>  	MUX_F(0, "mout_sw_aclk_g3d", mout_sw_aclk_g3d_p, SRC_TOP12, 16, 1,

> -	      CLK_SET_RATE_PARENT, 0),

> +	      CLK_IS_CRITICAL | CLK_SET_RATE_PARENT, 0),

>  	MUX(0, "mout_sw_aclk300_jpeg", mout_sw_aclk300_jpeg_p,

  			SRC_TOP12, 20, 1),

Adding that flag to a mux clock doesn't look right, it feels we are not
addressing the issue properly and the root cause is not known.

AFAICS CLK_IS_CRITICAL flag will also not ensure required root clock up 
in the clk tree is enabled in case any reparenting is done after that mux 
clock has been registered.  The flag seems misused and the fix looks dubious 
and fragile to me.

I would apply that to fix crashes we are seeing in -next but we ned to come
up with a better solution.

Perhaps we could add a comment like:

"CLK_IS_CRITICAL flag is added to this clock as a workaround for imprecise 
external abort triggered by the Panfrost driver and will be removed once 
the issue is properly addressed."

Can you resend with Stephen and Mike at Cc and with that comment added?
Feel free to add 
Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>


Alternatively, we could add an explicit clk_prepare_enable() call at the end
of exynos5x_clk_init() function, after the clock provider is registered,
so any possible reparenting through "assigned-clock-parents" is also taken 
into account. 

-- 
Thanks,
Sylwester
Stephen Boyd Dec. 13, 2019, 3:03 a.m. | #3
Quoting Sylwester Nawrocki (2019-12-12 03:53:17)
> Cc: Stephen and Michael

> 

> On 11/21/19 11:11, Marek Szyprowski wrote:

> > All top clocks on G3D path has to be enabled all the time to allow proper

> > G3D power domain operation. This is achieved by adding CRITICAL flag to

> > "mout_sw_aclk_g3d" clock, what keeps this clock and all its parents

> > enabled.

> > 

> > This fixes following imprecise abort issue observed on Odroid XU3/XU4

> > after enabling Panfrost driver by commit 1a5a85c56402 "ARM: dts: exynos:

> > Add Mali/GPU node on Exynos5420 and enable it on Odroid XU3/4"):

> > 

> > panfrost 11800000.gpu: clock rate = 400000000

> > panfrost 11800000.gpu: failed to get regulator: -517

> > panfrost 11800000.gpu: regulator init failed -517

> > Power domain G3D disable failed

> > ...

> > panfrost 11800000.gpu: clock rate = 400000000

> > 8<--- cut here ---

> > Unhandled fault: imprecise external abort (0x1406) at 0x00000000

> > pgd = (ptrval)

> > [00000000] *pgd=00000000

> > Internal error: : 1406 [#1] PREEMPT SMP ARM

> > Modules linked in:

> > CPU: 7 PID: 53 Comm: kworker/7:1 Not tainted 5.4.0-rc8-next-20191119-00032-g56f1001191a6 #6923

> > Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)

> > Workqueue: events deferred_probe_work_func

> > PC is at panfrost_gpu_soft_reset+0x94/0x110

> > LR is at ___might_sleep+0x128/0x2dc

> > ...

> > [<c05c231c>] (panfrost_gpu_soft_reset) from [<c05c2704>] (panfrost_gpu_init+0x10/0x67c)

> > [<c05c2704>] (panfrost_gpu_init) from [<c05c15d0>] (panfrost_device_init+0x158/0x2cc)

> > [<c05c15d0>] (panfrost_device_init) from [<c05c0cb0>] (panfrost_probe+0x80/0x178)

> > [<c05c0cb0>] (panfrost_probe) from [<c05cfaa0>] (platform_drv_probe+0x48/0x9c)

> > [<c05cfaa0>] (platform_drv_probe) from [<c05cd20c>] (really_probe+0x1c4/0x474)

> > [<c05cd20c>] (really_probe) from [<c05cd694>] (driver_probe_device+0x78/0x1bc)

> > [<c05cd694>] (driver_probe_device) from [<c05cb374>] (bus_for_each_drv+0x74/0xb8)

> > [<c05cb374>] (bus_for_each_drv) from [<c05ccfa8>] (__device_attach+0xd4/0x16c)

> > [<c05ccfa8>] (__device_attach) from [<c05cc110>] (bus_probe_device+0x88/0x90)

> > [<c05cc110>] (bus_probe_device) from [<c05cc634>] (deferred_probe_work_func+0x4c/0xd0)

> > [<c05cc634>] (deferred_probe_work_func) from [<c0149df0>] (process_one_work+0x300/0x864)

> > [<c0149df0>] (process_one_work) from [<c014a3ac>] (worker_thread+0x58/0x5a0)

> > [<c014a3ac>] (worker_thread) from [<c0151174>] (kthread+0x12c/0x160)

> > [<c0151174>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)

> > Exception stack(0xee03dfb0 to 0xee03dff8)

> > ...

> > Code: e594300c e5933020 e3130c01 1a00000f (ebefff50).

> > ---[ end trace badde2b74a65a540 ]---

> > 

> > In the above case, the Panfrost driver disables G3D clocks after failure

> > of getting the needed regulator and return with -EPROVE_DEFER code. This

> > causes G3D power domain disable failure and then, during second probe

> > an imprecise abort is triggered due to undefined power domain state.

> > 

> > Fixes: 45f10dabb56b ("clk: samsung: exynos5420: Add SET_RATE_PARENT flag to clocks on G3D path")

> > Fixes: c9f7567aff31 ("clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU")

> > Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

> > ---

> >  drivers/clk/samsung/clk-exynos5420.c | 2 +-

> >  1 file changed, 1 insertion(+), 1 deletion(-)

> > 

> > diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c

> > index 3a991ca1ee36..89126ba66995 100644

> > --- a/drivers/clk/samsung/clk-exynos5420.c

> > +++ b/drivers/clk/samsung/clk-exynos5420.c

> > @@ -712,7 +712,7 @@ static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = {

> >       MUX(0, "mout_sw_aclk266_g2d", mout_sw_aclk266_g2d_p,

> >                       SRC_TOP12, 12, 1),

> >       MUX_F(0, "mout_sw_aclk_g3d", mout_sw_aclk_g3d_p, SRC_TOP12, 16, 1,

> > -           CLK_SET_RATE_PARENT, 0),

> > +           CLK_IS_CRITICAL | CLK_SET_RATE_PARENT, 0),

> >       MUX(0, "mout_sw_aclk300_jpeg", mout_sw_aclk300_jpeg_p,

>                         SRC_TOP12, 20, 1),

> 

> Adding that flag to a mux clock doesn't look right, it feels we are not

> addressing the issue properly and the root cause is not known.

> 

> AFAICS CLK_IS_CRITICAL flag will also not ensure required root clock up 

> in the clk tree is enabled in case any reparenting is done after that mux 

> clock has been registered.  The flag seems misused and the fix looks dubious 

> and fragile to me.

> 

> I would apply that to fix crashes we are seeing in -next but we ned to come

> up with a better solution.

> 

> Perhaps we could add a comment like:

> 

> "CLK_IS_CRITICAL flag is added to this clock as a workaround for imprecise 

> external abort triggered by the Panfrost driver and will be removed once 

> the issue is properly addressed."


You should always add a comment to CLK_IS_CRITICAL usage. This sort of
comment is not comforting though. When will the issue be properly
addressed? Maybe we should block panfrost from being enabled on this
platform until it is root caused?
Marek Szyprowski Dec. 13, 2019, 7:11 a.m. | #4
On 13.12.2019 04:03, Stephen Boyd wrote:
> Quoting Sylwester Nawrocki (2019-12-12 03:53:17)

>> Cc: Stephen and Michael

>>

>> On 11/21/19 11:11, Marek Szyprowski wrote:

>>> All top clocks on G3D path has to be enabled all the time to allow proper

>>> G3D power domain operation. This is achieved by adding CRITICAL flag to

>>> "mout_sw_aclk_g3d" clock, what keeps this clock and all its parents

>>> enabled.

>>>

>>> This fixes following imprecise abort issue observed on Odroid XU3/XU4

>>> after enabling Panfrost driver by commit 1a5a85c56402 "ARM: dts: exynos:

>>> Add Mali/GPU node on Exynos5420 and enable it on Odroid XU3/4"):

>>>

>>> panfrost 11800000.gpu: clock rate = 400000000

>>> panfrost 11800000.gpu: failed to get regulator: -517

>>> panfrost 11800000.gpu: regulator init failed -517

>>> Power domain G3D disable failed

>>> ...

>>> panfrost 11800000.gpu: clock rate = 400000000

>>> 8<--- cut here ---

>>> Unhandled fault: imprecise external abort (0x1406) at 0x00000000

>>> pgd = (ptrval)

>>> [00000000] *pgd=00000000

>>> Internal error: : 1406 [#1] PREEMPT SMP ARM

>>> Modules linked in:

>>> CPU: 7 PID: 53 Comm: kworker/7:1 Not tainted 5.4.0-rc8-next-20191119-00032-g56f1001191a6 #6923

>>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)

>>> Workqueue: events deferred_probe_work_func

>>> PC is at panfrost_gpu_soft_reset+0x94/0x110

>>> LR is at ___might_sleep+0x128/0x2dc

>>> ...

>>> [<c05c231c>] (panfrost_gpu_soft_reset) from [<c05c2704>] (panfrost_gpu_init+0x10/0x67c)

>>> [<c05c2704>] (panfrost_gpu_init) from [<c05c15d0>] (panfrost_device_init+0x158/0x2cc)

>>> [<c05c15d0>] (panfrost_device_init) from [<c05c0cb0>] (panfrost_probe+0x80/0x178)

>>> [<c05c0cb0>] (panfrost_probe) from [<c05cfaa0>] (platform_drv_probe+0x48/0x9c)

>>> [<c05cfaa0>] (platform_drv_probe) from [<c05cd20c>] (really_probe+0x1c4/0x474)

>>> [<c05cd20c>] (really_probe) from [<c05cd694>] (driver_probe_device+0x78/0x1bc)

>>> [<c05cd694>] (driver_probe_device) from [<c05cb374>] (bus_for_each_drv+0x74/0xb8)

>>> [<c05cb374>] (bus_for_each_drv) from [<c05ccfa8>] (__device_attach+0xd4/0x16c)

>>> [<c05ccfa8>] (__device_attach) from [<c05cc110>] (bus_probe_device+0x88/0x90)

>>> [<c05cc110>] (bus_probe_device) from [<c05cc634>] (deferred_probe_work_func+0x4c/0xd0)

>>> [<c05cc634>] (deferred_probe_work_func) from [<c0149df0>] (process_one_work+0x300/0x864)

>>> [<c0149df0>] (process_one_work) from [<c014a3ac>] (worker_thread+0x58/0x5a0)

>>> [<c014a3ac>] (worker_thread) from [<c0151174>] (kthread+0x12c/0x160)

>>> [<c0151174>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)

>>> Exception stack(0xee03dfb0 to 0xee03dff8)

>>> ...

>>> Code: e594300c e5933020 e3130c01 1a00000f (ebefff50).

>>> ---[ end trace badde2b74a65a540 ]---

>>>

>>> In the above case, the Panfrost driver disables G3D clocks after failure

>>> of getting the needed regulator and return with -EPROVE_DEFER code. This

>>> causes G3D power domain disable failure and then, during second probe

>>> an imprecise abort is triggered due to undefined power domain state.

>>>

>>> Fixes: 45f10dabb56b ("clk: samsung: exynos5420: Add SET_RATE_PARENT flag to clocks on G3D path")

>>> Fixes: c9f7567aff31 ("clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU")

>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

>>> ---

>>>   drivers/clk/samsung/clk-exynos5420.c | 2 +-

>>>   1 file changed, 1 insertion(+), 1 deletion(-)

>>>

>>> diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c

>>> index 3a991ca1ee36..89126ba66995 100644

>>> --- a/drivers/clk/samsung/clk-exynos5420.c

>>> +++ b/drivers/clk/samsung/clk-exynos5420.c

>>> @@ -712,7 +712,7 @@ static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = {

>>>        MUX(0, "mout_sw_aclk266_g2d", mout_sw_aclk266_g2d_p,

>>>                        SRC_TOP12, 12, 1),

>>>        MUX_F(0, "mout_sw_aclk_g3d", mout_sw_aclk_g3d_p, SRC_TOP12, 16, 1,

>>> -           CLK_SET_RATE_PARENT, 0),

>>> +           CLK_IS_CRITICAL | CLK_SET_RATE_PARENT, 0),

>>>        MUX(0, "mout_sw_aclk300_jpeg", mout_sw_aclk300_jpeg_p,

>>                          SRC_TOP12, 20, 1),

>>

>> Adding that flag to a mux clock doesn't look right, it feels we are not

>> addressing the issue properly and the root cause is not known.

>>

>> AFAICS CLK_IS_CRITICAL flag will also not ensure required root clock up

>> in the clk tree is enabled in case any reparenting is done after that mux

>> clock has been registered.  The flag seems misused and the fix looks dubious

>> and fragile to me.

>>

>> I would apply that to fix crashes we are seeing in -next but we ned to come

>> up with a better solution.

>>

>> Perhaps we could add a comment like:

>>

>> "CLK_IS_CRITICAL flag is added to this clock as a workaround for imprecise

>> external abort triggered by the Panfrost driver and will be removed once

>> the issue is properly addressed."

> You should always add a comment to CLK_IS_CRITICAL usage. This sort of

> comment is not comforting though. When will the issue be properly

> addressed? Maybe we should block panfrost from being enabled on this

> platform until it is root caused?


I can add a comment, no problem.

This issue is not related to the panfrost driver at all. Panfrost driver 
simply enables and disables the clock of its HW module. It is nothing 
extraordinary.

The real issue here is that some SoC internal busses (not assigned to 
any driver at all) are sourced from the same MUX, which that patch 
flagged as CRITICAL without any additional gates, thus if the only 
client of that MUX disables its gate clock, the whole path up to the 
root PLL is disabled what causes the HW issue. The driver (or user via 
dts) might change the root PLL, so that CRITICAL flag cannot be moved to 
the top clock in this hierarchy.

I can change that CRITICAL flag to a explicit call to 
clk_prepare_enable() during exynos542x-clk driver probe, but IMHO the 
flag better fits in such case.

I'm not sure if the issue can be properly addressed in a better way. The 
main problem here is that the whole clock topology for Exynos5422 SoCs 
is not fully documented in datasheet (especially the relation of the 
clocks and SoC's power domains) and the driver doesn't also fully model 
it, because other Exynos542x HW module drivers doesn't expect such fine 
grained clocks (mainly for historical compatibility).

I can take a look how to better model some problematic parts, but this 
won't be a simple, single patch. The issue mitigated by this patch 
already prevents v5.5-rc1 from booting with the default 
exynos_defconfig/multi_v7_defconfig, so I would like to merge it as a 
fix for v5.5 first.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Sylwester Nawrocki Dec. 16, 2019, 11:05 a.m. | #5
On 12/13/19 08:11, Marek Szyprowski wrote:
> The real issue here is that some SoC internal busses (not assigned to 

> any driver at all) are sourced from the same MUX, which that patch 

> flagged as CRITICAL without any additional gates, thus if the only 

> client of that MUX disables its gate clock, the whole path up to the 

> root PLL is disabled what causes the HW issue. The driver (or user via 

> dts) might change the root PLL, so that CRITICAL flag cannot be moved to 

> the top clock in this hierarchy.

> 

> I can change that CRITICAL flag to a explicit call to 

> clk_prepare_enable() during exynos542x-clk driver probe, but IMHO the 

> flag better fits in such case.


I would prefer an explicit clk_prepare_enable() call, similarly as it is 
done in drivers/clk/samsung/clk-exynos-audss.c. This would somewhat separate
proper clocks definition from workarounds. The CLK_IS_CRITICAL flag might be 
a bit misleading IMO because the clock for which it is being added now doesn't 
have gating ability. The flag really applies to some root PLL clock which is
behind few other muxes going up in the clk tree.

-- 
Regards
Sylwester

Patch

diff --git a/drivers/clk/samsung/clk-exynos5420.c b/drivers/clk/samsung/clk-exynos5420.c
index 3a991ca1ee36..89126ba66995 100644
--- a/drivers/clk/samsung/clk-exynos5420.c
+++ b/drivers/clk/samsung/clk-exynos5420.c
@@ -712,7 +712,7 @@  static const struct samsung_mux_clock exynos5x_mux_clks[] __initconst = {
 	MUX(0, "mout_sw_aclk266_g2d", mout_sw_aclk266_g2d_p,
 			SRC_TOP12, 12, 1),
 	MUX_F(0, "mout_sw_aclk_g3d", mout_sw_aclk_g3d_p, SRC_TOP12, 16, 1,
-	      CLK_SET_RATE_PARENT, 0),
+	      CLK_IS_CRITICAL | CLK_SET_RATE_PARENT, 0),
 	MUX(0, "mout_sw_aclk300_jpeg", mout_sw_aclk300_jpeg_p,
 			SRC_TOP12, 20, 1),
 	MUX(CLK_MOUT_SW_ACLK300, "mout_sw_aclk300_disp1",