diff mbox series

[RFT] drm: adv7511/33: fix adv7511_cec_init() failure handling

Message ID 20171115123823.1895515-1-arnd@arndb.de
State New
Headers show
Series [RFT] drm: adv7511/33: fix adv7511_cec_init() failure handling | expand

Commit Message

Arnd Bergmann Nov. 15, 2017, 12:37 p.m. UTC
An otherwise correct cleanup patch from Dan Carpenter turned a broken
failure handling from a feature patch by Hans Verkuil into a kernel
Oops, so bisection points to commit 7af35b0addbc ("drm/kirin: Checking
for IS_ERR() instead of NULL") rather than 3b1b975003e4 ("drm:
adv7511/33: add HDMI CEC support").

I've managed to piece together several partial problems, though
I'm still struggling with the bigger picture:

adv7511_probe() registers a drm_bridge structure that was allocated
with devm_kzalloc(). It calls adv7511_cec_init(), which fails for an
unknown reason, which in turn triggers the registered structure to be
removed.

Elsewhere, kirin_drm_platform_probe() gets called, which calls
of_graph_get_remote_node(), and that returns NULL. Before Dan's
patch we would go on with a NULL pointer here and register that,
now kirin_drm_platform_probe() fails with -ENODEV.

In a third driver, dsi_parse_dt() calls drm_of_find_panel_or_bridge(),
which after not finding a panel goes on to call of_drm_find_bridge(),
and that crashes due to the earlier list corruption.

This addresses the first issue by making sure that adv7511_probe()
does not leave behind any corrupted list entries. This should
get the system back to boot but needs testing. From my understanding,
there is at least one more bug that needs to be resolved to actually
get everything to work again.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Xinliang Liu <xinliang.liu@linaro.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Archit Taneja <architt@codeaurora.org>
Link: https://bugs.linaro.org/show_bug.cgi?id=3345
Link: https://lkft.validation.linaro.org/scheduler/job/48017#L3551
Fixes: 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL")
Fixes: 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
Untested so far, this is what I came up with after reading the
WARN_ON log from a modified kernel.

Naresh, can you give this one a go?

Hans and others, can you review in the meantime?

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/gpu/drm/bridge/adv7511/adv7511_drv.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

John Stultz Nov. 16, 2017, 10:20 p.m. UTC | #1
On Thu, Nov 16, 2017 at 1:50 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Nov 15, 2017 at 4:37 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> An otherwise correct cleanup patch from Dan Carpenter turned a broken
>> failure handling from a feature patch by Hans Verkuil into a kernel
>> Oops, so bisection points to commit 7af35b0addbc ("drm/kirin: Checking
>> for IS_ERR() instead of NULL") rather than 3b1b975003e4 ("drm:
>> adv7511/33: add HDMI CEC support").
>>
>> I've managed to piece together several partial problems, though
>> I'm still struggling with the bigger picture:
>>
>> adv7511_probe() registers a drm_bridge structure that was allocated
>> with devm_kzalloc(). It calls adv7511_cec_init(), which fails for an
>> unknown reason, which in turn triggers the registered structure to be
>> removed.
>>
>> Elsewhere, kirin_drm_platform_probe() gets called, which calls
>> of_graph_get_remote_node(), and that returns NULL. Before Dan's
>> patch we would go on with a NULL pointer here and register that,
>> now kirin_drm_platform_probe() fails with -ENODEV.
>>
>> In a third driver, dsi_parse_dt() calls drm_of_find_panel_or_bridge(),
>> which after not finding a panel goes on to call of_drm_find_bridge(),
>> and that crashes due to the earlier list corruption.
>>
>> This addresses the first issue by making sure that adv7511_probe()
>> does not leave behind any corrupted list entries. This should
>> get the system back to boot but needs testing. From my understanding,
>> there is at least one more bug that needs to be resolved to actually
>> get everything to work again.
>
> So I've started hitting the issue this patch tries to address (now
> that the related code landed in Linus' tree). The only issue is that
> with this fix, I don't see graphics initializing properly, so I
> suspect something is still wrong with the error handling (though what
> exactly I'm not sure).

So this seems to only happen when CONFIG_DRM_I2C_ADV7511_CEC is
enabled. If it is on, I don't get any graphics, but if its disabled
graphics works.

Tying this with Arnd's patch, I'm guessing adv7511_cec_init() is
failing, but it seems like instead of just disabling the CEC feature,
we're failing to load the driver entirely.

Maybe should the logic be something like:
#ifdef CONFIG_DRM_I2C_ADV7511_CEC
        ret = adv7511_cec_init(dev, adv7511, offset);
        if (ret)
#endif
            regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL + offset,
                        ADV7511_CEC_CTRL_POWER_DOWN);

?
thanks
-john
John Stultz Nov. 16, 2017, 10:23 p.m. UTC | #2
On Thu, Nov 16, 2017 at 2:20 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Thu, Nov 16, 2017 at 1:50 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On Wed, Nov 15, 2017 at 4:37 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> An otherwise correct cleanup patch from Dan Carpenter turned a broken
>>> failure handling from a feature patch by Hans Verkuil into a kernel
>>> Oops, so bisection points to commit 7af35b0addbc ("drm/kirin: Checking
>>> for IS_ERR() instead of NULL") rather than 3b1b975003e4 ("drm:
>>> adv7511/33: add HDMI CEC support").
>>>
>>> I've managed to piece together several partial problems, though
>>> I'm still struggling with the bigger picture:
>>>
>>> adv7511_probe() registers a drm_bridge structure that was allocated
>>> with devm_kzalloc(). It calls adv7511_cec_init(), which fails for an
>>> unknown reason, which in turn triggers the registered structure to be
>>> removed.
>>>
>>> Elsewhere, kirin_drm_platform_probe() gets called, which calls
>>> of_graph_get_remote_node(), and that returns NULL. Before Dan's
>>> patch we would go on with a NULL pointer here and register that,
>>> now kirin_drm_platform_probe() fails with -ENODEV.
>>>
>>> In a third driver, dsi_parse_dt() calls drm_of_find_panel_or_bridge(),
>>> which after not finding a panel goes on to call of_drm_find_bridge(),
>>> and that crashes due to the earlier list corruption.
>>>
>>> This addresses the first issue by making sure that adv7511_probe()
>>> does not leave behind any corrupted list entries. This should
>>> get the system back to boot but needs testing. From my understanding,
>>> there is at least one more bug that needs to be resolved to actually
>>> get everything to work again.
>>
>> So I've started hitting the issue this patch tries to address (now
>> that the related code landed in Linus' tree). The only issue is that
>> with this fix, I don't see graphics initializing properly, so I
>> suspect something is still wrong with the error handling (though what
>> exactly I'm not sure).
>
> So this seems to only happen when CONFIG_DRM_I2C_ADV7511_CEC is
> enabled. If it is on, I don't get any graphics, but if its disabled
> graphics works.
>
> Tying this with Arnd's patch, I'm guessing adv7511_cec_init() is
> failing, but it seems like instead of just disabling the CEC feature,
> we're failing to load the driver entirely.
>
> Maybe should the logic be something like:
> #ifdef CONFIG_DRM_I2C_ADV7511_CEC
>         ret = adv7511_cec_init(dev, adv7511, offset);
>         if (ret)
> #endif
>             regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL + offset,
>                         ADV7511_CEC_CTRL_POWER_DOWN);
>
> ?

I just tested with this, and this approach seems to work for me.

thanks
-john
John Stultz Nov. 20, 2017, 8:13 p.m. UTC | #3
On Fri, Nov 17, 2017 at 12:43 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> index 3a33075dbb22..56eeeea6a1fa 100644
> --- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
> @@ -1202,9 +1202,7 @@ static int adv7511_probe(struct i2c_client *i2c, const struct i2c_device_id *id)
>         offset = adv7511->type == ADV7533 ? ADV7533_REG_CEC_OFFSET : 0;
>
>  #ifdef CONFIG_DRM_I2C_ADV7511_CEC
> -       ret = adv7511_cec_init(dev, adv7511, offset);
> -       if (ret)
> -               goto err_unregister_cec;
> +       adv7511_cec_init(dev, adv7511, offset);
>  #else
>         regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL + offset,
>                      ADV7511_CEC_CTRL_POWER_DOWN);


One tiny nit-pick I realized I should have made in my patch...

In the !CONFIG_DRM_I2C_ADV7511_CEC, can you just define adv7511_cec_init() as
{
    regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL + offset,
ADV7511_CEC_CTRL_POWER_DOWN);
}

Then we can call it either way, and don't need to have the ufly
#ifdefs in the adv7511_probe function?

thanks
-john
Archit Taneja Nov. 30, 2017, 7:02 a.m. UTC | #4
On 11/23/2017 05:52 AM, John Stultz wrote:
> On Tue, Nov 21, 2017 at 12:17 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> If the device tree for a board did not specify a cec clock, then
>> adv7511_cec_init would return an error, which would cause adv7511_probe()
>> to fail and thus there is no HDMI output.
>>
>> There is no need to have adv7511_probe() fail if the CEC initialization
>> fails, so just change adv7511_cec_init() to a void function. In addition,
>> adv7511_cec_init() should just return silently if the cec clock isn't
>> found and show a message for any other errors.
>>
>> An otherwise correct cleanup patch from Dan Carpenter turned this broken
>> failure handling into a kernel Oops, so bisection points to commit
>> 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL") rather
>> than 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support").
>>
>> Based on earlier patches from Arnd and John.
>>
>> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
>> Cc: Xinliang Liu <xinliang.liu@linaro.org>
>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>> Cc: Sean Paul <seanpaul@chromium.org>
>> Cc: Archit Taneja <architt@codeaurora.org>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Link: https://bugs.linaro.org/show_bug.cgi?id=3345
>> Link: https://lkft.validation.linaro.org/scheduler/job/48017#L3551
>> Fixes: 7af35b0addbc ("drm/kirin: Checking for IS_ERR() instead of NULL")
>> Fixes: 3b1b975003e4 ("drm: adv7511/33: add HDMI CEC support")
>> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
>> Tested-by: Hans Verkuil <hans.verkuil@cisco.com>
>> ---
>> This rework of Arnd and John's patches goes a bit further and just silently
>> exits if there is no cec clock defined in the dts. I'm sure that's the
>> reason why the kirin board failed on this. BTW: if the kirin board DOES
>> support cec, then it would be nice if it can be hooked up in the dts!
>>
>> Tested with my Dragonboard and Renesas Koelsch board. Also tested what
>> happens when probing is deferred due to missing cec clock.
>>
>> John, can you test this again?
> 
> Sorry I didn't get back to you yesterday on this!
> 
> Seems to be working ok for me!
> 
> Tested-by: John Stultz <john.stultz@linaro.org>

Queued to drm-misc-fixes. Thanks for fixing this.

Archit
diff mbox series

Patch

diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
index 0e14f1572d05..93d1ecafe8fa 100644
--- a/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
+++ b/drivers/gpu/drm/bridge/adv7511/adv7511_drv.c
@@ -1204,7 +1204,7 @@  static int adv7511_probe(struct i2c_client *i2c, const struct i2c_device_id *id)
 #ifdef CONFIG_DRM_I2C_ADV7511_CEC
 	ret = adv7511_cec_init(dev, adv7511, offset);
 	if (ret)
-		goto err_unregister_cec;
+		goto err_unregister_bridge;
 #else
 	regmap_write(adv7511->regmap, ADV7511_REG_CEC_CTRL + offset,
 		     ADV7511_CEC_CTRL_POWER_DOWN);
@@ -1212,6 +1212,11 @@  static int adv7511_probe(struct i2c_client *i2c, const struct i2c_device_id *id)
 
 	return 0;
 
+#ifdef CONFIG_DRM_I2C_ADV7511_CEC
+err_unregister_bridge:
+	adv7511_audio_exit(adv7511);
+	drm_bridge_remove(&adv7511->bridge);
+#endif
 err_unregister_cec:
 	i2c_unregister_device(adv7511->i2c_cec);
 	if (adv7511->cec_clk)