diff mbox series

ACPI: AGDI: Improve error reporting for problems during .remove()

Message ID 20221014160623.467195-1-u.kleine-koenig@pengutronix.de
State Accepted
Commit 858a56630a840aba0e291fd4958e9a8f74638d56
Headers show
Series ACPI: AGDI: Improve error reporting for problems during .remove() | expand

Commit Message

Uwe Kleine-König Oct. 14, 2022, 4:06 p.m. UTC
Returning an error value in a platform driver's remove callback results in
a generic error message being emitted by the driver core, but otherwise it
doesn't make a difference. The device goes away anyhow.

So instead of triggering the generic platform error message, emit a more
helpful message if a problem occurs and return 0 to suppress the generic
message.

This patch is a preparation for making platform remove callbacks return
void.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
Hello,

note that in the situations where the driver returned an error before
and now emits a message, there is a resource leak. Someone who knows
more about this driver and maybe even can test stuff, might want to
address this. This might not only be about non-freed memory, the device
disappears but it is kept in sdei_list and so might be used after being
gone.

Best regards
Uwe

 drivers/acpi/arm64/agdi.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)


base-commit: 4fe89d07dcc2804c8b562f6c7896a45643d34b2f

Comments

James Morse Oct. 26, 2022, 4:09 p.m. UTC | #1
Hi guys,

On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
>> Returning an error value in a platform driver's remove callback results in
>> a generic error message being emitted by the driver core, but otherwise it
>> doesn't make a difference. The device goes away anyhow.
>>
>> So instead of triggering the generic platform error message, emit a more
>> helpful message if a problem occurs and return 0 to suppress the generic
>> message.
>>
>> This patch is a preparation for making platform remove callbacks return
>> void.
> 
> If that's the plan - I don't have anything against this patch.
> 
>> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
>> ---
>> Hello,
>>
>> note that in the situations where the driver returned an error before
>> and now emits a message, there is a resource leak. Someone who knows
>> more about this driver and maybe even can test stuff, might want to
>> address this. This might not only be about non-freed memory, the device
>> disappears but it is kept in sdei_list and so might be used after being
>> gone.

> I'd need James' input on this. I guess we may ignore
> sdei_event_disable() return value and continue anyway in agdi_remove(),
> whether that's the right thing to do it is a different question.

The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
Given the handler panic()s the machine, if an event is in progress, the resource leak
isn't something worth worrying about. The real problem is that the handler code may be
free()d while another CPU is still executing it, which is only a problem for modules.

As this thing can't be built as a module, and the handler panic()s the machine, I don't
think there is going to be a problem here.


Thanks,

James


>> diff --git a/drivers/acpi/arm64/agdi.c b/drivers/acpi/arm64/agdi.c
>> index cf31abd0ed1b..f605302395c3 100644
>> --- a/drivers/acpi/arm64/agdi.c
>> +++ b/drivers/acpi/arm64/agdi.c
>> @@ -64,8 +64,11 @@ static int agdi_remove(struct platform_device *pdev)
>>  	int err, i;
>>  
>>  	err = sdei_event_disable(adata->sdei_event);
>> -	if (err)
>> -		return err;
>> +	if (err) {
>> +		dev_err(&pdev->dev, "Failed to disable sdei-event #%d (%pe)\n",
>> +			adata->sdei_event, ERR_PTR(err));
>> +		return 0;
>> +	}
>>  
>>  	for (i = 0; i < 3; i++) {
>>  		err = sdei_event_unregister(adata->sdei_event);
>> @@ -75,7 +78,11 @@ static int agdi_remove(struct platform_device *pdev)
>>  		schedule();
>>  	}
>>  
>> -	return err;
>> +	if (err)
>> +		dev_err(&pdev->dev, "Failed to unregister sdei-event #%d (%pe)\n",
>> +			adata->sdei_event, ERR_PTR(err));
>> +
>> +	return 0;
>>  }
>>  
>>  static struct platform_driver agdi_driver = {
>>
>> base-commit: 4fe89d07dcc2804c8b562f6c7896a45643d34b2f
>> -- 
>> 2.37.2
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Uwe Kleine-König Oct. 26, 2022, 5:23 p.m. UTC | #2
Hello James,

On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> Hi guys,
> 
> On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> >> Returning an error value in a platform driver's remove callback results in
> >> a generic error message being emitted by the driver core, but otherwise it
> >> doesn't make a difference. The device goes away anyhow.
> >>
> >> So instead of triggering the generic platform error message, emit a more
> >> helpful message if a problem occurs and return 0 to suppress the generic
> >> message.
> >>
> >> This patch is a preparation for making platform remove callbacks return
> >> void.
> > 
> > If that's the plan - I don't have anything against this patch.
> > 
> >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> >> ---
> >> Hello,
> >>
> >> note that in the situations where the driver returned an error before
> >> and now emits a message, there is a resource leak. Someone who knows
> >> more about this driver and maybe even can test stuff, might want to
> >> address this. This might not only be about non-freed memory, the device
> >> disappears but it is kept in sdei_list and so might be used after being
> >> gone.
> 
> > I'd need James' input on this. I guess we may ignore
> > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > whether that's the right thing to do it is a different question.
> 
> The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> Given the handler panic()s the machine, if an event is in progress, the resource leak
> isn't something worth worrying about. The real problem is that the handler code may be
> free()d while another CPU is still executing it, which is only a problem for modules.
> 
> As this thing can't be built as a module, and the handler panic()s the machine, I don't
> think there is going to be a problem here.

Is that an Ack?

Best regards
Uwe
Uwe Kleine-König Dec. 19, 2022, 10:18 p.m. UTC | #3
Hello,

On Wed, Oct 26, 2022 at 07:23:35PM +0200, Uwe Kleine-König wrote:
> On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> > On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> > >> Returning an error value in a platform driver's remove callback results in
> > >> a generic error message being emitted by the driver core, but otherwise it
> > >> doesn't make a difference. The device goes away anyhow.
> > >>
> > >> So instead of triggering the generic platform error message, emit a more
> > >> helpful message if a problem occurs and return 0 to suppress the generic
> > >> message.
> > >>
> > >> This patch is a preparation for making platform remove callbacks return
> > >> void.
> > > 
> > > If that's the plan - I don't have anything against this patch.
> > > 
> > >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> > >> ---
> > >> Hello,
> > >>
> > >> note that in the situations where the driver returned an error before
> > >> and now emits a message, there is a resource leak. Someone who knows
> > >> more about this driver and maybe even can test stuff, might want to
> > >> address this. This might not only be about non-freed memory, the device
> > >> disappears but it is kept in sdei_list and so might be used after being
> > >> gone.
> > 
> > > I'd need James' input on this. I guess we may ignore
> > > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > > whether that's the right thing to do it is a different question.
> > 
> > The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> > Given the handler panic()s the machine, if an event is in progress, the resource leak
> > isn't something worth worrying about. The real problem is that the handler code may be
> > free()d while another CPU is still executing it, which is only a problem for modules.
> > 
> > As this thing can't be built as a module, and the handler panic()s the machine, I don't
> > think there is going to be a problem here.
> 
> Is that an Ack?

This patch wasn't applied anywhere (at least it didn't appear in next
since October). Did it fell through the cracks? Is there anything
missing?

Best regards
Uwe
Uwe Kleine-König Feb. 14, 2023, 4:36 p.m. UTC | #4
Hello,

On Mon, Dec 19, 2022 at 11:18:19PM +0100, Uwe Kleine-König wrote:
> On Wed, Oct 26, 2022 at 07:23:35PM +0200, Uwe Kleine-König wrote:
> > On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> > > On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > > > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> > > >> Returning an error value in a platform driver's remove callback results in
> > > >> a generic error message being emitted by the driver core, but otherwise it
> > > >> doesn't make a difference. The device goes away anyhow.
> > > >>
> > > >> So instead of triggering the generic platform error message, emit a more
> > > >> helpful message if a problem occurs and return 0 to suppress the generic
> > > >> message.
> > > >>
> > > >> This patch is a preparation for making platform remove callbacks return
> > > >> void.
> > > > 
> > > > If that's the plan - I don't have anything against this patch.
> > > > 
> > > >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> > > >> ---
> > > >> Hello,
> > > >>
> > > >> note that in the situations where the driver returned an error before
> > > >> and now emits a message, there is a resource leak. Someone who knows
> > > >> more about this driver and maybe even can test stuff, might want to
> > > >> address this. This might not only be about non-freed memory, the device
> > > >> disappears but it is kept in sdei_list and so might be used after being
> > > >> gone.
> > > 
> > > > I'd need James' input on this. I guess we may ignore
> > > > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > > > whether that's the right thing to do it is a different question.
> > > 
> > > The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> > > Given the handler panic()s the machine, if an event is in progress, the resource leak
> > > isn't something worth worrying about. The real problem is that the handler code may be
> > > free()d while another CPU is still executing it, which is only a problem for modules.
> > > 
> > > As this thing can't be built as a module, and the handler panic()s the machine, I don't
> > > think there is going to be a problem here.
> > 
> > Is that an Ack?
> 
> This patch wasn't applied anywhere (at least it didn't appear in next
> since October). Did it fell through the cracks? Is there anything
> missing?

gentle ping!

Working on making struct platform_driver::remove() return void, I'd like
to base another patch on top of this one. For that it would be great if
it entered the mainline ...

Thanks for considering,
Uwe
Uwe Kleine-König April 12, 2023, 4:24 p.m. UTC | #5
On Tue, Feb 14, 2023 at 05:36:23PM +0100, Uwe Kleine-König wrote:
> Hello,
> 
> On Mon, Dec 19, 2022 at 11:18:19PM +0100, Uwe Kleine-König wrote:
> > On Wed, Oct 26, 2022 at 07:23:35PM +0200, Uwe Kleine-König wrote:
> > > On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> > > > On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > > > > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> > > > >> Returning an error value in a platform driver's remove callback results in
> > > > >> a generic error message being emitted by the driver core, but otherwise it
> > > > >> doesn't make a difference. The device goes away anyhow.
> > > > >>
> > > > >> So instead of triggering the generic platform error message, emit a more
> > > > >> helpful message if a problem occurs and return 0 to suppress the generic
> > > > >> message.
> > > > >>
> > > > >> This patch is a preparation for making platform remove callbacks return
> > > > >> void.
> > > > > 
> > > > > If that's the plan - I don't have anything against this patch.
> > > > > 
> > > > >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> > > > >> ---
> > > > >> Hello,
> > > > >>
> > > > >> note that in the situations where the driver returned an error before
> > > > >> and now emits a message, there is a resource leak. Someone who knows
> > > > >> more about this driver and maybe even can test stuff, might want to
> > > > >> address this. This might not only be about non-freed memory, the device
> > > > >> disappears but it is kept in sdei_list and so might be used after being
> > > > >> gone.
> > > > 
> > > > > I'd need James' input on this. I guess we may ignore
> > > > > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > > > > whether that's the right thing to do it is a different question.
> > > > 
> > > > The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> > > > Given the handler panic()s the machine, if an event is in progress, the resource leak
> > > > isn't something worth worrying about. The real problem is that the handler code may be
> > > > free()d while another CPU is still executing it, which is only a problem for modules.
> > > > 
> > > > As this thing can't be built as a module, and the handler panic()s the machine, I don't
> > > > think there is going to be a problem here.
> > > 
> > > Is that an Ack?
> > 
> > This patch wasn't applied anywhere (at least it didn't appear in next
> > since October). Did it fell through the cracks? Is there anything
> > missing?
> 
> gentle ping!
> 
> Working on making struct platform_driver::remove() return void, I'd like
> to base another patch on top of this one. For that it would be great if
> it entered the mainline ...

gentle ping ++

Would it help to resend?

Best regards
Uwe
Lorenzo Pieralisi April 13, 2023, 8:23 a.m. UTC | #6
[+Catalin, Will: ACPI arm64 changes are sent through arm64 tree]

On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> Hi guys,
> 
> On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> >> Returning an error value in a platform driver's remove callback results in
> >> a generic error message being emitted by the driver core, but otherwise it
> >> doesn't make a difference. The device goes away anyhow.
> >>
> >> So instead of triggering the generic platform error message, emit a more
> >> helpful message if a problem occurs and return 0 to suppress the generic
> >> message.
> >>
> >> This patch is a preparation for making platform remove callbacks return
> >> void.
> > 
> > If that's the plan - I don't have anything against this patch.
> > 
> >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> >> ---
> >> Hello,
> >>
> >> note that in the situations where the driver returned an error before
> >> and now emits a message, there is a resource leak. Someone who knows
> >> more about this driver and maybe even can test stuff, might want to
> >> address this. This might not only be about non-freed memory, the device
> >> disappears but it is kept in sdei_list and so might be used after being
> >> gone.
> 
> > I'd need James' input on this. I guess we may ignore
> > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > whether that's the right thing to do it is a different question.
> 
> The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> Given the handler panic()s the machine, if an event is in progress, the resource leak
> isn't something worth worrying about. The real problem is that the handler code may be
> free()d while another CPU is still executing it, which is only a problem for modules.
> 
> As this thing can't be built as a module, and the handler panic()s the machine, I don't
> think there is going to be a problem here.

Thanks James, I think though that's something we may want to handle in a
separate patch.

This one looks fine to merge to me:

Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

> Thanks,
> 
> James
> 
> 
> >> diff --git a/drivers/acpi/arm64/agdi.c b/drivers/acpi/arm64/agdi.c
> >> index cf31abd0ed1b..f605302395c3 100644
> >> --- a/drivers/acpi/arm64/agdi.c
> >> +++ b/drivers/acpi/arm64/agdi.c
> >> @@ -64,8 +64,11 @@ static int agdi_remove(struct platform_device *pdev)
> >>  	int err, i;
> >>  
> >>  	err = sdei_event_disable(adata->sdei_event);
> >> -	if (err)
> >> -		return err;
> >> +	if (err) {
> >> +		dev_err(&pdev->dev, "Failed to disable sdei-event #%d (%pe)\n",
> >> +			adata->sdei_event, ERR_PTR(err));
> >> +		return 0;
> >> +	}
> >>  
> >>  	for (i = 0; i < 3; i++) {
> >>  		err = sdei_event_unregister(adata->sdei_event);
> >> @@ -75,7 +78,11 @@ static int agdi_remove(struct platform_device *pdev)
> >>  		schedule();
> >>  	}
> >>  
> >> -	return err;
> >> +	if (err)
> >> +		dev_err(&pdev->dev, "Failed to unregister sdei-event #%d (%pe)\n",
> >> +			adata->sdei_event, ERR_PTR(err));
> >> +
> >> +	return 0;
> >>  }
> >>  
> >>  static struct platform_driver agdi_driver = {
> >>
> >> base-commit: 4fe89d07dcc2804c8b562f6c7896a45643d34b2f
> >> -- 
> >> 2.37.2
> >>
> >>
> >> _______________________________________________
> >> linux-arm-kernel mailing list
> >> linux-arm-kernel@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Will Deacon April 13, 2023, 2:48 p.m. UTC | #7
On Thu, Apr 13, 2023 at 10:23:50AM +0200, Lorenzo Pieralisi wrote:
> [+Catalin, Will: ACPI arm64 changes are sent through arm64 tree]
> 
> On Wed, Oct 26, 2022 at 05:09:40PM +0100, James Morse wrote:
> > Hi guys,
> > 
> > On 18/10/2022 10:35, Lorenzo Pieralisi wrote:
> > > On Fri, Oct 14, 2022 at 06:06:23PM +0200, Uwe Kleine-K�nig wrote:
> > >> Returning an error value in a platform driver's remove callback results in
> > >> a generic error message being emitted by the driver core, but otherwise it
> > >> doesn't make a difference. The device goes away anyhow.
> > >>
> > >> So instead of triggering the generic platform error message, emit a more
> > >> helpful message if a problem occurs and return 0 to suppress the generic
> > >> message.
> > >>
> > >> This patch is a preparation for making platform remove callbacks return
> > >> void.
> > > 
> > > If that's the plan - I don't have anything against this patch.
> > > 
> > >> Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de>
> > >> ---
> > >> Hello,
> > >>
> > >> note that in the situations where the driver returned an error before
> > >> and now emits a message, there is a resource leak. Someone who knows
> > >> more about this driver and maybe even can test stuff, might want to
> > >> address this. This might not only be about non-freed memory, the device
> > >> disappears but it is kept in sdei_list and so might be used after being
> > >> gone.
> > 
> > > I'd need James' input on this. I guess we may ignore
> > > sdei_event_disable() return value and continue anyway in agdi_remove(),
> > > whether that's the right thing to do it is a different question.
> > 
> > The unregister stuff is allowed to fail if the event is 'in progress' on another CPU.
> > Given the handler panic()s the machine, if an event is in progress, the resource leak
> > isn't something worth worrying about. The real problem is that the handler code may be
> > free()d while another CPU is still executing it, which is only a problem for modules.
> > 
> > As this thing can't be built as a module, and the handler panic()s the machine, I don't
> > think there is going to be a problem here.
> 
> Thanks James, I think though that's something we may want to handle in a
> separate patch.
> 
> This one looks fine to merge to me:
> 
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>

Cheers, Lorenzo. I'll pick this one up.

Will
Will Deacon April 17, 2023, 3:03 p.m. UTC | #8
On Fri, 14 Oct 2022 18:06:23 +0200, Uwe Kleine-König wrote:
> Returning an error value in a platform driver's remove callback results in
> a generic error message being emitted by the driver core, but otherwise it
> doesn't make a difference. The device goes away anyhow.
> 
> So instead of triggering the generic platform error message, emit a more
> helpful message if a problem occurs and return 0 to suppress the generic
> message.
> 
> [...]

Applied to arm64 (for-next/acpi), thanks!

[1/1] ACPI: AGDI: Improve error reporting for problems during .remove()
      https://git.kernel.org/arm64/c/858a56630a84

Cheers,
diff mbox series

Patch

diff --git a/drivers/acpi/arm64/agdi.c b/drivers/acpi/arm64/agdi.c
index cf31abd0ed1b..f605302395c3 100644
--- a/drivers/acpi/arm64/agdi.c
+++ b/drivers/acpi/arm64/agdi.c
@@ -64,8 +64,11 @@  static int agdi_remove(struct platform_device *pdev)
 	int err, i;
 
 	err = sdei_event_disable(adata->sdei_event);
-	if (err)
-		return err;
+	if (err) {
+		dev_err(&pdev->dev, "Failed to disable sdei-event #%d (%pe)\n",
+			adata->sdei_event, ERR_PTR(err));
+		return 0;
+	}
 
 	for (i = 0; i < 3; i++) {
 		err = sdei_event_unregister(adata->sdei_event);
@@ -75,7 +78,11 @@  static int agdi_remove(struct platform_device *pdev)
 		schedule();
 	}
 
-	return err;
+	if (err)
+		dev_err(&pdev->dev, "Failed to unregister sdei-event #%d (%pe)\n",
+			adata->sdei_event, ERR_PTR(err));
+
+	return 0;
 }
 
 static struct platform_driver agdi_driver = {