diff mbox

libata: Fix devres handling

Message ID 20170519230314.15718-1-linus.walleij@linaro.org
State New
Headers show

Commit Message

Linus Walleij May 19, 2017, 11:03 p.m. UTC
The ATA hosts are allocated using devres with:
host = devres_alloc(ata_host_release, sz, GFP_KERNEL);
However in the ata_host_release() function the host is retrieved
using dev_get_drvdata() which is not what other devres handlers
do, instead we should probably use the passed resource.

Before this my kernel crashes badly when I fail to start a host
in ata_host_start() and need to bail out, because dev_get_drvdata()
gets the wrong-but-almost-correct pointer (so on some systems it
may by chance be the right pointer what do I know).

On ARMv4 Gemini it is not:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ../lib/refcount.c:184 refcount_sub_and_test+0x9c/0xac
refcount_t: underflow; use-after-free.
CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc1+ #657
Hardware name: Gemini (Device Tree)
[<c0010f10>] (unwind_backtrace) from [<c000d8a4>] (show_stack+0x10/0x14)
[<c000d8a4>] (show_stack) from [<c0018720>] (__warn+0xcc/0xf4)
[<c0018720>] (__warn) from [<c0018780>] (warn_slowpath_fmt+0x38/0x48)
[<c0018780>] (warn_slowpath_fmt) from [<c01fffcc>] (refcount_sub_and_test+0x9c/0xac)
[<c01fffcc>] (refcount_sub_and_test) from [<c01e8a5c>] (kobject_put+0x28/0xe0)
[<c01e8a5c>] (kobject_put) from [<c029b294>] (ata_host_release+0xb0/0x144)
[<c029b294>] (ata_host_release) from [<c027326c>] (release_nodes+0x178/0x1fc)
[<c027326c>] (release_nodes) from [<c02707e4>] (driver_probe_device+0xd0/0x2dc)
[<c02707e4>] (driver_probe_device) from [<c0270aac>] (__driver_attach+0xbc/0xc0)
[<c0270aac>] (__driver_attach) from [<c026eeac>] (bus_for_each_dev+0x70/0xa0)
[<c026eeac>] (bus_for_each_dev) from [<c026f824>] (bus_add_driver+0x178/0x200)
[<c026f824>] (bus_add_driver) from [<c0271184>] (driver_register+0x78/0xf8)
[<c0271184>] (driver_register) from [<c05b2d90>] (do_one_initcall+0xac/0x174)
[<c05b2d90>] (do_one_initcall) from [<c05b2f6c>] (kernel_init_freeable+0x114/0x1cc)
[<c05b2f6c>] (kernel_init_freeable) from [<c04beeb4>] (kernel_init+0x8/0xf4)
[<c04beeb4>] (kernel_init) from [<c000a270>] (ret_from_fork+0x14/0x24)
---[ end trace 0a4570446a019085 ]---

Then there is a second (worse) crash when it tries to iterate
to the next port. But it is all because the host pointer is
wrong.

In this case, the host should be 0xc7a3f3d0 as it was when it got
allocated but instead what dev_get_drvdata() returns is 0xc7a3f370.
Using the passed resource gives the right pointer.

Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

---
 drivers/ata/libata-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.9.4

Comments

Tejun Heo May 23, 2017, 9:16 p.m. UTC | #1
Hello, Linus.

On Sat, May 20, 2017 at 01:03:14AM +0200, Linus Walleij wrote:
> The ATA hosts are allocated using devres with:

> host = devres_alloc(ata_host_release, sz, GFP_KERNEL);

> However in the ata_host_release() function the host is retrieved

> using dev_get_drvdata() which is not what other devres handlers

> do, instead we should probably use the passed resource.

>

> Before this my kernel crashes badly when I fail to start a host

> in ata_host_start() and need to bail out, because dev_get_drvdata()

> gets the wrong-but-almost-correct pointer (so on some systems it

> may by chance be the right pointer what do I know).

> 

> On ARMv4 Gemini it is not:

> 

> ------------[ cut here ]------------

> WARNING: CPU: 0 PID: 1 at ../lib/refcount.c:184 refcount_sub_and_test+0x9c/0xac

> refcount_t: underflow; use-after-free.

> CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc1+ #657

> Hardware name: Gemini (Device Tree)

> [<c0010f10>] (unwind_backtrace) from [<c000d8a4>] (show_stack+0x10/0x14)

> [<c000d8a4>] (show_stack) from [<c0018720>] (__warn+0xcc/0xf4)

> [<c0018720>] (__warn) from [<c0018780>] (warn_slowpath_fmt+0x38/0x48)

> [<c0018780>] (warn_slowpath_fmt) from [<c01fffcc>] (refcount_sub_and_test+0x9c/0xac)

> [<c01fffcc>] (refcount_sub_and_test) from [<c01e8a5c>] (kobject_put+0x28/0xe0)

> [<c01e8a5c>] (kobject_put) from [<c029b294>] (ata_host_release+0xb0/0x144)

> [<c029b294>] (ata_host_release) from [<c027326c>] (release_nodes+0x178/0x1fc)

> [<c027326c>] (release_nodes) from [<c02707e4>] (driver_probe_device+0xd0/0x2dc)

> [<c02707e4>] (driver_probe_device) from [<c0270aac>] (__driver_attach+0xbc/0xc0)

> [<c0270aac>] (__driver_attach) from [<c026eeac>] (bus_for_each_dev+0x70/0xa0)

> [<c026eeac>] (bus_for_each_dev) from [<c026f824>] (bus_add_driver+0x178/0x200)

> [<c026f824>] (bus_add_driver) from [<c0271184>] (driver_register+0x78/0xf8)

> [<c0271184>] (driver_register) from [<c05b2d90>] (do_one_initcall+0xac/0x174)

> [<c05b2d90>] (do_one_initcall) from [<c05b2f6c>] (kernel_init_freeable+0x114/0x1cc)

> [<c05b2f6c>] (kernel_init_freeable) from [<c04beeb4>] (kernel_init+0x8/0xf4)

> [<c04beeb4>] (kernel_init) from [<c000a270>] (ret_from_fork+0x14/0x24)

> ---[ end trace 0a4570446a019085 ]---

> 

> Then there is a second (worse) crash when it tries to iterate

> to the next port. But it is all because the host pointer is

> wrong.


This is really weird.  The two can't be different, well, at least
shouldn't.

> In this case, the host should be 0xc7a3f3d0 as it was when it got

> allocated but instead what dev_get_drvdata() returns is 0xc7a3f370.

> Using the passed resource gives the right pointer.


That's 96 bytes of difference, which seems too big for devres_node,
especially on 32bit machines.  Can you check what gdb says on "print
((struct devres *)0)->data" or "print sizeof(struct devres_node)"?

There gotta be something else going on.  devres_alloc() returns the
data pointer which is the same one which gets passed into the release
function.

Thanks.

-- 
tejun
Tejun Heo May 23, 2017, 9:27 p.m. UTC | #2
Hello, again.

On Tue, May 23, 2017 at 05:16:08PM -0400, Tejun Heo wrote:
> That's 96 bytes of difference, which seems too big for devres_node,

> especially on 32bit machines.  Can you check what gdb says on "print

> ((struct devres *)0)->data" or "print sizeof(struct devres_node)"?

> 

> There gotta be something else going on.  devres_alloc() returns the

> data pointer which is the same one which gets passed into the release

> function.


Also, can you please add throw in some printks and see what's being
returned from devres_alloc() and getting set on driver data?  This is
really weird.  I can't think of a reason why they'd differ.

Thanks.

-- 
tejun
Linus Walleij May 30, 2017, 9:21 a.m. UTC | #3
On Tue, May 23, 2017 at 11:16 PM, Tejun Heo <tj@kernel.org> wrote:

>> The ATA hosts are allocated using devres with:

>> host = devres_alloc(ata_host_release, sz, GFP_KERNEL);

>> However in the ata_host_release() function the host is retrieved

>> using dev_get_drvdata() which is not what other devres handlers

>> do, instead we should probably use the passed resource.

>>

>> Before this my kernel crashes badly when I fail to start a host

>> in ata_host_start() and need to bail out, because dev_get_drvdata()

>> gets the wrong-but-almost-correct pointer (so on some systems it

>> may by chance be the right pointer what do I know).

>>

>> On ARMv4 Gemini it is not:

(...)
> This is really weird.  The two can't be different, well, at least

> shouldn't.


I found the problem.

This is because my driver issues platform_set_drvdata(pdev)
on the same struct device * overwriting the data with
its own. That function is just an alias for dev_set_drvdata().

Amazingly, libata survives this until release.

Maybe we should print a warning if dev_get_drvdata()
and res differ? It's a sign that something is wrong because
someone screwed with the drvdata behind the back of
libata.

It appears further that I am in bad company: there are a few
drivers in drivers/ata that have broken errorpath because
they do exactly this or variants of this. So it turns up in my
driver too because of copypaste.

Device drivers assume that they "own" drvdata inside the
device, but with libata they do not, as shown above.
It is used more as a rule than an exception to pass a
state container from probe() over to remove().

It is a common pattern to overwrite drvdata, the following
drivers now have this bug in one way or another:

pata_bf54x.c: platform_set_drvdata(pdev, host);
pata_ep93xx.c: platform_set_drvdata(pdev, drv_data);
sata_dwc_460ex.c: dev_set_drvdata(&ofdev->dev, host);

These drivers:

pata_rb532_cf.c: platform_set_drvdata(pdev, ah);
pata_samsung_cf.c: platform_set_drvdata(pdev, host);
sata_fsl.c: platform_set_drvdata(ofdev, host);

They set the ATA host as drvdata, essentially overwriting
the drvdata pointer with the same value.

I guess I will simply make a cleanup series for these,
making sure they use host->private_data instead and do not
double-write the drvdata.

Yours,
Linus Walleij
Tejun Heo May 30, 2017, 5:59 p.m. UTC | #4
Hello, Linus.

On Tue, May 30, 2017 at 11:21:24AM +0200, Linus Walleij wrote:
> This is because my driver issues platform_set_drvdata(pdev)

> on the same struct device * overwriting the data with

> its own. That function is just an alias for dev_set_drvdata().


I see.

> Amazingly, libata survives this until release.


That is surprising given that libata does depend on that drvdata quite
a bit.

> Maybe we should print a warning if dev_get_drvdata()

> and res differ? It's a sign that something is wrong because

> someone screwed with the drvdata behind the back of

> libata.


Please feel free to submit a patch to add WARN_ON there.

> I guess I will simply make a cleanup series for these,

> making sure they use host->private_data instead and do not

> double-write the drvdata.


Great.

Thanks for working on this!

-- 
tejun
diff mbox

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 2d83b8c75965..5487c4a29bc3 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5921,7 +5921,7 @@  struct ata_port *ata_port_alloc(struct ata_host *host)
 
 static void ata_host_release(struct device *gendev, void *res)
 {
-	struct ata_host *host = dev_get_drvdata(gendev);
+	struct ata_host *host = res;
 	int i;
 
 	for (i = 0; i < host->n_ports; i++) {