Message ID | 20170519230314.15718-1-linus.walleij@linaro.org |
---|---|
State | New |
Headers | show |
Hello, Linus. On Sat, May 20, 2017 at 01:03:14AM +0200, Linus Walleij wrote: > The ATA hosts are allocated using devres with: > host = devres_alloc(ata_host_release, sz, GFP_KERNEL); > However in the ata_host_release() function the host is retrieved > using dev_get_drvdata() which is not what other devres handlers > do, instead we should probably use the passed resource. > > Before this my kernel crashes badly when I fail to start a host > in ata_host_start() and need to bail out, because dev_get_drvdata() > gets the wrong-but-almost-correct pointer (so on some systems it > may by chance be the right pointer what do I know). > > On ARMv4 Gemini it is not: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1 at ../lib/refcount.c:184 refcount_sub_and_test+0x9c/0xac > refcount_t: underflow; use-after-free. > CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc1+ #657 > Hardware name: Gemini (Device Tree) > [<c0010f10>] (unwind_backtrace) from [<c000d8a4>] (show_stack+0x10/0x14) > [<c000d8a4>] (show_stack) from [<c0018720>] (__warn+0xcc/0xf4) > [<c0018720>] (__warn) from [<c0018780>] (warn_slowpath_fmt+0x38/0x48) > [<c0018780>] (warn_slowpath_fmt) from [<c01fffcc>] (refcount_sub_and_test+0x9c/0xac) > [<c01fffcc>] (refcount_sub_and_test) from [<c01e8a5c>] (kobject_put+0x28/0xe0) > [<c01e8a5c>] (kobject_put) from [<c029b294>] (ata_host_release+0xb0/0x144) > [<c029b294>] (ata_host_release) from [<c027326c>] (release_nodes+0x178/0x1fc) > [<c027326c>] (release_nodes) from [<c02707e4>] (driver_probe_device+0xd0/0x2dc) > [<c02707e4>] (driver_probe_device) from [<c0270aac>] (__driver_attach+0xbc/0xc0) > [<c0270aac>] (__driver_attach) from [<c026eeac>] (bus_for_each_dev+0x70/0xa0) > [<c026eeac>] (bus_for_each_dev) from [<c026f824>] (bus_add_driver+0x178/0x200) > [<c026f824>] (bus_add_driver) from [<c0271184>] (driver_register+0x78/0xf8) > [<c0271184>] (driver_register) from [<c05b2d90>] (do_one_initcall+0xac/0x174) > [<c05b2d90>] (do_one_initcall) from [<c05b2f6c>] (kernel_init_freeable+0x114/0x1cc) > [<c05b2f6c>] (kernel_init_freeable) from [<c04beeb4>] (kernel_init+0x8/0xf4) > [<c04beeb4>] (kernel_init) from [<c000a270>] (ret_from_fork+0x14/0x24) > ---[ end trace 0a4570446a019085 ]--- > > Then there is a second (worse) crash when it tries to iterate > to the next port. But it is all because the host pointer is > wrong. This is really weird. The two can't be different, well, at least shouldn't. > In this case, the host should be 0xc7a3f3d0 as it was when it got > allocated but instead what dev_get_drvdata() returns is 0xc7a3f370. > Using the passed resource gives the right pointer. That's 96 bytes of difference, which seems too big for devres_node, especially on 32bit machines. Can you check what gdb says on "print ((struct devres *)0)->data" or "print sizeof(struct devres_node)"? There gotta be something else going on. devres_alloc() returns the data pointer which is the same one which gets passed into the release function. Thanks. -- tejun
Hello, again. On Tue, May 23, 2017 at 05:16:08PM -0400, Tejun Heo wrote: > That's 96 bytes of difference, which seems too big for devres_node, > especially on 32bit machines. Can you check what gdb says on "print > ((struct devres *)0)->data" or "print sizeof(struct devres_node)"? > > There gotta be something else going on. devres_alloc() returns the > data pointer which is the same one which gets passed into the release > function. Also, can you please add throw in some printks and see what's being returned from devres_alloc() and getting set on driver data? This is really weird. I can't think of a reason why they'd differ. Thanks. -- tejun
On Tue, May 23, 2017 at 11:16 PM, Tejun Heo <tj@kernel.org> wrote: >> The ATA hosts are allocated using devres with: >> host = devres_alloc(ata_host_release, sz, GFP_KERNEL); >> However in the ata_host_release() function the host is retrieved >> using dev_get_drvdata() which is not what other devres handlers >> do, instead we should probably use the passed resource. >> >> Before this my kernel crashes badly when I fail to start a host >> in ata_host_start() and need to bail out, because dev_get_drvdata() >> gets the wrong-but-almost-correct pointer (so on some systems it >> may by chance be the right pointer what do I know). >> >> On ARMv4 Gemini it is not: (...) > This is really weird. The two can't be different, well, at least > shouldn't. I found the problem. This is because my driver issues platform_set_drvdata(pdev) on the same struct device * overwriting the data with its own. That function is just an alias for dev_set_drvdata(). Amazingly, libata survives this until release. Maybe we should print a warning if dev_get_drvdata() and res differ? It's a sign that something is wrong because someone screwed with the drvdata behind the back of libata. It appears further that I am in bad company: there are a few drivers in drivers/ata that have broken errorpath because they do exactly this or variants of this. So it turns up in my driver too because of copypaste. Device drivers assume that they "own" drvdata inside the device, but with libata they do not, as shown above. It is used more as a rule than an exception to pass a state container from probe() over to remove(). It is a common pattern to overwrite drvdata, the following drivers now have this bug in one way or another: pata_bf54x.c: platform_set_drvdata(pdev, host); pata_ep93xx.c: platform_set_drvdata(pdev, drv_data); sata_dwc_460ex.c: dev_set_drvdata(&ofdev->dev, host); These drivers: pata_rb532_cf.c: platform_set_drvdata(pdev, ah); pata_samsung_cf.c: platform_set_drvdata(pdev, host); sata_fsl.c: platform_set_drvdata(ofdev, host); They set the ATA host as drvdata, essentially overwriting the drvdata pointer with the same value. I guess I will simply make a cleanup series for these, making sure they use host->private_data instead and do not double-write the drvdata. Yours, Linus Walleij
Hello, Linus. On Tue, May 30, 2017 at 11:21:24AM +0200, Linus Walleij wrote: > This is because my driver issues platform_set_drvdata(pdev) > on the same struct device * overwriting the data with > its own. That function is just an alias for dev_set_drvdata(). I see. > Amazingly, libata survives this until release. That is surprising given that libata does depend on that drvdata quite a bit. > Maybe we should print a warning if dev_get_drvdata() > and res differ? It's a sign that something is wrong because > someone screwed with the drvdata behind the back of > libata. Please feel free to submit a patch to add WARN_ON there. > I guess I will simply make a cleanup series for these, > making sure they use host->private_data instead and do not > double-write the drvdata. Great. Thanks for working on this! -- tejun
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 2d83b8c75965..5487c4a29bc3 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5921,7 +5921,7 @@ struct ata_port *ata_port_alloc(struct ata_host *host) static void ata_host_release(struct device *gendev, void *res) { - struct ata_host *host = dev_get_drvdata(gendev); + struct ata_host *host = res; int i; for (i = 0; i < host->n_ports; i++) {
The ATA hosts are allocated using devres with: host = devres_alloc(ata_host_release, sz, GFP_KERNEL); However in the ata_host_release() function the host is retrieved using dev_get_drvdata() which is not what other devres handlers do, instead we should probably use the passed resource. Before this my kernel crashes badly when I fail to start a host in ata_host_start() and need to bail out, because dev_get_drvdata() gets the wrong-but-almost-correct pointer (so on some systems it may by chance be the right pointer what do I know). On ARMv4 Gemini it is not: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at ../lib/refcount.c:184 refcount_sub_and_test+0x9c/0xac refcount_t: underflow; use-after-free. CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc1+ #657 Hardware name: Gemini (Device Tree) [<c0010f10>] (unwind_backtrace) from [<c000d8a4>] (show_stack+0x10/0x14) [<c000d8a4>] (show_stack) from [<c0018720>] (__warn+0xcc/0xf4) [<c0018720>] (__warn) from [<c0018780>] (warn_slowpath_fmt+0x38/0x48) [<c0018780>] (warn_slowpath_fmt) from [<c01fffcc>] (refcount_sub_and_test+0x9c/0xac) [<c01fffcc>] (refcount_sub_and_test) from [<c01e8a5c>] (kobject_put+0x28/0xe0) [<c01e8a5c>] (kobject_put) from [<c029b294>] (ata_host_release+0xb0/0x144) [<c029b294>] (ata_host_release) from [<c027326c>] (release_nodes+0x178/0x1fc) [<c027326c>] (release_nodes) from [<c02707e4>] (driver_probe_device+0xd0/0x2dc) [<c02707e4>] (driver_probe_device) from [<c0270aac>] (__driver_attach+0xbc/0xc0) [<c0270aac>] (__driver_attach) from [<c026eeac>] (bus_for_each_dev+0x70/0xa0) [<c026eeac>] (bus_for_each_dev) from [<c026f824>] (bus_add_driver+0x178/0x200) [<c026f824>] (bus_add_driver) from [<c0271184>] (driver_register+0x78/0xf8) [<c0271184>] (driver_register) from [<c05b2d90>] (do_one_initcall+0xac/0x174) [<c05b2d90>] (do_one_initcall) from [<c05b2f6c>] (kernel_init_freeable+0x114/0x1cc) [<c05b2f6c>] (kernel_init_freeable) from [<c04beeb4>] (kernel_init+0x8/0xf4) [<c04beeb4>] (kernel_init) from [<c000a270>] (ret_from_fork+0x14/0x24) ---[ end trace 0a4570446a019085 ]--- Then there is a second (worse) crash when it tries to iterate to the next port. But it is all because the host pointer is wrong. In this case, the host should be 0xc7a3f3d0 as it was when it got allocated but instead what dev_get_drvdata() returns is 0xc7a3f370. Using the passed resource gives the right pointer. Cc: stable@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- drivers/ata/libata-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.9.4