mbox series

[RFC,0/1] of: dynamic: allow freeing of_nodes after the overlay changeset

Message ID 20230511151047.1779841-1-nuno.sa@analog.com
Headers show
Series of: dynamic: allow freeing of_nodes after the overlay changeset | expand

Message

Nuno Sa May 11, 2023, 3:10 p.m. UTC
There are valid cases where we might get in here with an unexpected
refcount. One example are devlinks between suppliers <-> consumers.
When a link is created, it will grab a reference to both the supplier and
the consumer devices (which can, potentially, hold a reference to it's
of_node) and this reference is not synchronously dropped when unbinding the
device from the driver. Instead, a work item is queued (see
devlink_dev_release()). This async nature will make that
__of_changeset_entry_destroy() is reached with a refcount > 1. But,
eventually, all the refcounts are released and of_node_release() is
reached.

All the above will lead to leaks and the following dumps:

[ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /fpga-axi/dummy_dev
[ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vio
[ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vdd-1-8
[ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vref

 1297.168367] ------------[ cut here ]------------
[ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25 kobject_get+0x9c/0xa0
[ 1297.180514] refcount_t: addition on 0; use-after-free.
[ 1297.185661] Modules linked in:
[ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty #5
[ 1297.195617] Hardware name: Xilinx Zynq Platform
[ 1297.200158] Workqueue: events_long device_link_release_fn
[ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
[ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
[ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
[ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
[ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
[ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
[ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
[ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
[ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
[ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
[ 1297.259497]  pointer from vsnprintf+0x20c/0x410
[ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
[ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
[ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
[ 1297.274503]  vprintk_default from _printk+0x2c/0x58
[ 1297.274534]  _printk from kobject_put+0x8c/0x130
[ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
[ 1297.295245]  platform_device_release from device_release+0x30/0xa0
[ 1297.301458]  device_release from kobject_put+0x8c/0x130
[ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
[ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
[ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
[ 1297.324684]  worker_thread from kthread+0xd0/0xec
[ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
[ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
[ 1297.339200] 5fa0:                                     00000000 00000000 00000000 00000000
[ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 1297.362192] ---[ end trace 0000000000000000 ]---
[ 1297.366805] ------------[ cut here ]------------
[ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28 fwnode_full_name_string+0x8c/0xa0
[ 1297.379957] refcount_t: underflow; use-after-free.
[ 1297.384739] Modules linked in:
[ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G        W          6.3.1-dirty #5
[ 1297.396136] Hardware name: Xilinx Zynq Platform
[ 1297.400660] Workqueue: events_long device_link_release_fn
[ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
[ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
[ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
[ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
[ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
[ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
[ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
[ 1297.444867]  pointer from vsnprintf+0x20c/0x410
[ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
[ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
[ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
[ 1297.465056]  vprintk_default from _printk+0x2c/0x58
[ 1297.469944]  _printk from kobject_put+0x8c/0x130
[ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
[ 1297.480507]  platform_device_release from device_release+0x30/0xa0
[ 1297.486705]  device_release from kobject_put+0x8c/0x130
[ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
[ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
[ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
[ 1297.509828]  worker_thread from kthread+0xd0/0xec
[ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
[ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
[ 1297.524308] 5fa0:                                     00000000 00000000 00000000 00000000
[ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 1297.547256] ---[ end trace 0000000000000000 ]---
[ 1297.168353] OF: ERROR: memory leak before free overlay changeset,  /regulator-vref
[ 1297.551874] ------------[ cut here ]------------
[ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22 kobject_get+0x88/0xa0
[ 1297.551903] refcount_t: saturated; leaking memory.
[ 1297.551908] Modules linked in:
[ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G        W          6.3.1-dirty #5
[ 1297.551931] Hardware name: Xilinx Zynq Platform
[ 1297.551938] Workqueue: events_long device_link_release_fn
[ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
[ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
[ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
[ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
[ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
[ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
[ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
[ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
[ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
[ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
[ 1297.552262]  pointer from vsnprintf+0x20c/0x410
[ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
[ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
[ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
[ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
[ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
[ 1297.552457]  vprintk_default from _printk+0x2c/0x58
[ 1297.552483]  _printk from kobject_put+0x8c/0x130
[ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
[ 1297.552539]  platform_device_release from device_release+0x30/0xa0
[ 1297.552571]  device_release from kobject_put+0x8c/0x130
[ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
[ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
[ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
[ 1297.552674]  worker_thread from kthread+0xd0/0xec
[ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
[ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
[ 1297.552736] 5fa0:                                     00000000 00000000 00000000 00000000
[ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 1297.552768] ---[ end trace 0000000000000000 ]---
[ 1297.769464] OF: ERROR: memory leak before free overlay changeset,  /regulator-vdd-1-8
[ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-axi/dummy_dev
[ 1297.777360] OF: ERROR: memory leak before free overlay changeset,  /regulator-vio

the above is easily reproducible with a dummy platform device that just
gets some regulators (devlinks are created at that point).

The culprit of the above is actually this line:

https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366

This also makes me wonder if we should not have an extra patch just printing
'node->full_name' instead of '%pOF'. As seen, we should not make any assumptions
(like parent's being around :)) at this stage.

To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
is already gone so if we ever reach of_node_release() we can proceed
normally. Obviously, I'm not really sure about this and that's the whole reason
why I'm bringing this as an RFC. This looks like a nasty corner case and I know
that adding/removing devices dynamically is far from being easy to handle...

Also not sure if the driver core folks should be aware of this...

Lastly, one of workarounding this issue is by manually unbiding the device
from the driver before removing the overlay which would (potentially) give
time for the workers to run.

Nuno Sa (1):
  of: dynamic: allow freeing of_nodes after the overlay changeset

 drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
 include/linux/of.h   |  1 +
 2 files changed, 28 insertions(+), 4 deletions(-)

Comments

Frank Rowand May 14, 2023, 9:20 p.m. UTC | #1
On 5/11/23 10:10, Nuno Sa wrote:
> There are valid cases where we might get in here with an unexpected
> refcount. One example are devlinks between suppliers <-> consumers.
> When a link is created, it will grab a reference to both the supplier and
> the consumer devices (which can, potentially, hold a reference to it's
> of_node) and this reference is not synchronously dropped when unbinding the
> device from the driver. Instead, a work item is queued (see
> devlink_dev_release()). This async nature will make that
> __of_changeset_entry_destroy() is reached with a refcount > 1. But,
> eventually, all the refcounts are released and of_node_release() is
> reached.
> 
> All the above will lead to leaks and the following dumps:
> 
> [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /fpga-axi/dummy_dev
> [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vio
> [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vdd-1-8
> [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay node /regulator-vref
> 
>  1297.168367] ------------[ cut here ]------------
> [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25 kobject_get+0x9c/0xa0
> [ 1297.180514] refcount_t: addition on 0; use-after-free.
> [ 1297.185661] Modules linked in:
> [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty #5
> [ 1297.195617] Hardware name: Xilinx Zynq Platform
> [ 1297.200158] Workqueue: events_long device_link_release_fn
> [ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
> [ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
> [ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
> [ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
> [ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
> [ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
> [ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
> [ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> [ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> [ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
> [ 1297.259497]  pointer from vsnprintf+0x20c/0x410
> [ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
> [ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
> [ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
> [ 1297.274503]  vprintk_default from _printk+0x2c/0x58
> [ 1297.274534]  _printk from kobject_put+0x8c/0x130
> [ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
> [ 1297.295245]  platform_device_release from device_release+0x30/0xa0
> [ 1297.301458]  device_release from kobject_put+0x8c/0x130
> [ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
> [ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> [ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
> [ 1297.324684]  worker_thread from kthread+0xd0/0xec
> [ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
> [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> [ 1297.339200] 5fa0:                                     00000000 00000000 00000000 00000000
> [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 1297.362192] ---[ end trace 0000000000000000 ]---
> [ 1297.366805] ------------[ cut here ]------------
> [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28 fwnode_full_name_string+0x8c/0xa0
> [ 1297.379957] refcount_t: underflow; use-after-free.
> [ 1297.384739] Modules linked in:
> [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G        W          6.3.1-dirty #5
> [ 1297.396136] Hardware name: Xilinx Zynq Platform
> [ 1297.400660] Workqueue: events_long device_link_release_fn
> [ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
> [ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
> [ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
> [ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
> [ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
> [ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> [ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
> [ 1297.444867]  pointer from vsnprintf+0x20c/0x410
> [ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
> [ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
> [ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
> [ 1297.465056]  vprintk_default from _printk+0x2c/0x58
> [ 1297.469944]  _printk from kobject_put+0x8c/0x130
> [ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
> [ 1297.480507]  platform_device_release from device_release+0x30/0xa0
> [ 1297.486705]  device_release from kobject_put+0x8c/0x130
> [ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
> [ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> [ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
> [ 1297.509828]  worker_thread from kthread+0xd0/0xec
> [ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
> [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> [ 1297.524308] 5fa0:                                     00000000 00000000 00000000 00000000
> [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 1297.547256] ---[ end trace 0000000000000000 ]---
> [ 1297.168353] OF: ERROR: memory leak before free overlay changeset,  /regulator-vref
> [ 1297.551874] ------------[ cut here ]------------
> [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22 kobject_get+0x88/0xa0
> [ 1297.551903] refcount_t: saturated; leaking memory.
> [ 1297.551908] Modules linked in:
> [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G        W          6.3.1-dirty #5
> [ 1297.551931] Hardware name: Xilinx Zynq Platform
> [ 1297.551938] Workqueue: events_long device_link_release_fn
> [ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
> [ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
> [ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
> [ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
> [ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
> [ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
> [ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
> [ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> [ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> [ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
> [ 1297.552262]  pointer from vsnprintf+0x20c/0x410
> [ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
> [ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
> [ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
> [ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
> [ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
> [ 1297.552457]  vprintk_default from _printk+0x2c/0x58
> [ 1297.552483]  _printk from kobject_put+0x8c/0x130
> [ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
> [ 1297.552539]  platform_device_release from device_release+0x30/0xa0
> [ 1297.552571]  device_release from kobject_put+0x8c/0x130
> [ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
> [ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> [ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
> [ 1297.552674]  worker_thread from kthread+0xd0/0xec
> [ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
> [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> [ 1297.552736] 5fa0:                                     00000000 00000000 00000000 00000000
> [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [ 1297.552768] ---[ end trace 0000000000000000 ]---
> [ 1297.769464] OF: ERROR: memory leak before free overlay changeset,  /regulator-vdd-1-8
> [ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-axi/dummy_dev
> [ 1297.777360] OF: ERROR: memory leak before free overlay changeset,  /regulator-vio
> 
> the above is easily reproducible with a dummy platform device that just
> gets some regulators (devlinks are created at that point).
> 
> The culprit of the above is actually this line:
> 
> https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366
> 
> This also makes me wonder if we should not have an extra patch just printing
> 'node->full_name' instead of '%pOF'. As seen, we should not make any assumptions
> (like parent's being around :)) at this stage.
> 
> To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
> is already gone so if we ever reach of_node_release() we can proceed
> normally. Obviously, I'm not really sure about this and that's the whole reason
> why I'm bringing this as an RFC. This looks like a nasty corner case and I know
> that adding/removing devices dynamically is far from being easy to handle...
> 
> Also not sure if the driver core folks should be aware of this...
> 
> Lastly, one of workarounding this issue is by manually unbiding the device
> from the driver before removing the overlay which would (potentially) give
> time for the workers to run.
> 
> Nuno Sa (1):
>   of: dynamic: allow freeing of_nodes after the overlay changeset
> 
>  drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
>  include/linux/of.h   |  1 +
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 

Very nice problem description, thanks.

You have found a real problem of how the devlink implementation did not consider
the impact on overlays.  The overlay removal process does not expect any node
created by an overlay apply to exist at the moment that the overlay removal
completes (to be a little more pedantic, the removal process of deletes nodes and
properties occurs in the exact opposite order that they were created.  Thus even
partly through the overlay removal, a node is deleted before its parent is
deleted.  Also, all of the properties for any node deleted during the removal
are deleted (even if the node refcount does not reach zero).

The RFC patch does not attempt to fix the actual problem, it merely suppresses
reporting the symptom.

-Frank
Nuno Sá May 15, 2023, 6:57 a.m. UTC | #2
On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote:
> On 5/11/23 10:10, Nuno Sa wrote:
> > There are valid cases where we might get in here with an unexpected
> > refcount. One example are devlinks between suppliers <-> consumers.
> > When a link is created, it will grab a reference to both the supplier and
> > the consumer devices (which can, potentially, hold a reference to it's
> > of_node) and this reference is not synchronously dropped when unbinding the
> > device from the driver. Instead, a work item is queued (see
> > devlink_dev_release()). This async nature will make that
> > __of_changeset_entry_destroy() is reached with a refcount > 1. But,
> > eventually, all the refcounts are released and of_node_release() is
> > reached.
> > 
> > All the above will lead to leaks and the following dumps:
> > 
> > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /fpga-axi/dummy_dev
> > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vio
> > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vdd-1-8
> > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vref
> > 
> >  1297.168367] ------------[ cut here ]------------
> > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25
> > kobject_get+0x9c/0xa0
> > [ 1297.180514] refcount_t: addition on 0; use-after-free.
> > [ 1297.185661] Modules linked in:
> > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty
> > #5
> > [ 1297.195617] Hardware name: Xilinx Zynq Platform
> > [ 1297.200158] Workqueue: events_long device_link_release_fn
> > [ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
> > [ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.259497]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.274503]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.274534]  _printk from kobject_put+0x8c/0x130
> > [ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.295245]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.301458]  device_release from kobject_put+0x8c/0x130
> > [ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.324684]  worker_thread from kthread+0xd0/0xec
> > [ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.339200] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.362192] ---[ end trace 0000000000000000 ]---
> > [ 1297.366805] ------------[ cut here ]------------
> > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28
> > fwnode_full_name_string+0x8c/0xa0
> > [ 1297.379957] refcount_t: underflow; use-after-free.
> > [ 1297.384739] Modules linked in:
> > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.396136] Hardware name: Xilinx Zynq Platform
> > [ 1297.400660] Workqueue: events_long device_link_release_fn
> > [ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
> > [ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.444867]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.465056]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.469944]  _printk from kobject_put+0x8c/0x130
> > [ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.480507]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.486705]  device_release from kobject_put+0x8c/0x130
> > [ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.509828]  worker_thread from kthread+0xd0/0xec
> > [ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.524308] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.547256] ---[ end trace 0000000000000000 ]---
> > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vref
> > [ 1297.551874] ------------[ cut here ]------------
> > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22
> > kobject_get+0x88/0xa0
> > [ 1297.551903] refcount_t: saturated; leaking memory.
> > [ 1297.551908] Modules linked in:
> > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.551931] Hardware name: Xilinx Zynq Platform
> > [ 1297.551938] Workqueue: events_long device_link_release_fn
> > [ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
> > [ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.552262]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
> > [ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
> > [ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
> > [ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.552457]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.552483]  _printk from kobject_put+0x8c/0x130
> > [ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.552539]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.552571]  device_release from kobject_put+0x8c/0x130
> > [ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.552674]  worker_thread from kthread+0xd0/0xec
> > [ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.552736] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.552768] ---[ end trace 0000000000000000 ]---
> > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vdd-1-8
> > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-
> > axi/dummy_dev
> > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vio
> > 
> > the above is easily reproducible with a dummy platform device that just
> > gets some regulators (devlinks are created at that point).
> > 
> > The culprit of the above is actually this line:
> > 
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366
> > 
> > This also makes me wonder if we should not have an extra patch just printing
> > 'node->full_name' instead of '%pOF'. As seen, we should not make any
> > assumptions
> > (like parent's being around :)) at this stage.
> > 
> > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
> > is already gone so if we ever reach of_node_release() we can proceed
> > normally. Obviously, I'm not really sure about this and that's the whole
> > reason
> > why I'm bringing this as an RFC. This looks like a nasty corner case and I
> > know
> > that adding/removing devices dynamically is far from being easy to handle...
> > 
> > Also not sure if the driver core folks should be aware of this...
> > 
> > Lastly, one of workarounding this issue is by manually unbiding the device
> > from the driver before removing the overlay which would (potentially) give
> > time for the workers to run.
> > 
> > Nuno Sa (1):
> >   of: dynamic: allow freeing of_nodes after the overlay changeset
> > 
> >  drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
> >  include/linux/of.h   |  1 +
> >  2 files changed, 28 insertions(+), 4 deletions(-)
> > 
> 
> Very nice problem description, thanks.
> 
> You have found a real problem of how the devlink implementation did not
> consider
> the impact on overlays.  The overlay removal process does not expect any node
> created by an overlay apply to exist at the moment that the overlay removal
> completes (to be a little more pedantic, the removal process of deletes nodes
> and
> properties occurs in the exact opposite order that they were created.  Thus
> even
> partly through the overlay removal, a node is deleted before its parent is
> deleted.  Also, all of the properties for any node deleted during the removal
> are deleted (even if the node refcount does not reach zero).
> 
> The RFC patch does not attempt to fix the actual problem, it merely suppresses
> reporting the symptom.
> 
> -Frank
> 

Hi Frank,

Thanks for taking a look into this. I was afraid that this was to
straightforward indeed. So I guess you're saying this is something that needs to
be fixed at the devlink level?

+cc Greg and Rafael for some help and advice :)

So, I'd very much try to give this a proper fix but this is messing with very
core components so any help would be welcome. The only think that comes to my
mind is to have some kind of syncing between the moment (or even before) the
work is queued and the moment the node is deleted. I guess this would have to
happen at the kobject level...

Another piece of information is that this is an issue for platform devices and
pretty much for any device that releases the node during it's .dev.release
callback (which BTW, sound correct to me). For example, spi and i2c devices are
releasing the node during the driver - device unbind and not when the device
gets released. This actually gives time for things to settle before reaching the
point of deleting the node (not saying this correct or should be the solution
for this case).

- Nuno Sá
Nuno Sá June 13, 2023, 8:17 a.m. UTC | #3
On Mon, 2023-05-15 at 08:57 +0200, Nuno Sá wrote:
> On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote:
> > On 5/11/23 10:10, Nuno Sa wrote:
> > > There are valid cases where we might get in here with an unexpected
> > > refcount. One example are devlinks between suppliers <-> consumers.
> > > When a link is created, it will grab a reference to both the supplier and
> > > the consumer devices (which can, potentially, hold a reference to it's
> > > of_node) and this reference is not synchronously dropped when unbinding the
> > > device from the driver. Instead, a work item is queued (see
> > > devlink_dev_release()). This async nature will make that
> > > __of_changeset_entry_destroy() is reached with a refcount > 1. But,
> > > eventually, all the refcounts are released and of_node_release() is
> > > reached.
> > > 
> > > All the above will lead to leaks and the following dumps:
> > > 
> > > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2,
> > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > > node /fpga-axi/dummy_dev
> > > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > > node /regulator-vio
> > > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > > node /regulator-vdd-1-8
> > > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > > node /regulator-vref
> > > 
> > >  1297.168367] ------------[ cut here ]------------
> > > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25
> > > kobject_get+0x9c/0xa0
> > > [ 1297.180514] refcount_t: addition on 0; use-after-free.
> > > [ 1297.185661] Modules linked in:
> > > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty
> > > #5
> > > [ 1297.195617] Hardware name: Xilinx Zynq Platform
> > > [ 1297.200158] Workqueue: events_long device_link_release_fn
> > > [ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
> > > [ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
> > > [ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
> > > [ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
> > > [ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
> > > [ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
> > > [ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
> > > [ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > > [ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > > [ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
> > > [ 1297.259497]  pointer from vsnprintf+0x20c/0x410
> > > [ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
> > > [ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
> > > [ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
> > > [ 1297.274503]  vprintk_default from _printk+0x2c/0x58
> > > [ 1297.274534]  _printk from kobject_put+0x8c/0x130
> > > [ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
> > > [ 1297.295245]  platform_device_release from device_release+0x30/0xa0
> > > [ 1297.301458]  device_release from kobject_put+0x8c/0x130
> > > [ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
> > > [ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > > [ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
> > > [ 1297.324684]  worker_thread from kthread+0xd0/0xec
> > > [ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
> > > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > > [ 1297.339200] 5fa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > [ 1297.362192] ---[ end trace 0000000000000000 ]---
> > > [ 1297.366805] ------------[ cut here ]------------
> > > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28
> > > fwnode_full_name_string+0x8c/0xa0
> > > [ 1297.379957] refcount_t: underflow; use-after-free.
> > > [ 1297.384739] Modules linked in:
> > > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > > W          6.3.1-dirty #5
> > > [ 1297.396136] Hardware name: Xilinx Zynq Platform
> > > [ 1297.400660] Workqueue: events_long device_link_release_fn
> > > [ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
> > > [ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
> > > [ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
> > > [ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
> > > [ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
> > > [ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > > [ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
> > > [ 1297.444867]  pointer from vsnprintf+0x20c/0x410
> > > [ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
> > > [ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
> > > [ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
> > > [ 1297.465056]  vprintk_default from _printk+0x2c/0x58
> > > [ 1297.469944]  _printk from kobject_put+0x8c/0x130
> > > [ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
> > > [ 1297.480507]  platform_device_release from device_release+0x30/0xa0
> > > [ 1297.486705]  device_release from kobject_put+0x8c/0x130
> > > [ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
> > > [ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > > [ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
> > > [ 1297.509828]  worker_thread from kthread+0xd0/0xec
> > > [ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
> > > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > > [ 1297.524308] 5fa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > [ 1297.547256] ---[ end trace 0000000000000000 ]---
> > > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, 
> > > /regulator-vref
> > > [ 1297.551874] ------------[ cut here ]------------
> > > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22
> > > kobject_get+0x88/0xa0
> > > [ 1297.551903] refcount_t: saturated; leaking memory.
> > > [ 1297.551908] Modules linked in:
> > > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > > W          6.3.1-dirty #5
> > > [ 1297.551931] Hardware name: Xilinx Zynq Platform
> > > [ 1297.551938] Workqueue: events_long device_link_release_fn
> > > [ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
> > > [ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
> > > [ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
> > > [ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
> > > [ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
> > > [ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
> > > [ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
> > > [ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > > [ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > > [ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
> > > [ 1297.552262]  pointer from vsnprintf+0x20c/0x410
> > > [ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
> > > [ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
> > > [ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
> > > [ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
> > > [ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
> > > [ 1297.552457]  vprintk_default from _printk+0x2c/0x58
> > > [ 1297.552483]  _printk from kobject_put+0x8c/0x130
> > > [ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
> > > [ 1297.552539]  platform_device_release from device_release+0x30/0xa0
> > > [ 1297.552571]  device_release from kobject_put+0x8c/0x130
> > > [ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
> > > [ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > > [ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
> > > [ 1297.552674]  worker_thread from kthread+0xd0/0xec
> > > [ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
> > > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > > [ 1297.552736] 5fa0:                                     00000000 00000000
> > > 00000000 00000000
> > > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000
> > > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > > [ 1297.552768] ---[ end trace 0000000000000000 ]---
> > > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, 
> > > /regulator-vdd-1-8
> > > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-
> > > axi/dummy_dev
> > > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, 
> > > /regulator-vio
> > > 
> > > the above is easily reproducible with a dummy platform device that just
> > > gets some regulators (devlinks are created at that point).
> > > 
> > > The culprit of the above is actually this line:
> > > 
> > > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366
> > > 
> > > This also makes me wonder if we should not have an extra patch just printing
> > > 'node->full_name' instead of '%pOF'. As seen, we should not make any
> > > assumptions
> > > (like parent's being around :)) at this stage.
> > > 
> > > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
> > > is already gone so if we ever reach of_node_release() we can proceed
> > > normally. Obviously, I'm not really sure about this and that's the whole
> > > reason
> > > why I'm bringing this as an RFC. This looks like a nasty corner case and I
> > > know
> > > that adding/removing devices dynamically is far from being easy to handle...
> > > 
> > > Also not sure if the driver core folks should be aware of this...
> > > 
> > > Lastly, one of workarounding this issue is by manually unbiding the device
> > > from the driver before removing the overlay which would (potentially) give
> > > time for the workers to run.
> > > 
> > > Nuno Sa (1):
> > >   of: dynamic: allow freeing of_nodes after the overlay changeset
> > > 
> > >  drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
> > >  include/linux/of.h   |  1 +
> > >  2 files changed, 28 insertions(+), 4 deletions(-)
> > > 
> > 
> > Very nice problem description, thanks.
> > 
> > You have found a real problem of how the devlink implementation did not
> > consider
> > the impact on overlays.  The overlay removal process does not expect any node
> > created by an overlay apply to exist at the moment that the overlay removal
> > completes (to be a little more pedantic, the removal process of deletes nodes
> > and
> > properties occurs in the exact opposite order that they were created.  Thus
> > even
> > partly through the overlay removal, a node is deleted before its parent is
> > deleted.  Also, all of the properties for any node deleted during the removal
> > are deleted (even if the node refcount does not reach zero).
> > 
> > The RFC patch does not attempt to fix the actual problem, it merely suppresses
> > reporting the symptom.
> > 
> > -Frank
> > 
> 
> Hi Frank,
> 
> Thanks for taking a look into this. I was afraid that this was to
> straightforward indeed. So I guess you're saying this is something that needs to
> be fixed at the devlink level?
> 
> +cc Greg and Rafael for some help and advice :)
> 
> So, I'd very much try to give this a proper fix but this is messing with very
> core components so any help would be welcome. The only think that comes to my
> mind is to have some kind of syncing between the moment (or even before) the
> work is queued and the moment the node is deleted. I guess this would have to
> happen at the kobject level...
> 
> Another piece of information is that this is an issue for platform devices and
> pretty much for any device that releases the node during it's .dev.release
> callback (which BTW, sound correct to me). For example, spi and i2c devices are
> releasing the node during the driver - device unbind and not when the device
> gets released. This actually gives time for things to settle before reaching the
> point of deleting the node (not saying this correct or should be the solution
> for this case).
> 
> - Nuno Sá

Pinging this one just to understand if this is something that we wanna fix at the
driver core level...

I'll wait a bit more to see if anyone has some comments that might already help me in
going in the right direction.


- Nuno Sá
Nuno Sá July 7, 2023, 1:58 p.m. UTC | #4
Hi Frank,

Sorry to bother again :)... See below

On Sun, 2023-05-14 at 16:20 -0500, Frank Rowand wrote:
> On 5/11/23 10:10, Nuno Sa wrote:
> > There are valid cases where we might get in here with an unexpected
> > refcount. One example are devlinks between suppliers <-> consumers.
> > When a link is created, it will grab a reference to both the supplier and
> > the consumer devices (which can, potentially, hold a reference to it's
> > of_node) and this reference is not synchronously dropped when unbinding the
> > device from the driver. Instead, a work item is queued (see
> > devlink_dev_release()). This async nature will make that
> > __of_changeset_entry_destroy() is reached with a refcount > 1. But,
> > eventually, all the refcounts are released and of_node_release() is
> > reached.
> > 
> > All the above will lead to leaks and the following dumps:
> > 
> > [ 1297.035424] OF: ERROR: memory leak, expected refcount 1 instead of 2,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /fpga-axi/dummy_dev
> > [ 1297.050763] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vio
> > [ 1297.065654] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vdd-1-8
> > [ 1297.080885] OF: ERROR: memory leak, expected refcount 1 instead of 3,
> > of_node_get()/of_node_put() unbalanced - destroy cset entry: attach overlay
> > node /regulator-vref
> > 
> >  1297.168367] ------------[ cut here ]------------
> > [ 1297.173000] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:25
> > kobject_get+0x9c/0xa0
> > [ 1297.180514] refcount_t: addition on 0; use-after-free.
> > [ 1297.185661] Modules linked in:
> > [ 1297.188727] CPU: 0 PID: 15340 Comm: kworker/0:2 Not tainted 6.3.1-dirty
> > #5
> > [ 1297.195617] Hardware name: Xilinx Zynq Platform
> > [ 1297.200158] Workqueue: events_long device_link_release_fn
> > [ 1297.205600]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.210880]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.215983]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.220816]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.220857]  warn_slowpath_fmt from kobject_get+0x9c/0xa0
> > [ 1297.220892]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.236279]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.236324]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.247455]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.247488]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.259497]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.264063]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.269176]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.274454]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.274503]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.274534]  _printk from kobject_put+0x8c/0x130
> > [ 1297.289299]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.295245]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.301458]  device_release from kobject_put+0x8c/0x130
> > [ 1297.306718]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.312588]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.319079]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.324684]  worker_thread from kthread+0xd0/0xec
> > [ 1297.329421]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.334153] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.339200] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.347386] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.355577] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.362192] ---[ end trace 0000000000000000 ]---
> > [ 1297.366805] ------------[ cut here ]------------
> > [ 1297.371416] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:28
> > fwnode_full_name_string+0x8c/0xa0
> > [ 1297.379957] refcount_t: underflow; use-after-free.
> > [ 1297.384739] Modules linked in:
> > [ 1297.387789] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.396136] Hardware name: Xilinx Zynq Platform
> > [ 1297.400660] Workqueue: events_long device_link_release_fn
> > [ 1297.406079]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.411319]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.416389]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.421206]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.426361]  warn_slowpath_fmt from fwnode_full_name_string+0x8c/0xa0
> > [ 1297.432820]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.439537]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.444867]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.449406]  vsnprintf from vprintk_store+0x12c/0x430
> > [ 1297.454484]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.459727]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.465056]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.469944]  _printk from kobject_put+0x8c/0x130
> > [ 1297.474570]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.480507]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.486705]  device_release from kobject_put+0x8c/0x130
> > [ 1297.491947]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.497798]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.504256]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.509828]  worker_thread from kthread+0xd0/0xec
> > [ 1297.514550]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.519263] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.524308] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.532478] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.540653] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.547256] ---[ end trace 0000000000000000 ]---
> > [ 1297.168353] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vref
> > [ 1297.551874] ------------[ cut here ]------------
> > [ 1297.551880] WARNING: CPU: 0 PID: 15340 at lib/refcount.c:22
> > kobject_get+0x88/0xa0
> > [ 1297.551903] refcount_t: saturated; leaking memory.
> > [ 1297.551908] Modules linked in:
> > [ 1297.551918] CPU: 0 PID: 15340 Comm: kworker/0:2 Tainted: G       
> > W          6.3.1-dirty #5
> > [ 1297.551931] Hardware name: Xilinx Zynq Platform
> > [ 1297.551938] Workqueue: events_long device_link_release_fn
> > [ 1297.551965]  unwind_backtrace from show_stack+0x10/0x14
> > [ 1297.551994]  show_stack from dump_stack_lvl+0x40/0x4c
> > [ 1297.552023]  dump_stack_lvl from __warn+0x78/0x124
> > [ 1297.552059]  __warn from warn_slowpath_fmt+0x134/0x18c
> > [ 1297.552092]  warn_slowpath_fmt from kobject_get+0x88/0xa0
> > [ 1297.552123]  kobject_get from of_node_get+0x14/0x1c
> > [ 1297.552146]  of_node_get from of_fwnode_get+0x34/0x40
> > [ 1297.552173]  of_fwnode_get from fwnode_full_name_string+0x34/0xa0
> > [ 1297.552204]  fwnode_full_name_string from device_node_string+0x5a8/0x750
> > [ 1297.552234]  device_node_string from pointer+0x3d0/0x67c
> > [ 1297.552262]  pointer from vsnprintf+0x20c/0x410
> > [ 1297.552289]  vsnprintf from vscnprintf+0x10/0x24
> > [ 1297.552317]  vscnprintf from printk_sprint+0x18/0xf4
> > [ 1297.552353]  printk_sprint from vprintk_store+0x2d0/0x430
> > [ 1297.552388]  vprintk_store from vprintk_emit+0xe0/0x23c
> > [ 1297.552422]  vprintk_emit from vprintk_default+0x20/0x28
> > [ 1297.552457]  vprintk_default from _printk+0x2c/0x58
> > [ 1297.552483]  _printk from kobject_put+0x8c/0x130
> > [ 1297.552507]  kobject_put from platform_device_release+0x10/0x3c
> > [ 1297.552539]  platform_device_release from device_release+0x30/0xa0
> > [ 1297.552571]  device_release from kobject_put+0x8c/0x130
> > [ 1297.552599]  kobject_put from device_link_release_fn+0x54/0xa8
> > [ 1297.552630]  device_link_release_fn from process_one_work+0x1fc/0x4c8
> > [ 1297.552657]  process_one_work from worker_thread+0x50/0x54c
> > [ 1297.552674]  worker_thread from kthread+0xd0/0xec
> > [ 1297.552701]  kthread from ret_from_fork+0x14/0x2c
> > [ 1297.552724] Exception stack(0xe0b95fb0 to 0xe0b95ff8)
> > [ 1297.552736] 5fa0:                                     00000000 00000000
> > 00000000 00000000
> > [ 1297.552749] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000
> > 00000000 00000000
> > [ 1297.552760] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [ 1297.552768] ---[ end trace 0000000000000000 ]---
> > [ 1297.769464] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vdd-1-8
> > [ 1297.777337] OF: ERROR: memory leak before free overlay changeset,  /fpga-
> > axi/dummy_dev
> > [ 1297.777360] OF: ERROR: memory leak before free overlay changeset, 
> > /regulator-vio
> > 
> > the above is easily reproducible with a dummy platform device that just
> > gets some regulators (devlinks are created at that point).
> > 
> > The culprit of the above is actually this line:
> > 
> > https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/of/dynamic.c#L366
> > 
> > This also makes me wonder if we should not have an extra patch just printing
> > 'node->full_name' instead of '%pOF'. As seen, we should not make any
> > assumptions
> > (like parent's being around :)) at this stage.
> > 
> > To fix the issue, I'm adding a new OVERLAY flag to inform that the changeset
> > is already gone so if we ever reach of_node_release() we can proceed
> > normally. Obviously, I'm not really sure about this and that's the whole
> > reason
> > why I'm bringing this as an RFC. This looks like a nasty corner case and I
> > know
> > that adding/removing devices dynamically is far from being easy to handle...
> > 
> > Also not sure if the driver core folks should be aware of this...
> > 
> > Lastly, one of workarounding this issue is by manually unbiding the device
> > from the driver before removing the overlay which would (potentially) give
> > time for the workers to run.
> > 
> > Nuno Sa (1):
> >   of: dynamic: allow freeing of_nodes after the overlay changeset
> > 
> >  drivers/of/dynamic.c | 31 +++++++++++++++++++++++++++----
> >  include/linux/of.h   |  1 +
> >  2 files changed, 28 insertions(+), 4 deletions(-)
> > 
> 
> Very nice problem description, thanks.
> 
> You have found a real problem of how the devlink implementation did not
> consider
> the impact on overlays.  The overlay removal process does not expect any node
> created by an overlay apply to exist at the moment that the overlay removal
> completes (to be a little more pedantic, the removal process of deletes nodes
> and
> properties occurs in the exact opposite order that they were created.  Thus
> even
> partly through the overlay removal, a node is deleted before its parent is
> deleted.  Also, all of the properties for any node deleted during the removal
> are deleted (even if the node refcount does not reach zero).
> 
> The RFC patch does not attempt to fix the actual problem, it merely suppresses
> reporting the symptom.
> 

So I agree this patch was not fixing the real problem and was clumsy and I'm
aware the following is also kind of a workaround but hopefully one more
acceptable:


diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index cd3821a6444f..550370555bf7 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -461,11 +461,23 @@ struct device_node *__of_node_dup(const struct device_node
*np,
        return NULL;
 }

+static unsigned int __of_check_kref(struct kref *kref)
+{
+       unsigned int cnt = kref_read(kref);
+
+       if (cnt <= 1)
+               return 0;
+
+       msleep(50);
+
+       return kref_read(kref);
+}
+
 static void __of_changeset_entry_destroy(struct of_changeset_entry *ce)
 {
        if (ce->action == OF_RECONFIG_ATTACH_NODE &&
            of_node_check_flag(ce->np, OF_OVERLAY)) {
-               if (kref_read(&ce->np->kobj.kref) > 1) {
+               if (__of_check_kref(&ce->np->kobj.kref) > 1) {
                        pr_err("ERROR: memory leak, expected refcount 1 instead
of %d, of_node_get()/of_node_put() unbalanced - destroy cset entry: attach
overlay node %pOF\n",
                               kref_read(&ce->np->kobj.kref), ce->np);
                } else {

----

This would still need a proper comment before the sleep and maybe better fine
tune the sleep time (and actually test it :)). 

The thing is, I'm guessing the dev_link stuff is async for some reason and I'm
fairly sure that changing it to a synchronous fashion is likely a NAK. So the
only other thing that comes to my mind would be some kind of synchronization at
the kobject level which would probably feel a bit hacky. Anyways, bottom line,
in the overlay code path (and assuming the async nature won't change) we would
still need to do some kind of waiting and probably with some timeout (so we
don't wait indefinitely). Hence, the above patch does not feel that bad as we
are anyways doing some waiting to see if all the references where released in
the meantime. And yes, if after X time, we still have kref > 1, then possible
there's a real bug of someone taking a lot of time to release the reference (or
not releasing at all).

Would the above patch be something more acceptable?

- Nuno Sá