Message ID | 20250529-icc-bw-lockdep-v1-1-3d714b6a9374@gmail.com |
---|---|
State | New |
Headers | show |
Series | interconnect: avoid memory allocation when 'icc_bw_lock' is held | expand |
Hello Bryan, Sorry for the late reply, I missed your mail. 2025. 05. 30. 11:16 keltezéssel, Bryan O'Donoghue írta: > On 29/05/2025 15:46, Gabor Juhos wrote: >> The 'icc_bw_lock' mutex is introduced in commit af42269c3523 >> ("interconnect: Fix locking for runpm vs reclaim") in order >> to decouple serialization of bw aggregation from codepaths >> that require memory allocation. >> >> However commit d30f83d278a9 ("interconnect: core: Add dynamic >> id allocation support") added a devm_kasprintf() call into a >> path protected by the 'icc_bw_lock' which causes this lockdep >> warning (at least on the IPQ9574 platform): > > Missing a Fixes tag. Erm, it is before my s-o-b tag. ... >> Move the memory allocation part of the code outside of the protected >> path to eliminate the warning. Also add a note about why it is moved >> to there, >> >> Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support") >> Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> >> --- >> drivers/interconnect/core.c | 14 ++++++++++---- >> 1 file changed, 10 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c >> index >> 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644 >> --- a/drivers/interconnect/core.c >> +++ b/drivers/interconnect/core.c >> @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct >> icc_provider *provider) >> return; >> >> mutex_lock(&icc_lock); >> + >> + if (node->id >= ICC_DYN_ID_START) { >> + /* >> + * Memory allocation must be done outside of codepaths >> + * protected by icc_bw_lock. >> + */ >> + node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s", >> + node->name, dev_name(provider->dev)); >> + } >> + >> mutex_lock(&icc_bw_lock); >> >> node->provider = provider; >> @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct >> icc_provider *provider) >> node->avg_bw = node->init_avg; >> node->peak_bw = node->init_peak; >> >> - if (node->id >= ICC_DYN_ID_START) >> - node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s", >> - node->name, dev_name(provider->dev)); >> - >> if (node->avg_bw || node->peak_bw) { >> if (provider->pre_aggregate) >> provider->pre_aggregate(node); >> >> --- >> base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2 >> change-id: 20250529-icc-bw-lockdep-ed030d892a19 >> >> Best regards, >> -- >> Gabor Juhos <j4g8y7@gmail.com> >> >> > > The locking in this code is a mess. > > Which data-structures does icc_lock protect node* pointers I think and which > data-structures does icc_bw_lock protect - "bw" data structures ? > > Hmm. > > Looking at this code I'm not sure at all what icc_lock was introduced to do. Initially, only the 'icc_lock' mutex was here, and that protected 'everything'. The 'icc_bw_lock' has been introduced later by commit af42269c3523 ("interconnect: Fix locking for runpm vs reclaim") as part of the "drm/msm+PM+icc: Make job_run() reclaim-safe" series [1]. Here is the reason copied from the original commit message: "For cases where icc_bw_set() can be called in callbaths that could deadlock against shrinker/reclaim, such as runpm resume, we need to decouple the icc locking. Introduce a new icc_bw_lock for cases where we need to serialize bw aggregation and update to decouple that from paths that require memory allocation such as node/link creation/ destruction." > Can we not just drop it entirely ? I'm not an expert in locking, but I doubt that we can easily drop any of the two mutexes without reintroducing the problem fixed by the change mentioned above. [1] https://lore.kernel.org/all/20230807171148.210181-1-robdclark@gmail.com/ Regards, Gabor
diff --git a/drivers/interconnect/core.c b/drivers/interconnect/core.c index 1a41e59c77f85a811f78986e98401625f4cadfa3..acdb3b8f1e54942dbb1b71ec2b170b08ad709e6b 100644 --- a/drivers/interconnect/core.c +++ b/drivers/interconnect/core.c @@ -1023,6 +1023,16 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider) return; mutex_lock(&icc_lock); + + if (node->id >= ICC_DYN_ID_START) { + /* + * Memory allocation must be done outside of codepaths + * protected by icc_bw_lock. + */ + node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s", + node->name, dev_name(provider->dev)); + } + mutex_lock(&icc_bw_lock); node->provider = provider; @@ -1038,10 +1048,6 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider) node->avg_bw = node->init_avg; node->peak_bw = node->init_peak; - if (node->id >= ICC_DYN_ID_START) - node->name = devm_kasprintf(provider->dev, GFP_KERNEL, "%s@%s", - node->name, dev_name(provider->dev)); - if (node->avg_bw || node->peak_bw) { if (provider->pre_aggregate) provider->pre_aggregate(node);
The 'icc_bw_lock' mutex is introduced in commit af42269c3523 ("interconnect: Fix locking for runpm vs reclaim") in order to decouple serialization of bw aggregation from codepaths that require memory allocation. However commit d30f83d278a9 ("interconnect: core: Add dynamic id allocation support") added a devm_kasprintf() call into a path protected by the 'icc_bw_lock' which causes this lockdep warning (at least on the IPQ9574 platform): ====================================================== WARNING: possible circular locking dependency detected 6.15.0-next-20250529 #0 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: ffffffc081df57d8 (icc_bw_lock){+.+.}-{4:4}, at: icc_init+0x8/0x108 but task is already holding lock: ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x7c/0xb8 slab_alloc_node.isra.0+0x48/0x188 __kmalloc_node_track_caller_noprof+0xa4/0x2b8 devm_kmalloc+0x5c/0x138 devm_kvasprintf+0x6c/0xb8 devm_kasprintf+0x50/0x68 icc_node_add+0xbc/0x160 icc_clk_register+0x15c/0x230 devm_icc_clk_register+0x20/0x90 qcom_cc_really_probe+0x320/0x338 nss_cc_ipq9574_probe+0xac/0x1e8 platform_probe+0x70/0xd0 really_probe+0xdc/0x3b8 __driver_probe_device+0x94/0x178 driver_probe_device+0x48/0xf0 __driver_attach+0x13c/0x208 bus_for_each_dev+0x6c/0xb8 driver_attach+0x2c/0x40 bus_add_driver+0x100/0x250 driver_register+0x68/0x138 __platform_driver_register+0x2c/0x40 nss_cc_ipq9574_driver_init+0x24/0x38 do_one_initcall+0x88/0x340 kernel_init_freeable+0x2ac/0x4f8 kernel_init+0x28/0x1e8 ret_from_fork+0x10/0x20 -> #0 (icc_bw_lock){+.+.}-{4:4}: __lock_acquire+0x1348/0x2090 lock_acquire+0x108/0x2d8 icc_init+0x50/0x108 do_one_initcall+0x88/0x340 kernel_init_freeable+0x2ac/0x4f8 kernel_init+0x28/0x1e8 ret_from_fork+0x10/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(icc_bw_lock); lock(fs_reclaim); lock(icc_bw_lock); *** DEADLOCK *** 1 lock held by swapper/0/1: #0: ffffffc081d7db10 (fs_reclaim){+.+.}-{0:0}, at: icc_init+0x28/0x108 stack backtrace: CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-next-20250529 #0 NONE Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C7 (DT) Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x28 print_circular_bug+0x334/0x448 check_noncircular+0x12c/0x140 __lock_acquire+0x1348/0x2090 lock_acquire+0x108/0x2d8 icc_init+0x50/0x108 do_one_initcall+0x88/0x340 kernel_init_freeable+0x2ac/0x4f8 kernel_init+0x28/0x1e8 ret_from_fork+0x10/0x20 Move the memory allocation part of the code outside of the protected path to eliminate the warning. Also add a note about why it is moved to there, Fixes: d30f83d278a9 ("interconnect: core: Add dynamic id allocation support") Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> --- drivers/interconnect/core.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) --- base-commit: 5fed7fe33c2cd7104fc87b7bc699a7be892befa2 change-id: 20250529-icc-bw-lockdep-ed030d892a19 Best regards,