From patchwork Wed Aug 12 09:52:16 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 52331 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f200.google.com (mail-wi0-f200.google.com [209.85.212.200]) by patches.linaro.org (Postfix) with ESMTPS id 86C4C22ED0 for ; Wed, 12 Aug 2015 09:52:35 +0000 (UTC) Received: by wicja10 with SMTP id ja10sf4842857wic.2 for ; Wed, 12 Aug 2015 02:52:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:sender:precedence:list-id:x-original-sender :x-original-authentication-results:mailing-list:list-post:list-help :list-archive:list-unsubscribe; bh=xQDQB5Lwd/mqROea0LHNI5SKwyZGrEINJTyI3PUvs2M=; b=JEFFJfOp+xzIIjbQHM45gVQjFld6HTQJqsWWYwuHifDOaGMdQoNcGERpSz0FW39fHD +BvRjLenSokfe28Nyb8f4C5SoHkahuOIBC9hrQHdlbJPQ3UWREbJc7S5jgGLJ4wvMgNu oWu2bj6MK5WjF7kr1ykCgoimwUsU+9c0AUb+GP8gBzXeP0xXz0Ghu+eYckpVISzMdDlM FT+7xovw4PxtcmoMMOxRUH3BwSsdtcDYuxKbK6//vmzYLyso8eePr6dJPNg/529gX8fe M22VwFwXH5e1vq1qSGtaoMJctDqjmHg1OQLZWHjd5GMsGxQdiC/+EdCb7yyLAaI55ADR H4Vg== X-Gm-Message-State: ALoCoQktuZuBNjX7w+y9mic6xsR6/vQnA05R5xE3Rpzpkq+0Ka+ILuIIxuEXtnmBaG+eOiIVCyUW X-Received: by 10.152.115.196 with SMTP id jq4mr9331609lab.1.1439373154824; Wed, 12 Aug 2015 02:52:34 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.204.40 with SMTP id kv8ls20780lac.94.gmail; Wed, 12 Aug 2015 02:52:34 -0700 (PDT) X-Received: by 10.152.238.2 with SMTP id vg2mr31887757lac.12.1439373154514; Wed, 12 Aug 2015 02:52:34 -0700 (PDT) Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com. [209.85.217.171]) by mx.google.com with ESMTPS id h8si3747114laa.5.2015.08.12.02.52.34 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Aug 2015 02:52:34 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) client-ip=209.85.217.171; Received: by lbcbn3 with SMTP id bn3so6400292lbc.2 for ; Wed, 12 Aug 2015 02:52:34 -0700 (PDT) X-Received: by 10.112.129.137 with SMTP id nw9mr25005669lbb.19.1439373153957; Wed, 12 Aug 2015 02:52:33 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.7.198 with SMTP id l6csp213287lba; Wed, 12 Aug 2015 02:52:32 -0700 (PDT) X-Received: by 10.68.195.231 with SMTP id ih7mr66692757pbc.26.1439373152387; Wed, 12 Aug 2015 02:52:32 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ph5si1132973pbb.248.2015.08.12.02.52.31; Wed, 12 Aug 2015 02:52:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934372AbbHLJw3 (ORCPT + 28 others); Wed, 12 Aug 2015 05:52:29 -0400 Received: from mail-pd0-f178.google.com ([209.85.192.178]:35617 "EHLO mail-pd0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751643AbbHLJwY (ORCPT ); Wed, 12 Aug 2015 05:52:24 -0400 Received: by pdrg1 with SMTP id g1so5715753pdr.2 for ; Wed, 12 Aug 2015 02:52:23 -0700 (PDT) X-Received: by 10.70.130.109 with SMTP id od13mr65514389pdb.165.1439373143335; Wed, 12 Aug 2015 02:52:23 -0700 (PDT) Received: from localhost ([122.171.186.190]) by smtp.gmail.com with ESMTPSA id g10sm5787918pat.35.2015.08.12.02.52.21 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Wed, 12 Aug 2015 02:52:22 -0700 (PDT) From: Viresh Kumar To: edubezval@gmail.com Cc: linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org, Rafael Wysocki , linux-arm-kernel@lists.infradead.org, Russell King , Viresh Kumar , linux-kernel@vger.kernel.org (open list), Zhang Rui Subject: [PATCH V2] thermal: cpu_cooling: fix lockdep problems in cpu_cooling Date: Wed, 12 Aug 2015 15:22:16 +0530 Message-Id: <16e7d7d811fa2b0f8dcf6cade345a5f2f3e3c36a.1439372926.git.viresh.kumar@linaro.org> X-Mailer: git-send-email 2.4.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: viresh.kumar@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Russell King A recent change to the cpu_cooling code introduced a AB-BA deadlock scenario between the cpufreq_policy_notifier_list rwsem and the cooling_cpufreq_lock. This is caused by cooling_cpufreq_lock being held before the registration/removal of the notifier block (an operation which takes the rwsem), and the notifier code itself which takes the locks in the reverse order: ====================================================== [ INFO: possible circular locking dependency detected ] 3.18.0+ #1453 Not tainted ------------------------------------------------------- rc.local/770 is trying to acquire lock: (cooling_cpufreq_lock){+.+.+.}, at: [] cpufreq_thermal_notifier+0x34/0xfc but task is already holding lock: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x34/0x68 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}: [] down_write+0x44/0x9c [] blocking_notifier_chain_register+0x28/0xd8 [] cpufreq_register_notifier+0x68/0x90 [] __cpufreq_cooling_register.part.1+0x120/0x180 [] __cpufreq_cooling_register+0x98/0xa4 [] cpufreq_cooling_register+0x18/0x1c [] imx_thermal_probe+0x1c0/0x470 [imx_thermal] [] platform_drv_probe+0x50/0xac [] driver_probe_device+0x114/0x234 [] __driver_attach+0x9c/0xa0 [] bus_for_each_dev+0x5c/0x90 [] driver_attach+0x24/0x28 [] bus_add_driver+0xe0/0x1d8 [] driver_register+0x80/0xfc [] __platform_driver_register+0x50/0x64 [] 0xbf007018 [] do_one_initcall+0x88/0x1d8 [] load_module+0x1768/0x1ef8 [] SyS_init_module+0xe0/0xf4 [] ret_fast_syscall+0x0/0x48 -> #0 (cooling_cpufreq_lock){+.+.+.}: [] lock_acquire+0xb0/0x124 [] mutex_lock_nested+0x5c/0x3d8 [] cpufreq_thermal_notifier+0x34/0xfc [] notifier_call_chain+0x4c/0x8c [] __blocking_notifier_call_chain+0x50/0x68 [] blocking_notifier_call_chain+0x20/0x28 [] cpufreq_set_policy+0x7c/0x1d0 [] store_scaling_governor+0x74/0x9c [] store+0x90/0xc0 [] sysfs_kf_write+0x54/0x58 [] kernfs_fop_write+0xdc/0x190 [] vfs_write+0xac/0x1b4 [] SyS_write+0x44/0x90 [] ret_fast_syscall+0x0/0x48 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); lock((cpufreq_policy_notifier_list).rwsem); lock(cooling_cpufreq_lock); *** DEADLOCK *** 7 locks held by rc.local/770: #0: (sb_writers#6){.+.+.+}, at: [] vfs_write+0x18c/0x1b4 #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0xa0/0x190 #2: (s_active#52){.+.+.+}, at: [] kernfs_fop_write+0xa8/0x190 #3: (cpu_hotplug.lock){++++++}, at: [] get_online_cpus+0x34/0x90 #4: (cpufreq_rwsem){.+.+.+}, at: [] store+0x58/0xc0 #5: (&policy->rwsem){+.+.+.}, at: [] store+0x70/0xc0 #6: ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [] __blocking_notifier_call_chain+0x34/0x68 stack backtrace: CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [] (dump_backtrace) from [] (show_stack+0x18/0x1c) r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000 [] (show_stack) from [] (dump_stack+0x7c/0x98) [] (dump_stack) from [] (print_circular_bug+0x28c/0x2d8) r4:c0b85a80 r3:d0071d40 [] (print_circular_bug) from [] (__lock_acquire+0x1acc/0x1bb0) r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240 [] (__lock_acquire) from [] (lock_acquire+0xb0/0x124) r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c r4:00000000 [] (lock_acquire) from [] (mutex_lock_nested+0x5c/0x3d8) r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000 r4:c04abfc4 [] (mutex_lock_nested) from [] (cpufreq_thermal_notifier+0x34/0xfc) r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000 r4:fffffffe [] (cpufreq_thermal_notifier) from [] (notifier_call_chain+0x4c/0x8c) r7:00000000 r6:00000000 r5:00000000 r4:fffffffe [] (notifier_call_chain) from [] (__blocking_notifier_call_chain+0x50/0x68) r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff [] (__blocking_notifier_call_chain) from [] (blocking_notifier_call_chain+0x20/0x28) r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c [] (blocking_notifier_call_chain) from [] (cpufreq_set_policy+0x7c/0x1d0) [] (cpufreq_set_policy) from [] (store_scaling_governor+0x74/0x9c) r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600 [] (store_scaling_governor) from [] (store+0x90/0xc0) r6:0000000c r5:ed76e6d4 r4:ed76e600 [] (store) from [] (sysfs_kf_write+0x54/0x58) r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c [] (sysfs_kf_write) from [] (kernfs_fop_write+0xdc/0x190) r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330 [] (kernfs_fop_write) from [] (vfs_write+0xac/0x1b4) r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c r4:eccde500 [] (vfs_write) from [] (SyS_write+0x44/0x90) r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000 [] (SyS_write) from [] (ret_fast_syscall+0x0/0x48) r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70 Solve this by moving to finer grained locking - use one mutex to protect the cpufreq_dev_list as a whole, and a separate lock to ensure correct ordering of cpufreq notifier registration and removal. cooling_list_lock is taken within cooling_cpufreq_lock on (un)registration to preserve the behavior of the code, i.e. to atomically add/remove to the list and (un)register the notifier. Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with Reviewed-by: Viresh Kumar Signed-off-by: Russell King Signed-off-by: Viresh Kumar --- Hi Eduardo/Zhang, This is 4.2-rc material, please apply soon. V1->V2: - V1 was posted by Russell this morning and I made few minor changes to it. - Some sections (specially the Rant) of the commit log are dropped and the last one is updated, as we do take cooling_list_lock within cooling_cpufreq_lock. - Above is updated in code during unregister sequence - cooling_list_lock taken at few more places - Minor reorganization of the code drivers/thermal/cpu_cooling.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index 6509c61b9648..5ae0524bed19 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -107,6 +107,9 @@ struct cpufreq_cooling_device { static DEFINE_IDR(cpufreq_idr); static DEFINE_MUTEX(cooling_cpufreq_lock); +static unsigned int cpufreq_dev_count; + +static DEFINE_MUTEX(cooling_list_lock); static LIST_HEAD(cpufreq_dev_list); /** @@ -185,14 +188,14 @@ unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq) { struct cpufreq_cooling_device *cpufreq_dev; - mutex_lock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { if (cpumask_test_cpu(cpu, &cpufreq_dev->allowed_cpus)) { - mutex_unlock(&cooling_cpufreq_lock); + mutex_unlock(&cooling_list_lock); return get_level(cpufreq_dev, freq); } } - mutex_unlock(&cooling_cpufreq_lock); + mutex_unlock(&cooling_list_lock); pr_err("%s: cpu:%d not part of any cooling device\n", __func__, cpu); return THERMAL_CSTATE_INVALID; @@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, switch (event) { case CPUFREQ_ADJUST: - mutex_lock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) { if (!cpumask_test_cpu(policy->cpu, &cpufreq_dev->allowed_cpus)) @@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb, cpufreq_verify_within_limits(policy, 0, max_freq); } - mutex_unlock(&cooling_cpufreq_lock); + mutex_unlock(&cooling_list_lock); break; default: return NOTIFY_DONE; @@ -866,12 +869,14 @@ __cpufreq_cooling_register(struct device_node *np, mutex_lock(&cooling_cpufreq_lock); + mutex_lock(&cooling_list_lock); + list_add(&cpufreq_dev->node, &cpufreq_dev_list); + mutex_unlock(&cooling_list_lock); + /* Register the notifier for first cpufreq cooling device */ - if (list_empty(&cpufreq_dev_list)) + if (!cpufreq_dev_count++) cpufreq_register_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); - list_add(&cpufreq_dev->node, &cpufreq_dev_list); - mutex_unlock(&cooling_cpufreq_lock); return cool_dev; @@ -1013,13 +1018,17 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) return; cpufreq_dev = cdev->devdata; - mutex_lock(&cooling_cpufreq_lock); - list_del(&cpufreq_dev->node); /* Unregister the notifier for the last cpufreq cooling device */ - if (list_empty(&cpufreq_dev_list)) + mutex_lock(&cooling_cpufreq_lock); + if (!--cpufreq_dev_count) cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block, CPUFREQ_POLICY_NOTIFIER); + + mutex_lock(&cooling_list_lock); + list_del(&cpufreq_dev->node); + mutex_unlock(&cooling_list_lock); + mutex_unlock(&cooling_cpufreq_lock); thermal_cooling_device_unregister(cpufreq_dev->cool_dev);