From patchwork Wed Aug 12 09:52:16 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Viresh Kumar <viresh.kumar@linaro.org>
X-Patchwork-Id: 52331
Return-Path: <patchwork-forward+bncBCRIZROPVQPBBYVOVSXAKGQEKC7XZEA@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-wi0-f200.google.com (mail-wi0-f200.google.com
 [209.85.212.200])
 by patches.linaro.org (Postfix) with ESMTPS id 86C4C22ED0
 for <linaro@patches.linaro.org>; Wed, 12 Aug 2015 09:52:35 +0000 (UTC)
Received: by wicja10 with SMTP id ja10sf4842857wic.2
 for <linaro@patches.linaro.org>; Wed, 12 Aug 2015 02:52:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:sender:precedence:list-id:x-original-sender
 :x-original-authentication-results:mailing-list:list-post:list-help
 :list-archive:list-unsubscribe;
 bh=xQDQB5Lwd/mqROea0LHNI5SKwyZGrEINJTyI3PUvs2M=;
 b=JEFFJfOp+xzIIjbQHM45gVQjFld6HTQJqsWWYwuHifDOaGMdQoNcGERpSz0FW39fHD
 +BvRjLenSokfe28Nyb8f4C5SoHkahuOIBC9hrQHdlbJPQ3UWREbJc7S5jgGLJ4wvMgNu
 oWu2bj6MK5WjF7kr1ykCgoimwUsU+9c0AUb+GP8gBzXeP0xXz0Ghu+eYckpVISzMdDlM
 FT+7xovw4PxtcmoMMOxRUH3BwSsdtcDYuxKbK6//vmzYLyso8eePr6dJPNg/529gX8fe
 M22VwFwXH5e1vq1qSGtaoMJctDqjmHg1OQLZWHjd5GMsGxQdiC/+EdCb7yyLAaI55ADR
 H4Vg==
X-Gm-Message-State: ALoCoQktuZuBNjX7w+y9mic6xsR6/vQnA05R5xE3Rpzpkq+0Ka+ILuIIxuEXtnmBaG+eOiIVCyUW
X-Received: by 10.152.115.196 with SMTP id jq4mr9331609lab.1.1439373154824; 
 Wed, 12 Aug 2015 02:52:34 -0700 (PDT)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.204.40 with SMTP id kv8ls20780lac.94.gmail; Wed, 12 Aug
 2015 02:52:34 -0700 (PDT)
X-Received: by 10.152.238.2 with SMTP id vg2mr31887757lac.12.1439373154514; 
 Wed, 12 Aug 2015 02:52:34 -0700 (PDT)
Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com.
 [209.85.217.171])
 by mx.google.com with ESMTPS id h8si3747114laa.5.2015.08.12.02.52.34
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 12 Aug 2015 02:52:34 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.217.171 as permitted sender) client-ip=209.85.217.171; 
Received: by lbcbn3 with SMTP id bn3so6400292lbc.2
 for <patchwork-forward@linaro.org>;
 Wed, 12 Aug 2015 02:52:34 -0700 (PDT)
X-Received: by 10.112.129.137 with SMTP id nw9mr25005669lbb.19.1439373153957; 
 Wed, 12 Aug 2015 02:52:33 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.7.198 with SMTP id l6csp213287lba;
 Wed, 12 Aug 2015 02:52:32 -0700 (PDT)
X-Received: by 10.68.195.231 with SMTP id ih7mr66692757pbc.26.1439373152387; 
 Wed, 12 Aug 2015 02:52:32 -0700 (PDT)
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 ph5si1132973pbb.248.2015.08.12.02.52.31; 
 Wed, 12 Aug 2015 02:52:32 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S934372AbbHLJw3 (ORCPT <rfc822;matthew.hart@linaro.org>
 + 28 others); Wed, 12 Aug 2015 05:52:29 -0400
Received: from mail-pd0-f178.google.com ([209.85.192.178]:35617 "EHLO
 mail-pd0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1751643AbbHLJwY (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 12 Aug 2015 05:52:24 -0400
Received: by pdrg1 with SMTP id g1so5715753pdr.2
 for <linux-kernel@vger.kernel.org>;
 Wed, 12 Aug 2015 02:52:23 -0700 (PDT)
X-Received: by 10.70.130.109 with SMTP id od13mr65514389pdb.165.1439373143335; 
 Wed, 12 Aug 2015 02:52:23 -0700 (PDT)
Received: from localhost ([122.171.186.190]) by smtp.gmail.com with ESMTPSA id
 g10sm5787918pat.35.2015.08.12.02.52.21
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Wed, 12 Aug 2015 02:52:22 -0700 (PDT)
From: Viresh Kumar <viresh.kumar@linaro.org>
To: edubezval@gmail.com
Cc: linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org,
 Rafael Wysocki <rjw@rjwysocki.net>, linux-arm-kernel@lists.infradead.org,
 Russell King <rmk+kernel@arm.linux.org.uk>,
 Viresh Kumar <viresh.kumar@linaro.org>,
 linux-kernel@vger.kernel.org (open list), Zhang Rui <rui.zhang@intel.com>
Subject: [PATCH V2] thermal: cpu_cooling: fix lockdep problems in cpu_cooling
Date: Wed, 12 Aug 2015 15:22:16 +0530
Message-Id: <16e7d7d811fa2b0f8dcf6cade345a5f2f3e3c36a.1439372926.git.viresh.kumar@linaro.org>
X-Mailer: git-send-email 2.4.0
Sender: linux-kernel-owner@vger.kernel.org
Precedence: list
List-ID: <patchwork-forward.linaro.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: viresh.kumar@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.217.171 as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>

From: Russell King <rmk+kernel@arm.linux.org.uk>

A recent change to the cpu_cooling code introduced a AB-BA deadlock
scenario between the cpufreq_policy_notifier_list rwsem and the
cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
before the registration/removal of the notifier block (an operation
which takes the rwsem), and the notifier code itself which takes the
locks in the reverse order:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0+ #1453 Not tainted
-------------------------------------------------------
rc.local/770 is trying to acquire lock:
 (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc

but task is already holding lock:
 ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]  __blocking_notifier_call_chain+0x34/0x68

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}:
       [<c06bc3b0>] down_write+0x44/0x9c
       [<c0043444>] blocking_notifier_chain_register+0x28/0xd8
       [<c04ad610>] cpufreq_register_notifier+0x68/0x90
       [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180
       [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4
       [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c
       [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal]
       [<c037cef8>] platform_drv_probe+0x50/0xac
       [<c037b710>] driver_probe_device+0x114/0x234
       [<c037b8cc>] __driver_attach+0x9c/0xa0
       [<c0379d68>] bus_for_each_dev+0x5c/0x90
       [<c037b204>] driver_attach+0x24/0x28
       [<c037ae7c>] bus_add_driver+0xe0/0x1d8
       [<c037c0cc>] driver_register+0x80/0xfc
       [<c037cd80>] __platform_driver_register+0x50/0x64
       [<bf007018>] 0xbf007018
       [<c0008a5c>] do_one_initcall+0x88/0x1d8
       [<c0095da4>] load_module+0x1768/0x1ef8
       [<c0096614>] SyS_init_module+0xe0/0xf4
       [<c000ec00>] ret_fast_syscall+0x0/0x48

-> #0 (cooling_cpufreq_lock){+.+.+.}:
       [<c00619f8>] lock_acquire+0xb0/0x124
       [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8
       [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
       [<c0042bf4>] notifier_call_chain+0x4c/0x8c
       [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68
       [<c0042f58>] blocking_notifier_call_chain+0x20/0x28
       [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0
       [<c04af3cc>] store_scaling_governor+0x74/0x9c
       [<c04ad418>] store+0x90/0xc0
       [<c0175384>] sysfs_kf_write+0x54/0x58
       [<c01746b4>] kernfs_fop_write+0xdc/0x190
       [<c010dcc0>] vfs_write+0xac/0x1b4
       [<c010dfec>] SyS_write+0x44/0x90
       [<c000ec00>] ret_fast_syscall+0x0/0x48

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((cpufreq_policy_notifier_list).rwsem);
                               lock(cooling_cpufreq_lock);
                               lock((cpufreq_policy_notifier_list).rwsem);
  lock(cooling_cpufreq_lock);

 *** DEADLOCK ***

7 locks held by rc.local/770:
 #0:  (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4
 #1:  (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190
 #2:  (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190
 #3:  (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90
 #4:  (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0
 #5:  (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0
 #6:  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68

stack backtrace:
CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c)
 r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000
[<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98)
[<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8)
 r4:c0b85a80 r3:d0071d40
[<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0)
 r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240
[<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124)
 r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c
 r4:00000000
[<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8)
 r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000
 r4:c04abfc4
[<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc)
 r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000
 r4:fffffffe
[<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c)
 r7:00000000 r6:00000000 r5:00000000 r4:fffffffe
[<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff
[<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28)
 r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c
[<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0)
[<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c)
 r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600
[<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0)
 r6:0000000c r5:ed76e6d4 r4:ed76e600
[<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58)
 r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c
[<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190)
 r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330
[<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4)
 r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c
 r4:eccde500
[<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90)
 r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000
[<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48)
 r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70

Solve this by moving to finer grained locking - use one mutex to protect
the cpufreq_dev_list as a whole, and a separate lock to ensure correct
ordering of cpufreq notifier registration and removal.

cooling_list_lock is taken within cooling_cpufreq_lock on
(un)registration to preserve the behavior of the code, i.e. to
atomically add/remove to the list and (un)register the notifier.

Fixes: 2dcd851fe4b4 ("thermal: cpu_cooling: Update always cpufreq policy with
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---

Hi Eduardo/Zhang,

This is 4.2-rc material, please apply soon.

V1->V2:
- V1 was posted by Russell this morning and I made few minor changes to
  it.
- Some sections (specially the Rant) of the commit log are dropped and
  the last one is updated, as we do take cooling_list_lock within
  cooling_cpufreq_lock.
- Above is updated in code during unregister sequence
- cooling_list_lock taken at few more places
- Minor reorganization of the code

 drivers/thermal/cpu_cooling.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 6509c61b9648..5ae0524bed19 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -107,6 +107,9 @@ struct cpufreq_cooling_device {
 static DEFINE_IDR(cpufreq_idr);
 static DEFINE_MUTEX(cooling_cpufreq_lock);
 
+static unsigned int cpufreq_dev_count;
+
+static DEFINE_MUTEX(cooling_list_lock);
 static LIST_HEAD(cpufreq_dev_list);
 
 /**
@@ -185,14 +188,14 @@ unsigned long cpufreq_cooling_get_level(unsigned int cpu, unsigned int freq)
 {
 	struct cpufreq_cooling_device *cpufreq_dev;
 
-	mutex_lock(&cooling_cpufreq_lock);
+	mutex_lock(&cooling_list_lock);
 	list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
 		if (cpumask_test_cpu(cpu, &cpufreq_dev->allowed_cpus)) {
-			mutex_unlock(&cooling_cpufreq_lock);
+			mutex_unlock(&cooling_list_lock);
 			return get_level(cpufreq_dev, freq);
 		}
 	}
-	mutex_unlock(&cooling_cpufreq_lock);
+	mutex_unlock(&cooling_list_lock);
 
 	pr_err("%s: cpu:%d not part of any cooling device\n", __func__, cpu);
 	return THERMAL_CSTATE_INVALID;
@@ -221,7 +224,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
 	switch (event) {
 
 	case CPUFREQ_ADJUST:
-		mutex_lock(&cooling_cpufreq_lock);
+		mutex_lock(&cooling_list_lock);
 		list_for_each_entry(cpufreq_dev, &cpufreq_dev_list, node) {
 			if (!cpumask_test_cpu(policy->cpu,
 					      &cpufreq_dev->allowed_cpus))
@@ -233,7 +236,7 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
 				cpufreq_verify_within_limits(policy, 0,
 							     max_freq);
 		}
-		mutex_unlock(&cooling_cpufreq_lock);
+		mutex_unlock(&cooling_list_lock);
 		break;
 	default:
 		return NOTIFY_DONE;
@@ -866,12 +869,14 @@ __cpufreq_cooling_register(struct device_node *np,
 
 	mutex_lock(&cooling_cpufreq_lock);
 
+	mutex_lock(&cooling_list_lock);
+	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
+	mutex_unlock(&cooling_list_lock);
+
 	/* Register the notifier for first cpufreq cooling device */
-	if (list_empty(&cpufreq_dev_list))
+	if (!cpufreq_dev_count++)
 		cpufreq_register_notifier(&thermal_cpufreq_notifier_block,
 					  CPUFREQ_POLICY_NOTIFIER);
-	list_add(&cpufreq_dev->node, &cpufreq_dev_list);
-
 	mutex_unlock(&cooling_cpufreq_lock);
 
 	return cool_dev;
@@ -1013,13 +1018,17 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
 		return;
 
 	cpufreq_dev = cdev->devdata;
-	mutex_lock(&cooling_cpufreq_lock);
-	list_del(&cpufreq_dev->node);
 
 	/* Unregister the notifier for the last cpufreq cooling device */
-	if (list_empty(&cpufreq_dev_list))
+	mutex_lock(&cooling_cpufreq_lock);
+	if (!--cpufreq_dev_count)
 		cpufreq_unregister_notifier(&thermal_cpufreq_notifier_block,
 					    CPUFREQ_POLICY_NOTIFIER);
+
+	mutex_lock(&cooling_list_lock);
+	list_del(&cpufreq_dev->node);
+	mutex_unlock(&cooling_list_lock);
+
 	mutex_unlock(&cooling_cpufreq_lock);
 
 	thermal_cooling_device_unregister(cpufreq_dev->cool_dev);