From patchwork Thu Feb 4 06:24:39 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 61156 Delivered-To: patch@linaro.org Received: by 10.112.43.199 with SMTP id y7csp278198lbl; Wed, 3 Feb 2016 22:24:47 -0800 (PST) X-Received: by 10.98.72.133 with SMTP id q5mr8685083pfi.166.1454567087202; Wed, 03 Feb 2016 22:24:47 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id rr5si14431488pab.188.2016.02.03.22.24.46; Wed, 03 Feb 2016 22:24:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dkim=pass header.i=@linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757973AbcBDGYp (ORCPT + 30 others); Thu, 4 Feb 2016 01:24:45 -0500 Received: from mail-pf0-f170.google.com ([209.85.192.170]:36141 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752348AbcBDGYn (ORCPT ); Thu, 4 Feb 2016 01:24:43 -0500 Received: by mail-pf0-f170.google.com with SMTP id n128so33633002pfn.3 for ; Wed, 03 Feb 2016 22:24:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=9oMRvi1B3HdvRKwazsVDr2xjzLVFKl/gGXpKtHXtRhw=; b=Xx3MCKpYoBrZ9VeeqnYld4lQZnBBkbK9bTw0Hz33A2qZiSrGmIN7WII2PEc8+iiNIY 3RfEfJeQYvMMYNl1LtOzGfpve6XPJFLTUVViFVRhdAmWDGzY3oLgnnRUYVVPDTQteiGz ZKmMUqzbF2/eBLq388jDTBWiBPQny2fxweK8A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=9oMRvi1B3HdvRKwazsVDr2xjzLVFKl/gGXpKtHXtRhw=; b=Id8JIFKJvr+QyN1sahKjHCIP7Mr3KpfkiCNiYs7+I+0nbapeQtBfgLQHkHSymBDBZG jw5kv2LHnPtdx/S6FwMhBdjhjwChsr1n3w/dA7vaYTZkNSBnBuOygGGnfKZqcjtX23Dt a+DAV38mmNnp9oCLP3XMwXHVuJMY7OAzhIWFf9cR/0fDE0X3zbM1KXCaDCUmTNmt9Cei SFYtWk1/Px2XUSq1Bo65VKdUa+zZrPISy00G3StWBad3NKu6HwDp/qDfFIAV3/5r74wv Qj+TkF4PP25eaJJwdd+9fxh6CyNjV8k9CfClqxr9jwQunZbzGyzY33tP1NZsxozhfTZ5 AFzQ== X-Gm-Message-State: AG10YOR32Q3Ftmb6znYp3n3NetCjvArrf0If0TmaWXZjuWXbbCsZxUX/QXv024AwQmIzYss4 X-Received: by 10.66.230.201 with SMTP id ta9mr8619439pac.52.1454567082910; Wed, 03 Feb 2016 22:24:42 -0800 (PST) Received: from localhost ([122.172.22.246]) by smtp.gmail.com with ESMTPSA id xz6sm13981452pab.42.2016.02.03.22.24.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Feb 2016 22:24:42 -0800 (PST) Date: Thu, 4 Feb 2016 11:54:39 +0530 From: Viresh Kumar To: Juri Lelli Cc: Rafael Wysocki , linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org, skannan@codeaurora.org, peterz@infradead.org, mturquette@baylibre.com, steve.muckle@linaro.org, vincent.guittot@linaro.org, morten.rasmussen@arm.com, dietmar.eggemann@arm.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Message-ID: <20160204062439.GZ3469@vireshk> References: <20160203155428.GY3947@e106622-lin> <20160203161059.GH3469@vireshk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160203161059.GH3469@vireshk> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03-02-16, 21:40, Viresh Kumar wrote: > On 03-02-16, 15:54, Juri Lelli wrote: > > Ouch, I've just got this executing -f basic on Juno. :( > > It happens with the hotplug_1_by_1 test. > > > > > > [ 1086.531252] IRQ1 no longer affine to CPU1 > > [ 1086.531495] CPU1: shutdown > > [ 1086.538199] psci: CPU1 killed. > > [ 1086.583396] > > [ 1086.584881] ====================================================== > > [ 1086.590999] [ INFO: possible circular locking dependency detected ] > > [ 1086.597205] 4.5.0-rc2+ #37 Not tainted > > [ 1086.600914] ------------------------------------------------------- > > [ 1086.607118] runme.sh/1052 is trying to acquire lock: > > [ 1086.612031] (sb_writers#7){.+.+.+}, at: [] __sb_start_write+0xcc/0xe0 > > [ 1086.620090] > > [ 1086.620090] but task is already holding lock: > > [ 1086.625865] (&policy->rwsem){+++++.}, at: [] cpufreq_offline+0x7c/0x278 > > [ 1086.634081] > > [ 1086.634081] which lock already depends on the new lock. > > [ 1086.634081] > > [ 1086.642180] > > [ 1086.642180] the existing dependency chain (in reverse order) is: > > [ 1086.649589] > > -> #1 (&policy->rwsem){+++++.}: > > [ 1086.653929] [] check_prev_add+0x670/0x754 > > [ 1086.660060] [] validate_chain.isra.36+0x724/0xa0c > > [ 1086.666876] [] __lock_acquire+0x4e4/0xba0 > > [ 1086.673001] [] lock_release+0x244/0x570 > > [ 1086.678955] [] __mutex_unlock_slowpath+0xa0/0x18c > > [ 1086.685771] [] mutex_unlock+0x20/0x2c > > [ 1086.691553] [] kernfs_fop_write+0xb0/0x194 > > [ 1086.697768] [] __vfs_write+0x48/0x104 > > [ 1086.703550] [] vfs_write+0x98/0x198 > > [ 1086.709161] [] SyS_write+0x54/0xb0 > > [ 1086.714684] [] el0_svc_naked+0x24/0x28 > > [ 1086.720555] > > -> #0 (sb_writers#7){.+.+.+}: > > [ 1086.724730] [] print_circular_bug+0x80/0x2e4 > > [ 1086.731116] [] check_prev_add+0x13c/0x754 > > [ 1086.737243] [] validate_chain.isra.36+0x724/0xa0c > > [ 1086.744059] [] __lock_acquire+0x4e4/0xba0 > > [ 1086.750184] [] lock_acquire+0xe4/0x204 > > [ 1086.756052] [] percpu_down_read+0x50/0xe4 > > [ 1086.762180] [] __sb_start_write+0xcc/0xe0 > > [ 1086.768306] [] mnt_want_write+0x28/0x54 > > [ 1086.774263] [] do_last+0x660/0xcb8 > > [ 1086.779788] [] path_openat+0x8c/0x2b0 > > [ 1086.785570] [] do_filp_open+0x78/0xf0 > > [ 1086.791353] [] do_sys_open+0x150/0x214 > > [ 1086.797222] [] SyS_openat+0x3c/0x48 > > [ 1086.802831] [] el0_svc_naked+0x24/0x28 > > [ 1086.808700] > > [ 1086.808700] other info that might help us debug this: > > [ 1086.808700] > > [ 1086.816627] Possible unsafe locking scenario: > > [ 1086.816627] > > [ 1086.822488] CPU0 CPU1 > > [ 1086.826971] ---- ---- > > [ 1086.831453] lock(&policy->rwsem); > > [ 1086.834918] lock(sb_writers#7); > > [ 1086.840713] lock(&policy->rwsem); > > [ 1086.846671] lock(sb_writers#7); > > [ 1086.849972] > > [ 1086.849972] *** DEADLOCK *** > > [ 1086.849972] > > [ 1086.855836] 1 lock held by runme.sh/1052: > > [ 1086.859802] #0: (&policy->rwsem){+++++.}, at: [] cpufreq_offline+0x7c/0x278 > > [ 1086.868453] > > [ 1086.868453] stack backtrace: > > [ 1086.872769] CPU: 5 PID: 1052 Comm: runme.sh Not tainted 4.5.0-rc2+ #37 > > [ 1086.879229] Hardware name: ARM Juno development board (r2) (DT) > > [ 1086.885089] Call trace: > > [ 1086.887511] [] dump_backtrace+0x0/0x1f4 > > [ 1086.892858] [] show_stack+0x20/0x28 > > [ 1086.897861] [] dump_stack+0x84/0xc0 > > [ 1086.902863] [] print_circular_bug+0x1d4/0x2e4 > > [ 1086.908725] [] check_prev_add+0x13c/0x754 > > [ 1086.914244] [] validate_chain.isra.36+0x724/0xa0c > > [ 1086.920448] [] __lock_acquire+0x4e4/0xba0 > > [ 1086.925965] [] lock_acquire+0xe4/0x204 > > [ 1086.931224] [] percpu_down_read+0x50/0xe4 > > [ 1086.936742] [] __sb_start_write+0xcc/0xe0 > > [ 1086.942260] [] mnt_want_write+0x28/0x54 > > [ 1086.947605] [] do_last+0x660/0xcb8 > > [ 1086.952520] [] path_openat+0x8c/0x2b0 > > [ 1086.957693] [] do_filp_open+0x78/0xf0 > > [ 1086.962865] [] do_sys_open+0x150/0x214 > > [ 1086.968123] [] SyS_openat+0x3c/0x48 > > [ 1086.973124] [] el0_svc_naked+0x24/0x28 > > [ 1087.019315] Detected PIPT I-cache on CPU1 > > [ 1087.019373] CPU1: Booted secondary processor [410fd080] > > Urg.. Urg square :( > I failed to understand it for now though. Please test only the first 4 > patches and leave the bottom three. AFAICT, this is caused by the 6th > patch. >From the code I still failed to understand this since sometime back and I something just caught my eyes and the 6th patch needs this fixup: I tried the basic tests using './runme' and they aren't reporting the same lockdep now. And yes, your lockdep occurred on my exynos board as well :) I have re-pushed my patches again to the same branch. All 7 look fine to me now :) -- viresh diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 7bc8a5ed97e5..ac3348ecde7b 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1351,7 +1351,7 @@ static void cpufreq_offline(unsigned int cpu) pr_err("%s: Failed to start governor\n", __func__); } - return; + goto unlock; } if (cpufreq_driver->stop_cpu) @@ -1373,6 +1373,8 @@ static void cpufreq_offline(unsigned int cpu) cpufreq_driver->exit(policy); policy->freq_table = NULL; } + +unlock: up_write(&policy->rwsem); }