From patchwork Thu Mar 30 21:01:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 96334 Delivered-To: patches@linaro.org Received: by 10.140.89.233 with SMTP id v96csp425882qgd; Thu, 30 Mar 2017 14:01:47 -0700 (PDT) X-Received: by 10.84.163.75 with SMTP id n11mr1277153plg.186.1490907707441; Thu, 30 Mar 2017 14:01:47 -0700 (PDT) Return-Path: Received: from mail-pg0-x230.google.com (mail-pg0-x230.google.com. [2607:f8b0:400e:c05::230]) by mx.google.com with ESMTPS id q9si2932385pli.252.2017.03.30.14.01.47 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Mar 2017 14:01:47 -0700 (PDT) Received-SPF: pass (google.com: domain of john.stultz@linaro.org designates 2607:f8b0:400e:c05::230 as permitted sender) client-ip=2607:f8b0:400e:c05::230; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: domain of john.stultz@linaro.org designates 2607:f8b0:400e:c05::230 as permitted sender) smtp.mailfrom=john.stultz@linaro.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: by mail-pg0-x230.google.com with SMTP id x125so50227024pgb.0 for ; Thu, 30 Mar 2017 14:01:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=VZFJpnwAw2ThsT8tcfunYCctEGAaNNP6qveSBTD0FH4=; b=B6A57W6in0hLeZ2ZuATbB95nkAO7ebWbRxc9v3HjIGJhZcD6Hp+WfFUVk+mhQS/LUo AXJmVDQL3N3P+9noRWqylxRjSNNyT187zU36hBVFSrCTlCAmWx1uTHm4YPmQjoymtGDC XvF6cPovtaN3v8cbD14/AcdNRJ8pj4++LymTA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=VZFJpnwAw2ThsT8tcfunYCctEGAaNNP6qveSBTD0FH4=; b=do/oM6VKpRuTyVjlFV/58XUGSJ2VziicSPUQ9adLy1K+vVWVDeaalstcj/WMhKc2RC gmaVYqN80XBOinDEo1ubbzy0L+VnFP3Uc/QVqREIPwTv8s+9HAUpLAI/MjdeJhVqnLT8 oJZ8M2y9bs49r9itX6kZh2enM2MVYt/nqxajmdWxViEwGdsThVbty+2nFoqoqFC/OVn9 T/MAJIbA3smIx+9qvohSaGebqyTb+GRZQsFA2vsgdApewXuLfTDAwPeI6SGVf01lDsEc iX74q+YKaUol+q1O+/QZbdA1MX6ASW5bmqqYV5M3q/Q7K+/mNnSw71GUTOjsx7aA3x+4 3hHw== X-Gm-Message-State: AFeK/H08Q7FN/fuNubNf3MjVrNOHgX7a3nY0ttCELfQRRTRjotn1WMSUyFKRrEcBqLgKI4qujUM= X-Received: by 10.84.216.81 with SMTP id f17mr1306774plj.170.1490907707114; Thu, 30 Mar 2017 14:01:47 -0700 (PDT) Return-Path: Received: from localhost.localdomain ([2601:1c2:1002:83f0:4e72:b9ff:fe99:466a]) by smtp.gmail.com with ESMTPSA id y7sm6162626pfk.93.2017.03.30.14.01.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 30 Mar 2017 14:01:46 -0700 (PDT) From: John Stultz To: lkml Cc: Tom Hromatka , Ingo Molnar , Thomas Gleixner , Daniel Lezcano , Richard Cochran , Prarit Bhargava , Stephen Boyd , John Stultz Subject: [PATCH 9/9] sysrq: Reset the watchdog timers while displaying high-resolution timers Date: Thu, 30 Mar 2017 14:01:24 -0700 Message-Id: <1490907684-11186-10-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1490907684-11186-1-git-send-email-john.stultz@linaro.org> References: <1490907684-11186-1-git-send-email-john.stultz@linaro.org> From: Tom Hromatka On systems with a large number of CPUs, running sysrq- can cause watchdog timeouts. There are two slow sections of code in the sysrq- path in timer_list.c. 1. print_active_timers() - This function is called by print_cpu() and contains a slow goto loop. On a machine with hundreds of CPUs, this loop took approximately 100ms for the first CPU in a NUMA node. (Subsequent CPUs in the same node ran much quicker.) The total time to print all of the CPUs is ultimately long enough to trigger the soft lockup watchdog. 2. print_tickdevice() - This function outputs a large amount of textual information. This function also took approximately 100ms per CPU. Since sysrq- is not a performance critical path, there should be no harm in touching the nmi watchdog in both slow sections above. Touching it in just one location was insufficient on systems with hundreds of CPUs as occasional timeouts were still observed during testing. This issue was observed on an Oracle T7 machine with 128 CPUs, but I anticipate it may affect other systems with similarly large numbers of CPUs. Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: Richard Cochran Cc: Prarit Bhargava Cc: Stephen Boyd Signed-off-by: Tom Hromatka Reviewed-by: Rob Gardner Signed-off-by: John Stultz --- kernel/time/timer_list.c | 6 ++++++ 1 file changed, 6 insertions(+) -- 2.7.4 diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c index ff8d5c1..0e7f542 100644 --- a/kernel/time/timer_list.c +++ b/kernel/time/timer_list.c @@ -16,6 +16,7 @@ #include #include #include +#include #include @@ -86,6 +87,9 @@ print_active_timers(struct seq_file *m, struct hrtimer_clock_base *base, next_one: i = 0; + + touch_nmi_watchdog(); + raw_spin_lock_irqsave(&base->cpu_base->lock, flags); curr = timerqueue_getnext(&base->active); @@ -197,6 +201,8 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu) { struct clock_event_device *dev = td->evtdev; + touch_nmi_watchdog(); + SEQ_printf(m, "Tick Device: mode: %d\n", td->mode); if (cpu < 0) SEQ_printf(m, "Broadcast device\n");