From patchwork Fri Jan 9 16:16:10 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniel Thompson X-Patchwork-Id: 42913 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f198.google.com (mail-wi0-f198.google.com [209.85.212.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id BA55826CF1 for ; Fri, 9 Jan 2015 16:16:40 +0000 (UTC) Received: by mail-wi0-f198.google.com with SMTP id r20sf1604573wiv.1 for ; Fri, 09 Jan 2015 08:16:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=API8EhWsWL68MDK90yK9zs9ZRqstpC2vqXCKTCTeK10=; b=llo0JsXYDwZCrAOhIIGNwQy+ayCx41sc3tytAoWbs33G75QtKs9BaNKjCi5pyV52bh GqMOBGxjItq36vmJFpHQp9M1ZFFlBncF5sTzLuCOcteMKTGIAmY8+IgPb7sklfRwocBL zHTLROpfYS2FKVnm59SSDwydLdIWB5o4dfc90CdAnXCMcbm4l0su10IuUzHIp1ngj+sv 9OcPrLqQQu0PnM6MkN2V5dDbByNM61TO4+59MmF7IVoBlVz9hTTcvALlCx+XgoYFSipU aOogET0AHotM9J+HucQl+8vnBptf4ec6r+O4gRLP4jBZJFcTgn3Vwoa40ZBIwIggy2Hf vViA== X-Gm-Message-State: ALoCoQmHIjarG6ZckRTfVXtqb8bq5aK20IDo1ofeSwUZgsEnXO2bpuY/MPvhNNUj4Cu8sWb+CRLG X-Received: by 10.152.44.225 with SMTP id h1mr2096275lam.2.1420820200023; Fri, 09 Jan 2015 08:16:40 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.27.202 with SMTP id v10ls389790lag.77.gmail; Fri, 09 Jan 2015 08:16:39 -0800 (PST) X-Received: by 10.152.87.142 with SMTP id ay14mr22488208lab.45.1420820199879; Fri, 09 Jan 2015 08:16:39 -0800 (PST) Received: from mail-lb0-f171.google.com (mail-lb0-f171.google.com. [209.85.217.171]) by mx.google.com with ESMTPS id tv7si13509455lbb.83.2015.01.09.08.16.39 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 Jan 2015 08:16:39 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) client-ip=209.85.217.171; Received: by mail-lb0-f171.google.com with SMTP id w7so9182581lbi.2 for ; Fri, 09 Jan 2015 08:16:39 -0800 (PST) X-Received: by 10.152.115.146 with SMTP id jo18mr22760553lab.9.1420820199733; Fri, 09 Jan 2015 08:16:39 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.112.9.200 with SMTP id c8csp381846lbb; Fri, 9 Jan 2015 08:16:37 -0800 (PST) X-Received: by 10.180.73.178 with SMTP id m18mr6298615wiv.65.1420820197173; Fri, 09 Jan 2015 08:16:37 -0800 (PST) Received: from mail-wg0-f54.google.com (mail-wg0-f54.google.com. [74.125.82.54]) by mx.google.com with ESMTPS id wo6si20127306wjc.129.2015.01.09.08.16.36 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 Jan 2015 08:16:36 -0800 (PST) Received-SPF: pass (google.com: domain of daniel.thompson@linaro.org designates 74.125.82.54 as permitted sender) client-ip=74.125.82.54; Received: by mail-wg0-f54.google.com with SMTP id z12so8963408wgg.13 for ; Fri, 09 Jan 2015 08:16:36 -0800 (PST) X-Received: by 10.180.107.195 with SMTP id he3mr6646547wib.44.1420820196555; Fri, 09 Jan 2015 08:16:36 -0800 (PST) Received: from sundance.lan (cpc4-aztw19-0-0-cust157.18-1.cable.virginm.net. [82.33.25.158]) by mx.google.com with ESMTPSA id kn7sm10518442wjc.45.2015.01.09.08.16.34 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Jan 2015 08:16:35 -0800 (PST) From: Daniel Thompson To: Russell King , Will Deacon Cc: Daniel Thompson , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Shawn Guo , Sascha Hauer , Peter Zijlstra , Paul Mackerras , Ingo Molnar , Arnaldo Carvalho de Melo , Thomas Gleixner , Lucas Stach , Linus Walleij , Mark Rutland , patches@linaro.org, linaro-kernel@lists.linaro.org, John Stultz , Sumit Semwal Subject: [PATCH v4] arm: perf: Directly handle SMP platforms with one SPI Date: Fri, 9 Jan 2015 16:16:10 +0000 Message-Id: <1420820170-6127-1-git-send-email-daniel.thompson@linaro.org> X-Mailer: git-send-email 1.9.3 In-Reply-To: <1416581603-30557-1-git-send-email-daniel.thompson@linaro.org> References: <1416581603-30557-1-git-send-email-daniel.thompson@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: daniel.thompson@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.171 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Some ARM platforms mux the PMU interrupt of every core into a single SPI. On such platforms if the PMU of any core except 0 raises an interrupt then it cannot be serviced and eventually, if you are lucky, the spurious irq detection might forcefully disable the interrupt. On these SoCs it is not possible to determine which core raised the interrupt so workaround this issue by queuing irqwork on the other cores whenever the primary interrupt handler is unable to service the interrupt. The u8500 platform has an alternative workaround that dynamically alters the affinity of the PMU interrupt. This workaround logic is no longer required so the original code is removed as is the hook it relied upon. Tested on imx6q (which has fours cores/PMUs all muxed to a single SPI). Signed-off-by: Daniel Thompson --- Notes: v2 was tested on u8500 (thanks to Linus Walleij). v4 doesn't change anything conceptual but the changes were sufficient for me not to preserve the Tested-By:. v4: * Ripped out the logic that tried to preserve the operation of the spurious interrupt detector. It was complex and not really needed (Will Deacon). * Removed a redundant memory barrier and added a comment explaining why it is not needed (Will Deacon). * Made fully safe w.r.t. hotplug by falling back to a work queue if there is a hotplug operation in flight when the PMU interrupt comes in (Will Deacon). The work queue code paths have been tested synthetically (by changing the if condition). * Posted the correct, as in compilable and tested, version of the code (Will Deacon). v3: * Removed function pointer indirection when deploying workaround code and reorganise the code accordingly (Mark Rutland). * Move the workaround state tracking into the existing percpu data structure (Mark Rutland). * Renamed cret to percpu_ret and rewrote the comment describing the purpose of this variable (Mark Rutland). * Copy the cpu_online_mask and use that to act on a consistent set of cpus throughout the workaround (Mark Rutland). * Changed "single_irq" to "muxed_spi" to more explicitly describe the problem. v2: * Fixed build problems on systems without SMP. v1: * Thanks to Lucas Stach, Russell King and Thomas Gleixner for critiquing an older, completely different way to tackle the same problem. arch/arm/include/asm/pmu.h | 13 ++++ arch/arm/kernel/perf_event.c | 9 +-- arch/arm/kernel/perf_event_cpu.c | 148 +++++++++++++++++++++++++++++++++++++++ arch/arm/kernel/perf_event_v7.c | 2 +- arch/arm/mach-ux500/cpu-db8500.c | 29 -------- 5 files changed, 163 insertions(+), 38 deletions(-) -- 1.9.3 diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h index b1596bd59129..26c7d29c976d 100644 --- a/arch/arm/include/asm/pmu.h +++ b/arch/arm/include/asm/pmu.h @@ -87,6 +87,14 @@ struct pmu_hw_events { * already have to allocate this struct per cpu. */ struct arm_pmu *percpu_pmu; + +#ifdef CONFIG_SMP + /* + * This is used to schedule workaround logic on platforms where all + * the PMUs are attached to a single SPI. + */ + struct irq_work work; +#endif }; struct arm_pmu { @@ -117,6 +125,11 @@ struct arm_pmu { struct platform_device *plat_device; struct pmu_hw_events __percpu *hw_events; struct notifier_block hotplug_nb; +#ifdef CONFIG_SMP + int muxed_spi_workaround_irq; + struct work_struct muxed_spi_workaround_work; + atomic_t remaining_irq_work; +#endif }; #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu)) diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c index f7c65adaa428..e5c537b57f94 100644 --- a/arch/arm/kernel/perf_event.c +++ b/arch/arm/kernel/perf_event.c @@ -299,8 +299,6 @@ validate_group(struct perf_event *event) static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) { struct arm_pmu *armpmu; - struct platform_device *plat_device; - struct arm_pmu_platdata *plat; int ret; u64 start_clock, finish_clock; @@ -311,14 +309,9 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) * dereference. */ armpmu = *(void **)dev; - plat_device = armpmu->plat_device; - plat = dev_get_platdata(&plat_device->dev); start_clock = sched_clock(); - if (plat && plat->handle_irq) - ret = plat->handle_irq(irq, armpmu, armpmu->handle_irq); - else - ret = armpmu->handle_irq(irq, armpmu); + ret = armpmu->handle_irq(irq, armpmu); finish_clock = sched_clock(); perf_sample_event_took(finish_clock - start_clock); diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c index dd9acc95ebc0..76227484baa9 100644 --- a/arch/arm/kernel/perf_event_cpu.c +++ b/arch/arm/kernel/perf_event_cpu.c @@ -59,6 +59,142 @@ int perf_num_counters(void) } EXPORT_SYMBOL_GPL(perf_num_counters); +#ifdef CONFIG_SMP +/* + * Workaround logic that is distributed to all cores if the PMU has only + * a single IRQ and the CPU receiving that IRQ cannot handle it. Its + * job is to try to service the interrupt on the current CPU. It will + * also enable the IRQ again if all the other CPUs have already tried to + * service it. + */ +static void cpu_pmu_do_percpu_work(struct irq_work *w) +{ + struct pmu_hw_events *hw_events = + container_of(w, struct pmu_hw_events, work); + struct arm_pmu *cpu_pmu = hw_events->percpu_pmu; + + /* Ignore the return code, we can do nothing useful with it */ + cpu_pmu->handle_irq(0, cpu_pmu); + + if (atomic_dec_and_test(&cpu_pmu->remaining_irq_work)) + enable_irq(cpu_pmu->muxed_spi_workaround_irq); +} + +/* + * Issue work to the other CPUs. Must be called whilst we own the + * hotplug locks. + */ +static void cpu_pmu_queue_percpu_work(struct arm_pmu *cpu_pmu) +{ + int cpu; + + atomic_add(num_online_cpus() - 1, &cpu_pmu->remaining_irq_work); + + for_each_online_cpu(cpu) { + struct pmu_hw_events *hw_events = + per_cpu_ptr(cpu_pmu->hw_events, cpu); + + if (cpu == smp_processor_id()) + continue; + + /* + * We assume that the IPI within irq_work_queue_on() + * implies a full memory barrier making the value of + * cpu_pmu->remaining_irq_work visible to the target. + */ + if (!irq_work_queue_on(&hw_events->work, cpu)) + if (atomic_dec_and_test(&cpu_pmu->remaining_irq_work)) + enable_irq(cpu_pmu->muxed_spi_workaround_irq); + } +} + +void cpu_pmu_muxed_spi_workaround_worker(struct work_struct *work) +{ + struct arm_pmu *cpu_pmu = + container_of(work, struct arm_pmu, muxed_spi_workaround_work); + + get_online_cpus(); + cpu_pmu_queue_percpu_work(cpu_pmu); + put_online_cpus(); +} + +/* + * Called when the main interrupt handler cannot determine the source + * of interrupt. It will deploy a workaround if we are running on an SMP + * platform with only a single muxed SPI. + * + * The workaround disables the interrupt and distributes irqwork to all + * other processors in the system. Hopefully one of them will clear the + * interrupt... + */ +static irqreturn_t cpu_pmu_handle_irq_none(int irq_num, struct arm_pmu *cpu_pmu) +{ + + if (irq_num != cpu_pmu->muxed_spi_workaround_irq) + return IRQ_NONE; + + disable_irq_nosync(cpu_pmu->muxed_spi_workaround_irq); + + if (try_get_online_cpus()) { + cpu_pmu_queue_percpu_work(cpu_pmu); + put_online_cpus(); + } else { + /* + * There is a CPU hotplug operation in flight making it + * unsafe for us to queue the percpu work. The PMU is + * already silenced so we'll leave it like that and + * schedule some work to tidy things up. + * + * Taking this code path should be very rare which is + * good because the latencies involved here are way to + * long for good profiling. + */ + schedule_work(&cpu_pmu->muxed_spi_workaround_work); + } + + return IRQ_HANDLED; +} + +static int cpu_pmu_muxed_spi_workaround_init(struct arm_pmu *cpu_pmu) +{ + struct platform_device *pmu_device = cpu_pmu->plat_device; + int cpu; + + for_each_possible_cpu(cpu) { + struct pmu_hw_events *hw_events = + per_cpu_ptr(cpu_pmu->hw_events, cpu); + + init_irq_work(&hw_events->work, cpu_pmu_do_percpu_work); + } + + INIT_WORK(&cpu_pmu->muxed_spi_workaround_work, + cpu_pmu_muxed_spi_workaround_worker); + atomic_set(&cpu_pmu->remaining_irq_work, 0); + cpu_pmu->muxed_spi_workaround_irq = platform_get_irq(pmu_device, 0); + + return 0; +} + +static void cpu_pmu_muxed_spi_workaround_term(struct arm_pmu *cpu_pmu) +{ + cpu_pmu->muxed_spi_workaround_irq = 0; +} +#else /* CONFIG_SMP */ +static int cpu_pmu_muxed_spi_workaround_init(struct arm_pmu *cpu_pmu) +{ + return 0; +} + +static void cpu_pmu_muxed_spi_workaround_term(struct arm_pmu *cpu_pmu) +{ +} + +static irqreturn_t cpu_pmu_handle_irq_none(int irq_num, struct arm_pmu *cpu_pmu) +{ + return IRQ_NONE; +} +#endif /* CONFIG_SMP */ + /* Include the PMU-specific implementations. */ #include "perf_event_xscale.c" #include "perf_event_v6.c" @@ -98,6 +234,8 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu) if (irq >= 0) free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i)); } + + cpu_pmu_muxed_spi_workaround_term(cpu_pmu); } } @@ -155,6 +293,16 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler) cpumask_set_cpu(i, &cpu_pmu->active_irqs); } + + /* + * If we are running SMP and have only one interrupt source + * then get ready to share that single irq among the cores. + */ + if (nr_cpu_ids > 1 && irqs == 1) { + err = cpu_pmu_muxed_spi_workaround_init(cpu_pmu); + if (err) + return err; + } } return 0; diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c index 8993770c47de..0dd914c10803 100644 --- a/arch/arm/kernel/perf_event_v7.c +++ b/arch/arm/kernel/perf_event_v7.c @@ -792,7 +792,7 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev) * Did an overflow occur? */ if (!armv7_pmnc_has_overflowed(pmnc)) - return IRQ_NONE; + return cpu_pmu_handle_irq_none(irq_num, cpu_pmu); /* * Handle the counter(s) overflow(s) diff --git a/arch/arm/mach-ux500/cpu-db8500.c b/arch/arm/mach-ux500/cpu-db8500.c index 6f63954c8bde..917774999c5c 100644 --- a/arch/arm/mach-ux500/cpu-db8500.c +++ b/arch/arm/mach-ux500/cpu-db8500.c @@ -12,8 +12,6 @@ #include #include #include -#include -#include #include #include #include @@ -23,7 +21,6 @@ #include #include -#include #include #include "setup.h" @@ -99,30 +96,6 @@ static void __init u8500_map_io(void) iotable_init(u8500_io_desc, ARRAY_SIZE(u8500_io_desc)); } -/* - * The PMU IRQ lines of two cores are wired together into a single interrupt. - * Bounce the interrupt to the other core if it's not ours. - */ -static irqreturn_t db8500_pmu_handler(int irq, void *dev, irq_handler_t handler) -{ - irqreturn_t ret = handler(irq, dev); - int other = !smp_processor_id(); - - if (ret == IRQ_NONE && cpu_online(other)) - irq_set_affinity(irq, cpumask_of(other)); - - /* - * We should be able to get away with the amount of IRQ_NONEs we give, - * while still having the spurious IRQ detection code kick in if the - * interrupt really starts hitting spuriously. - */ - return ret; -} - -static struct arm_pmu_platdata db8500_pmu_platdata = { - .handle_irq = db8500_pmu_handler, -}; - static const char *db8500_read_soc_id(void) { void __iomem *uid = __io_address(U8500_BB_UID_BASE); @@ -143,8 +116,6 @@ static struct device * __init db8500_soc_device_init(void) } static struct of_dev_auxdata u8500_auxdata_lookup[] __initdata = { - /* Requires call-back bindings. */ - OF_DEV_AUXDATA("arm,cortex-a9-pmu", 0, "arm-pmu", &db8500_pmu_platdata), /* Requires DMA bindings. */ OF_DEV_AUXDATA("stericsson,ux500-msp-i2s", 0x80123000, "ux500-msp-i2s.0", &msp0_platform_data),