[v5,08/14] ARM: hisi: add hip04 SoC support

Message ID	1399473888-12947-9-git-send-email-haojian.zhuang@linaro.org
State	Changes Requested
Headers	show Return-Path: <patchwork-forward+bncBCUY3LGNZALBBT4QVGNQKGQEVH6N5SY@linaro.org> Received-SPF: none (google.com: patch+caf_=patchwork-forward=linaro.org@linaro.org does not designate permitted sender hosts) client-ip=209.85.128.173; Received-SPF: none (google.com: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org does not designate permitted sender hosts) client-ip=2001:1868:205::9; From: Haojian Zhuang <haojian.zhuang@linaro.org> To: tglx@linutronix.de, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org, arnd@arndb.de, olof@lixom.net, khilman@kernel.org, xuwei5@hisilicon.com, christoffer.dall@linaro.org, Dave.Martin@arm.com, nicolas.pitre@linaro.org, marc.zyngier@arm.com Subject: [PATCH v5 08/14] ARM: hisi: add hip04 SoC support Date: Wed, 7 May 2014 22:44:42 +0800 Message-Id: <1399473888-12947-9-git-send-email-haojian.zhuang@linaro.org> In-Reply-To: <1399473888-12947-1-git-send-email-haojian.zhuang@linaro.org> References: <1399473888-12947-1-git-send-email-haojian.zhuang@linaro.org> summary: Content analysis details: (-0.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.220.42 listed in list.dnswl.org] Cc: Haojian Zhuang <haojian.zhuang@linaro.org> Precedence: list MIME-Version: 1.0 Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit

Haojian Zhuang May 7, 2014, 2:44 p.m. UTC

Hisilicon Hi3xxx is based on Cortex A9 Core. Now HiP04 SoC is based on
Cortex A15 Core. Since multiple clusters is used in HiP04 SoC, it could
be based on MCPM.

And HiP04 supports LPAE to support large memory.

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
---
 .../bindings/arm/hisilicon/hisilicon.txt           |  19 ++
 .../devicetree/bindings/clock/hip04-clock.txt      |  20 ++
 arch/arm/mach-hisi/Kconfig                         |  10 +-
 arch/arm/mach-hisi/Makefile                        |   1 +
 arch/arm/mach-hisi/core.h                          |   2 +
 arch/arm/mach-hisi/hisilicon.c                     |  12 +
 arch/arm/mach-hisi/platmcpm.c                      | 314 +++++++++++++++++++++
 7 files changed, 377 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/clock/hip04-clock.txt
 create mode 100644 arch/arm/mach-hisi/platmcpm.c

Olof Johansson May 7, 2014, 4:50 p.m. UTC | #1

On Wed, May 07, 2014 at 10:44:42PM +0800, Haojian Zhuang wrote:
> Hisilicon Hi3xxx is based on Cortex A9 Core. Now HiP04 SoC is based on
> Cortex A15 Core. Since multiple clusters is used in HiP04 SoC, it could
> be based on MCPM.
> 
> And HiP04 supports LPAE to support large memory.
> 
> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> ---
>  .../bindings/arm/hisilicon/hisilicon.txt           |  19 ++
>  .../devicetree/bindings/clock/hip04-clock.txt      |  20 ++
>  arch/arm/mach-hisi/Kconfig                         |  10 +-
>  arch/arm/mach-hisi/Makefile                        |   1 +
>  arch/arm/mach-hisi/core.h                          |   2 +
>  arch/arm/mach-hisi/hisilicon.c                     |  12 +
>  arch/arm/mach-hisi/platmcpm.c                      | 314 +++++++++++++++++++++
>  7 files changed, 377 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/devicetree/bindings/clock/hip04-clock.txt
>  create mode 100644 arch/arm/mach-hisi/platmcpm.c
> 
> diff --git a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
> index df0a452..20736b0 100644
> --- a/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
> +++ b/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
> @@ -4,6 +4,10 @@ Hisilicon Platforms Device Tree Bindings
>  Hi4511 Board
>  Required root node properties:
>  	- compatible = "hisilicon,hi3620-hi4511";
> +HiP04 D01 Board
> +Required root node properties:
> +	- compatible = "hisilicon,hip04-d01";
> +
>  
>  Hisilicon system controller
>  
> @@ -19,6 +23,13 @@ Optional properties:
>  		If reg value is not zero, cpun exit wfi and go
>  - resume-offset : offset in sysctrl for notifying cpu0 when resume
>  - reboot-offset : offset in sysctrl for system reboot
> +- relocation-entry : relocation address of secondary cpu boot code
> +- relocation-size : relocation size of secondary cpu boot code
> +- bootwrapper-phys : physical address of boot wrapper
> +- bootwrapper-size : size of boot wrapper
> +- bootwrapper-magic : magic number for secondary cpu in boot wrapper
> +The memory area of [bootwrapper-phys : bootwrapper-phys+bootwrapper-size]
> +should be reserved. This should be set in /memreserve/ node in DTS file.

This binding spec should be split up in separate files, it's hard to tell
from the patch what part of the binding this is appending to. Please do
that in a preceding patch, and then add new contents on top.


-Olof

Nicolas Pitre May 7, 2014, 5:44 p.m. UTC | #2

On Wed, 7 May 2014, Haojian Zhuang wrote:

> Hisilicon Hi3xxx is based on Cortex A9 Core. Now HiP04 SoC is based on
> Cortex A15 Core. Since multiple clusters is used in HiP04 SoC, it could
> be based on MCPM.
> 
> And HiP04 supports LPAE to support large memory.

This makes for a large patch with multiple things mixed together (DT 
bindings, MCPM backend, etc.)  That might help reviewing if split into 
separate patches.

> Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
> ---
>  .../bindings/arm/hisilicon/hisilicon.txt           |  19 ++
>  .../devicetree/bindings/clock/hip04-clock.txt      |  20 ++
>  arch/arm/mach-hisi/Kconfig                         |  10 +-
>  arch/arm/mach-hisi/Makefile                        |   1 +
>  arch/arm/mach-hisi/core.h                          |   2 +
>  arch/arm/mach-hisi/hisilicon.c                     |  12 +
>  arch/arm/mach-hisi/platmcpm.c                      | 314 +++++++++++++++++++++
>  7 files changed, 377 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/devicetree/bindings/clock/hip04-clock.txt
>  create mode 100644 arch/arm/mach-hisi/platmcpm.c

[...]

> diff --git a/arch/arm/mach-hisi/platmcpm.c b/arch/arm/mach-hisi/platmcpm.c
> new file mode 100644
> index 0000000..e0d7afd
> --- /dev/null
> +++ b/arch/arm/mach-hisi/platmcpm.c
> @@ -0,0 +1,314 @@
> +/*
> + * Copyright (c) 2013-2014 Linaro Ltd.
> + * Copyright (c) 2013-2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + */
> +#include <linux/delay.h>
> +#include <linux/io.h>
> +#include <linux/memblock.h>
> +#include <linux/of_address.h>
> +
> +#include <asm/cputype.h>
> +#include <asm/cp15.h>
> +#include <asm/mcpm.h>
> +
> +#include "core.h"
> +
> +/* bits definition in SC_CPU_RESET_REQ[x]/SC_CPU_RESET_DREQ[x]
> + * 1 -- unreset; 0 -- reset
> + */
> +#define CORE_RESET_BIT(x)		(1 << x)
> +#define NEON_RESET_BIT(x)		(1 << (x + 4))
> +#define CORE_DEBUG_RESET_BIT(x)		(1 << (x + 9))
> +#define CLUSTER_L2_RESET_BIT		(1 << 8)
> +#define CLUSTER_DEBUG_RESET_BIT		(1 << 13)
> +
> +/*
> + * bits definition in SC_CPU_RESET_STATUS[x]
> + * 1 -- reset status; 0 -- unreset status
> + */
> +#define CORE_RESET_STATUS(x)		(1 << x)
> +#define NEON_RESET_STATUS(x)		(1 << (x + 4))
> +#define CORE_DEBUG_RESET_STATUS(x)	(1 << (x + 9))
> +#define CLUSTER_L2_RESET_STATUS		(1 << 8)
> +#define CLUSTER_DEBUG_RESET_STATUS	(1 << 13)
> +#define CORE_WFI_STATUS(x)		(1 << (x + 16))
> +#define CORE_WFE_STATUS(x)		(1 << (x + 20))
> +#define CORE_DEBUG_ACK(x)		(1 << (x + 24))
> +
> +#define SC_CPU_RESET_REQ(x)		(0x520 + (x << 3))	/* reset */
> +#define SC_CPU_RESET_DREQ(x)		(0x524 + (x << 3))	/* unreset */
> +#define SC_CPU_RESET_STATUS(x)		(0x1520 + (x << 3))
> +
> +#define FAB_SF_MODE			0x0c
> +#define FAB_SF_INVLD			0x10
> +
> +/* bits definition in FB_SF_INVLD */
> +#define FB_SF_INVLD_START		(1 << 8)
> +
> +#define HIP04_MAX_CLUSTERS		4
> +#define HIP04_MAX_CPUS_PER_CLUSTER	4
> +
> +#define POLL_MSEC	10
> +#define TIMEOUT_MSEC	1000
> +
> +struct hip04_secondary_cpu_data {
> +	u32	bootwrapper_phys;
> +	u32	bootwrapper_size;
> +	u32	bootwrapper_magic;
> +	u32	relocation_entry;
> +	u32	relocation_size;
> +};
> +
> +static void __iomem *relocation, *sysctrl, *fabric;
> +static int hip04_cpu_table[HIP04_MAX_CLUSTERS][HIP04_MAX_CPUS_PER_CLUSTER];
> +static DEFINE_SPINLOCK(boot_lock);
> +static struct hip04_secondary_cpu_data hip04_boot;
> +
> +static bool hip04_cluster_down(unsigned int cluster)
> +{
> +	int i;
> +
> +	for (i = 0; i < HIP04_MAX_CPUS_PER_CLUSTER; i++)
> +		if (hip04_cpu_table[cluster][i])
> +			return false;
> +	return true;
> +}
> +
> +static void hip04_set_snoop_filter(unsigned int cluster, unsigned int on)
> +{
> +	unsigned long data;
> +
> +	if (!fabric)
> +		return;
> +	data = readl_relaxed(fabric + FAB_SF_MODE);
> +	if (on)
> +		data |= 1 << cluster;
> +	else
> +		data &= ~(1 << cluster);
> +	writel_relaxed(data, fabric + FAB_SF_MODE);
> +	while (1) {
> +		if (data == readl_relaxed(fabric + FAB_SF_MODE))
> +			break;
> +	}
> +}
> +
> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
> +{
> +	unsigned long data, mask;
> +
> +	if (!relocation || !sysctrl)
> +		return -ENODEV;
> +	if (cluster >= HIP04_MAX_CLUSTERS || cpu >= HIP04_MAX_CPUS_PER_CLUSTER)
> +		return -EINVAL;
> +
> +	spin_lock_irq(&boot_lock);
> +	writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
> +	writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
> +	writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
> +	writel_relaxed(0, relocation + 12);
> +
> +	if (hip04_cluster_down(cluster)) {
> +		data = CLUSTER_DEBUG_RESET_BIT;
> +		writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> +		do {
> +			mask = CLUSTER_DEBUG_RESET_STATUS;
> +			data = readl_relaxed(sysctrl + \
> +					     SC_CPU_RESET_STATUS(cluster));
> +		} while (data & mask);
> +		hip04_set_snoop_filter(cluster, 1);
> +	}

Isn't this racy?  What if the new CPU starts executing code before 
snoops are enabled for it?

Normally we solve this race by letting the waking-up CPU turn on its 
snoops by itself from an assembly code callback provided as argument to 
mcpm_sync_init().  This is also the only way to eventually support deep 
C-states with cpuidle.

> +
> +	hip04_cpu_table[cluster][cpu]++;
> +
> +	data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> +	       CORE_DEBUG_RESET_BIT(cpu);
> +	writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> +	spin_unlock_irq(&boot_lock);
> +	msleep(POLL_MSEC);

What is that msleep() for?

> +	return 0;
> +}
> +
> +static void hip04_mcpm_power_down(void)
> +{
> +	unsigned int mpidr, cpu, cluster;
> +	bool skip_wfi = false;
> +
> +	mpidr = read_cpuid_mpidr();
> +	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> +	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> +
> +	__mcpm_cpu_going_down(cpu, cluster);
> +
> +	spin_lock(&boot_lock);
> +	BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
> +	hip04_cpu_table[cluster][cpu]--;
> +	if (hip04_cpu_table[cluster][cpu] == 1) {
> +		/* A power_up request went ahead of us. */
> +		skip_wfi = true;
> +	} else if (hip04_cpu_table[cluster][cpu] > 1) {
> +		pr_err("Cluster %d CPU%d is still running\n", cluster, cpu);
> +		BUG();
> +	}
> +
> +	spin_unlock(&boot_lock);
> +
> +	v7_exit_coherency_flush(louis);

Don't you have to turn off snoop filters in the case of a complete 
cluster down as well?

> +
> +	__mcpm_cpu_down(cpu, cluster);
> +
> +	isb();
> +	dsb();

Those isb/dsb are redundant here.  They're implicit from 
__mcpm_cpu_down().

> +
> +	if (!skip_wfi)
> +		wfi();
> +}
> +
> +static int hip04_mcpm_wait_for_powerdown(unsigned int cpu, unsigned int cluster)
> +{
> +	unsigned int data, tries;
> +
> +	BUG_ON(cluster >= HIP04_MAX_CLUSTERS ||
> +	       cpu >= HIP04_MAX_CPUS_PER_CLUSTER);
> +
> +	for (tries = 0; tries < TIMEOUT_MSEC / POLL_MSEC; tries++) {
> +		data = readl_relaxed(sysctrl + SC_CPU_RESET_STATUS(cluster));
> +		if (!(data & CORE_WFI_STATUS(cpu))) {
> +			msleep(POLL_MSEC);
> +			continue;
> +		}
> +		data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> +		       CORE_DEBUG_RESET_BIT(cpu);
> +		writel_relaxed(data, sysctrl + SC_CPU_RESET_REQ(cluster));

You are not supposed to perform any changes here.  This method is there 
only to wait for given CPU to be down.  It is not guaranteed it will be 
called.  Furthermore this can race with the power_up method.

Normally you should perform this reset business from 
hip04_mcpm_power_down() while the lock is held.  Now... this assumes 
that any power/reset would be effective only when the CPU executes a 
WFI.  Is this the case on this hardware?  If not then we'll have to 
think about a different strategy.

> +		return 0;
> +	}
> +
> +	return -ETIMEDOUT;
> +}
> +
> +static void hip04_mcpm_powered_up(void)
> +{
> +	if (!relocation)
> +		return;
> +	spin_lock(&boot_lock);
> +	writel_relaxed(0, relocation);
> +	writel_relaxed(0, relocation + 4);
> +	writel_relaxed(0, relocation + 8);
> +	writel_relaxed(0, relocation + 12);
> +	spin_unlock(&boot_lock);
> +}
> +
> +static const struct mcpm_platform_ops hip04_mcpm_ops = {
> +	.power_up		= hip04_mcpm_power_up,
> +	.power_down		= hip04_mcpm_power_down,
> +	.wait_for_powerdown	= hip04_mcpm_wait_for_powerdown,
> +	.powered_up		= hip04_mcpm_powered_up,
> +};
> +
> +static bool __init hip04_cpu_table_init(void)
> +{
> +	unsigned int mpidr, cpu, cluster;
> +
> +	mpidr = read_cpuid_mpidr();
> +	cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> +	cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> +
> +	if (cluster >= HIP04_MAX_CLUSTERS ||
> +	    cpu >= HIP04_MAX_CPUS_PER_CLUSTER) {
> +		pr_err("%s: boot CPU is out of bound!\n", __func__);
> +		return false;
> +	}
> +	hip04_set_snoop_filter(cluster, 1);
> +	hip04_cpu_table[cluster][cpu] = 1;
> +	return true;
> +}
> +
> +static int __init hip04_mcpm_init(void)
> +{
> +	struct device_node *np, *np_fab;
> +	int ret = -ENODEV;
> +
> +	np = of_find_compatible_node(NULL, NULL, "hisilicon,sysctrl");
> +	if (!np)
> +		goto err;
> +	np_fab = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-fabric");
> +	if (!np_fab)
> +		goto err;
> +
> +	if (of_property_read_u32(np, "bootwrapper-phys",
> +				 &hip04_boot.bootwrapper_phys)) {
> +		pr_err("failed to get bootwrapper-phys\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +	if (of_property_read_u32(np, "bootwrapper-size",
> +				 &hip04_boot.bootwrapper_size)) {
> +		pr_err("failed to get bootwrapper-size\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +	if (of_property_read_u32(np, "bootwrapper-magic",
> +				 &hip04_boot.bootwrapper_magic)) {
> +		pr_err("failed to get bootwrapper-magic\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +	if (of_property_read_u32(np, "relocation-entry",
> +				 &hip04_boot.relocation_entry)) {
> +		pr_err("failed to get relocation-entry\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +	if (of_property_read_u32(np, "relocation-size",
> +				 &hip04_boot.relocation_size)) {
> +		pr_err("failed to get relocation-size\n");
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	relocation = ioremap(hip04_boot.relocation_entry,
> +			     hip04_boot.relocation_size);
> +	if (!relocation) {
> +		pr_err("failed to map relocation space\n");
> +		ret = -ENOMEM;
> +		goto err;
> +	}
> +	sysctrl = of_iomap(np, 0);
> +	if (!sysctrl) {
> +		pr_err("failed to get sysctrl base\n");
> +		ret = -ENOMEM;
> +		goto err_sysctrl;
> +	}
> +	fabric = of_iomap(np_fab, 0);
> +	if (!fabric) {
> +		pr_err("failed to get fabric base\n");
> +		ret = -ENOMEM;
> +		goto err_fabric;
> +	}
> +
> +	if (!hip04_cpu_table_init())
> +		return -EINVAL;
> +	ret = mcpm_platform_register(&hip04_mcpm_ops);
> +	if (!ret) {
> +		mcpm_sync_init(NULL);
> +		pr_info("HiP04 MCPM initialized\n");
> +	}
> +	return ret;
> +err_fabric:
> +	iounmap(sysctrl);
> +err_sysctrl:
> +	iounmap(relocation);
> +err:
> +	return ret;
> +}
> +early_initcall(hip04_mcpm_init);
> +
> +bool __init hip04_smp_init_ops(void)
> +{
> +	mcpm_smp_set_ops();
> +	return true;
> +}

You should be able to call mcpm_smp_set_ops() directly from 
hip04_mcpm_init(), and only if there was no errors.


Nicolas

Haojian Zhuang May 12, 2014, 12:44 a.m. UTC | #3

On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>
>
> On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> On Wed, 7 May 2014, Haojian Zhuang wrote:
>>
>>> +
>>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
>>> +{
>>> +     unsigned long data, mask;
>>> +
>>> +     if (!relocation || !sysctrl)
>>> +             return -ENODEV;
>>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
>>> HIP04_MAX_CPUS_PER_CLUSTER)
>>> +             return -EINVAL;
>>> +
>>> +     spin_lock_irq(&boot_lock);
>>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
>>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
>>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
>>> +     writel_relaxed(0, relocation + 12);
>>> +
>>> +     if (hip04_cluster_down(cluster)) {
>>> +             data = CLUSTER_DEBUG_RESET_BIT;
>>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>>> +             do {
>>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
>>> +                     data = readl_relaxed(sysctrl + \
>>> +                                          SC_CPU_RESET_STATUS(cluster));
>>> +             } while (data & mask);
>>> +             hip04_set_snoop_filter(cluster, 1);
>>> +     }
>>
>> Isn't this racy?  What if the new CPU starts executing code before
>> snoops are enabled for it?
>>
>> Normally we solve this race by letting the waking-up CPU turn on its
>> snoops by itself from an assembly code callback provided as argument to
>> mcpm_sync_init().  This is also the only way to eventually support deep
>> C-states with cpuidle.
>>
>
> There's no race. At here, I enable snoop without unreset new CPU.
>
>>> +
>>> +     hip04_cpu_table[cluster][cpu]++;
>>> +
>>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
>>> +            CORE_DEBUG_RESET_BIT(cpu);
>>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>
> At here, I make new CPU out of reset state.
>
>
>>> +     spin_unlock_irq(&boot_lock);
>>> +     msleep(POLL_MSEC);
>>
>> What is that msleep() for?
>>
> Without this delay, I'll meet hang.
>
> Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
> IPI at here. If I remove the msleep() here, it seems that IPI caused my
> system hang. So it's a workaround in my system.
>

I'm sorry that it's not hang. It results that new CPU cannot be online.

>>> +     return 0;
>>> +}
>>> +
>>> +static void hip04_mcpm_power_down(void)
>>> +{
>>> +     unsigned int mpidr, cpu, cluster;
>>> +     bool skip_wfi = false;
>>> +
>>> +     mpidr = read_cpuid_mpidr();
>>> +     cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
>>> +     cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
>>> +
>>> +     __mcpm_cpu_going_down(cpu, cluster);
>>> +
>>> +     spin_lock(&boot_lock);
>>> +     BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
>>> +     hip04_cpu_table[cluster][cpu]--;
>>> +     if (hip04_cpu_table[cluster][cpu] == 1) {
>>> +             /* A power_up request went ahead of us. */
>>> +             skip_wfi = true;
>>> +     } else if (hip04_cpu_table[cluster][cpu] > 1) {
>>> +             pr_err("Cluster %d CPU%d is still running\n", cluster,
>>> cpu);
>>> +             BUG();
>>> +     }
>>> +
>>> +     spin_unlock(&boot_lock);
>>> +
>>> +     v7_exit_coherency_flush(louis);
>>
>> Don't you have to turn off snoop filters in the case of a complete
>> cluster down as well?
>>
>
> I still have some issue on disabling a whole cluster. So making it
> completely
> will be finished later in additional patch.
>
>
>>> +
>>> +     __mcpm_cpu_down(cpu, cluster);
>>> +
>>> +     isb();
>>> +     dsb();
>>
>> Those isb/dsb are redundant here.  They're implicit from
>> __mcpm_cpu_down().
>>
> OK. I'll remove it.
>
>
>>> +
>>> +     if (!skip_wfi)
>>> +             wfi();
>>> +}
>>> +
>>> +static int hip04_mcpm_wait_for_powerdown(unsigned int cpu, unsigned int
>>> cluster)
>>> +{
>>> +     unsigned int data, tries;
>>> +
>>> +     BUG_ON(cluster >= HIP04_MAX_CLUSTERS ||
>>> +            cpu >= HIP04_MAX_CPUS_PER_CLUSTER);
>>> +
>>> +     for (tries = 0; tries < TIMEOUT_MSEC / POLL_MSEC; tries++) {
>>> +             data = readl_relaxed(sysctrl +
>>> SC_CPU_RESET_STATUS(cluster));
>>> +             if (!(data & CORE_WFI_STATUS(cpu))) {
>>> +                     msleep(POLL_MSEC);
>>> +                     continue;
>>> +             }
>>> +             data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
>>> +                    CORE_DEBUG_RESET_BIT(cpu);
>>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_REQ(cluster));
>>
>> You are not supposed to perform any changes here.  This method is there
>> only to wait for given CPU to be down.  It is not guaranteed it will be
>> called.  Furthermore this can race with the power_up method.
>>
>> Normally you should perform this reset business from
>> hip04_mcpm_power_down() while the lock is held.  Now... this assumes
>> that any power/reset would be effective only when the CPU executes a
>> WFI.  Is this the case on this hardware?  If not then we'll have to
>> think about a different strategy.
>>
> OK. I can move reset operation in hip04_mcpm_power_down().
>
>>> +             return 0;
>>> +     }
>>> +
>>> +     return -ETIMEDOUT;
>>> +}
>>> +
>>> +static void hip04_mcpm_powered_up(void)
>>> +{
>>> +     if (!relocation)
>>> +             return;
>>> +     spin_lock(&boot_lock);
>>> +     writel_relaxed(0, relocation);
>>> +     writel_relaxed(0, relocation + 4);
>>> +     writel_relaxed(0, relocation + 8);
>>> +     writel_relaxed(0, relocation + 12);
>>> +     spin_unlock(&boot_lock);
>>> +}
>>> +
>>> +static const struct mcpm_platform_ops hip04_mcpm_ops = {
>>> +     .power_up               = hip04_mcpm_power_up,
>>> +     .power_down             = hip04_mcpm_power_down,
>>> +     .wait_for_powerdown     = hip04_mcpm_wait_for_powerdown,
>>> +     .powered_up             = hip04_mcpm_powered_up,
>>> +};
>>> +
>>> +static bool __init hip04_cpu_table_init(void)
>>> +{
>>> +     unsigned int mpidr, cpu, cluster;
>>> +
>>> +     mpidr = read_cpuid_mpidr();
>>> +     cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
>>> +     cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
>>> +
>>> +     if (cluster >= HIP04_MAX_CLUSTERS ||
>>> +         cpu >= HIP04_MAX_CPUS_PER_CLUSTER) {
>>> +             pr_err("%s: boot CPU is out of bound!\n", __func__);
>>> +             return false;
>>> +     }
>>> +     hip04_set_snoop_filter(cluster, 1);
>>> +     hip04_cpu_table[cluster][cpu] = 1;
>>> +     return true;
>>> +}
>>> +
>>> +static int __init hip04_mcpm_init(void)
>>> +{
>>> +     struct device_node *np, *np_fab;
>>> +     int ret = -ENODEV;
>>> +
>>> +     np = of_find_compatible_node(NULL, NULL, "hisilicon,sysctrl");
>>> +     if (!np)
>>> +             goto err;
>>> +     np_fab = of_find_compatible_node(NULL, NULL,
>>> "hisilicon,hip04-fabric");
>>> +     if (!np_fab)
>>> +             goto err;
>>> +
>>> +     if (of_property_read_u32(np, "bootwrapper-phys",
>>> +                              &hip04_boot.bootwrapper_phys)) {
>>> +             pr_err("failed to get bootwrapper-phys\n");
>>> +             ret = -EINVAL;
>>> +             goto err;
>>> +     }
>>> +     if (of_property_read_u32(np, "bootwrapper-size",
>>> +                              &hip04_boot.bootwrapper_size)) {
>>> +             pr_err("failed to get bootwrapper-size\n");
>>> +             ret = -EINVAL;
>>> +             goto err;
>>> +     }
>>> +     if (of_property_read_u32(np, "bootwrapper-magic",
>>> +                              &hip04_boot.bootwrapper_magic)) {
>>> +             pr_err("failed to get bootwrapper-magic\n");
>>> +             ret = -EINVAL;
>>> +             goto err;
>>> +     }
>>> +     if (of_property_read_u32(np, "relocation-entry",
>>> +                              &hip04_boot.relocation_entry)) {
>>> +             pr_err("failed to get relocation-entry\n");
>>> +             ret = -EINVAL;
>>> +             goto err;
>>> +     }
>>> +     if (of_property_read_u32(np, "relocation-size",
>>> +                              &hip04_boot.relocation_size)) {
>>> +             pr_err("failed to get relocation-size\n");
>>> +             ret = -EINVAL;
>>> +             goto err;
>>> +     }
>>> +
>>> +     relocation = ioremap(hip04_boot.relocation_entry,
>>> +                          hip04_boot.relocation_size);
>>> +     if (!relocation) {
>>> +             pr_err("failed to map relocation space\n");
>>> +             ret = -ENOMEM;
>>> +             goto err;
>>> +     }
>>> +     sysctrl = of_iomap(np, 0);
>>> +     if (!sysctrl) {
>>> +             pr_err("failed to get sysctrl base\n");
>>> +             ret = -ENOMEM;
>>> +             goto err_sysctrl;
>>> +     }
>>> +     fabric = of_iomap(np_fab, 0);
>>> +     if (!fabric) {
>>> +             pr_err("failed to get fabric base\n");
>>> +             ret = -ENOMEM;
>>> +             goto err_fabric;
>>> +     }
>>> +
>>> +     if (!hip04_cpu_table_init())
>>> +             return -EINVAL;
>>> +     ret = mcpm_platform_register(&hip04_mcpm_ops);
>>> +     if (!ret) {
>>> +             mcpm_sync_init(NULL);
>>> +             pr_info("HiP04 MCPM initialized\n");
>>> +     }
>>> +     return ret;
>>> +err_fabric:
>>> +     iounmap(sysctrl);
>>> +err_sysctrl:
>>> +     iounmap(relocation);
>>> +err:
>>> +     return ret;
>>> +}
>>> +early_initcall(hip04_mcpm_init);
>>> +
>>> +bool __init hip04_smp_init_ops(void)
>>> +{
>>> +     mcpm_smp_set_ops();
>>> +     return true;
>>> +}
>>
>> You should be able to call mcpm_smp_set_ops() directly from
>> hip04_mcpm_init(), and only if there was no errors.
>>
> if (!mdesc->smp_init || !mdesc->smp_init()) {
>         if (psci_smp_available())
>                 smp_set_ops(&psci_smp_ops);
>         else if (mdesc->smp)
>                 smp_set_ops(mdesc->smp);
> }
>
> Since the above code checks the return value of mdesc->smp_init(),
> I think it's better to call mcpm_smp_set_ops() in mdesc->smp_init().
>
> Otherwise, my mdesc->smp_init() only return true without doing anything.
> It'll confuse others to understand the code.
>
>>
>> Nicolas

Nicolas Pitre May 12, 2014, 7:39 p.m. UTC | #4

On Mon, 12 May 2014, Haojian Zhuang wrote:

> On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >
> >
> > On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> On Wed, 7 May 2014, Haojian Zhuang wrote:
> >>
> >>> +
> >>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
> >>> +{
> >>> +     unsigned long data, mask;
> >>> +
> >>> +     if (!relocation || !sysctrl)
> >>> +             return -ENODEV;
> >>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
> >>> HIP04_MAX_CPUS_PER_CLUSTER)
> >>> +             return -EINVAL;
> >>> +
> >>> +     spin_lock_irq(&boot_lock);
> >>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
> >>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
> >>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
> >>> +     writel_relaxed(0, relocation + 12);
> >>> +
> >>> +     if (hip04_cluster_down(cluster)) {
> >>> +             data = CLUSTER_DEBUG_RESET_BIT;
> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >>> +             do {
> >>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
> >>> +                     data = readl_relaxed(sysctrl + \
> >>> +                                          SC_CPU_RESET_STATUS(cluster));
> >>> +             } while (data & mask);
> >>> +             hip04_set_snoop_filter(cluster, 1);
> >>> +     }
> >>
> >> Isn't this racy?  What if the new CPU starts executing code before
> >> snoops are enabled for it?
> >>
> >> Normally we solve this race by letting the waking-up CPU turn on its
> >> snoops by itself from an assembly code callback provided as argument to
> >> mcpm_sync_init().  This is also the only way to eventually support deep
> >> C-states with cpuidle.
> >>
> >
> > There's no race. At here, I enable snoop without unreset new CPU.
> >
> >>> +
> >>> +     hip04_cpu_table[cluster][cpu]++;
> >>> +
> >>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> >>> +            CORE_DEBUG_RESET_BIT(cpu);
> >>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >
> > At here, I make new CPU out of reset state.

That's my point.  At least on systems with the CCI, you risk hanging the 
system if you enable snoops for a CPU that is not yet running.

Yet you cannot enable the CPU and then turn on snoops as you have the 
opposite race in that case.

So the only way to be absolutely safe is to let the waking-up CPU turn 
on snoops for itself _before_ it starts using the cache.  That is what 
the power_up_setup callback provided to mcpm_sync_init() is all about.  

It must be written all in assembly as there is no stack available yet 
when it is called.  See arch/arm/mach-vexpress/dcscb_setup.S for an 
example skeleton where cci_enable_port_for_self is called when the 
affinity level is 1 (cluster level) and nothing special is done when the 
affinity level is 0 (CPU level).  All the synchronization and race 
avoidance is all handled for you in arch/arm/common/mcpm_head.S already.

> >
> >
> >>> +     spin_unlock_irq(&boot_lock);
> >>> +     msleep(POLL_MSEC);
> >>
> >> What is that msleep() for?
> >>
> > Without this delay, I'll meet hang.
> >
> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
> > system hang. So it's a workaround in my system.

Could you confirm that the IPI is actually what causes you trouble?

If so maybe you should make sure that the CPU is actually masking out 
IRQs at all times before they're unmasked by the kernel.

> >>> +     return 0;
> >>> +}
> >>> +
> >>> +static void hip04_mcpm_power_down(void)
> >>> +{
> >>> +     unsigned int mpidr, cpu, cluster;
> >>> +     bool skip_wfi = false;
> >>> +
> >>> +     mpidr = read_cpuid_mpidr();
> >>> +     cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> >>> +     cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> >>> +
> >>> +     __mcpm_cpu_going_down(cpu, cluster);
> >>> +
> >>> +     spin_lock(&boot_lock);
> >>> +     BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
> >>> +     hip04_cpu_table[cluster][cpu]--;
> >>> +     if (hip04_cpu_table[cluster][cpu] == 1) {
> >>> +             /* A power_up request went ahead of us. */
> >>> +             skip_wfi = true;
> >>> +     } else if (hip04_cpu_table[cluster][cpu] > 1) {
> >>> +             pr_err("Cluster %d CPU%d is still running\n", cluster,
> >>> cpu);
> >>> +             BUG();
> >>> +     }
> >>> +
> >>> +     spin_unlock(&boot_lock);
> >>> +
> >>> +     v7_exit_coherency_flush(louis);
> >>
> >> Don't you have to turn off snoop filters in the case of a complete
> >> cluster down as well?
> >>
> >
> > I still have some issue on disabling a whole cluster. So making it
> > completely
> > will be finished later in additional patch.

Most likely related to my commentabove.

> >
> >
> >>> +
> >>> +     __mcpm_cpu_down(cpu, cluster);
> >>> +
> >>> +     isb();
> >>> +     dsb();
> >>
> >> Those isb/dsb are redundant here.  They're implicit from
> >> __mcpm_cpu_down().
> >>
> > OK. I'll remove it.
> >
> >
> >>> +
> >>> +     if (!skip_wfi)
> >>> +             wfi();
> >>> +}
> >>> +
> >>> +static int hip04_mcpm_wait_for_powerdown(unsigned int cpu, unsigned int
> >>> cluster)
> >>> +{
> >>> +     unsigned int data, tries;
> >>> +
> >>> +     BUG_ON(cluster >= HIP04_MAX_CLUSTERS ||
> >>> +            cpu >= HIP04_MAX_CPUS_PER_CLUSTER);
> >>> +
> >>> +     for (tries = 0; tries < TIMEOUT_MSEC / POLL_MSEC; tries++) {
> >>> +             data = readl_relaxed(sysctrl +
> >>> SC_CPU_RESET_STATUS(cluster));
> >>> +             if (!(data & CORE_WFI_STATUS(cpu))) {
> >>> +                     msleep(POLL_MSEC);
> >>> +                     continue;
> >>> +             }
> >>> +             data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> >>> +                    CORE_DEBUG_RESET_BIT(cpu);
> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_REQ(cluster));
> >>
> >> You are not supposed to perform any changes here.  This method is there
> >> only to wait for given CPU to be down.  It is not guaranteed it will be
> >> called.  Furthermore this can race with the power_up method.
> >>
> >> Normally you should perform this reset business from
> >> hip04_mcpm_power_down() while the lock is held.  Now... this assumes
> >> that any power/reset would be effective only when the CPU executes a
> >> WFI.  Is this the case on this hardware?  If not then we'll have to
> >> think about a different strategy.
> >>
> > OK. I can move reset operation in hip04_mcpm_power_down().
> >
> >>> +             return 0;
> >>> +     }
> >>> +
> >>> +     return -ETIMEDOUT;
> >>> +}
> >>> +
> >>> +static void hip04_mcpm_powered_up(void)
> >>> +{
> >>> +     if (!relocation)
> >>> +             return;
> >>> +     spin_lock(&boot_lock);
> >>> +     writel_relaxed(0, relocation);
> >>> +     writel_relaxed(0, relocation + 4);
> >>> +     writel_relaxed(0, relocation + 8);
> >>> +     writel_relaxed(0, relocation + 12);
> >>> +     spin_unlock(&boot_lock);
> >>> +}
> >>> +
> >>> +static const struct mcpm_platform_ops hip04_mcpm_ops = {
> >>> +     .power_up               = hip04_mcpm_power_up,
> >>> +     .power_down             = hip04_mcpm_power_down,
> >>> +     .wait_for_powerdown     = hip04_mcpm_wait_for_powerdown,
> >>> +     .powered_up             = hip04_mcpm_powered_up,
> >>> +};
> >>> +
> >>> +static bool __init hip04_cpu_table_init(void)
> >>> +{
> >>> +     unsigned int mpidr, cpu, cluster;
> >>> +
> >>> +     mpidr = read_cpuid_mpidr();
> >>> +     cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
> >>> +     cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
> >>> +
> >>> +     if (cluster >= HIP04_MAX_CLUSTERS ||
> >>> +         cpu >= HIP04_MAX_CPUS_PER_CLUSTER) {
> >>> +             pr_err("%s: boot CPU is out of bound!\n", __func__);
> >>> +             return false;
> >>> +     }
> >>> +     hip04_set_snoop_filter(cluster, 1);
> >>> +     hip04_cpu_table[cluster][cpu] = 1;
> >>> +     return true;
> >>> +}
> >>> +
> >>> +static int __init hip04_mcpm_init(void)
> >>> +{
> >>> +     struct device_node *np, *np_fab;
> >>> +     int ret = -ENODEV;
> >>> +
> >>> +     np = of_find_compatible_node(NULL, NULL, "hisilicon,sysctrl");
> >>> +     if (!np)
> >>> +             goto err;
> >>> +     np_fab = of_find_compatible_node(NULL, NULL,
> >>> "hisilicon,hip04-fabric");
> >>> +     if (!np_fab)
> >>> +             goto err;
> >>> +
> >>> +     if (of_property_read_u32(np, "bootwrapper-phys",
> >>> +                              &hip04_boot.bootwrapper_phys)) {
> >>> +             pr_err("failed to get bootwrapper-phys\n");
> >>> +             ret = -EINVAL;
> >>> +             goto err;
> >>> +     }
> >>> +     if (of_property_read_u32(np, "bootwrapper-size",
> >>> +                              &hip04_boot.bootwrapper_size)) {
> >>> +             pr_err("failed to get bootwrapper-size\n");
> >>> +             ret = -EINVAL;
> >>> +             goto err;
> >>> +     }
> >>> +     if (of_property_read_u32(np, "bootwrapper-magic",
> >>> +                              &hip04_boot.bootwrapper_magic)) {
> >>> +             pr_err("failed to get bootwrapper-magic\n");
> >>> +             ret = -EINVAL;
> >>> +             goto err;
> >>> +     }
> >>> +     if (of_property_read_u32(np, "relocation-entry",
> >>> +                              &hip04_boot.relocation_entry)) {
> >>> +             pr_err("failed to get relocation-entry\n");
> >>> +             ret = -EINVAL;
> >>> +             goto err;
> >>> +     }
> >>> +     if (of_property_read_u32(np, "relocation-size",
> >>> +                              &hip04_boot.relocation_size)) {
> >>> +             pr_err("failed to get relocation-size\n");
> >>> +             ret = -EINVAL;
> >>> +             goto err;
> >>> +     }
> >>> +
> >>> +     relocation = ioremap(hip04_boot.relocation_entry,
> >>> +                          hip04_boot.relocation_size);
> >>> +     if (!relocation) {
> >>> +             pr_err("failed to map relocation space\n");
> >>> +             ret = -ENOMEM;
> >>> +             goto err;
> >>> +     }
> >>> +     sysctrl = of_iomap(np, 0);
> >>> +     if (!sysctrl) {
> >>> +             pr_err("failed to get sysctrl base\n");
> >>> +             ret = -ENOMEM;
> >>> +             goto err_sysctrl;
> >>> +     }
> >>> +     fabric = of_iomap(np_fab, 0);
> >>> +     if (!fabric) {
> >>> +             pr_err("failed to get fabric base\n");
> >>> +             ret = -ENOMEM;
> >>> +             goto err_fabric;
> >>> +     }
> >>> +
> >>> +     if (!hip04_cpu_table_init())
> >>> +             return -EINVAL;
> >>> +     ret = mcpm_platform_register(&hip04_mcpm_ops);
> >>> +     if (!ret) {
> >>> +             mcpm_sync_init(NULL);
> >>> +             pr_info("HiP04 MCPM initialized\n");
> >>> +     }
> >>> +     return ret;
> >>> +err_fabric:
> >>> +     iounmap(sysctrl);
> >>> +err_sysctrl:
> >>> +     iounmap(relocation);
> >>> +err:
> >>> +     return ret;
> >>> +}
> >>> +early_initcall(hip04_mcpm_init);
> >>> +
> >>> +bool __init hip04_smp_init_ops(void)
> >>> +{
> >>> +     mcpm_smp_set_ops();
> >>> +     return true;
> >>> +}
> >>
> >> You should be able to call mcpm_smp_set_ops() directly from
> >> hip04_mcpm_init(), and only if there was no errors.
> >>
> > if (!mdesc->smp_init || !mdesc->smp_init()) {
> >         if (psci_smp_available())
> >                 smp_set_ops(&psci_smp_ops);
> >         else if (mdesc->smp)
> >                 smp_set_ops(mdesc->smp);
> > }
> >
> > Since the above code checks the return value of mdesc->smp_init(),
> > I think it's better to call mcpm_smp_set_ops() in mdesc->smp_init().
> >
> > Otherwise, my mdesc->smp_init() only return true without doing anything.
> > It'll confuse others to understand the code.

Simply don't provide a mdesc->smp_init at all.  If you have a default 
value for mdesc->smp then it'll be replaced when hip04_mcpm_init() 
completes successfully.


Nicolas

Haojian Zhuang May 13, 2014, 9:57 a.m. UTC | #5

On 13 May 2014 03:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Mon, 12 May 2014, Haojian Zhuang wrote:
>
>> On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> >
>> >
>> > On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> >> On Wed, 7 May 2014, Haojian Zhuang wrote:
>> >>
>> >>> +
>> >>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
>> >>> +{
>> >>> +     unsigned long data, mask;
>> >>> +
>> >>> +     if (!relocation || !sysctrl)
>> >>> +             return -ENODEV;
>> >>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
>> >>> HIP04_MAX_CPUS_PER_CLUSTER)
>> >>> +             return -EINVAL;
>> >>> +
>> >>> +     spin_lock_irq(&boot_lock);
>> >>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
>> >>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
>> >>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
>> >>> +     writel_relaxed(0, relocation + 12);
>> >>> +
>> >>> +     if (hip04_cluster_down(cluster)) {
>> >>> +             data = CLUSTER_DEBUG_RESET_BIT;
>> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>> >>> +             do {
>> >>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
>> >>> +                     data = readl_relaxed(sysctrl + \
>> >>> +                                          SC_CPU_RESET_STATUS(cluster));
>> >>> +             } while (data & mask);
>> >>> +             hip04_set_snoop_filter(cluster, 1);
>> >>> +     }
>> >>
>> >> Isn't this racy?  What if the new CPU starts executing code before
>> >> snoops are enabled for it?
>> >>
>> >> Normally we solve this race by letting the waking-up CPU turn on its
>> >> snoops by itself from an assembly code callback provided as argument to
>> >> mcpm_sync_init().  This is also the only way to eventually support deep
>> >> C-states with cpuidle.
>> >>
>> >
>> > There's no race. At here, I enable snoop without unreset new CPU.
>> >
>> >>> +
>> >>> +     hip04_cpu_table[cluster][cpu]++;
>> >>> +
>> >>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
>> >>> +            CORE_DEBUG_RESET_BIT(cpu);
>> >>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>> >
>> > At here, I make new CPU out of reset state.
>
> That's my point.  At least on systems with the CCI, you risk hanging the
> system if you enable snoops for a CPU that is not yet running.
>
> Yet you cannot enable the CPU and then turn on snoops as you have the
> opposite race in that case.
>
> So the only way to be absolutely safe is to let the waking-up CPU turn
> on snoops for itself _before_ it starts using the cache.  That is what
> the power_up_setup callback provided to mcpm_sync_init() is all about.
>
> It must be written all in assembly as there is no stack available yet
> when it is called.  See arch/arm/mach-vexpress/dcscb_setup.S for an
> example skeleton where cci_enable_port_for_self is called when the
> affinity level is 1 (cluster level) and nothing special is done when the
> affinity level is 0 (CPU level).  All the synchronization and race
> avoidance is all handled for you in arch/arm/common/mcpm_head.S already.
>

There's no CCI in HiP04. Their own fabric is used instead.

>> >
>> >
>> >>> +     spin_unlock_irq(&boot_lock);
>> >>> +     msleep(POLL_MSEC);
>> >>
>> >> What is that msleep() for?
>> >>
>> > Without this delay, I'll meet hang.
>> >
>> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
>> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
>> > system hang. So it's a workaround in my system.
>
> Could you confirm that the IPI is actually what causes you trouble?
>
> If so maybe you should make sure that the CPU is actually masking out
> IRQs at all times before they're unmasked by the kernel.
>

I think so. But I don't have any hardware debugger to confirm it.

I think that unnecessary interrupt causes new CPU in interrupt mode,
then this CPU can't be online.

If I remove the code to send IPI, the new CPU could be online right.

>> >>> +     return 0;
>> >>> +}
>> >>> +
>> >>> +static void hip04_mcpm_power_down(void)
>> >>> +{
>> >>> +     unsigned int mpidr, cpu, cluster;
>> >>> +     bool skip_wfi = false;
>> >>> +
>> >>> +     mpidr = read_cpuid_mpidr();
>> >>> +     cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
>> >>> +     cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
>> >>> +
>> >>> +     __mcpm_cpu_going_down(cpu, cluster);
>> >>> +
>> >>> +     spin_lock(&boot_lock);
>> >>> +     BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
>> >>> +     hip04_cpu_table[cluster][cpu]--;
>> >>> +     if (hip04_cpu_table[cluster][cpu] == 1) {
>> >>> +             /* A power_up request went ahead of us. */
>> >>> +             skip_wfi = true;
>> >>> +     } else if (hip04_cpu_table[cluster][cpu] > 1) {
>> >>> +             pr_err("Cluster %d CPU%d is still running\n", cluster,
>> >>> cpu);
>> >>> +             BUG();
>> >>> +     }
>> >>> +
>> >>> +     spin_unlock(&boot_lock);
>> >>> +
>> >>> +     v7_exit_coherency_flush(louis);
>> >>
>> >> Don't you have to turn off snoop filters in the case of a complete
>> >> cluster down as well?
>> >>
>> >
>> > I still have some issue on disabling a whole cluster. So making it
>> > completely
>> > will be finished later in additional patch.
>
> Most likely related to my commentabove.
>
Maybe it's not. In my original code, I call hip04_set_snoop_filter() at here.
But it doesn't help me.

Before I root cause this issue, I don't plan to add
hip04_set_snoop_filter() here.

>> >> You should be able to call mcpm_smp_set_ops() directly from
>> >> hip04_mcpm_init(), and only if there was no errors.
>> >>
>> > if (!mdesc->smp_init || !mdesc->smp_init()) {
>> >         if (psci_smp_available())
>> >                 smp_set_ops(&psci_smp_ops);
>> >         else if (mdesc->smp)
>> >                 smp_set_ops(mdesc->smp);
>> > }
>> >
>> > Since the above code checks the return value of mdesc->smp_init(),
>> > I think it's better to call mcpm_smp_set_ops() in mdesc->smp_init().
>> >
>> > Otherwise, my mdesc->smp_init() only return true without doing anything.
>> > It'll confuse others to understand the code.
>
> Simply don't provide a mdesc->smp_init at all.  If you have a default
> value for mdesc->smp then it'll be replaced when hip04_mcpm_init()
> completes successfully.
>
>
> Nicolas

Nicolas Pitre May 13, 2014, 6:44 p.m. UTC | #6

On Tue, 13 May 2014, Haojian Zhuang wrote:

> On 13 May 2014 03:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Mon, 12 May 2014, Haojian Zhuang wrote:
> >
> >> On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >> >
> >> >
> >> > On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> >> On Wed, 7 May 2014, Haojian Zhuang wrote:
> >> >>
> >> >>> +
> >> >>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
> >> >>> +{
> >> >>> +     unsigned long data, mask;
> >> >>> +
> >> >>> +     if (!relocation || !sysctrl)
> >> >>> +             return -ENODEV;
> >> >>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
> >> >>> HIP04_MAX_CPUS_PER_CLUSTER)
> >> >>> +             return -EINVAL;
> >> >>> +
> >> >>> +     spin_lock_irq(&boot_lock);
> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
> >> >>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
> >> >>> +     writel_relaxed(0, relocation + 12);
> >> >>> +
> >> >>> +     if (hip04_cluster_down(cluster)) {
> >> >>> +             data = CLUSTER_DEBUG_RESET_BIT;
> >> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >> >>> +             do {
> >> >>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
> >> >>> +                     data = readl_relaxed(sysctrl + \
> >> >>> +                                          SC_CPU_RESET_STATUS(cluster));
> >> >>> +             } while (data & mask);
> >> >>> +             hip04_set_snoop_filter(cluster, 1);
> >> >>> +     }
> >> >>
> >> >> Isn't this racy?  What if the new CPU starts executing code before
> >> >> snoops are enabled for it?
> >> >>
> >> >> Normally we solve this race by letting the waking-up CPU turn on its
> >> >> snoops by itself from an assembly code callback provided as argument to
> >> >> mcpm_sync_init().  This is also the only way to eventually support deep
> >> >> C-states with cpuidle.
> >> >>
> >> >
> >> > There's no race. At here, I enable snoop without unreset new CPU.
> >> >
> >> >>> +
> >> >>> +     hip04_cpu_table[cluster][cpu]++;
> >> >>> +
> >> >>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> >> >>> +            CORE_DEBUG_RESET_BIT(cpu);
> >> >>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >> >
> >> > At here, I make new CPU out of reset state.
> >
> > That's my point.  At least on systems with the CCI, you risk hanging the
> > system if you enable snoops for a CPU that is not yet running.
> >
> > Yet you cannot enable the CPU and then turn on snoops as you have the
> > opposite race in that case.
> >
> > So the only way to be absolutely safe is to let the waking-up CPU turn
> > on snoops for itself _before_ it starts using the cache.  That is what
> > the power_up_setup callback provided to mcpm_sync_init() is all about.
> >
> > It must be written all in assembly as there is no stack available yet
> > when it is called.  See arch/arm/mach-vexpress/dcscb_setup.S for an
> > example skeleton where cci_enable_port_for_self is called when the
> > affinity level is 1 (cluster level) and nothing special is done when the
> > affinity level is 0 (CPU level).  All the synchronization and race
> > avoidance is all handled for you in arch/arm/common/mcpm_head.S already.
> >
> 
> There's no CCI in HiP04. Their own fabric is used instead.

OK.  Still, in order to support deep C-States with cpuidle, the 
waking-up of CPUs must happen asynchronously, and in that case the CPU 
being awaken must do snooping setup by itself.  So I'd be interested to 
see if that can be down that way as well for HiP04, at least to be 
similar in structure to other backend implementations, and possibly for 
being cpuidle ready as well.

> >> >
> >> >
> >> >>> +     spin_unlock_irq(&boot_lock);
> >> >>> +     msleep(POLL_MSEC);
> >> >>
> >> >> What is that msleep() for?
> >> >>
> >> > Without this delay, I'll meet hang.
> >> >
> >> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
> >> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
> >> > system hang. So it's a workaround in my system.
> >
> > Could you confirm that the IPI is actually what causes you trouble?
> >
> > If so maybe you should make sure that the CPU is actually masking out
> > IRQs at all times before they're unmasked by the kernel.
> >
> 
> I think so. But I don't have any hardware debugger to confirm it.
> 
> I think that unnecessary interrupt causes new CPU in interrupt mode,
> then this CPU can't be online.
> 
> If I remove the code to send IPI, the new CPU could be online right.

If so that's a bug in the firmware/bootloader/whatever code that needs 
to be fixed.  CPUs must never unmasq IRQs if they're not ready to cope 
with them.


Nicolas

Haojian Zhuang May 15, 2014, 6:28 a.m. UTC | #7

On 14 May 2014 02:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Tue, 13 May 2014, Haojian Zhuang wrote:
>
>> On 13 May 2014 03:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> > On Mon, 12 May 2014, Haojian Zhuang wrote:
>> >
>> >> On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> >> >
>> >> >
>> >> > On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> >> >> On Wed, 7 May 2014, Haojian Zhuang wrote:
>> >> >>
>> >> >>> +
>> >> >>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
>> >> >>> +{
>> >> >>> +     unsigned long data, mask;
>> >> >>> +
>> >> >>> +     if (!relocation || !sysctrl)
>> >> >>> +             return -ENODEV;
>> >> >>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
>> >> >>> HIP04_MAX_CPUS_PER_CLUSTER)
>> >> >>> +             return -EINVAL;
>> >> >>> +
>> >> >>> +     spin_lock_irq(&boot_lock);
>> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
>> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
>> >> >>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
>> >> >>> +     writel_relaxed(0, relocation + 12);
>> >> >>> +
>> >> >>> +     if (hip04_cluster_down(cluster)) {
>> >> >>> +             data = CLUSTER_DEBUG_RESET_BIT;
>> >> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>> >> >>> +             do {
>> >> >>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
>> >> >>> +                     data = readl_relaxed(sysctrl + \
>> >> >>> +                                          SC_CPU_RESET_STATUS(cluster));
>> >> >>> +             } while (data & mask);
>> >> >>> +             hip04_set_snoop_filter(cluster, 1);
>> >> >>> +     }
>> >> >>
>> >> >> Isn't this racy?  What if the new CPU starts executing code before
>> >> >> snoops are enabled for it?
>> >> >>
>> >> >> Normally we solve this race by letting the waking-up CPU turn on its
>> >> >> snoops by itself from an assembly code callback provided as argument to
>> >> >> mcpm_sync_init().  This is also the only way to eventually support deep
>> >> >> C-states with cpuidle.
>> >> >>
>> >> >
>> >> > There's no race. At here, I enable snoop without unreset new CPU.
>> >> >
>> >> >>> +
>> >> >>> +     hip04_cpu_table[cluster][cpu]++;
>> >> >>> +
>> >> >>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
>> >> >>> +            CORE_DEBUG_RESET_BIT(cpu);
>> >> >>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
>> >> >
>> >> > At here, I make new CPU out of reset state.
>> >
>> > That's my point.  At least on systems with the CCI, you risk hanging the
>> > system if you enable snoops for a CPU that is not yet running.
>> >
>> > Yet you cannot enable the CPU and then turn on snoops as you have the
>> > opposite race in that case.
>> >
>> > So the only way to be absolutely safe is to let the waking-up CPU turn
>> > on snoops for itself _before_ it starts using the cache.  That is what
>> > the power_up_setup callback provided to mcpm_sync_init() is all about.
>> >
>> > It must be written all in assembly as there is no stack available yet
>> > when it is called.  See arch/arm/mach-vexpress/dcscb_setup.S for an
>> > example skeleton where cci_enable_port_for_self is called when the
>> > affinity level is 1 (cluster level) and nothing special is done when the
>> > affinity level is 0 (CPU level).  All the synchronization and race
>> > avoidance is all handled for you in arch/arm/common/mcpm_head.S already.
>> >
>>
>> There's no CCI in HiP04. Their own fabric is used instead.
>
> OK.  Still, in order to support deep C-States with cpuidle, the
> waking-up of CPUs must happen asynchronously, and in that case the CPU
> being awaken must do snooping setup by itself.  So I'd be interested to
> see if that can be down that way as well for HiP04, at least to be
> similar in structure to other backend implementations, and possibly for
> being cpuidle ready as well.
>

But it fails on my platform if I execute snooping setup on the new CPU.

>> >> >
>> >> >
>> >> >>> +     spin_unlock_irq(&boot_lock);
>> >> >>> +     msleep(POLL_MSEC);
>> >> >>
>> >> >> What is that msleep() for?
>> >> >>
>> >> > Without this delay, I'll meet hang.
>> >> >
>> >> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
>> >> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
>> >> > system hang. So it's a workaround in my system.
>> >
>> > Could you confirm that the IPI is actually what causes you trouble?
>> >
>> > If so maybe you should make sure that the CPU is actually masking out
>> > IRQs at all times before they're unmasked by the kernel.
>> >
>>
>> I think so. But I don't have any hardware debugger to confirm it.
>>
>> I think that unnecessary interrupt causes new CPU in interrupt mode,
>> then this CPU can't be online.
>>
>> If I remove the code to send IPI, the new CPU could be online right.
>
> If so that's a bug in the firmware/bootloader/whatever code that needs
> to be fixed.  CPUs must never unmasq IRQs if they're not ready to cope
> with them.
>

I agree that it should be fixed in firmware. But I need some time. So I
want to keep the workaround until the bug is fixed in firmware.

Regards
Haojian

Nicolas Pitre May 15, 2014, 8:10 p.m. UTC | #8

On Thu, 15 May 2014, Haojian Zhuang wrote:

> On 14 May 2014 02:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Tue, 13 May 2014, Haojian Zhuang wrote:
> >
> >> On 13 May 2014 03:39, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> > On Mon, 12 May 2014, Haojian Zhuang wrote:
> >> >
> >> >> On 11 May 2014 16:02, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >> >> >
> >> >> >
> >> >> > On 8 May 2014 01:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> >> >> On Wed, 7 May 2014, Haojian Zhuang wrote:
> >> >> >>
> >> >> >>> +
> >> >> >>> +static int hip04_mcpm_power_up(unsigned int cpu, unsigned int cluster)
> >> >> >>> +{
> >> >> >>> +     unsigned long data, mask;
> >> >> >>> +
> >> >> >>> +     if (!relocation || !sysctrl)
> >> >> >>> +             return -ENODEV;
> >> >> >>> +     if (cluster >= HIP04_MAX_CLUSTERS || cpu >=
> >> >> >>> HIP04_MAX_CPUS_PER_CLUSTER)
> >> >> >>> +             return -EINVAL;
> >> >> >>> +
> >> >> >>> +     spin_lock_irq(&boot_lock);
> >> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_phys, relocation);
> >> >> >>> +     writel_relaxed(hip04_boot.bootwrapper_magic, relocation + 4);
> >> >> >>> +     writel_relaxed(virt_to_phys(mcpm_entry_point), relocation + 8);
> >> >> >>> +     writel_relaxed(0, relocation + 12);
> >> >> >>> +
> >> >> >>> +     if (hip04_cluster_down(cluster)) {
> >> >> >>> +             data = CLUSTER_DEBUG_RESET_BIT;
> >> >> >>> +             writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >> >> >>> +             do {
> >> >> >>> +                     mask = CLUSTER_DEBUG_RESET_STATUS;
> >> >> >>> +                     data = readl_relaxed(sysctrl + \
> >> >> >>> +                                          SC_CPU_RESET_STATUS(cluster));
> >> >> >>> +             } while (data & mask);
> >> >> >>> +             hip04_set_snoop_filter(cluster, 1);
> >> >> >>> +     }
> >> >> >>
> >> >> >> Isn't this racy?  What if the new CPU starts executing code before
> >> >> >> snoops are enabled for it?
> >> >> >>
> >> >> >> Normally we solve this race by letting the waking-up CPU turn on its
> >> >> >> snoops by itself from an assembly code callback provided as argument to
> >> >> >> mcpm_sync_init().  This is also the only way to eventually support deep
> >> >> >> C-states with cpuidle.
> >> >> >>
> >> >> >
> >> >> > There's no race. At here, I enable snoop without unreset new CPU.
> >> >> >
> >> >> >>> +
> >> >> >>> +     hip04_cpu_table[cluster][cpu]++;
> >> >> >>> +
> >> >> >>> +     data = CORE_RESET_BIT(cpu) | NEON_RESET_BIT(cpu) | \
> >> >> >>> +            CORE_DEBUG_RESET_BIT(cpu);
> >> >> >>> +     writel_relaxed(data, sysctrl + SC_CPU_RESET_DREQ(cluster));
> >> >> >
> >> >> > At here, I make new CPU out of reset state.
> >> >
> >> > That's my point.  At least on systems with the CCI, you risk hanging the
> >> > system if you enable snoops for a CPU that is not yet running.
> >> >
> >> > Yet you cannot enable the CPU and then turn on snoops as you have the
> >> > opposite race in that case.
> >> >
> >> > So the only way to be absolutely safe is to let the waking-up CPU turn
> >> > on snoops for itself _before_ it starts using the cache.  That is what
> >> > the power_up_setup callback provided to mcpm_sync_init() is all about.
> >> >
> >> > It must be written all in assembly as there is no stack available yet
> >> > when it is called.  See arch/arm/mach-vexpress/dcscb_setup.S for an
> >> > example skeleton where cci_enable_port_for_self is called when the
> >> > affinity level is 1 (cluster level) and nothing special is done when the
> >> > affinity level is 0 (CPU level).  All the synchronization and race
> >> > avoidance is all handled for you in arch/arm/common/mcpm_head.S already.
> >> >
> >>
> >> There's no CCI in HiP04. Their own fabric is used instead.
> >
> > OK.  Still, in order to support deep C-States with cpuidle, the
> > waking-up of CPUs must happen asynchronously, and in that case the CPU
> > being awaken must do snooping setup by itself.  So I'd be interested to
> > see if that can be down that way as well for HiP04, at least to be
> > similar in structure to other backend implementations, and possibly for
> > being cpuidle ready as well.
> >
> 
> But it fails on my platform if I execute snooping setup on the new CPU.

It fails how?  I want to make sure if the problem is with the hardware 
design or the code.

> >> >> >
> >> >> >>> +     spin_unlock_irq(&boot_lock);
> >> >> >>> +     msleep(POLL_MSEC);
> >> >> >>
> >> >> >> What is that msleep() for?
> >> >> >>
> >> >> > Without this delay, I'll meet hang.
> >> >> >
> >> >> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
> >> >> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
> >> >> > system hang. So it's a workaround in my system.
> >> >
> >> > Could you confirm that the IPI is actually what causes you trouble?
> >> >
> >> > If so maybe you should make sure that the CPU is actually masking out
> >> > IRQs at all times before they're unmasked by the kernel.
> >> >
> >>
> >> I think so. But I don't have any hardware debugger to confirm it.
> >>
> >> I think that unnecessary interrupt causes new CPU in interrupt mode,
> >> then this CPU can't be online.
> >>
> >> If I remove the code to send IPI, the new CPU could be online right.
> >
> > If so that's a bug in the firmware/bootloader/whatever code that needs
> > to be fixed.  CPUs must never unmasq IRQs if they're not ready to cope
> > with them.
> >
> 
> I agree that it should be fixed in firmware. But I need some time. So I
> want to keep the workaround until the bug is fixed in firmware.

This is still a very bad workaround.  As an alternative, there might be 
a way to simply mask the IPI at the GIC level and unmask it from the 
powered_up method.

Also... if this is going to be fixed in the firmware, maybe not 
including such a workaround into the upstream submission will keep the 
incentive alive for having it actually fixed.


Nicolas

Haojian Zhuang May 20, 2014, 4:53 a.m. UTC | #9

On 16 May 2014 04:10, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 15 May 2014, Haojian Zhuang wrote:
>
>> On 14 May 2014 02:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> >>
>> >> There's no CCI in HiP04. Their own fabric is used instead.
>> >
>> > OK.  Still, in order to support deep C-States with cpuidle, the
>> > waking-up of CPUs must happen asynchronously, and in that case the CPU
>> > being awaken must do snooping setup by itself.  So I'd be interested to
>> > see if that can be down that way as well for HiP04, at least to be
>> > similar in structure to other backend implementations, and possibly for
>> > being cpuidle ready as well.
>> >
>>
>> But it fails on my platform if I execute snooping setup on the new CPU.
>
> It fails how?  I want to make sure if the problem is with the hardware
> design or the code.
>
The failure isn't caused by your assembly code. I even rewrite it.

It's caused by hardware design. Hisilicon guys said the snooping setup
should be set before new CPU on. I tried and got the same result.

>> >> >> >
>> >> >> >>> +     spin_unlock_irq(&boot_lock);
>> >> >> >>> +     msleep(POLL_MSEC);
>> >> >> >>
>> >> >> >> What is that msleep() for?
>> >> >> >>
>> >> >> > Without this delay, I'll meet hang.
>> >> >> >
>> >> >> > Since I'm using reset to wakeup CPU from wfi/wfe status, I don't need
>> >> >> > IPI at here. If I remove the msleep() here, it seems that IPI caused my
>> >> >> > system hang. So it's a workaround in my system.
>> >> >
>> >> > Could you confirm that the IPI is actually what causes you trouble?
>> >> >
>> >> > If so maybe you should make sure that the CPU is actually masking out
>> >> > IRQs at all times before they're unmasked by the kernel.
>> >> >
>> >>
>> >> I think so. But I don't have any hardware debugger to confirm it.
>> >>
>> >> I think that unnecessary interrupt causes new CPU in interrupt mode,
>> >> then this CPU can't be online.
>> >>
>> >> If I remove the code to send IPI, the new CPU could be online right.
>> >
>> > If so that's a bug in the firmware/bootloader/whatever code that needs
>> > to be fixed.  CPUs must never unmasq IRQs if they're not ready to cope
>> > with them.
>> >
>>
>> I agree that it should be fixed in firmware. But I need some time. So I
>> want to keep the workaround until the bug is fixed in firmware.
>
> This is still a very bad workaround.  As an alternative, there might be
> a way to simply mask the IPI at the GIC level and unmask it from the
> powered_up method.
>
> Also... if this is going to be fixed in the firmware, maybe not
> including such a workaround into the upstream submission will keep the
> incentive alive for having it actually fixed.
>


OK. Let me remove the workaround.

Regards
Haojian

Dave Martin May 21, 2014, 9:47 a.m. UTC | #10

On Tue, May 20, 2014 at 12:53:31PM +0800, Haojian Zhuang wrote:
> On 16 May 2014 04:10, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 15 May 2014, Haojian Zhuang wrote:
> >
> >> On 14 May 2014 02:44, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> >>
> >> >> There's no CCI in HiP04. Their own fabric is used instead.
> >> >
> >> > OK.  Still, in order to support deep C-States with cpuidle, the
> >> > waking-up of CPUs must happen asynchronously, and in that case the CPU
> >> > being awaken must do snooping setup by itself.  So I'd be interested to
> >> > see if that can be down that way as well for HiP04, at least to be
> >> > similar in structure to other backend implementations, and possibly for
> >> > being cpuidle ready as well.
> >> >
> >>
> >> But it fails on my platform if I execute snooping setup on the new CPU.
> >
> > It fails how?  I want to make sure if the problem is with the hardware
> > design or the code.
> >
> The failure isn't caused by your assembly code. I even rewrite it.
> 
> It's caused by hardware design. Hisilicon guys said the snooping setup
> should be set before new CPU on. I tried and got the same result.

How does this design support cluster powerdown during idle, then ... or
is this just not intended to be supported?

Unless there is some extra logic in the hardware taking care of snoop
control races, it's not clear who interrupt-driven wakeup from a cluster
powerdown could work safely.

Cheers
---Dave

[v5,08/14] ARM: hisi: add hip04 SoC support

Commit Message

Comments

Patch