[v2,00/15] Introduce PECI subsystem

Message ID	20210803113134.2262882-1-iwona.winiarska@intel.com
Headers	show Return-Path: <devicetree-owner@kernel.org> From: Iwona Winiarska <iwona.winiarska@intel.com> To: linux-kernel@vger.kernel.org, openbmc@lists.ozlabs.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: x86@kernel.org, devicetree@vger.kernel.org, linux-aspeed@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, linux-hwmon@vger.kernel.org, linux-doc@vger.kernel.org, Rob Herring <robh+dt@kernel.org>, Joel Stanley <joel@jms.id.au>, Andrew Jeffery <andrew@aj.id.au>, Jean Delvare <jdelvare@suse.com>, Guenter Roeck <linux@roeck-us.net>, Arnd Bergmann <arnd@arndb.de>, Olof Johansson <olof@lixom.net>, Jonathan Corbet <corbet@lwn.net>, Thomas Gleixner <tglx@linutronix.de>, Andy Lutomirski <luto@kernel.org>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Yazen Ghannam <yazen.ghannam@amd.com>, Mauro Carvalho Chehab <mchehab@kernel.org>, Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>, Tony Luck <tony.luck@intel.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>, Dan Williams <dan.j.williams@intel.com>, Randy Dunlap <rdunlap@infradead.org>, Zev Weiss <zweiss@equinix.com>, David Muller <d.mueller@elsoft.ch>, Iwona Winiarska <iwona.winiarska@intel.com> Subject: [PATCH v2 00/15] Introduce PECI subsystem Date: Tue, 3 Aug 2021 13:31:19 +0200 Message-Id: <20210803113134.2262882-1-iwona.winiarska@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Introduce PECI subsystem \| expand [v2,00/15] Introduce PECI subsystem [v2,02/15] x86/cpu: Extract cpuid helpers to arch-independent [v2,04/15] dt-bindings: Add bindings for peci-aspeed [v2,05/15] ARM: dts: aspeed: Add PECI controller nodes [v2,08/15] peci: Add device detection [v2,10/15] peci: Add support for PECI device drivers [v2,13/15] hwmon: peci: Add dimmtemp driver [v2,14/15] docs: hwmon: Document PECI drivers [v2,15/15] docs: Add PECI documentation

Iwona Winiarska Aug. 3, 2021, 11:31 a.m. UTC

Hi Greg,

This is a second round of patches introducing PECI subsystem.
I don't think it is ready to be applied right away (we're still
missing r-b's), but I hope we have chance to complete discussion in
the 5.15 development cycle. I would appreciate if you could take
a look.

Note: All changes to arch/x86 are contained within patches 01-02, plus
small Kconfig change adding "depends on PECI" to GENERIC_LIB_X86
Kconfig in patch 10.

The Platform Environment Control Interface (PECI) is a communication
interface between Intel processors and management controllers (e.g.
Baseboard Management Controller, BMC).

This series adds a PECI subsystem and introduces drivers which run in
the Linux instance on the management controller (not the main Intel
processor) and is intended to be used by the OpenBMC [1], a Linux
distribution for BMC devices.
The information exposed over PECI (like processor and DIMM
temperature) refers to the Intel processor and can be consumed by
daemons running on the BMC to, for example, display the processor
temperature in its web interface.

The PECI bus is collection of code that provides interface support
between PECI devices (that actually represent processors) and PECI
controllers (such as the "peci-aspeed" controller) that allow to
access physical PECI interface. PECI devices are bound to PECI
drivers that provides access to PECI services. This series introduces
a generic "peci-cpu" driver that exposes hardware monitoring "cputemp"
and "dimmtemp" using the auxiliary bus.

Exposing "raw" PECI to userspace, either to write userspace drivers or
for debug/testing purpose was left out of this series to encourage
writing kernel drivers instead, but may be pursued in the future.

Introducing PECI to upstream Linux was already attempted before [2].
Since it's been over a year since last revision, and the series
changed quite a bit in the meantime, I've decided to start from v1.

I would also like to give credit to everyone who helped me with
different aspects of preliminary review:
- Pierre-Louis Bossart,
- Tony Luck, 
- Andy Shevchenko,
- Dave Hansen.

[1] https://github.com/openbmc/openbmc
[2] https://lore.kernel.org/openbmc/20191211194624.2872-1-jae.hyun.yoo@linux.intel.com/

Changes v1 -> v2:

Biggest changes when it comes to diffstat are locking in HWMON
(I decided to clean things up a bit while adding it), switching to
devres usage in more places and exposing sysfs interface in separate patch.

* Moved extending X86 ARCHITECTURE MAINTAINERS earlier in series (Dan)
* Removed "default n" for GENERIC_LIB_X86 (Dan)
* Added vendor prefix for peci-aspeed specific properties (Rob)
* Refactored PECI to use devres consistently (Dan)
* Added missing sysfs documentation and excluded adding peci-sysfs to
  separate patch (Dan)
* Used module_init() instead of subsys_init() for peci module initialization (Dan)
* Removed redundant struct peci_device member (Dan)
* Improved PECI Kconfig help (Randy/Dan)
* Fixed/removed log messages (Dan, Guenter)
* Refactored peci-cputemp and peci-dimmtemp and added missing locks (Guenter)
* Removed unused dev_set_drvdata() in peci-cputemp and peci-dimmtemp (Guenter)
* Fixed used types, names, fixed broken and added additional comments
  to peci-hwmon (Guenter, Zev)
* Refactored peci-dimmtemp to not return -ETIMEDOUT (Guenter)
* Added sanity check for min_peci_revision in peci-hwmon drivers (Zev)
* Added assert for DIMM_NUMS_MAX and additional warning in peci-dimmtemp (Zev)
* Fixed macro names in peci-aspeed (Zev)
* Refactored peci-aspeed sanitizing properties to a single helper function (Zev)
* Fixed peci_cpu_device_ids definition for Broadwell Xeon D (David)
* Refactor peci_request to use a single allocation (Zev)
* Used min_t() to improve code readability (Zev)
* Added macro for PECI_RDENDPTCFG_MMIO_WR_LEN_BASE and fixed adev type
  array name to more descriptive (Zev)
* Fixed peci-hwmon commit-msg and documentation (Zev)

Thanks
-Iwona

Iwona Winiarska (13):
  x86/cpu: Move intel-family to arch-independent headers
  x86/cpu: Extract cpuid helpers to arch-independent
  dt-bindings: Add generic bindings for PECI
  dt-bindings: Add bindings for peci-aspeed
  ARM: dts: aspeed: Add PECI controller nodes
  peci: Add core infrastructure
  peci: Add device detection
  peci: Add sysfs interface for PECI bus
  peci: Add support for PECI device drivers
  peci: Add peci-cpu driver
  hwmon: peci: Add cputemp driver
  hwmon: peci: Add dimmtemp driver
  docs: Add PECI documentation

Jae Hyun Yoo (2):
  peci: Add peci-aspeed controller driver
  docs: hwmon: Document PECI drivers

 Documentation/ABI/testing/sysfs-bus-peci      |  16 +
 .../devicetree/bindings/peci/peci-aspeed.yaml | 109 ++++
 .../bindings/peci/peci-controller.yaml        |  33 +
 Documentation/hwmon/index.rst                 |   2 +
 Documentation/hwmon/peci-cputemp.rst          |  90 +++
 Documentation/hwmon/peci-dimmtemp.rst         |  57 ++
 Documentation/index.rst                       |   1 +
 Documentation/peci/index.rst                  |  16 +
 Documentation/peci/peci.rst                   |  48 ++
 MAINTAINERS                                   |  32 +
 arch/arm/boot/dts/aspeed-g4.dtsi              |  14 +
 arch/arm/boot/dts/aspeed-g5.dtsi              |  14 +
 arch/arm/boot/dts/aspeed-g6.dtsi              |  14 +
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/cpu.h                    |   3 -
 arch/x86/include/asm/intel-family.h           | 141 +---
 arch/x86/include/asm/microcode.h              |   2 +-
 arch/x86/kvm/cpuid.h                          |   3 +-
 arch/x86/lib/Makefile                         |   2 +-
 drivers/Kconfig                               |   3 +
 drivers/Makefile                              |   1 +
 drivers/edac/mce_amd.c                        |   3 +-
 drivers/hwmon/Kconfig                         |   2 +
 drivers/hwmon/Makefile                        |   1 +
 drivers/hwmon/peci/Kconfig                    |  31 +
 drivers/hwmon/peci/Makefile                   |   7 +
 drivers/hwmon/peci/common.h                   |  58 ++
 drivers/hwmon/peci/cputemp.c                  | 591 +++++++++++++++++
 drivers/hwmon/peci/dimmtemp.c                 | 614 ++++++++++++++++++
 drivers/peci/Kconfig                          |  37 ++
 drivers/peci/Makefile                         |  10 +
 drivers/peci/controller/Kconfig               |  16 +
 drivers/peci/controller/Makefile              |   3 +
 drivers/peci/controller/peci-aspeed.c         | 445 +++++++++++++
 drivers/peci/core.c                           | 238 +++++++
 drivers/peci/cpu.c                            | 344 ++++++++++
 drivers/peci/device.c                         | 221 +++++++
 drivers/peci/internal.h                       | 137 ++++
 drivers/peci/request.c                        | 477 ++++++++++++++
 drivers/peci/sysfs.c                          |  82 +++
 include/linux/peci-cpu.h                      |  38 ++
 include/linux/peci.h                          | 110 ++++
 include/linux/x86/cpu.h                       |   9 +
 include/linux/x86/intel-family.h              | 146 +++++
 lib/Kconfig                                   |   4 +
 lib/Makefile                                  |   2 +
 lib/x86/Makefile                              |   3 +
 {arch/x86/lib => lib/x86}/cpu.c               |   2 +-
 48 files changed, 4084 insertions(+), 149 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-peci
 create mode 100644 Documentation/devicetree/bindings/peci/peci-aspeed.yaml
 create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml
 create mode 100644 Documentation/hwmon/peci-cputemp.rst
 create mode 100644 Documentation/hwmon/peci-dimmtemp.rst
 create mode 100644 Documentation/peci/index.rst
 create mode 100644 Documentation/peci/peci.rst
 create mode 100644 drivers/hwmon/peci/Kconfig
 create mode 100644 drivers/hwmon/peci/Makefile
 create mode 100644 drivers/hwmon/peci/common.h
 create mode 100644 drivers/hwmon/peci/cputemp.c
 create mode 100644 drivers/hwmon/peci/dimmtemp.c
 create mode 100644 drivers/peci/Kconfig
 create mode 100644 drivers/peci/Makefile
 create mode 100644 drivers/peci/controller/Kconfig
 create mode 100644 drivers/peci/controller/Makefile
 create mode 100644 drivers/peci/controller/peci-aspeed.c
 create mode 100644 drivers/peci/core.c
 create mode 100644 drivers/peci/cpu.c
 create mode 100644 drivers/peci/device.c
 create mode 100644 drivers/peci/internal.h
 create mode 100644 drivers/peci/request.c
 create mode 100644 drivers/peci/sysfs.c
 create mode 100644 include/linux/peci-cpu.h
 create mode 100644 include/linux/peci.h
 create mode 100644 include/linux/x86/cpu.h
 create mode 100644 include/linux/x86/intel-family.h
 create mode 100644 lib/x86/Makefile
 rename {arch/x86/lib => lib/x86}/cpu.c (95%)

Guenter Roeck Aug. 3, 2021, 3:24 p.m. UTC | #1

On 8/3/21 4:31 AM, Iwona Winiarska wrote:
> Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> readings of the processor package and processor cores that are
> accessible via the PECI interface.
> 
> The main use case for the driver (and PECI interface) is out-of-band
> management, where we're able to obtain the DTS readings from an external
> entity connected with PECI, e.g. BMC on server platforms.
> 
> Co-developed-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> ---
>   MAINTAINERS                  |   7 +
>   drivers/hwmon/Kconfig        |   2 +
>   drivers/hwmon/Makefile       |   1 +
>   drivers/hwmon/peci/Kconfig   |  18 ++
>   drivers/hwmon/peci/Makefile  |   5 +
>   drivers/hwmon/peci/common.h  |  58 ++++
>   drivers/hwmon/peci/cputemp.c | 591 +++++++++++++++++++++++++++++++++++
>   7 files changed, 682 insertions(+)
>   create mode 100644 drivers/hwmon/peci/Kconfig
>   create mode 100644 drivers/hwmon/peci/Makefile
>   create mode 100644 drivers/hwmon/peci/common.h
>   create mode 100644 drivers/hwmon/peci/cputemp.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3f5d48e1d143..e36b5c0824e3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14512,6 +14512,13 @@ L:	platform-driver-x86@vger.kernel.org
>   S:	Maintained
>   F:	drivers/platform/x86/peaq-wmi.c
>   
> +PECI HARDWARE MONITORING DRIVERS
> +M:	Iwona Winiarska <iwona.winiarska@intel.com>
> +R:	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> +L:	linux-hwmon@vger.kernel.org
> +S:	Supported
> +F:	drivers/hwmon/peci/
> +
>   PECI SUBSYSTEM
>   M:	Iwona Winiarska <iwona.winiarska@intel.com>
>   R:	Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> index e3675377bc5d..61c0e3404415 100644
> --- a/drivers/hwmon/Kconfig
> +++ b/drivers/hwmon/Kconfig
> @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
>   	  These devices are hard to detect and rarely found on mainstream
>   	  hardware. If unsure, say N.
>   
> +source "drivers/hwmon/peci/Kconfig"
> +
>   source "drivers/hwmon/pmbus/Kconfig"
>   
>   config SENSORS_PWM_FAN
> diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> index d712c61c1f5e..f52331f212ed 100644
> --- a/drivers/hwmon/Makefile
> +++ b/drivers/hwmon/Makefile
> @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350)	+= wm8350-hwmon.o
>   obj-$(CONFIG_SENSORS_XGENE)	+= xgene-hwmon.o
>   
>   obj-$(CONFIG_SENSORS_OCC)	+= occ/
> +obj-$(CONFIG_SENSORS_PECI)	+= peci/
>   obj-$(CONFIG_PMBUS)		+= pmbus/
>   
>   ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> new file mode 100644
> index 000000000000..e10eed68d70a
> --- /dev/null
> +++ b/drivers/hwmon/peci/Kconfig
> @@ -0,0 +1,18 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config SENSORS_PECI_CPUTEMP
> +	tristate "PECI CPU temperature monitoring client"
> +	depends on PECI
> +	select SENSORS_PECI
> +	select PECI_CPU
> +	help
> +	  If you say yes here you get support for the generic Intel PECI
> +	  cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> +	  readings of the CPU package and CPU cores that are accessible via
> +	  the processor PECI interface.
> +
> +	  This driver can also be built as a module. If so, the module
> +	  will be called peci-cputemp.
> +
> +config SENSORS_PECI
> +	tristate
> diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> new file mode 100644
> index 000000000000..e8a0ada5ab1f
> --- /dev/null
> +++ b/drivers/hwmon/peci/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +peci-cputemp-y := cputemp.o
> +
> +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)	+= peci-cputemp.o
> diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> new file mode 100644
> index 000000000000..734506b0eca2
> --- /dev/null
> +++ b/drivers/hwmon/peci/common.h
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2021 Intel Corporation */
> +
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +#ifndef __PECI_HWMON_COMMON_H
> +#define __PECI_HWMON_COMMON_H
> +
> +#define PECI_HWMON_UPDATE_INTERVAL	HZ
> +
> +/**
> + * struct peci_sensor_state - PECI state information
> + * @valid: flag to indicate the sensor value is valid
> + * @last_updated: time of the last update in jiffies
> + * @lock: mutex to protect sensor access
> + */
> +struct peci_sensor_state {
> +	bool valid;
> +	unsigned long last_updated;
> +	struct mutex lock; /* protect sensor access */
> +};
> +
> +/**
> + * struct peci_sensor_data - PECI sensor information
> + * @value: sensor value in milli units
> + * @state: sensor update state
> + */
> +
> +struct peci_sensor_data {
> +	s32 value;
> +	struct peci_sensor_state state;
> +};
> +
> +/**
> + * peci_sensor_need_update() - check whether sensor update is needed or not
> + * @sensor: pointer to sensor data struct
> + *
> + * Return: true if update is needed, false if not.
> + */
> +
> +static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
> +{
> +	return !state->valid ||
> +	       time_after(jiffies, state->last_updated + PECI_HWMON_UPDATE_INTERVAL);
> +}
> +
> +/**
> + * peci_sensor_mark_updated() - mark the sensor is updated
> + * @sensor: pointer to sensor data struct
> + */
> +static inline void peci_sensor_mark_updated(struct peci_sensor_state *state)
> +{
> +	state->valid = true;
> +	state->last_updated = jiffies;
> +}
> +
> +#endif /* __PECI_HWMON_COMMON_H */
> diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> new file mode 100644
> index 000000000000..9c6858a9fb6d
> --- /dev/null
> +++ b/drivers/hwmon/peci/cputemp.c
> @@ -0,0 +1,591 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) 2018-2021 Intel Corporation
> +
> +#include <linux/auxiliary_bus.h>
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/hwmon.h>
> +#include <linux/jiffies.h>
> +#include <linux/module.h>
> +#include <linux/peci.h>
> +#include <linux/peci-cpu.h>
> +#include <linux/units.h>
> +#include <linux/x86/intel-family.h>
> +
> +#include "common.h"
> +
> +#define CORE_NUMS_MAX		64
> +
> +#define BASE_CHANNEL_NUMS	5
> +#define CPUTEMP_CHANNEL_NUMS	(BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
> +
> +#define TEMP_TARGET_FAN_TEMP_MASK	GENMASK(15, 8)
> +#define TEMP_TARGET_REF_TEMP_MASK	GENMASK(23, 16)
> +#define TEMP_TARGET_TJ_OFFSET_MASK	GENMASK(29, 24)
> +
> +#define DTS_MARGIN_MASK		GENMASK(15, 0)
> +#define PCS_MODULE_TEMP_MASK	GENMASK(15, 0)
> +
> +#define DTS_FIXED_POINT_FRACTION	64
> +
> +struct resolved_cores_reg {
> +	u8 bus;
> +	u8 dev;
> +	u8 func;
> +	u8 offset;
> +};
> +
> +struct cpu_info {
> +	struct resolved_cores_reg *reg;
> +	u8 min_peci_revision;
> +};
> +
> +struct peci_temp_target {
> +	s32 tcontrol;
> +	s32 tthrottle;
> +	s32 tjmax;
> +	struct peci_sensor_state state;
> +};
> +
> +enum peci_temp_target_type {
> +	tcontrol_type,
> +	tthrottle_type,
> +	tjmax_type,
> +	crit_hyst_type,
> +};
> +
> +struct peci_cputemp {
> +	struct peci_device *peci_dev;
> +	struct device *dev;
> +	const char *name;
> +	const struct cpu_info *gen_info;
> +	struct {
> +		struct peci_temp_target target;
> +		struct peci_sensor_data die;
> +		struct peci_sensor_data dts;
> +		struct peci_sensor_data core[CORE_NUMS_MAX];
> +	} temp;
> +	const char **coretemp_label;
> +	DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> +};
> +
> +enum cputemp_channels {
> +	channel_die,
> +	channel_dts,
> +	channel_tcontrol,
> +	channel_tthrottle,
> +	channel_tjmax,
> +	channel_core,
> +};
> +
> +static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
> +	"Die",
> +	"DTS",
> +	"Tcontrol",
> +	"Tthrottle",
> +	"Tjmax",
> +};
> +
> +static int update_temp_target(struct peci_cputemp *priv)
> +{
> +	s32 tthrottle_offset, tcontrol_margin;
> +	u32 pcs;
> +	int ret;
> +
> +	if (!peci_sensor_need_update(&priv->temp.target.state))
> +		return 0;
> +
> +	ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> +	if (ret)
> +		return ret;
> +
> +	priv->temp.target.tjmax =
> +		FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> +
> +	tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> +	tcontrol_margin = sign_extend32(tcontrol_margin, 7) * MILLIDEGREE_PER_DEGREE;
> +	priv->temp.target.tcontrol = priv->temp.target.tjmax - tcontrol_margin;
> +
> +	tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) * MILLIDEGREE_PER_DEGREE;
> +	priv->temp.target.tthrottle = priv->temp.target.tjmax - tthrottle_offset;
> +
> +	peci_sensor_mark_updated(&priv->temp.target.state);
> +
> +	return 0;
> +}
> +
> +static int get_temp_target(struct peci_cputemp *priv, enum peci_temp_target_type type, long *val)
> +{
> +	int ret;
> +
> +	mutex_lock(&priv->temp.target.state.lock);
> +
> +	ret = update_temp_target(priv);
> +	if (ret)
> +		goto unlock;
> +
> +	switch (type) {
> +	case tcontrol_type:
> +		*val = priv->temp.target.tcontrol;
> +		break;
> +	case tthrottle_type:
> +		*val = priv->temp.target.tthrottle;
> +		break;
> +	case tjmax_type:
> +		*val = priv->temp.target.tjmax;
> +		break;
> +	case crit_hyst_type:
> +		*val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
> +		break;
> +	default:
> +		ret = -EOPNOTSUPP;
> +		break;
> +	}
> +unlock:
> +	mutex_unlock(&priv->temp.target.state.lock);
> +
> +	return ret;
> +}
> +
> +/*
> + * Processors return a value of DTS reading in S10.6 fixed point format
> + * (16 bits: 10-bit signed magnitude, 6-bit fraction).
> + * Error codes:
> + *   0x8000: General sensor error
> + *   0x8001: Reserved
> + *   0x8002: Underflow on reading value
> + *   0x8003-0x81ff: Reserved
> + */
> +static bool dts_valid(s32 val)
> +{
> +	return val < 0x8000 || val > 0x81ff;
> +}
> +
> +static s32 dts_to_millidegree(s32 val)
> +{
> +	return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE / DTS_FIXED_POINT_FRACTION;
> +}
> +
> +static int get_die_temp(struct peci_cputemp *priv, long *val)
> +{
> +	long tjmax;
> +	s16 temp;
> +	int ret;
> +
> +	mutex_lock(&priv->temp.die.state.lock);
> +	if (!peci_sensor_need_update(&priv->temp.die.state))
> +		goto skip_update;
> +
> +	ret = peci_temp_read(priv->peci_dev, &temp);
> +	if (ret)
> +		goto err_unlock;
> +
> +	if (!dts_valid(temp)) {
> +		ret = -EIO;
> +		goto err_unlock;
> +	}
> +
> +	ret = get_temp_target(priv, tjmax_type, &tjmax);
> +	if (ret)
> +		goto err_unlock;
> +
> +	priv->temp.die.value = (s32)tjmax + dts_to_millidegree(temp);
> +
> +	peci_sensor_mark_updated(&priv->temp.die.state);
> +
> +skip_update:
> +	*val = priv->temp.die.value;
> +	mutex_unlock(&priv->temp.die.state.lock);
> +
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&priv->temp.die.state.lock);
> +	return ret;
> +}
> +
> +static int get_dts(struct peci_cputemp *priv, long *val)
> +{
> +	s32 dts_margin;
> +	long tcontrol;
> +	u32 pcs;
> +	int ret;
> +
> +	mutex_lock(&priv->temp.dts.state.lock);
> +	if (!peci_sensor_need_update(&priv->temp.dts.state))
> +		goto skip_update;
> +
> +	ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0, &pcs);
> +	if (ret)
> +		goto err_unlock;
> +
> +	dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> +	if (!dts_valid(dts_margin)) {
> +		ret = -EIO;
> +		goto err_unlock;
> +	}
> +
> +	ret = get_temp_target(priv, tcontrol_type, &tcontrol);
> +	if (ret)
> +		goto err_unlock;
> +
> +	/* Note that the tcontrol should be available before calling it */
> +	priv->temp.dts.value = (s32)tcontrol - dts_to_millidegree(dts_margin);
> +
> +	peci_sensor_mark_updated(&priv->temp.dts.state);
> +
> +skip_update:
> +	*val = priv->temp.dts.value;
> +	mutex_unlock(&priv->temp.dts.state.lock);
> +
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&priv->temp.dts.state.lock);
> +	return ret;

Simplify (see below)

> +}
> +
> +static int get_core_temp(struct peci_cputemp *priv, int core_index, long *val)
> +{
> +	s32 core_dts_margin;
> +	long tjmax;
> +	u32 pcs;
> +	int ret;

	int ret = 0;

to handle simplification below.

> +
> +	mutex_lock(&priv->temp.core[core_index].state.lock);
> +	if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
> +		goto skip_update;
> +
> +	ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP, core_index, &pcs);
> +	if (ret)
> +		goto err_unlock;
> +
> +	core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> +	if (!dts_valid(core_dts_margin)) {
> +		ret = -EIO;
> +		goto err_unlock;
> +	}
> +
> +	ret = get_temp_target(priv, tjmax_type, &tjmax);
> +	if (ret)
> +		goto err_unlock;
> +
> +	/* Note that the tjmax should be available before calling it */
> +	priv->temp.core[core_index].value = (s32)tjmax + dts_to_millidegree(core_dts_margin);
> +
> +	peci_sensor_mark_updated(&priv->temp.core[core_index].state);
> +
> +skip_update:
> +	*val = priv->temp.core[core_index].value;
> +	mutex_unlock(&priv->temp.core[core_index].state.lock);
> +
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&priv->temp.core[core_index].state.lock);
> +	return ret;

Simplify:

skip_update:
	*val = priv->temp.core[core_index].value;
err_unlock:
	mutex_unlock(&priv->temp.core[core_index].state.lock);
	return ret;

> +}
> +
> +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types type,
> +			       u32 attr, int channel, const char **str)
> +{
> +	struct peci_cputemp *priv = dev_get_drvdata(dev);
> +
> +	if (attr != hwmon_temp_label)
> +		return -EOPNOTSUPP;
> +
> +	*str = channel < channel_core ?
> +		cputemp_label[channel] : priv->coretemp_label[channel - channel_core];
> +
> +	return 0;
> +}
> +
> +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> +			u32 attr, int channel, long *val)
> +{
> +	struct peci_cputemp *priv = dev_get_drvdata(dev);
> +
> +	switch (attr) {
> +	case hwmon_temp_input:
> +		switch (channel) {
> +		case channel_die:
> +			return get_die_temp(priv, val);
> +		case channel_dts:
> +			return get_dts(priv, val);
> +		case channel_tcontrol:
> +			return get_temp_target(priv, tcontrol_type, val);
> +		case channel_tthrottle:
> +			return get_temp_target(priv, tthrottle_type, val);
> +		case channel_tjmax:
> +			return get_temp_target(priv, tjmax_type, val);
> +		default:
> +			return get_core_temp(priv, channel - channel_core, val);
> +		}
> +		break;
> +	case hwmon_temp_max:
> +		return get_temp_target(priv, tcontrol_type, val);
> +	case hwmon_temp_crit:
> +		return get_temp_target(priv, tjmax_type, val);
> +	case hwmon_temp_crit_hyst:
> +		return get_temp_target(priv, crit_hyst_type, val);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +
> +	return 0;
> +}
> +
> +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types type,
> +				  u32 attr, int channel)
> +{
> +	const struct peci_cputemp *priv = data;
> +
> +	if (channel > CPUTEMP_CHANNEL_NUMS)
> +		return 0;
> +
> +	if (channel < channel_core)
> +		return 0444;
> +
> +	if (test_bit(channel - channel_core, priv->core_mask))
> +		return 0444;
> +
> +	return 0;
> +}
> +
> +static int init_core_mask(struct peci_cputemp *priv)
> +{
> +	struct peci_device *peci_dev = priv->peci_dev;
> +	struct resolved_cores_reg *reg = priv->gen_info->reg;
> +	u64 core_mask;
> +	u32 data;
> +	int ret;
> +
> +	/* Get the RESOLVED_CORES register value */
> +	switch (peci_dev->info.model) {
> +	case INTEL_FAM6_ICELAKE_X:
> +	case INTEL_FAM6_ICELAKE_D:
> +		ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> +					     reg->func, reg->offset + 4, &data);
> +		if (ret)
> +			return ret;
> +
> +		core_mask = (u64)data << 32;
> +
> +		ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg->dev,
> +					     reg->func, reg->offset, &data);
> +		if (ret)
> +			return ret;
> +
> +		core_mask |= data;
> +
> +		break;
> +	default:
> +		ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> +					  reg->func, reg->offset, &data);
> +		if (ret)
> +			return ret;
> +
> +		core_mask = data;
> +
> +		break;
> +	}
> +
> +	if (!core_mask)
> +		return -EIO;
> +
> +	bitmap_from_u64(priv->core_mask, core_mask);
> +
> +	return 0;
> +}
> +
> +static int create_temp_label(struct peci_cputemp *priv)
> +{
> +	unsigned long core_max = find_last_bit(priv->core_mask, CORE_NUMS_MAX);
> +	int i;
> +
> +	priv->coretemp_label = devm_kzalloc(priv->dev, core_max * sizeof(char *), GFP_KERNEL);
> +	if (!priv->coretemp_label)
> +		return -ENOMEM;
> +
> +	for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> +		priv->coretemp_label[i] = devm_kasprintf(priv->dev, GFP_KERNEL, "Core %d", i);
> +		if (!priv->coretemp_label[i])
> +			return -ENOMEM;
> +	}
> +
> +	return 0;
> +}
> +
> +static void check_resolved_cores(struct peci_cputemp *priv)
> +{
> +	/*
> +	 * Failure to resolve cores is non-critical, we're still able to
> +	 * provide other sensor data.
> +	 */
> +
> +	if (init_core_mask(priv))
> +		return;
> +
> +	if (create_temp_label(priv))
> +		bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
> +}
> +
> +static void sensor_init(struct peci_cputemp *priv)
> +{
> +	int i;
> +
> +	mutex_init(&priv->temp.target.state.lock);
> +	mutex_init(&priv->temp.die.state.lock);
> +	mutex_init(&priv->temp.dts.state.lock);
> +
> +	for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
> +		mutex_init(&priv->temp.core[i].state.lock);
> +}
> +
> +static const struct hwmon_ops peci_cputemp_ops = {
> +	.is_visible = cputemp_is_visible,
> +	.read_string = cputemp_read_string,
> +	.read = cputemp_read,
> +};
> +
> +static const u32 peci_cputemp_temp_channel_config[] = {
> +	/* Die temperature */
> +	HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> +	/* DTS margin */
> +	HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_CRIT_HYST,
> +	/* Tcontrol temperature */
> +	HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> +	/* Tthrottle temperature */
> +	HWMON_T_LABEL | HWMON_T_INPUT,
> +	/* Tjmax temperature */
> +	HWMON_T_LABEL | HWMON_T_INPUT,
> +	/* Core temperature - for all core channels */
> +	[channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL | HWMON_T_INPUT,
> +	0
> +};
> +
> +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> +	.type = hwmon_temp,
> +	.config = peci_cputemp_temp_channel_config,
> +};
> +
> +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> +	&peci_cputemp_temp_channel,
> +	NULL
> +};
> +
> +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> +	.ops = &peci_cputemp_ops,
> +	.info = peci_cputemp_info,
> +};
> +
> +static int peci_cputemp_probe(struct auxiliary_device *adev,
> +			      const struct auxiliary_device_id *id)
> +{
> +	struct device *dev = &adev->dev;
> +	struct peci_device *peci_dev = to_peci_device(dev->parent);
> +	struct peci_cputemp *priv;
> +	struct device *hwmon_dev;
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> +				    peci_dev->info.socket_id);
> +	if (!priv->name)
> +		return -ENOMEM;
> +
> +	priv->dev = dev;
> +	priv->peci_dev = peci_dev;
> +	priv->gen_info = (const struct cpu_info *)id->driver_data;
> +
> +	/*
> +	 * This is just a sanity check. Since we're using commands that are
> +	 * guaranteed to be supported on a given platform, we should never see
> +	 * revision lower than expected.
> +	 */
> +	if (peci_dev->info.peci_revision < priv->gen_info->min_peci_revision)
> +		dev_warn(priv->dev,
> +			 "Unexpected PECI revision %#x, some features may be unavailable\n",
> +			 peci_dev->info.peci_revision);
> +
> +	check_resolved_cores(priv);
> +
> +	sensor_init(priv);
> +
> +	hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv->name,
> +							 priv, &peci_cputemp_chip_info, NULL);
> +
> +	return PTR_ERR_OR_ZERO(hwmon_dev);
> +}
> +
> +/*
> + * RESOLVED_CORES PCI configuration register may have different location on
> + * different platforms.
> + */
> +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> +	.bus = 1,
> +	.dev = 30,
> +	.func = 3,
> +	.offset = 0xb4,
> +};
> +
> +static struct resolved_cores_reg resolved_cores_reg_icx = {
> +	.bus = 14,
> +	.dev = 30,
> +	.func = 3,
> +	.offset = 0xd0,
> +};
> +
> +static const struct cpu_info cpu_hsx = {
> +	.reg		= &resolved_cores_reg_hsx,
> +	.min_peci_revision = 0x33,
> +};
> +
> +static const struct cpu_info cpu_icx = {
> +	.reg		= &resolved_cores_reg_icx,
> +	.min_peci_revision = 0x40,
> +};
> +
> +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> +	{
> +		.name = "peci_cpu.cputemp.hsx",
> +		.driver_data = (kernel_ulong_t)&cpu_hsx,
> +	},
> +	{
> +		.name = "peci_cpu.cputemp.bdx",
> +		.driver_data = (kernel_ulong_t)&cpu_hsx,
> +	},
> +	{
> +		.name = "peci_cpu.cputemp.bdxd",
> +		.driver_data = (kernel_ulong_t)&cpu_hsx,
> +	},
> +	{
> +		.name = "peci_cpu.cputemp.skx",
> +		.driver_data = (kernel_ulong_t)&cpu_hsx,
> +	},
> +	{
> +		.name = "peci_cpu.cputemp.icx",
> +		.driver_data = (kernel_ulong_t)&cpu_icx,
> +	},
> +	{
> +		.name = "peci_cpu.cputemp.icxd",
> +		.driver_data = (kernel_ulong_t)&cpu_icx,
> +	},
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> +
> +static struct auxiliary_driver peci_cputemp_driver = {
> +	.probe		= peci_cputemp_probe,
> +	.id_table	= peci_cputemp_ids,
> +};
> +
> +module_auxiliary_driver(peci_cputemp_driver);
> +
> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> +MODULE_AUTHOR("Iwona Winiarska <iwona.winiarska@intel.com>");
> +MODULE_DESCRIPTION("PECI cputemp driver");
> +MODULE_LICENSE("GPL");
> +MODULE_IMPORT_NS(PECI_CPU);
>

Iwona Winiarska Aug. 4, 2021, 10:43 a.m. UTC | #2

On Tue, 2021-08-03 at 08:24 -0700, Guenter Roeck wrote:
> On 8/3/21 4:31 AM, Iwona Winiarska wrote:
> > Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
> > readings of the processor package and processor cores that are
> > accessible via the PECI interface.
> > 
> > The main use case for the driver (and PECI interface) is out-of-band
> > management, where we're able to obtain the DTS readings from an external
> > entity connected with PECI, e.g. BMC on server platforms.
> > 
> > Co-developed-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > ---
> >   MAINTAINERS                  |   7 +
> >   drivers/hwmon/Kconfig        |   2 +
> >   drivers/hwmon/Makefile       |   1 +
> >   drivers/hwmon/peci/Kconfig   |  18 ++
> >   drivers/hwmon/peci/Makefile  |   5 +
> >   drivers/hwmon/peci/common.h  |  58 ++++
> >   drivers/hwmon/peci/cputemp.c | 591 +++++++++++++++++++++++++++++++++++
> >   7 files changed, 682 insertions(+)
> >   create mode 100644 drivers/hwmon/peci/Kconfig
> >   create mode 100644 drivers/hwmon/peci/Makefile
> >   create mode 100644 drivers/hwmon/peci/common.h
> >   create mode 100644 drivers/hwmon/peci/cputemp.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 3f5d48e1d143..e36b5c0824e3 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -14512,6 +14512,13 @@ L:     platform-driver-x86@vger.kernel.org
> >   S:    Maintained
> >   F:    drivers/platform/x86/peaq-wmi.c
> >   
> > +PECI HARDWARE MONITORING DRIVERS
> > +M:     Iwona Winiarska <iwona.winiarska@intel.com>
> > +R:     Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > +L:     linux-hwmon@vger.kernel.org
> > +S:     Supported
> > +F:     drivers/hwmon/peci/
> > +
> >   PECI SUBSYSTEM
> >   M:    Iwona Winiarska <iwona.winiarska@intel.com>
> >   R:    Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
> > index e3675377bc5d..61c0e3404415 100644
> > --- a/drivers/hwmon/Kconfig
> > +++ b/drivers/hwmon/Kconfig
> > @@ -1507,6 +1507,8 @@ config SENSORS_PCF8591
> >           These devices are hard to detect and rarely found on mainstream
> >           hardware. If unsure, say N.
> >   
> > +source "drivers/hwmon/peci/Kconfig"
> > +
> >   source "drivers/hwmon/pmbus/Kconfig"
> >   
> >   config SENSORS_PWM_FAN
> > diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
> > index d712c61c1f5e..f52331f212ed 100644
> > --- a/drivers/hwmon/Makefile
> > +++ b/drivers/hwmon/Makefile
> > @@ -202,6 +202,7 @@ obj-$(CONFIG_SENSORS_WM8350)        += wm8350-hwmon.o
> >   obj-$(CONFIG_SENSORS_XGENE)   += xgene-hwmon.o
> >   
> >   obj-$(CONFIG_SENSORS_OCC)     += occ/
> > +obj-$(CONFIG_SENSORS_PECI)     += peci/
> >   obj-$(CONFIG_PMBUS)           += pmbus/
> >   
> >   ccflags-$(CONFIG_HWMON_DEBUG_CHIP) := -DDEBUG
> > diff --git a/drivers/hwmon/peci/Kconfig b/drivers/hwmon/peci/Kconfig
> > new file mode 100644
> > index 000000000000..e10eed68d70a
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Kconfig
> > @@ -0,0 +1,18 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config SENSORS_PECI_CPUTEMP
> > +       tristate "PECI CPU temperature monitoring client"
> > +       depends on PECI
> > +       select SENSORS_PECI
> > +       select PECI_CPU
> > +       help
> > +         If you say yes here you get support for the generic Intel PECI
> > +         cputemp driver which provides Digital Thermal Sensor (DTS) thermal
> > +         readings of the CPU package and CPU cores that are accessible via
> > +         the processor PECI interface.
> > +
> > +         This driver can also be built as a module. If so, the module
> > +         will be called peci-cputemp.
> > +
> > +config SENSORS_PECI
> > +       tristate
> > diff --git a/drivers/hwmon/peci/Makefile b/drivers/hwmon/peci/Makefile
> > new file mode 100644
> > index 000000000000..e8a0ada5ab1f
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +peci-cputemp-y := cputemp.o
> > +
> > +obj-$(CONFIG_SENSORS_PECI_CPUTEMP)     += peci-cputemp.o
> > diff --git a/drivers/hwmon/peci/common.h b/drivers/hwmon/peci/common.h
> > new file mode 100644
> > index 000000000000..734506b0eca2
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/common.h
> > @@ -0,0 +1,58 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/* Copyright (c) 2021 Intel Corporation */
> > +
> > +#include <linux/mutex.h>
> > +#include <linux/types.h>
> > +
> > +#ifndef __PECI_HWMON_COMMON_H
> > +#define __PECI_HWMON_COMMON_H
> > +
> > +#define PECI_HWMON_UPDATE_INTERVAL     HZ
> > +
> > +/**
> > + * struct peci_sensor_state - PECI state information
> > + * @valid: flag to indicate the sensor value is valid
> > + * @last_updated: time of the last update in jiffies
> > + * @lock: mutex to protect sensor access
> > + */
> > +struct peci_sensor_state {
> > +       bool valid;
> > +       unsigned long last_updated;
> > +       struct mutex lock; /* protect sensor access */
> > +};
> > +
> > +/**
> > + * struct peci_sensor_data - PECI sensor information
> > + * @value: sensor value in milli units
> > + * @state: sensor update state
> > + */
> > +
> > +struct peci_sensor_data {
> > +       s32 value;
> > +       struct peci_sensor_state state;
> > +};
> > +
> > +/**
> > + * peci_sensor_need_update() - check whether sensor update is needed or not
> > + * @sensor: pointer to sensor data struct
> > + *
> > + * Return: true if update is needed, false if not.
> > + */
> > +
> > +static inline bool peci_sensor_need_update(struct peci_sensor_state *state)
> > +{
> > +       return !state->valid ||
> > +              time_after(jiffies, state->last_updated +
> > PECI_HWMON_UPDATE_INTERVAL);
> > +}
> > +
> > +/**
> > + * peci_sensor_mark_updated() - mark the sensor is updated
> > + * @sensor: pointer to sensor data struct
> > + */
> > +static inline void peci_sensor_mark_updated(struct peci_sensor_state
> > *state)
> > +{
> > +       state->valid = true;
> > +       state->last_updated = jiffies;
> > +}
> > +
> > +#endif /* __PECI_HWMON_COMMON_H */
> > diff --git a/drivers/hwmon/peci/cputemp.c b/drivers/hwmon/peci/cputemp.c
> > new file mode 100644
> > index 000000000000..9c6858a9fb6d
> > --- /dev/null
> > +++ b/drivers/hwmon/peci/cputemp.c
> > @@ -0,0 +1,591 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (c) 2018-2021 Intel Corporation
> > +
> > +#include <linux/auxiliary_bus.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/bitops.h>
> > +#include <linux/hwmon.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/peci.h>
> > +#include <linux/peci-cpu.h>
> > +#include <linux/units.h>
> > +#include <linux/x86/intel-family.h>
> > +
> > +#include "common.h"
> > +
> > +#define CORE_NUMS_MAX          64
> > +
> > +#define BASE_CHANNEL_NUMS      5
> > +#define CPUTEMP_CHANNEL_NUMS   (BASE_CHANNEL_NUMS + CORE_NUMS_MAX)
> > +
> > +#define TEMP_TARGET_FAN_TEMP_MASK      GENMASK(15, 8)
> > +#define TEMP_TARGET_REF_TEMP_MASK      GENMASK(23, 16)
> > +#define TEMP_TARGET_TJ_OFFSET_MASK     GENMASK(29, 24)
> > +
> > +#define DTS_MARGIN_MASK                GENMASK(15, 0)
> > +#define PCS_MODULE_TEMP_MASK   GENMASK(15, 0)
> > +
> > +#define DTS_FIXED_POINT_FRACTION       64
> > +
> > +struct resolved_cores_reg {
> > +       u8 bus;
> > +       u8 dev;
> > +       u8 func;
> > +       u8 offset;
> > +};
> > +
> > +struct cpu_info {
> > +       struct resolved_cores_reg *reg;
> > +       u8 min_peci_revision;
> > +};
> > +
> > +struct peci_temp_target {
> > +       s32 tcontrol;
> > +       s32 tthrottle;
> > +       s32 tjmax;
> > +       struct peci_sensor_state state;
> > +};
> > +
> > +enum peci_temp_target_type {
> > +       tcontrol_type,
> > +       tthrottle_type,
> > +       tjmax_type,
> > +       crit_hyst_type,
> > +};
> > +
> > +struct peci_cputemp {
> > +       struct peci_device *peci_dev;
> > +       struct device *dev;
> > +       const char *name;
> > +       const struct cpu_info *gen_info;
> > +       struct {
> > +               struct peci_temp_target target;
> > +               struct peci_sensor_data die;
> > +               struct peci_sensor_data dts;
> > +               struct peci_sensor_data core[CORE_NUMS_MAX];
> > +       } temp;
> > +       const char **coretemp_label;
> > +       DECLARE_BITMAP(core_mask, CORE_NUMS_MAX);
> > +};
> > +
> > +enum cputemp_channels {
> > +       channel_die,
> > +       channel_dts,
> > +       channel_tcontrol,
> > +       channel_tthrottle,
> > +       channel_tjmax,
> > +       channel_core,
> > +};
> > +
> > +static const char * const cputemp_label[BASE_CHANNEL_NUMS] = {
> > +       "Die",
> > +       "DTS",
> > +       "Tcontrol",
> > +       "Tthrottle",
> > +       "Tjmax",
> > +};
> > +
> > +static int update_temp_target(struct peci_cputemp *priv)
> > +{
> > +       s32 tthrottle_offset, tcontrol_margin;
> > +       u32 pcs;
> > +       int ret;
> > +
> > +       if (!peci_sensor_need_update(&priv->temp.target.state))
> > +               return 0;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_TEMP_TARGET, 0, &pcs);
> > +       if (ret)
> > +               return ret;
> > +
> > +       priv->temp.target.tjmax =
> > +               FIELD_GET(TEMP_TARGET_REF_TEMP_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +
> > +       tcontrol_margin = FIELD_GET(TEMP_TARGET_FAN_TEMP_MASK, pcs);
> > +       tcontrol_margin = sign_extend32(tcontrol_margin, 7) *
> > MILLIDEGREE_PER_DEGREE;
> > +       priv->temp.target.tcontrol = priv->temp.target.tjmax -
> > tcontrol_margin;
> > +
> > +       tthrottle_offset = FIELD_GET(TEMP_TARGET_TJ_OFFSET_MASK, pcs) *
> > MILLIDEGREE_PER_DEGREE;
> > +       priv->temp.target.tthrottle = priv->temp.target.tjmax -
> > tthrottle_offset;
> > +
> > +       peci_sensor_mark_updated(&priv->temp.target.state);
> > +
> > +       return 0;
> > +}
> > +
> > +static int get_temp_target(struct peci_cputemp *priv, enum
> > peci_temp_target_type type, long *val)
> > +{
> > +       int ret;
> > +
> > +       mutex_lock(&priv->temp.target.state.lock);
> > +
> > +       ret = update_temp_target(priv);
> > +       if (ret)
> > +               goto unlock;
> > +
> > +       switch (type) {
> > +       case tcontrol_type:
> > +               *val = priv->temp.target.tcontrol;
> > +               break;
> > +       case tthrottle_type:
> > +               *val = priv->temp.target.tthrottle;
> > +               break;
> > +       case tjmax_type:
> > +               *val = priv->temp.target.tjmax;
> > +               break;
> > +       case crit_hyst_type:
> > +               *val = priv->temp.target.tjmax - priv->temp.target.tcontrol;
> > +               break;
> > +       default:
> > +               ret = -EOPNOTSUPP;
> > +               break;
> > +       }
> > +unlock:
> > +       mutex_unlock(&priv->temp.target.state.lock);
> > +
> > +       return ret;
> > +}
> > +
> > +/*
> > + * Processors return a value of DTS reading in S10.6 fixed point format
> > + * (16 bits: 10-bit signed magnitude, 6-bit fraction).
> > + * Error codes:
> > + *   0x8000: General sensor error
> > + *   0x8001: Reserved
> > + *   0x8002: Underflow on reading value
> > + *   0x8003-0x81ff: Reserved
> > + */
> > +static bool dts_valid(s32 val)
> > +{
> > +       return val < 0x8000 || val > 0x81ff;
> > +}
> > +
> > +static s32 dts_to_millidegree(s32 val)
> > +{
> > +       return sign_extend32(val, 15) * MILLIDEGREE_PER_DEGREE /
> > DTS_FIXED_POINT_FRACTION;
> > +}
> > +
> > +static int get_die_temp(struct peci_cputemp *priv, long *val)
> > +{
> > +       long tjmax;
> > +       s16 temp;
> > +       int ret;
> > +
> > +       mutex_lock(&priv->temp.die.state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.die.state))
> > +               goto skip_update;
> > +
> > +       ret = peci_temp_read(priv->peci_dev, &temp);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       if (!dts_valid(temp)) {
> > +               ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
> > +       ret = get_temp_target(priv, tjmax_type, &tjmax);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       priv->temp.die.value = (s32)tjmax + dts_to_millidegree(temp);
> > +
> > +       peci_sensor_mark_updated(&priv->temp.die.state);
> > +
> > +skip_update:
> > +       *val = priv->temp.die.value;
> > +       mutex_unlock(&priv->temp.die.state.lock);
> > +
> > +       return 0;
> > +
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.die.state.lock);
> > +       return ret;
> > +}
> > +
> > +static int get_dts(struct peci_cputemp *priv, long *val)
> > +{
> > +       s32 dts_margin;
> > +       long tcontrol;
> > +       u32 pcs;
> > +       int ret;
> > +
> > +       mutex_lock(&priv->temp.dts.state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.dts.state))
> > +               goto skip_update;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_THERMAL_MARGIN, 0,
> > &pcs);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       dts_margin = FIELD_GET(DTS_MARGIN_MASK, pcs);
> > +       if (!dts_valid(dts_margin)) {
> > +               ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
> > +       ret = get_temp_target(priv, tcontrol_type, &tcontrol);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       /* Note that the tcontrol should be available before calling it */
> > +       priv->temp.dts.value = (s32)tcontrol -
> > dts_to_millidegree(dts_margin);
> > +
> > +       peci_sensor_mark_updated(&priv->temp.dts.state);
> > +
> > +skip_update:
> > +       *val = priv->temp.dts.value;
> > +       mutex_unlock(&priv->temp.dts.state.lock);
> > +
> > +       return 0;
> > +
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.dts.state.lock);
> > +       return ret;
> 
> Simplify (see below)
> 
> > +}
> > +
> > +static int get_core_temp(struct peci_cputemp *priv, int core_index, long
> > *val)
> > +{
> > +       s32 core_dts_margin;
> > +       long tjmax;
> > +       u32 pcs;
> > +       int ret;
> 
>         int ret = 0;
> 
> to handle simplification below.
> 
> > +
> > +       mutex_lock(&priv->temp.core[core_index].state.lock);
> > +       if (!peci_sensor_need_update(&priv->temp.core[core_index].state))
> > +               goto skip_update;
> > +
> > +       ret = peci_pcs_read(priv->peci_dev, PECI_PCS_MODULE_TEMP,
> > core_index, &pcs);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       core_dts_margin = FIELD_GET(PCS_MODULE_TEMP_MASK, pcs);
> > +       if (!dts_valid(core_dts_margin)) {
> > +               ret = -EIO;
> > +               goto err_unlock;
> > +       }
> > +
> > +       ret = get_temp_target(priv, tjmax_type, &tjmax);
> > +       if (ret)
> > +               goto err_unlock;
> > +
> > +       /* Note that the tjmax should be available before calling it */
> > +       priv->temp.core[core_index].value = (s32)tjmax +
> > dts_to_millidegree(core_dts_margin);
> > +
> > +       peci_sensor_mark_updated(&priv->temp.core[core_index].state);
> > +
> > +skip_update:
> > +       *val = priv->temp.core[core_index].value;
> > +       mutex_unlock(&priv->temp.core[core_index].state.lock);
> > +
> > +       return 0;
> > +
> > +err_unlock:
> > +       mutex_unlock(&priv->temp.core[core_index].state.lock);
> > +       return ret;
> 
> Simplify:
> 
> skip_update:
>         *val = priv->temp.core[core_index].value;
> err_unlock:
>         mutex_unlock(&priv->temp.core[core_index].state.lock);
>         return ret;

Sure, I'll use the same pattern in other places as well.

Thank you
-Iwona

> 
> > +}
> > +
> > +static int cputemp_read_string(struct device *dev, enum hwmon_sensor_types
> > type,
> > +                              u32 attr, int channel, const char **str)
> > +{
> > +       struct peci_cputemp *priv = dev_get_drvdata(dev);
> > +
> > +       if (attr != hwmon_temp_label)
> > +               return -EOPNOTSUPP;
> > +
> > +       *str = channel < channel_core ?
> > +               cputemp_label[channel] : priv->coretemp_label[channel -
> > channel_core];
> > +
> > +       return 0;
> > +}
> > +
> > +static int cputemp_read(struct device *dev, enum hwmon_sensor_types type,
> > +                       u32 attr, int channel, long *val)
> > +{
> > +       struct peci_cputemp *priv = dev_get_drvdata(dev);
> > +
> > +       switch (attr) {
> > +       case hwmon_temp_input:
> > +               switch (channel) {
> > +               case channel_die:
> > +                       return get_die_temp(priv, val);
> > +               case channel_dts:
> > +                       return get_dts(priv, val);
> > +               case channel_tcontrol:
> > +                       return get_temp_target(priv, tcontrol_type, val);
> > +               case channel_tthrottle:
> > +                       return get_temp_target(priv, tthrottle_type, val);
> > +               case channel_tjmax:
> > +                       return get_temp_target(priv, tjmax_type, val);
> > +               default:
> > +                       return get_core_temp(priv, channel - channel_core,
> > val);
> > +               }
> > +               break;
> > +       case hwmon_temp_max:
> > +               return get_temp_target(priv, tcontrol_type, val);
> > +       case hwmon_temp_crit:
> > +               return get_temp_target(priv, tjmax_type, val);
> > +       case hwmon_temp_crit_hyst:
> > +               return get_temp_target(priv, crit_hyst_type, val);
> > +       default:
> > +               return -EOPNOTSUPP;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static umode_t cputemp_is_visible(const void *data, enum hwmon_sensor_types
> > type,
> > +                                 u32 attr, int channel)
> > +{
> > +       const struct peci_cputemp *priv = data;
> > +
> > +       if (channel > CPUTEMP_CHANNEL_NUMS)
> > +               return 0;
> > +
> > +       if (channel < channel_core)
> > +               return 0444;
> > +
> > +       if (test_bit(channel - channel_core, priv->core_mask))
> > +               return 0444;
> > +
> > +       return 0;
> > +}
> > +
> > +static int init_core_mask(struct peci_cputemp *priv)
> > +{
> > +       struct peci_device *peci_dev = priv->peci_dev;
> > +       struct resolved_cores_reg *reg = priv->gen_info->reg;
> > +       u64 core_mask;
> > +       u32 data;
> > +       int ret;
> > +
> > +       /* Get the RESOLVED_CORES register value */
> > +       switch (peci_dev->info.model) {
> > +       case INTEL_FAM6_ICELAKE_X:
> > +       case INTEL_FAM6_ICELAKE_D:
> > +               ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg-
> > >dev,
> > +                                            reg->func, reg->offset + 4,
> > &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask = (u64)data << 32;
> > +
> > +               ret = peci_ep_pci_local_read(peci_dev, 0, reg->bus, reg-
> > >dev,
> > +                                            reg->func, reg->offset, &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask |= data;
> > +
> > +               break;
> > +       default:
> > +               ret = peci_pci_local_read(peci_dev, reg->bus, reg->dev,
> > +                                         reg->func, reg->offset, &data);
> > +               if (ret)
> > +                       return ret;
> > +
> > +               core_mask = data;
> > +
> > +               break;
> > +       }
> > +
> > +       if (!core_mask)
> > +               return -EIO;
> > +
> > +       bitmap_from_u64(priv->core_mask, core_mask);
> > +
> > +       return 0;
> > +}
> > +
> > +static int create_temp_label(struct peci_cputemp *priv)
> > +{
> > +       unsigned long core_max = find_last_bit(priv->core_mask,
> > CORE_NUMS_MAX);
> > +       int i;
> > +
> > +       priv->coretemp_label = devm_kzalloc(priv->dev, core_max *
> > sizeof(char *), GFP_KERNEL);
> > +       if (!priv->coretemp_label)
> > +               return -ENOMEM;
> > +
> > +       for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX) {
> > +               priv->coretemp_label[i] = devm_kasprintf(priv->dev,
> > GFP_KERNEL, "Core %d", i);
> > +               if (!priv->coretemp_label[i])
> > +                       return -ENOMEM;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static void check_resolved_cores(struct peci_cputemp *priv)
> > +{
> > +       /*
> > +        * Failure to resolve cores is non-critical, we're still able to
> > +        * provide other sensor data.
> > +        */
> > +
> > +       if (init_core_mask(priv))
> > +               return;
> > +
> > +       if (create_temp_label(priv))
> > +               bitmap_zero(priv->core_mask, CORE_NUMS_MAX);
> > +}
> > +
> > +static void sensor_init(struct peci_cputemp *priv)
> > +{
> > +       int i;
> > +
> > +       mutex_init(&priv->temp.target.state.lock);
> > +       mutex_init(&priv->temp.die.state.lock);
> > +       mutex_init(&priv->temp.dts.state.lock);
> > +
> > +       for_each_set_bit(i, priv->core_mask, CORE_NUMS_MAX)
> > +               mutex_init(&priv->temp.core[i].state.lock);
> > +}
> > +
> > +static const struct hwmon_ops peci_cputemp_ops = {
> > +       .is_visible = cputemp_is_visible,
> > +       .read_string = cputemp_read_string,
> > +       .read = cputemp_read,
> > +};
> > +
> > +static const u32 peci_cputemp_temp_channel_config[] = {
> > +       /* Die temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > +       /* DTS margin */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
> > HWMON_T_CRIT_HYST,
> > +       /* Tcontrol temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT | HWMON_T_CRIT,
> > +       /* Tthrottle temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT,
> > +       /* Tjmax temperature */
> > +       HWMON_T_LABEL | HWMON_T_INPUT,
> > +       /* Core temperature - for all core channels */
> > +       [channel_core ... CPUTEMP_CHANNEL_NUMS - 1] = HWMON_T_LABEL |
> > HWMON_T_INPUT,
> > +       0
> > +};
> > +
> > +static const struct hwmon_channel_info peci_cputemp_temp_channel = {
> > +       .type = hwmon_temp,
> > +       .config = peci_cputemp_temp_channel_config,
> > +};
> > +
> > +static const struct hwmon_channel_info *peci_cputemp_info[] = {
> > +       &peci_cputemp_temp_channel,
> > +       NULL
> > +};
> > +
> > +static const struct hwmon_chip_info peci_cputemp_chip_info = {
> > +       .ops = &peci_cputemp_ops,
> > +       .info = peci_cputemp_info,
> > +};
> > +
> > +static int peci_cputemp_probe(struct auxiliary_device *adev,
> > +                             const struct auxiliary_device_id *id)
> > +{
> > +       struct device *dev = &adev->dev;
> > +       struct peci_device *peci_dev = to_peci_device(dev->parent);
> > +       struct peci_cputemp *priv;
> > +       struct device *hwmon_dev;
> > +
> > +       priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return -ENOMEM;
> > +
> > +       priv->name = devm_kasprintf(dev, GFP_KERNEL, "peci_cputemp.cpu%d",
> > +                                   peci_dev->info.socket_id);
> > +       if (!priv->name)
> > +               return -ENOMEM;
> > +
> > +       priv->dev = dev;
> > +       priv->peci_dev = peci_dev;
> > +       priv->gen_info = (const struct cpu_info *)id->driver_data;
> > +
> > +       /*
> > +        * This is just a sanity check. Since we're using commands that are
> > +        * guaranteed to be supported on a given platform, we should never
> > see
> > +        * revision lower than expected.
> > +        */
> > +       if (peci_dev->info.peci_revision < priv->gen_info-
> > >min_peci_revision)
> > +               dev_warn(priv->dev,
> > +                        "Unexpected PECI revision %#x, some features may be
> > unavailable\n",
> > +                        peci_dev->info.peci_revision);
> > +
> > +       check_resolved_cores(priv);
> > +
> > +       sensor_init(priv);
> > +
> > +       hwmon_dev = devm_hwmon_device_register_with_info(priv->dev, priv-
> > >name,
> > +                                                        priv,
> > &peci_cputemp_chip_info, NULL);
> > +
> > +       return PTR_ERR_OR_ZERO(hwmon_dev);
> > +}
> > +
> > +/*
> > + * RESOLVED_CORES PCI configuration register may have different location on
> > + * different platforms.
> > + */
> > +static struct resolved_cores_reg resolved_cores_reg_hsx = {
> > +       .bus = 1,
> > +       .dev = 30,
> > +       .func = 3,
> > +       .offset = 0xb4,
> > +};
> > +
> > +static struct resolved_cores_reg resolved_cores_reg_icx = {
> > +       .bus = 14,
> > +       .dev = 30,
> > +       .func = 3,
> > +       .offset = 0xd0,
> > +};
> > +
> > +static const struct cpu_info cpu_hsx = {
> > +       .reg            = &resolved_cores_reg_hsx,
> > +       .min_peci_revision = 0x33,
> > +};
> > +
> > +static const struct cpu_info cpu_icx = {
> > +       .reg            = &resolved_cores_reg_icx,
> > +       .min_peci_revision = 0x40,
> > +};
> > +
> > +static const struct auxiliary_device_id peci_cputemp_ids[] = {
> > +       {
> > +               .name = "peci_cpu.cputemp.hsx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.bdx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.bdxd",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.skx",
> > +               .driver_data = (kernel_ulong_t)&cpu_hsx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.icx",
> > +               .driver_data = (kernel_ulong_t)&cpu_icx,
> > +       },
> > +       {
> > +               .name = "peci_cpu.cputemp.icxd",
> > +               .driver_data = (kernel_ulong_t)&cpu_icx,
> > +       },
> > +       { }
> > +};
> > +MODULE_DEVICE_TABLE(auxiliary, peci_cputemp_ids);
> > +
> > +static struct auxiliary_driver peci_cputemp_driver = {
> > +       .probe          = peci_cputemp_probe,
> > +       .id_table       = peci_cputemp_ids,
> > +};
> > +
> > +module_auxiliary_driver(peci_cputemp_driver);
> > +
> > +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> > +MODULE_AUTHOR("Iwona Winiarska <iwona.winiarska@intel.com>");
> > +MODULE_DESCRIPTION("PECI cputemp driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI_CPU);
> > 
>

Greg Kroah-Hartman Aug. 5, 2021, 12:17 p.m. UTC | #3

On Tue, Aug 03, 2021 at 01:31:19PM +0200, Iwona Winiarska wrote:
> Hi Greg,

> 

> This is a second round of patches introducing PECI subsystem.

> I don't think it is ready to be applied right away (we're still

> missing r-b's), but I hope we have chance to complete discussion in

> the 5.15 development cycle. I would appreciate if you could take

> a look.


I will wait to review this when you all feel it is ready so as to not
waste my time finding things that you already know need to be resolved.

thanks,

greg k-h

Rob Herring (Arm) Aug. 11, 2021, 6:11 p.m. UTC | #4

On Tue, 03 Aug 2021 13:31:22 +0200, Iwona Winiarska wrote:
> Add device tree bindings for the PECI controller.

> 

> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> ---

>  .../bindings/peci/peci-controller.yaml        | 33 +++++++++++++++++++

>  1 file changed, 33 insertions(+)

>  create mode 100644 Documentation/devicetree/bindings/peci/peci-controller.yaml

> 


Reviewed-by: Rob Herring <robh@kernel.org>

Dan Williams Aug. 25, 2021, 10:58 p.m. UTC | #5

On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska
<iwona.winiarska@intel.com> wrote:
>

> Intel processors provide access for various services designed to support

> processor and DRAM thermal management, platform manageability and

> processor interface tuning and diagnostics.

> Those services are available via the Platform Environment Control

> Interface (PECI) that provides a communication channel between the

> processor and the Baseboard Management Controller (BMC) or other

> platform management device.

>

> This change introduces PECI subsystem by adding the initial core module

> and API for controller drivers.

>

> Co-developed-by: Jason M Bills <jason.m.bills@linux.intel.com>

> Signed-off-by: Jason M Bills <jason.m.bills@linux.intel.com>

> Co-developed-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> ---

>  MAINTAINERS             |   9 +++

>  drivers/Kconfig         |   3 +

>  drivers/Makefile        |   1 +

>  drivers/peci/Kconfig    |  15 ++++

>  drivers/peci/Makefile   |   5 ++

>  drivers/peci/core.c     | 155 ++++++++++++++++++++++++++++++++++++++++

>  drivers/peci/internal.h |  16 +++++

>  include/linux/peci.h    |  99 +++++++++++++++++++++++++

>  8 files changed, 303 insertions(+)

>  create mode 100644 drivers/peci/Kconfig

>  create mode 100644 drivers/peci/Makefile

>  create mode 100644 drivers/peci/core.c

>  create mode 100644 drivers/peci/internal.h

>  create mode 100644 include/linux/peci.h

>

> diff --git a/MAINTAINERS b/MAINTAINERS

> index 7cdab7229651..d411974aaa5e 100644

> --- a/MAINTAINERS

> +++ b/MAINTAINERS

> @@ -14503,6 +14503,15 @@ L:     platform-driver-x86@vger.kernel.org

>  S:     Maintained

>  F:     drivers/platform/x86/peaq-wmi.c

>

> +PECI SUBSYSTEM

> +M:     Iwona Winiarska <iwona.winiarska@intel.com>

> +R:     Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> +L:     openbmc@lists.ozlabs.org (moderated for non-subscribers)

> +S:     Supported

> +F:     Documentation/devicetree/bindings/peci/

> +F:     drivers/peci/

> +F:     include/linux/peci.h

> +

>  PENSANDO ETHERNET DRIVERS

>  M:     Shannon Nelson <snelson@pensando.io>

>  M:     drivers@pensando.io

> diff --git a/drivers/Kconfig b/drivers/Kconfig

> index 8bad63417a50..f472b3d972b3 100644

> --- a/drivers/Kconfig

> +++ b/drivers/Kconfig

> @@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"

>  source "drivers/counter/Kconfig"

>

>  source "drivers/most/Kconfig"

> +

> +source "drivers/peci/Kconfig"

> +

>  endmenu

> diff --git a/drivers/Makefile b/drivers/Makefile

> index 27c018bdf4de..8d96f0c3dde5 100644

> --- a/drivers/Makefile

> +++ b/drivers/Makefile

> @@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS)          += gnss/

>  obj-$(CONFIG_INTERCONNECT)     += interconnect/

>  obj-$(CONFIG_COUNTER)          += counter/

>  obj-$(CONFIG_MOST)             += most/

> +obj-$(CONFIG_PECI)             += peci/

> diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig

> new file mode 100644

> index 000000000000..71a4ad81225a

> --- /dev/null

> +++ b/drivers/peci/Kconfig

> @@ -0,0 +1,15 @@

> +# SPDX-License-Identifier: GPL-2.0-only

> +

> +menuconfig PECI

> +       tristate "PECI support"

> +       help

> +         The Platform Environment Control Interface (PECI) is an interface

> +         that provides a communication channel to Intel processors and

> +         chipset components from external monitoring or control devices.

> +

> +         If you are building a Baseboard Management Controller (BMC) kernel

> +         for Intel platform say Y here and also to the specific driver for

> +         your adapter(s) below. If unsure say N.

> +

> +         This support is also available as a module. If so, the module

> +         will be called peci.

> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile

> new file mode 100644

> index 000000000000..e789a354e842

> --- /dev/null

> +++ b/drivers/peci/Makefile

> @@ -0,0 +1,5 @@

> +# SPDX-License-Identifier: GPL-2.0-only

> +

> +# Core functionality

> +peci-y := core.o

> +obj-$(CONFIG_PECI) += peci.o

> diff --git a/drivers/peci/core.c b/drivers/peci/core.c

> new file mode 100644

> index 000000000000..7b3938af0396

> --- /dev/null

> +++ b/drivers/peci/core.c

> @@ -0,0 +1,155 @@

> +// SPDX-License-Identifier: GPL-2.0-only

> +// Copyright (c) 2018-2021 Intel Corporation

> +

> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt


This looks like overkill for only one print statement in this module,
especially when the dev_ print helpers offer more detail.

> +

> +#include <linux/bug.h>

> +#include <linux/device.h>

> +#include <linux/export.h>

> +#include <linux/idr.h>

> +#include <linux/module.h>

> +#include <linux/of.h>

> +#include <linux/peci.h>

> +#include <linux/pm_runtime.h>

> +#include <linux/property.h>

> +#include <linux/slab.h>

> +

> +#include "internal.h"

> +

> +static DEFINE_IDA(peci_controller_ida);

> +

> +static void peci_controller_dev_release(struct device *dev)

> +{

> +       struct peci_controller *controller = to_peci_controller(dev);

> +

> +       pm_runtime_disable(&controller->dev);


This seems late to be disabling power management, the device is about
to be freed. Keep in mind the lifetime of the this object can be
artificially prolonged. I expect this to be done when the device is
unregistered from the bus.

> +

> +       mutex_destroy(&controller->bus_lock);

> +       ida_free(&peci_controller_ida, controller->id);

> +       fwnode_handle_put(controller->dev.fwnode);


Shouldn't the get / put of this handle reference be bound to specific
accesses not held for the entire lifetime of the object? At a minimum
it seems to be a reference that can taken at registration and dropped
at unregistration.

> +       kfree(controller);

> +}

> +

> +struct device_type peci_controller_type = {

> +       .release        = peci_controller_dev_release,

> +};

> +

> +static struct peci_controller *peci_controller_alloc(struct device *dev,

> +                                                    struct peci_controller_ops *ops)

> +{

> +       struct fwnode_handle *node = fwnode_handle_get(dev_fwnode(dev));

> +       struct peci_controller *controller;

> +       int ret;

> +

> +       if (!ops->xfer)

> +               return ERR_PTR(-EINVAL);

> +

> +       controller = kzalloc(sizeof(*controller), GFP_KERNEL);

> +       if (!controller)

> +               return ERR_PTR(-ENOMEM);

> +

> +       ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);

> +       if (ret < 0)

> +               goto err;

> +       controller->id = ret;

> +

> +       controller->ops = ops;

> +

> +       controller->dev.parent = dev;

> +       controller->dev.bus = &peci_bus_type;

> +       controller->dev.type = &peci_controller_type;

> +       controller->dev.fwnode = node;

> +       controller->dev.of_node = to_of_node(node);

> +

> +       device_initialize(&controller->dev);

> +

> +       mutex_init(&controller->bus_lock);

> +

> +       pm_runtime_no_callbacks(&controller->dev);

> +       pm_suspend_ignore_children(&controller->dev, true);

> +       pm_runtime_enable(&controller->dev);


Per above, are you sure unregistered devices need pm_runtime enabled?

Rest looks ok to me.

Dan Williams Aug. 26, 2021, 1:35 a.m. UTC | #6

On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska
<iwona.winiarska@intel.com> wrote:
>

> From: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

>

> ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical

> interface (a.k.a PECI wire).


Maybe a one sentence blurb here and in the Kconfig reminding people
why they should care if they have a PECI driver or not?

>

> Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> Co-developed-by: Iwona Winiarska <iwona.winiarska@intel.com>

> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> ---

>  MAINTAINERS                           |   9 +

>  drivers/peci/Kconfig                  |   6 +

>  drivers/peci/Makefile                 |   3 +

>  drivers/peci/controller/Kconfig       |  16 +

>  drivers/peci/controller/Makefile      |   3 +

>  drivers/peci/controller/peci-aspeed.c | 445 ++++++++++++++++++++++++++

>  6 files changed, 482 insertions(+)

>  create mode 100644 drivers/peci/controller/Kconfig

>  create mode 100644 drivers/peci/controller/Makefile

>  create mode 100644 drivers/peci/controller/peci-aspeed.c

>

> diff --git a/MAINTAINERS b/MAINTAINERS

> index d411974aaa5e..6e9d53ff68ab 100644

> --- a/MAINTAINERS

> +++ b/MAINTAINERS

> @@ -2866,6 +2866,15 @@ S:       Maintained

>  F:     Documentation/hwmon/asc7621.rst

>  F:     drivers/hwmon/asc7621.c

>

> +ASPEED PECI CONTROLLER

> +M:     Iwona Winiarska <iwona.winiarska@intel.com>

> +M:     Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> +L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)

> +L:     openbmc@lists.ozlabs.org (moderated for non-subscribers)

> +S:     Supported

> +F:     Documentation/devicetree/bindings/peci/peci-aspeed.yaml

> +F:     drivers/peci/controller/peci-aspeed.c

> +

>  ASPEED PINCTRL DRIVERS

>  M:     Andrew Jeffery <andrew@aj.id.au>

>  L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)

> diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig

> index 71a4ad81225a..99279df97a78 100644

> --- a/drivers/peci/Kconfig

> +++ b/drivers/peci/Kconfig

> @@ -13,3 +13,9 @@ menuconfig PECI

>

>           This support is also available as a module. If so, the module

>           will be called peci.

> +

> +if PECI

> +

> +source "drivers/peci/controller/Kconfig"

> +

> +endif # PECI

> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile

> index e789a354e842..926d8df15cbd 100644

> --- a/drivers/peci/Makefile

> +++ b/drivers/peci/Makefile

> @@ -3,3 +3,6 @@

>  # Core functionality

>  peci-y := core.o

>  obj-$(CONFIG_PECI) += peci.o

> +

> +# Hardware specific bus drivers

> +obj-y += controller/

> diff --git a/drivers/peci/controller/Kconfig b/drivers/peci/controller/Kconfig

> new file mode 100644

> index 000000000000..6d48df08db1c

> --- /dev/null

> +++ b/drivers/peci/controller/Kconfig

> @@ -0,0 +1,16 @@

> +# SPDX-License-Identifier: GPL-2.0-only

> +

> +config PECI_ASPEED

> +       tristate "ASPEED PECI support"

> +       depends on ARCH_ASPEED || COMPILE_TEST

> +       depends on OF

> +       depends on HAS_IOMEM

> +       help

> +         This option enables PECI controller driver for ASPEED AST2400,

> +         AST2500 and AST2600 SoCs.

> +

> +         Say Y here if your system runs on ASPEED SoC and you are using it

> +         as BMC for Intel platform.

> +

> +         This driver can also be built as a module. If so, the module will

> +         be called peci-aspeed.

> diff --git a/drivers/peci/controller/Makefile b/drivers/peci/controller/Makefile

> new file mode 100644

> index 000000000000..022c28ef1bf0

> --- /dev/null

> +++ b/drivers/peci/controller/Makefile

> @@ -0,0 +1,3 @@

> +# SPDX-License-Identifier: GPL-2.0-only

> +

> +obj-$(CONFIG_PECI_ASPEED)      += peci-aspeed.o

> diff --git a/drivers/peci/controller/peci-aspeed.c b/drivers/peci/controller/peci-aspeed.c

> new file mode 100644

> index 000000000000..1d708c983749

> --- /dev/null

> +++ b/drivers/peci/controller/peci-aspeed.c

> @@ -0,0 +1,445 @@

> +// SPDX-License-Identifier: GPL-2.0-only

> +// Copyright (C) 2012-2017 ASPEED Technology Inc.

> +// Copyright (c) 2018-2021 Intel Corporation


Why different copyright capitalization?

> +

> +#include <linux/bitfield.h>

> +#include <linux/clk.h>

> +#include <linux/delay.h>

> +#include <linux/interrupt.h>

> +#include <linux/io.h>

> +#include <linux/iopoll.h>

> +#include <linux/jiffies.h>

> +#include <linux/module.h>

> +#include <linux/of.h>

> +#include <linux/peci.h>

> +#include <linux/platform_device.h>

> +#include <linux/reset.h>

> +

> +#include <asm/unaligned.h>


Why is this included?

> +

> +/* ASPEED PECI Registers */

> +/* Control Register */

> +#define ASPEED_PECI_CTRL                       0x00

> +#define   ASPEED_PECI_CTRL_SAMPLING_MASK       GENMASK(19, 16)

> +#define   ASPEED_PECI_CTRL_RD_MODE_MASK                GENMASK(13, 12)

> +#define     ASPEED_PECI_CTRL_RD_MODE_DBG       BIT(13)

> +#define     ASPEED_PECI_CTRL_RD_MODE_COUNT     BIT(12)

> +#define   ASPEED_PECI_CTRL_CLK_SOURCE          BIT(11)

> +#define   ASPEED_PECI_CTRL_CLK_DIV_MASK                GENMASK(10, 8)

> +#define   ASPEED_PECI_CTRL_INVERT_OUT          BIT(7)

> +#define   ASPEED_PECI_CTRL_INVERT_IN           BIT(6)

> +#define   ASPEED_PECI_CTRL_BUS_CONTENTION_EN   BIT(5)

> +#define   ASPEED_PECI_CTRL_PECI_EN             BIT(4)

> +#define   ASPEED_PECI_CTRL_PECI_CLK_EN         BIT(0)

> +

> +/* Timing Negotiation Register */

> +#define ASPEED_PECI_TIMING_NEGOTIATION         0x04

> +#define   ASPEED_PECI_T_NEGO_MSG_MASK          GENMASK(15, 8)

> +#define   ASPEED_PECI_T_NEGO_ADDR_MASK         GENMASK(7, 0)

> +

> +/* Command Register */

> +#define ASPEED_PECI_CMD                                0x08

> +#define   ASPEED_PECI_CMD_PIN_MONITORING       BIT(31)

> +#define   ASPEED_PECI_CMD_STS_MASK             GENMASK(27, 24)

> +#define     ASPEED_PECI_CMD_STS_ADDR_T_NEGO    0x3

> +#define   ASPEED_PECI_CMD_IDLE_MASK            \

> +         (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MONITORING)

> +#define   ASPEED_PECI_CMD_FIRE                 BIT(0)

> +

> +/* Read/Write Length Register */

> +#define ASPEED_PECI_RW_LENGTH                  0x0c

> +#define   ASPEED_PECI_AW_FCS_EN                        BIT(31)

> +#define   ASPEED_PECI_RD_LEN_MASK              GENMASK(23, 16)

> +#define   ASPEED_PECI_WR_LEN_MASK              GENMASK(15, 8)

> +#define   ASPEED_PECI_TARGET_ADDR_MASK         GENMASK(7, 0)

> +

> +/* Expected FCS Data Register */

> +#define ASPEED_PECI_EXPECTED_FCS               0x10

> +#define   ASPEED_PECI_EXPECTED_RD_FCS_MASK     GENMASK(23, 16)

> +#define   ASPEED_PECI_EXPECTED_AW_FCS_AUTO_MASK        GENMASK(15, 8)

> +#define   ASPEED_PECI_EXPECTED_WR_FCS_MASK     GENMASK(7, 0)

> +

> +/* Captured FCS Data Register */

> +#define ASPEED_PECI_CAPTURED_FCS               0x14

> +#define   ASPEED_PECI_CAPTURED_RD_FCS_MASK     GENMASK(23, 16)

> +#define   ASPEED_PECI_CAPTURED_WR_FCS_MASK     GENMASK(7, 0)

> +

> +/* Interrupt Register */

> +#define ASPEED_PECI_INT_CTRL                   0x18

> +#define   ASPEED_PECI_TIMING_NEGO_SEL_MASK     GENMASK(31, 30)

> +#define     ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO   0

> +#define     ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO   1

> +#define     ASPEED_PECI_MESSAGE_NEGO           2

> +#define   ASPEED_PECI_INT_MASK                 GENMASK(4, 0)

> +#define     ASPEED_PECI_INT_BUS_TIMEOUT                BIT(4)

> +#define     ASPEED_PECI_INT_BUS_CONTENTION     BIT(3)

> +#define     ASPEED_PECI_INT_WR_FCS_BAD         BIT(2)

> +#define     ASPEED_PECI_INT_WR_FCS_ABORT       BIT(1)

> +#define     ASPEED_PECI_INT_CMD_DONE           BIT(0)

> +

> +/* Interrupt Status Register */

> +#define ASPEED_PECI_INT_STS                    0x1c

> +#define   ASPEED_PECI_INT_TIMING_RESULT_MASK   GENMASK(29, 16)

> +         /* bits[4..0]: Same bit fields in the 'Interrupt Register' */

> +

> +/* Rx/Tx Data Buffer Registers */

> +#define ASPEED_PECI_WR_DATA0                   0x20

> +#define ASPEED_PECI_WR_DATA1                   0x24

> +#define ASPEED_PECI_WR_DATA2                   0x28

> +#define ASPEED_PECI_WR_DATA3                   0x2c

> +#define ASPEED_PECI_RD_DATA0                   0x30

> +#define ASPEED_PECI_RD_DATA1                   0x34

> +#define ASPEED_PECI_RD_DATA2                   0x38

> +#define ASPEED_PECI_RD_DATA3                   0x3c

> +#define ASPEED_PECI_WR_DATA4                   0x40

> +#define ASPEED_PECI_WR_DATA5                   0x44

> +#define ASPEED_PECI_WR_DATA6                   0x48

> +#define ASPEED_PECI_WR_DATA7                   0x4c

> +#define ASPEED_PECI_RD_DATA4                   0x50

> +#define ASPEED_PECI_RD_DATA5                   0x54

> +#define ASPEED_PECI_RD_DATA6                   0x58

> +#define ASPEED_PECI_RD_DATA7                   0x5c

> +#define   ASPEED_PECI_DATA_BUF_SIZE_MAX                32

> +

> +/* Timing Negotiation */

> +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT  8

> +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX      (BIT(4) - 1)

> +#define ASPEED_PECI_CLK_DIV_DEFAULT            0

> +#define ASPEED_PECI_CLK_DIV_MAX                        (BIT(3) - 1)

> +#define ASPEED_PECI_MSG_TIMING_DEFAULT         1

> +#define ASPEED_PECI_MSG_TIMING_MAX             (BIT(8) - 1)

> +#define ASPEED_PECI_ADDR_TIMING_DEFAULT                1

> +#define ASPEED_PECI_ADDR_TIMING_MAX            (BIT(8) - 1)

> +

> +/* Timeout */

> +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US      (50 * USEC_PER_MSEC)

> +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US     (10 * USEC_PER_MSEC)

> +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT     (1000)

> +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX         (1000)

> +

> +struct aspeed_peci {

> +       struct peci_controller *controller;

> +       struct device *dev;

> +       void __iomem *base;

> +       struct clk *clk;

> +       struct reset_control *rst;

> +       int irq;

> +       spinlock_t lock; /* to sync completion status handling */

> +       struct completion xfer_complete;

> +       u32 status;

> +       u32 cmd_timeout_ms;

> +       u32 msg_timing;

> +       u32 addr_timing;

> +       u32 rd_sampling_point;

> +       u32 clk_div;

> +};

> +

> +static void aspeed_peci_init_regs(struct aspeed_peci *priv)

> +{

> +       u32 val;

> +

> +       val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, ASPEED_PECI_CLK_DIV_DEFAULT);

> +       val |= ASPEED_PECI_CTRL_PECI_CLK_EN;

> +       writel(val, priv->base + ASPEED_PECI_CTRL);

> +       /*

> +        * Timing negotiation period setting.

> +        * The unit of the programmed value is 4 times of PECI clock period.

> +        */

> +       val = FIELD_PREP(ASPEED_PECI_T_NEGO_MSG_MASK, priv->msg_timing);

> +       val |= FIELD_PREP(ASPEED_PECI_T_NEGO_ADDR_MASK, priv->addr_timing);

> +       writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);

> +

> +       /* Clear interrupts */

> +       val = readl(priv->base + ASPEED_PECI_INT_STS) | ASPEED_PECI_INT_MASK;

> +       writel(val, priv->base + ASPEED_PECI_INT_STS);

> +

> +       /* Set timing negotiation mode and enable interrupts */

> +       val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK, ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);

> +       val |= ASPEED_PECI_INT_MASK;

> +       writel(val, priv->base + ASPEED_PECI_INT_CTRL);

> +

> +       val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv->rd_sampling_point);

> +       val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);

> +       val |= ASPEED_PECI_CTRL_PECI_EN;

> +       val |= ASPEED_PECI_CTRL_PECI_CLK_EN;

> +       writel(val, priv->base + ASPEED_PECI_CTRL);

> +}

> +

> +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)

> +{

> +       u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);

> +

> +       if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) == ASPEED_PECI_CMD_STS_ADDR_T_NEGO)

> +               aspeed_peci_init_regs(priv);

> +

> +       return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,

> +                                 cmd_sts,

> +                                 !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),

> +                                 ASPEED_PECI_IDLE_CHECK_INTERVAL_US,

> +                                 ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);

> +}

> +

> +static int aspeed_peci_xfer(struct peci_controller *controller,

> +                           u8 addr, struct peci_request *req)

> +{

> +       struct aspeed_peci *priv = dev_get_drvdata(controller->dev.parent);

> +       unsigned long flags, timeout = msecs_to_jiffies(priv->cmd_timeout_ms);

> +       u32 peci_head;

> +       int ret;

> +

> +       if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||

> +           req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)

> +               return -EINVAL;

> +

> +       /* Check command sts and bus idle state */

> +       ret = aspeed_peci_check_idle(priv);

> +       if (ret)

> +               return ret; /* -ETIMEDOUT */

> +

> +       spin_lock_irqsave(&priv->lock, flags);

> +       reinit_completion(&priv->xfer_complete);

> +

> +       peci_head = FIELD_PREP(ASPEED_PECI_TARGET_ADDR_MASK, addr) |

> +                   FIELD_PREP(ASPEED_PECI_WR_LEN_MASK, req->tx.len) |

> +                   FIELD_PREP(ASPEED_PECI_RD_LEN_MASK, req->rx.len);

> +

> +       writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);

> +

> +       memcpy_toio(priv->base + ASPEED_PECI_WR_DATA0, req->tx.buf, min_t(u8, req->tx.len, 16));

> +       if (req->tx.len > 16)

> +               memcpy_toio(priv->base + ASPEED_PECI_WR_DATA4, req->tx.buf + 16,

> +                           req->tx.len - 16);

> +

> +       dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);

> +       print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req->tx.len);


On CONFIG_DYNAMIC_DEBUG=n builds the kernel will do all the work of
reading through this buffer, but skip emitting it. Are you sure you
want to pay that overhead for every transaction?

> +

> +       priv->status = 0;

> +       writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);

> +       spin_unlock_irqrestore(&priv->lock, flags);

> +

> +       ret = wait_for_completion_interruptible_timeout(&priv->xfer_complete, timeout);


spin_lock_irqsave() says "I don't know if interrupts are disabled
already, so I'll save the state, whatever it is, and restore later"

wait_for_completion_interruptible_timeout() says "I know I am in a
sleepable context where interrupts are enabled"

So, one of those is wrong, i.e. should it be spin_{lock,unlock}_irq()?


> +       if (ret < 0)

> +               return ret;

> +

> +       if (ret == 0) {

> +               dev_dbg(priv->dev, "Timeout waiting for a response!\n");

> +               return -ETIMEDOUT;

> +       }

> +

> +       spin_lock_irqsave(&priv->lock, flags);

> +

> +       writel(0, priv->base + ASPEED_PECI_CMD);

> +

> +       if (priv->status != ASPEED_PECI_INT_CMD_DONE) {

> +               spin_unlock_irqrestore(&priv->lock, flags);

> +               dev_dbg(priv->dev, "No valid response!\n");

> +               return -EIO;

> +       }

> +

> +       spin_unlock_irqrestore(&priv->lock, flags);

> +

> +       memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_RD_DATA0, min_t(u8, req->rx.len, 16));

> +       if (req->rx.len > 16)

> +               memcpy_fromio(req->rx.buf + 16, priv->base + ASPEED_PECI_RD_DATA4,

> +                             req->rx.len - 16);

> +

> +       print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req->rx.len);

> +

> +       return 0;

> +}

> +

> +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)

> +{

> +       struct aspeed_peci *priv = arg;

> +       u32 status;

> +

> +       spin_lock(&priv->lock);

> +       status = readl(priv->base + ASPEED_PECI_INT_STS);

> +       writel(status, priv->base + ASPEED_PECI_INT_STS);

> +       priv->status |= (status & ASPEED_PECI_INT_MASK);

> +

> +       /*

> +        * In most cases, interrupt bits will be set one by one but also note

> +        * that multiple interrupt bits could be set at the same time.

> +        */

> +       if (status & ASPEED_PECI_INT_BUS_TIMEOUT)

> +               dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_TIMEOUT\n");

> +

> +       if (status & ASPEED_PECI_INT_BUS_CONTENTION)

> +               dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_BUS_CONTENTION\n");

> +

> +       if (status & ASPEED_PECI_INT_WR_FCS_BAD)

> +               dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_WR_FCS_BAD\n");

> +

> +       if (status & ASPEED_PECI_INT_WR_FCS_ABORT)

> +               dev_dbg_ratelimited(priv->dev, "ASPEED_PECI_INT_WR_FCS_ABORT\n");


Are you sure these would not be better as tracepoints? If you're
debugging an interrupt related failure, the ratelimiting might get in
your way when you really need to know when one of these error
interrupts fire relative to another event.

> +

> +       /*

> +        * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE bit

> +        * set even in an error case.

> +        */

> +       if (status & ASPEED_PECI_INT_CMD_DONE)

> +               complete(&priv->xfer_complete);


Hmm, no need to check if there was a sequencing error, like a command
was never submitted?

> +

> +       spin_unlock(&priv->lock);

> +

> +       return IRQ_HANDLED;

> +}

> +

> +static void aspeed_peci_property_sanitize(struct device *dev, const char *propname,

> +                                         u32 min, u32 max, u32 default_val, u32 *propval)

> +{

> +       u32 val;

> +       int ret;

> +

> +       ret = device_property_read_u32(dev, propname, &val);

> +       if (ret) {

> +               val = default_val;

> +       } else if (val > max || val < min) {

> +               dev_warn(dev, "Invalid %s: %u, falling back to: %u\n",

> +                        propname, val, default_val);

> +

> +               val = default_val;

> +       }

> +

> +       *propval = val;

> +}

> +

> +static void aspeed_peci_property_setup(struct aspeed_peci *priv)

> +{

> +       aspeed_peci_property_sanitize(priv->dev, "aspeed,clock-divider",

> +                                     0, ASPEED_PECI_CLK_DIV_MAX,

> +                                     ASPEED_PECI_CLK_DIV_DEFAULT, &priv->clk_div);

> +       aspeed_peci_property_sanitize(priv->dev, "aspeed,msg-timing",

> +                                     0, ASPEED_PECI_MSG_TIMING_MAX,

> +                                     ASPEED_PECI_MSG_TIMING_DEFAULT, &priv->msg_timing);

> +       aspeed_peci_property_sanitize(priv->dev, "aspeed,addr-timing",

> +                                     0, ASPEED_PECI_ADDR_TIMING_MAX,

> +                                     ASPEED_PECI_ADDR_TIMING_DEFAULT, &priv->addr_timing);

> +       aspeed_peci_property_sanitize(priv->dev, "aspeed,rd-sampling-point",

> +                                     0, ASPEED_PECI_RD_SAMPLING_POINT_MAX,

> +                                     ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT,

> +                                     &priv->rd_sampling_point);

> +       aspeed_peci_property_sanitize(priv->dev, "cmd-timeout-ms",

> +                                     1, ASPEED_PECI_CMD_TIMEOUT_MS_MAX,

> +                                     ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT, &priv->cmd_timeout_ms);

> +}

> +

> +static struct peci_controller_ops aspeed_ops = {

> +       .xfer = aspeed_peci_xfer,

> +};

> +

> +static void aspeed_peci_reset_control_release(void *data)

> +{

> +       reset_control_assert(data);

> +}

> +

> +int aspeed_peci_reset_control_deassert(struct device *dev, struct reset_control *rst)


I'd recommend naming this devm_aspeed_peci_reset_control_deassert(),
because I came looking here from reading probe for why there was no
reassertion of reset on driver ->remove().

> +{

> +       int ret;

> +

> +       ret = reset_control_deassert(rst);

> +       if (ret)

> +               return ret;

> +

> +       return devm_add_action_or_reset(dev, aspeed_peci_reset_control_release, rst);

> +}

> +

> +static void aspeed_peci_clk_release(void *data)

> +{

> +       clk_disable_unprepare(data);

> +}

> +

> +static int aspeed_peci_clk_enable(struct device *dev, struct clk *clk)


...ditto on the devm prefix, just to speed readability.

> +{

> +       int ret;

> +

> +       ret = clk_prepare_enable(clk);

> +       if (ret)

> +               return ret;

> +

> +       return devm_add_action_or_reset(dev, aspeed_peci_clk_release, clk);

> +}

> +

> +static int aspeed_peci_probe(struct platform_device *pdev)

> +{

> +       struct peci_controller *controller;

> +       struct aspeed_peci *priv;

> +       int ret;

> +

> +       priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);

> +       if (!priv)

> +               return -ENOMEM;

> +

> +       priv->dev = &pdev->dev;

> +       dev_set_drvdata(priv->dev, priv);

> +

> +       priv->base = devm_platform_ioremap_resource(pdev, 0);

> +       if (IS_ERR(priv->base))

> +               return PTR_ERR(priv->base);

> +

> +       priv->irq = platform_get_irq(pdev, 0);

> +       if (!priv->irq)

> +               return priv->irq;

> +

> +       ret = devm_request_irq(&pdev->dev, priv->irq, aspeed_peci_irq_handler,

> +                              0, "peci-aspeed", priv);

> +       if (ret)

> +               return ret;

> +

> +       init_completion(&priv->xfer_complete);

> +       spin_lock_init(&priv->lock);

> +

> +       priv->rst = devm_reset_control_get(&pdev->dev, NULL);

> +       if (IS_ERR(priv->rst))

> +               return dev_err_probe(priv->dev, PTR_ERR(priv->rst),

> +                                    "failed to get reset control\n");

> +

> +       ret = aspeed_peci_reset_control_deassert(priv->dev, priv->rst);

> +       if (ret)

> +               return dev_err_probe(priv->dev, ret, "cannot deassert reset control\n");

> +

> +       priv->clk = devm_clk_get(priv->dev, NULL);

> +       if (IS_ERR(priv->clk))

> +               return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "failed to get clk\n");

> +

> +       ret = aspeed_peci_clk_enable(priv->dev, priv->clk);

> +       if (ret)

> +               return dev_err_probe(priv->dev, ret, "failed to enable clock\n");

> +

> +       aspeed_peci_property_setup(priv);

> +

> +       aspeed_peci_init_regs(priv);

> +

> +       controller = devm_peci_controller_add(priv->dev, &aspeed_ops);

> +       if (IS_ERR(controller))

> +               return dev_err_probe(priv->dev, PTR_ERR(controller),

> +                                    "failed to add aspeed peci controller\n");

> +

> +       priv->controller = controller;

> +

> +       return 0;

> +}

> +

> +static const struct of_device_id aspeed_peci_of_table[] = {

> +       { .compatible = "aspeed,ast2400-peci", },

> +       { .compatible = "aspeed,ast2500-peci", },

> +       { .compatible = "aspeed,ast2600-peci", },

> +       { }

> +};

> +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);

> +

> +static struct platform_driver aspeed_peci_driver = {

> +       .probe  = aspeed_peci_probe,

> +       .driver = {

> +               .name           = "peci-aspeed",

> +               .of_match_table = aspeed_peci_of_table,

> +       },

> +};

> +module_platform_driver(aspeed_peci_driver);

> +

> +MODULE_AUTHOR("Ryan Chen <ryan_chen@aspeedtech.com>");

> +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");

> +MODULE_DESCRIPTION("ASPEED PECI driver");

> +MODULE_LICENSE("GPL");

> +MODULE_IMPORT_NS(PECI);

> --

> 2.31.1

>

Iwona Winiarska Aug. 26, 2021, 10:40 p.m. UTC | #7

On Wed, 2021-08-25 at 15:58 -0700, Dan Williams wrote:
> On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska

> <iwona.winiarska@intel.com> wrote:

> > 

> > Intel processors provide access for various services designed to support

> > processor and DRAM thermal management, platform manageability and

> > processor interface tuning and diagnostics.

> > Those services are available via the Platform Environment Control

> > Interface (PECI) that provides a communication channel between the

> > processor and the Baseboard Management Controller (BMC) or other

> > platform management device.

> > 

> > This change introduces PECI subsystem by adding the initial core module

> > and API for controller drivers.

> > 

> > Co-developed-by: Jason M Bills <jason.m.bills@linux.intel.com>

> > Signed-off-by: Jason M Bills <jason.m.bills@linux.intel.com>

> > Co-developed-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> > Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>

> > ---

> >  MAINTAINERS             |   9 +++

> >  drivers/Kconfig         |   3 +

> >  drivers/Makefile        |   1 +

> >  drivers/peci/Kconfig    |  15 ++++

> >  drivers/peci/Makefile   |   5 ++

> >  drivers/peci/core.c     | 155 ++++++++++++++++++++++++++++++++++++++++

> >  drivers/peci/internal.h |  16 +++++

> >  include/linux/peci.h    |  99 +++++++++++++++++++++++++

> >  8 files changed, 303 insertions(+)

> >  create mode 100644 drivers/peci/Kconfig

> >  create mode 100644 drivers/peci/Makefile

> >  create mode 100644 drivers/peci/core.c

> >  create mode 100644 drivers/peci/internal.h

> >  create mode 100644 include/linux/peci.h

> > 

> > diff --git a/MAINTAINERS b/MAINTAINERS

> > index 7cdab7229651..d411974aaa5e 100644

> > --- a/MAINTAINERS

> > +++ b/MAINTAINERS

> > @@ -14503,6 +14503,15 @@ L:     platform-driver-x86@vger.kernel.org

> >  S:     Maintained

> >  F:     drivers/platform/x86/peaq-wmi.c

> > 

> > +PECI SUBSYSTEM

> > +M:     Iwona Winiarska <iwona.winiarska@intel.com>

> > +R:     Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> > +L:     openbmc@lists.ozlabs.org (moderated for non-subscribers)

> > +S:     Supported

> > +F:     Documentation/devicetree/bindings/peci/

> > +F:     drivers/peci/

> > +F:     include/linux/peci.h

> > +

> >  PENSANDO ETHERNET DRIVERS

> >  M:     Shannon Nelson <snelson@pensando.io>

> >  M:     drivers@pensando.io

> > diff --git a/drivers/Kconfig b/drivers/Kconfig

> > index 8bad63417a50..f472b3d972b3 100644

> > --- a/drivers/Kconfig

> > +++ b/drivers/Kconfig

> > @@ -236,4 +236,7 @@ source "drivers/interconnect/Kconfig"

> >  source "drivers/counter/Kconfig"

> > 

> >  source "drivers/most/Kconfig"

> > +

> > +source "drivers/peci/Kconfig"

> > +

> >  endmenu

> > diff --git a/drivers/Makefile b/drivers/Makefile

> > index 27c018bdf4de..8d96f0c3dde5 100644

> > --- a/drivers/Makefile

> > +++ b/drivers/Makefile

> > @@ -189,3 +189,4 @@ obj-$(CONFIG_GNSS)          += gnss/

> >  obj-$(CONFIG_INTERCONNECT)     += interconnect/

> >  obj-$(CONFIG_COUNTER)          += counter/

> >  obj-$(CONFIG_MOST)             += most/

> > +obj-$(CONFIG_PECI)             += peci/

> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig

> > new file mode 100644

> > index 000000000000..71a4ad81225a

> > --- /dev/null

> > +++ b/drivers/peci/Kconfig

> > @@ -0,0 +1,15 @@

> > +# SPDX-License-Identifier: GPL-2.0-only

> > +

> > +menuconfig PECI

> > +       tristate "PECI support"

> > +       help

> > +         The Platform Environment Control Interface (PECI) is an interface

> > +         that provides a communication channel to Intel processors and

> > +         chipset components from external monitoring or control devices.

> > +

> > +         If you are building a Baseboard Management Controller (BMC) kernel

> > +         for Intel platform say Y here and also to the specific driver for

> > +         your adapter(s) below. If unsure say N.

> > +

> > +         This support is also available as a module. If so, the module

> > +         will be called peci.

> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile

> > new file mode 100644

> > index 000000000000..e789a354e842

> > --- /dev/null

> > +++ b/drivers/peci/Makefile

> > @@ -0,0 +1,5 @@

> > +# SPDX-License-Identifier: GPL-2.0-only

> > +

> > +# Core functionality

> > +peci-y := core.o

> > +obj-$(CONFIG_PECI) += peci.o

> > diff --git a/drivers/peci/core.c b/drivers/peci/core.c

> > new file mode 100644

> > index 000000000000..7b3938af0396

> > --- /dev/null

> > +++ b/drivers/peci/core.c

> > @@ -0,0 +1,155 @@

> > +// SPDX-License-Identifier: GPL-2.0-only

> > +// Copyright (c) 2018-2021 Intel Corporation

> > +

> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

> 

> This looks like overkill for only one print statement in this module,

> especially when the dev_ print helpers offer more detail.


Ok, I'll remove it.

> 

> > +

> > +#include <linux/bug.h>

> > +#include <linux/device.h>

> > +#include <linux/export.h>

> > +#include <linux/idr.h>

> > +#include <linux/module.h>

> > +#include <linux/of.h>

> > +#include <linux/peci.h>

> > +#include <linux/pm_runtime.h>

> > +#include <linux/property.h>

> > +#include <linux/slab.h>

> > +

> > +#include "internal.h"

> > +

> > +static DEFINE_IDA(peci_controller_ida);

> > +

> > +static void peci_controller_dev_release(struct device *dev)

> > +{

> > +       struct peci_controller *controller = to_peci_controller(dev);

> > +

> > +       pm_runtime_disable(&controller->dev);

> 

> This seems late to be disabling power management, the device is about

> to be freed. Keep in mind the lifetime of the this object can be

> artificially prolonged. I expect this to be done when the device is

> unregistered from the bus.


Makes sense.

> 

> > +

> > +       mutex_destroy(&controller->bus_lock);

> > +       ida_free(&peci_controller_ida, controller->id);

> > +       fwnode_handle_put(controller->dev.fwnode);

> 

> Shouldn't the get / put of this handle reference be bound to specific

> accesses not held for the entire lifetime of the object? At a minimum

> it seems to be a reference that can taken at registration and dropped

> at unregistration.


I'll move it to take ref at registration and to drop it at unregistration.

> 

> > +       kfree(controller);

> > +}

> > +

> > +struct device_type peci_controller_type = {

> > +       .release        = peci_controller_dev_release,

> > +};

> > +

> > +static struct peci_controller *peci_controller_alloc(struct device *dev,

> > +                                                    struct

> > peci_controller_ops *ops)

> > +{

> > +       struct fwnode_handle *node = fwnode_handle_get(dev_fwnode(dev));

> > +       struct peci_controller *controller;

> > +       int ret;

> > +

> > +       if (!ops->xfer)

> > +               return ERR_PTR(-EINVAL);

> > +

> > +       controller = kzalloc(sizeof(*controller), GFP_KERNEL);

> > +       if (!controller)

> > +               return ERR_PTR(-ENOMEM);

> > +

> > +       ret = ida_alloc_max(&peci_controller_ida, U8_MAX, GFP_KERNEL);

> > +       if (ret < 0)

> > +               goto err;

> > +       controller->id = ret;

> > +

> > +       controller->ops = ops;

> > +

> > +       controller->dev.parent = dev;

> > +       controller->dev.bus = &peci_bus_type;

> > +       controller->dev.type = &peci_controller_type;

> > +       controller->dev.fwnode = node;

> > +       controller->dev.of_node = to_of_node(node);

> > +

> > +       device_initialize(&controller->dev);

> > +

> > +       mutex_init(&controller->bus_lock);

> > +

> > +       pm_runtime_no_callbacks(&controller->dev);

> > +       pm_suspend_ignore_children(&controller->dev, true);

> > +       pm_runtime_enable(&controller->dev);

> 

> Per above, are you sure unregistered devices need pm_runtime enabled?

> 

> Rest looks ok to me.


Thanks
-Iwona

Iwona Winiarska Aug. 26, 2021, 11:54 p.m. UTC | #8

On Wed, 2021-08-25 at 18:35 -0700, Dan Williams wrote:
> On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska
> <iwona.winiarska@intel.com> wrote:
> > 
> > From: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > 
> > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical
> > interface (a.k.a PECI wire).
> 
> Maybe a one sentence blurb here and in the Kconfig reminding people
> why they should care if they have a PECI driver or not?

Ok, I'll expand it a bit.

> 
> > 
> > Signed-off-by: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > Co-developed-by: Iwona Winiarska <iwona.winiarska@intel.com>
> > Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>
> > Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> > ---
> >  MAINTAINERS                           |   9 +
> >  drivers/peci/Kconfig                  |   6 +
> >  drivers/peci/Makefile                 |   3 +
> >  drivers/peci/controller/Kconfig       |  16 +
> >  drivers/peci/controller/Makefile      |   3 +
> >  drivers/peci/controller/peci-aspeed.c | 445 ++++++++++++++++++++++++++
> >  6 files changed, 482 insertions(+)
> >  create mode 100644 drivers/peci/controller/Kconfig
> >  create mode 100644 drivers/peci/controller/Makefile
> >  create mode 100644 drivers/peci/controller/peci-aspeed.c
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index d411974aaa5e..6e9d53ff68ab 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2866,6 +2866,15 @@ S:       Maintained
> >  F:     Documentation/hwmon/asc7621.rst
> >  F:     drivers/hwmon/asc7621.c
> > 
> > +ASPEED PECI CONTROLLER
> > +M:     Iwona Winiarska <iwona.winiarska@intel.com>
> > +M:     Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>
> > +L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
> > +L:     openbmc@lists.ozlabs.org (moderated for non-subscribers)
> > +S:     Supported
> > +F:     Documentation/devicetree/bindings/peci/peci-aspeed.yaml
> > +F:     drivers/peci/controller/peci-aspeed.c
> > +
> >  ASPEED PINCTRL DRIVERS
> >  M:     Andrew Jeffery <andrew@aj.id.au>
> >  L:     linux-aspeed@lists.ozlabs.org (moderated for non-subscribers)
> > diff --git a/drivers/peci/Kconfig b/drivers/peci/Kconfig
> > index 71a4ad81225a..99279df97a78 100644
> > --- a/drivers/peci/Kconfig
> > +++ b/drivers/peci/Kconfig
> > @@ -13,3 +13,9 @@ menuconfig PECI
> > 
> >           This support is also available as a module. If so, the module
> >           will be called peci.
> > +
> > +if PECI
> > +
> > +source "drivers/peci/controller/Kconfig"
> > +
> > +endif # PECI
> > diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile
> > index e789a354e842..926d8df15cbd 100644
> > --- a/drivers/peci/Makefile
> > +++ b/drivers/peci/Makefile
> > @@ -3,3 +3,6 @@
> >  # Core functionality
> >  peci-y := core.o
> >  obj-$(CONFIG_PECI) += peci.o
> > +
> > +# Hardware specific bus drivers
> > +obj-y += controller/
> > diff --git a/drivers/peci/controller/Kconfig
> > b/drivers/peci/controller/Kconfig
> > new file mode 100644
> > index 000000000000..6d48df08db1c
> > --- /dev/null
> > +++ b/drivers/peci/controller/Kconfig
> > @@ -0,0 +1,16 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +config PECI_ASPEED
> > +       tristate "ASPEED PECI support"
> > +       depends on ARCH_ASPEED || COMPILE_TEST
> > +       depends on OF
> > +       depends on HAS_IOMEM
> > +       help
> > +         This option enables PECI controller driver for ASPEED AST2400,
> > +         AST2500 and AST2600 SoCs.
> > +
> > +         Say Y here if your system runs on ASPEED SoC and you are using it
> > +         as BMC for Intel platform.
> > +
> > +         This driver can also be built as a module. If so, the module will
> > +         be called peci-aspeed.
> > diff --git a/drivers/peci/controller/Makefile
> > b/drivers/peci/controller/Makefile
> > new file mode 100644
> > index 000000000000..022c28ef1bf0
> > --- /dev/null
> > +++ b/drivers/peci/controller/Makefile
> > @@ -0,0 +1,3 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +
> > +obj-$(CONFIG_PECI_ASPEED)      += peci-aspeed.o
> > diff --git a/drivers/peci/controller/peci-aspeed.c
> > b/drivers/peci/controller/peci-aspeed.c
> > new file mode 100644
> > index 000000000000..1d708c983749
> > --- /dev/null
> > +++ b/drivers/peci/controller/peci-aspeed.c
> > @@ -0,0 +1,445 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +// Copyright (C) 2012-2017 ASPEED Technology Inc.
> > +// Copyright (c) 2018-2021 Intel Corporation
> 
> Why different copyright capitalization?

I'll make them consistent.

> 
> > +
> > +#include <linux/bitfield.h>
> > +#include <linux/clk.h>
> > +#include <linux/delay.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/io.h>
> > +#include <linux/iopoll.h>
> > +#include <linux/jiffies.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/peci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/reset.h>
> > +
> > +#include <asm/unaligned.h>
> 
> Why is this included?

Leftover - I'll remove it.

> 
> > +
> > +/* ASPEED PECI Registers */
> > +/* Control Register */
> > +#define ASPEED_PECI_CTRL                       0x00
> > +#define   ASPEED_PECI_CTRL_SAMPLING_MASK       GENMASK(19, 16)
> > +#define   ASPEED_PECI_CTRL_RD_MODE_MASK                GENMASK(13, 12)
> > +#define     ASPEED_PECI_CTRL_RD_MODE_DBG       BIT(13)
> > +#define     ASPEED_PECI_CTRL_RD_MODE_COUNT     BIT(12)
> > +#define   ASPEED_PECI_CTRL_CLK_SOURCE          BIT(11)
> > +#define   ASPEED_PECI_CTRL_CLK_DIV_MASK                GENMASK(10, 8)
> > +#define   ASPEED_PECI_CTRL_INVERT_OUT          BIT(7)
> > +#define   ASPEED_PECI_CTRL_INVERT_IN           BIT(6)
> > +#define   ASPEED_PECI_CTRL_BUS_CONTENTION_EN   BIT(5)
> > +#define   ASPEED_PECI_CTRL_PECI_EN             BIT(4)
> > +#define   ASPEED_PECI_CTRL_PECI_CLK_EN         BIT(0)
> > +
> > +/* Timing Negotiation Register */
> > +#define ASPEED_PECI_TIMING_NEGOTIATION         0x04
> > +#define   ASPEED_PECI_T_NEGO_MSG_MASK          GENMASK(15, 8)
> > +#define   ASPEED_PECI_T_NEGO_ADDR_MASK         GENMASK(7, 0)
> > +
> > +/* Command Register */
> > +#define ASPEED_PECI_CMD                                0x08
> > +#define   ASPEED_PECI_CMD_PIN_MONITORING       BIT(31)
> > +#define   ASPEED_PECI_CMD_STS_MASK             GENMASK(27, 24)
> > +#define     ASPEED_PECI_CMD_STS_ADDR_T_NEGO    0x3
> > +#define   ASPEED_PECI_CMD_IDLE_MASK            \
> > +         (ASPEED_PECI_CMD_STS_MASK | ASPEED_PECI_CMD_PIN_MONITORING)
> > +#define   ASPEED_PECI_CMD_FIRE                 BIT(0)
> > +
> > +/* Read/Write Length Register */
> > +#define ASPEED_PECI_RW_LENGTH                  0x0c
> > +#define   ASPEED_PECI_AW_FCS_EN                        BIT(31)
> > +#define   ASPEED_PECI_RD_LEN_MASK              GENMASK(23, 16)
> > +#define   ASPEED_PECI_WR_LEN_MASK              GENMASK(15, 8)
> > +#define   ASPEED_PECI_TARGET_ADDR_MASK         GENMASK(7, 0)
> > +
> > +/* Expected FCS Data Register */
> > +#define ASPEED_PECI_EXPECTED_FCS               0x10
> > +#define   ASPEED_PECI_EXPECTED_RD_FCS_MASK     GENMASK(23, 16)
> > +#define   ASPEED_PECI_EXPECTED_AW_FCS_AUTO_MASK        GENMASK(15, 8)
> > +#define   ASPEED_PECI_EXPECTED_WR_FCS_MASK     GENMASK(7, 0)
> > +
> > +/* Captured FCS Data Register */
> > +#define ASPEED_PECI_CAPTURED_FCS               0x14
> > +#define   ASPEED_PECI_CAPTURED_RD_FCS_MASK     GENMASK(23, 16)
> > +#define   ASPEED_PECI_CAPTURED_WR_FCS_MASK     GENMASK(7, 0)
> > +
> > +/* Interrupt Register */
> > +#define ASPEED_PECI_INT_CTRL                   0x18
> > +#define   ASPEED_PECI_TIMING_NEGO_SEL_MASK     GENMASK(31, 30)
> > +#define     ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO   0
> > +#define     ASPEED_PECI_2ND_BIT_OF_ADDR_NEGO   1
> > +#define     ASPEED_PECI_MESSAGE_NEGO           2
> > +#define   ASPEED_PECI_INT_MASK                 GENMASK(4, 0)
> > +#define     ASPEED_PECI_INT_BUS_TIMEOUT                BIT(4)
> > +#define     ASPEED_PECI_INT_BUS_CONTENTION     BIT(3)
> > +#define     ASPEED_PECI_INT_WR_FCS_BAD         BIT(2)
> > +#define     ASPEED_PECI_INT_WR_FCS_ABORT       BIT(1)
> > +#define     ASPEED_PECI_INT_CMD_DONE           BIT(0)
> > +
> > +/* Interrupt Status Register */
> > +#define ASPEED_PECI_INT_STS                    0x1c
> > +#define   ASPEED_PECI_INT_TIMING_RESULT_MASK   GENMASK(29, 16)
> > +         /* bits[4..0]: Same bit fields in the 'Interrupt Register' */
> > +
> > +/* Rx/Tx Data Buffer Registers */
> > +#define ASPEED_PECI_WR_DATA0                   0x20
> > +#define ASPEED_PECI_WR_DATA1                   0x24
> > +#define ASPEED_PECI_WR_DATA2                   0x28
> > +#define ASPEED_PECI_WR_DATA3                   0x2c
> > +#define ASPEED_PECI_RD_DATA0                   0x30
> > +#define ASPEED_PECI_RD_DATA1                   0x34
> > +#define ASPEED_PECI_RD_DATA2                   0x38
> > +#define ASPEED_PECI_RD_DATA3                   0x3c
> > +#define ASPEED_PECI_WR_DATA4                   0x40
> > +#define ASPEED_PECI_WR_DATA5                   0x44
> > +#define ASPEED_PECI_WR_DATA6                   0x48
> > +#define ASPEED_PECI_WR_DATA7                   0x4c
> > +#define ASPEED_PECI_RD_DATA4                   0x50
> > +#define ASPEED_PECI_RD_DATA5                   0x54
> > +#define ASPEED_PECI_RD_DATA6                   0x58
> > +#define ASPEED_PECI_RD_DATA7                   0x5c
> > +#define   ASPEED_PECI_DATA_BUF_SIZE_MAX                32
> > +
> > +/* Timing Negotiation */
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT  8
> > +#define ASPEED_PECI_RD_SAMPLING_POINT_MAX      (BIT(4) - 1)
> > +#define ASPEED_PECI_CLK_DIV_DEFAULT            0
> > +#define ASPEED_PECI_CLK_DIV_MAX                        (BIT(3) - 1)
> > +#define ASPEED_PECI_MSG_TIMING_DEFAULT         1
> > +#define ASPEED_PECI_MSG_TIMING_MAX             (BIT(8) - 1)
> > +#define ASPEED_PECI_ADDR_TIMING_DEFAULT                1
> > +#define ASPEED_PECI_ADDR_TIMING_MAX            (BIT(8) - 1)
> > +
> > +/* Timeout */
> > +#define ASPEED_PECI_IDLE_CHECK_TIMEOUT_US      (50 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_IDLE_CHECK_INTERVAL_US     (10 * USEC_PER_MSEC)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT     (1000)
> > +#define ASPEED_PECI_CMD_TIMEOUT_MS_MAX         (1000)
> > +
> > +struct aspeed_peci {
> > +       struct peci_controller *controller;
> > +       struct device *dev;
> > +       void __iomem *base;
> > +       struct clk *clk;
> > +       struct reset_control *rst;
> > +       int irq;
> > +       spinlock_t lock; /* to sync completion status handling */
> > +       struct completion xfer_complete;
> > +       u32 status;
> > +       u32 cmd_timeout_ms;
> > +       u32 msg_timing;
> > +       u32 addr_timing;
> > +       u32 rd_sampling_point;
> > +       u32 clk_div;
> > +};
> > +
> > +static void aspeed_peci_init_regs(struct aspeed_peci *priv)
> > +{
> > +       u32 val;
> > +
> > +       val = FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK,
> > ASPEED_PECI_CLK_DIV_DEFAULT);
> > +       val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > +       writel(val, priv->base + ASPEED_PECI_CTRL);
> > +       /*
> > +        * Timing negotiation period setting.
> > +        * The unit of the programmed value is 4 times of PECI clock period.
> > +        */
> > +       val = FIELD_PREP(ASPEED_PECI_T_NEGO_MSG_MASK, priv->msg_timing);
> > +       val |= FIELD_PREP(ASPEED_PECI_T_NEGO_ADDR_MASK, priv->addr_timing);
> > +       writel(val, priv->base + ASPEED_PECI_TIMING_NEGOTIATION);
> > +
> > +       /* Clear interrupts */
> > +       val = readl(priv->base + ASPEED_PECI_INT_STS) |
> > ASPEED_PECI_INT_MASK;
> > +       writel(val, priv->base + ASPEED_PECI_INT_STS);
> > +
> > +       /* Set timing negotiation mode and enable interrupts */
> > +       val = FIELD_PREP(ASPEED_PECI_TIMING_NEGO_SEL_MASK,
> > ASPEED_PECI_1ST_BIT_OF_ADDR_NEGO);
> > +       val |= ASPEED_PECI_INT_MASK;
> > +       writel(val, priv->base + ASPEED_PECI_INT_CTRL);
> > +
> > +       val = FIELD_PREP(ASPEED_PECI_CTRL_SAMPLING_MASK, priv-
> > >rd_sampling_point);
> > +       val |= FIELD_PREP(ASPEED_PECI_CTRL_CLK_DIV_MASK, priv->clk_div);
> > +       val |= ASPEED_PECI_CTRL_PECI_EN;
> > +       val |= ASPEED_PECI_CTRL_PECI_CLK_EN;
> > +       writel(val, priv->base + ASPEED_PECI_CTRL);
> > +}
> > +
> > +static inline int aspeed_peci_check_idle(struct aspeed_peci *priv)
> > +{
> > +       u32 cmd_sts = readl(priv->base + ASPEED_PECI_CMD);
> > +
> > +       if (FIELD_GET(ASPEED_PECI_CMD_STS_MASK, cmd_sts) ==
> > ASPEED_PECI_CMD_STS_ADDR_T_NEGO)
> > +               aspeed_peci_init_regs(priv);
> > +
> > +       return readl_poll_timeout(priv->base + ASPEED_PECI_CMD,
> > +                                 cmd_sts,
> > +                                 !(cmd_sts & ASPEED_PECI_CMD_IDLE_MASK),
> > +                                 ASPEED_PECI_IDLE_CHECK_INTERVAL_US,
> > +                                 ASPEED_PECI_IDLE_CHECK_TIMEOUT_US);
> > +}
> > +
> > +static int aspeed_peci_xfer(struct peci_controller *controller,
> > +                           u8 addr, struct peci_request *req)
> > +{
> > +       struct aspeed_peci *priv = dev_get_drvdata(controller->dev.parent);
> > +       unsigned long flags, timeout = msecs_to_jiffies(priv-
> > >cmd_timeout_ms);
> > +       u32 peci_head;
> > +       int ret;
> > +
> > +       if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||
> > +           req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)
> > +               return -EINVAL;
> > +
> > +       /* Check command sts and bus idle state */
> > +       ret = aspeed_peci_check_idle(priv);
> > +       if (ret)
> > +               return ret; /* -ETIMEDOUT */
> > +
> > +       spin_lock_irqsave(&priv->lock, flags);
> > +       reinit_completion(&priv->xfer_complete);
> > +
> > +       peci_head = FIELD_PREP(ASPEED_PECI_TARGET_ADDR_MASK, addr) |
> > +                   FIELD_PREP(ASPEED_PECI_WR_LEN_MASK, req->tx.len) |
> > +                   FIELD_PREP(ASPEED_PECI_RD_LEN_MASK, req->rx.len);
> > +
> > +       writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);
> > +
> > +       memcpy_toio(priv->base + ASPEED_PECI_WR_DATA0, req->tx.buf,
> > min_t(u8, req->tx.len, 16));
> > +       if (req->tx.len > 16)
> > +               memcpy_toio(priv->base + ASPEED_PECI_WR_DATA4, req->tx.buf +
> > 16,
> > +                           req->tx.len - 16);
> > +
> > +       dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);
> > +       print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req-
> > >tx.len);
> 
> On CONFIG_DYNAMIC_DEBUG=n builds the kernel will do all the work of
> reading through this buffer, but skip emitting it. Are you sure you
> want to pay that overhead for every transaction?

I can remove it or I can add something like:

#if IS_ENABLED(CONFIG_PECI_DEBUG)
#define peci_debug(fmt, ...) pr_debug(fmt, ##__VA_ARGS__)
#else
#define peci_debug(...) do { } while (0)
#endif

(and similar peci_trace with trace_printk for usage in IRQ handlers and such).

What do you think?

> 
> > +
> > +       priv->status = 0;
> > +       writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);
> > +       spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > +       ret = wait_for_completion_interruptible_timeout(&priv-
> > >xfer_complete, timeout);
> 
> spin_lock_irqsave() says "I don't know if interrupts are disabled
> already, so I'll save the state, whatever it is, and restore later"
> 
> wait_for_completion_interruptible_timeout() says "I know I am in a
> sleepable context where interrupts are enabled"
> 
> So, one of those is wrong, i.e. should it be spin_{lock,unlock}_irq()?

You're right - I'll fix it.

> 
> 
> > +       if (ret < 0)
> > +               return ret;
> > +
> > +       if (ret == 0) {
> > +               dev_dbg(priv->dev, "Timeout waiting for a response!\n");
> > +               return -ETIMEDOUT;
> > +       }
> > +
> > +       spin_lock_irqsave(&priv->lock, flags);
> > +
> > +       writel(0, priv->base + ASPEED_PECI_CMD);
> > +
> > +       if (priv->status != ASPEED_PECI_INT_CMD_DONE) {
> > +               spin_unlock_irqrestore(&priv->lock, flags);
> > +               dev_dbg(priv->dev, "No valid response!\n");
> > +               return -EIO;
> > +       }
> > +
> > +       spin_unlock_irqrestore(&priv->lock, flags);
> > +
> > +       memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_RD_DATA0,
> > min_t(u8, req->rx.len, 16));
> > +       if (req->rx.len > 16)
> > +               memcpy_fromio(req->rx.buf + 16, priv->base +
> > ASPEED_PECI_RD_DATA4,
> > +                             req->rx.len - 16);
> > +
> > +       print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req-
> > >rx.len);
> > +
> > +       return 0;
> > +}
> > +
> > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)
> > +{
> > +       struct aspeed_peci *priv = arg;
> > +       u32 status;
> > +
> > +       spin_lock(&priv->lock);
> > +       status = readl(priv->base + ASPEED_PECI_INT_STS);
> > +       writel(status, priv->base + ASPEED_PECI_INT_STS);
> > +       priv->status |= (status & ASPEED_PECI_INT_MASK);
> > +
> > +       /*
> > +        * In most cases, interrupt bits will be set one by one but also
> > note
> > +        * that multiple interrupt bits could be set at the same time.
> > +        */
> > +       if (status & ASPEED_PECI_INT_BUS_TIMEOUT)
> > +               dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_BUS_TIMEOUT\n");
> > +
> > +       if (status & ASPEED_PECI_INT_BUS_CONTENTION)
> > +               dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_BUS_CONTENTION\n");
> > +
> > +       if (status & ASPEED_PECI_INT_WR_FCS_BAD)
> > +               dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_WR_FCS_BAD\n");
> > +
> > +       if (status & ASPEED_PECI_INT_WR_FCS_ABORT)
> > +               dev_dbg_ratelimited(priv->dev,
> > "ASPEED_PECI_INT_WR_FCS_ABORT\n");
> 
> Are you sure these would not be better as tracepoints? If you're
> debugging an interrupt related failure, the ratelimiting might get in
> your way when you really need to know when one of these error
> interrupts fire relative to another event.

Tracepoints are ABI(ish), and using a full blown tracepoint just for IRQ status
would probably be too much.
I was thinking about something like trace_printk hidden under a
"CONFIG_PECI_DEBUG" (see above), but perhaps that's something for the future
improvement?

> 
> > +
> > +       /*
> > +        * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE
> > bit
> > +        * set even in an error case.
> > +        */
> > +       if (status & ASPEED_PECI_INT_CMD_DONE)
> > +               complete(&priv->xfer_complete);
> 
> Hmm, no need to check if there was a sequencing error, like a command
> was never submitted?

It's handled by checking if HW is idle in xfer before a command is sent, where
we just expect a single interrupt per command.

> 
> > +
> > +       spin_unlock(&priv->lock);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> > +static void aspeed_peci_property_sanitize(struct device *dev, const char
> > *propname,
> > +                                         u32 min, u32 max, u32 default_val,
> > u32 *propval)
> > +{
> > +       u32 val;
> > +       int ret;
> > +
> > +       ret = device_property_read_u32(dev, propname, &val);
> > +       if (ret) {
> > +               val = default_val;
> > +       } else if (val > max || val < min) {
> > +               dev_warn(dev, "Invalid %s: %u, falling back to: %u\n",
> > +                        propname, val, default_val);
> > +
> > +               val = default_val;
> > +       }
> > +
> > +       *propval = val;
> > +}
> > +
> > +static void aspeed_peci_property_setup(struct aspeed_peci *priv)
> > +{
> > +       aspeed_peci_property_sanitize(priv->dev, "aspeed,clock-divider",
> > +                                     0, ASPEED_PECI_CLK_DIV_MAX,
> > +                                     ASPEED_PECI_CLK_DIV_DEFAULT, &priv-
> > >clk_div);
> > +       aspeed_peci_property_sanitize(priv->dev, "aspeed,msg-timing",
> > +                                     0, ASPEED_PECI_MSG_TIMING_MAX,
> > +                                     ASPEED_PECI_MSG_TIMING_DEFAULT, &priv-
> > >msg_timing);
> > +       aspeed_peci_property_sanitize(priv->dev, "aspeed,addr-timing",
> > +                                     0, ASPEED_PECI_ADDR_TIMING_MAX,
> > +                                     ASPEED_PECI_ADDR_TIMING_DEFAULT,
> > &priv->addr_timing);
> > +       aspeed_peci_property_sanitize(priv->dev, "aspeed,rd-sampling-point",
> > +                                     0, ASPEED_PECI_RD_SAMPLING_POINT_MAX,
> > +                                     ASPEED_PECI_RD_SAMPLING_POINT_DEFAULT,
> > +                                     &priv->rd_sampling_point);
> > +       aspeed_peci_property_sanitize(priv->dev, "cmd-timeout-ms",
> > +                                     1, ASPEED_PECI_CMD_TIMEOUT_MS_MAX,
> > +                                     ASPEED_PECI_CMD_TIMEOUT_MS_DEFAULT,
> > &priv->cmd_timeout_ms);
> > +}
> > +
> > +static struct peci_controller_ops aspeed_ops = {
> > +       .xfer = aspeed_peci_xfer,
> > +};
> > +
> > +static void aspeed_peci_reset_control_release(void *data)
> > +{
> > +       reset_control_assert(data);
> > +}
> > +
> > +int aspeed_peci_reset_control_deassert(struct device *dev, struct
> > reset_control *rst)
> 
> I'd recommend naming this devm_aspeed_peci_reset_control_deassert(),
> because I came looking here from reading probe for why there was no
> reassertion of reset on driver ->remove().

Ok.

> 
> > +{
> > +       int ret;
> > +
> > +       ret = reset_control_deassert(rst);
> > +       if (ret)
> > +               return ret;
> > +
> > +       return devm_add_action_or_reset(dev,
> > aspeed_peci_reset_control_release, rst);
> > +}
> > +
> > +static void aspeed_peci_clk_release(void *data)
> > +{
> > +       clk_disable_unprepare(data);
> > +}
> > +
> > +static int aspeed_peci_clk_enable(struct device *dev, struct clk *clk)
> 
> ...ditto on the devm prefix, just to speed readability.

Ok.

Thanks
-Iwona

> 
> > +{
> > +       int ret;
> > +
> > +       ret = clk_prepare_enable(clk);
> > +       if (ret)
> > +               return ret;
> > +
> > +       return devm_add_action_or_reset(dev, aspeed_peci_clk_release, clk);
> > +}
> > +
> > +static int aspeed_peci_probe(struct platform_device *pdev)
> > +{
> > +       struct peci_controller *controller;
> > +       struct aspeed_peci *priv;
> > +       int ret;
> > +
> > +       priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return -ENOMEM;
> > +
> > +       priv->dev = &pdev->dev;
> > +       dev_set_drvdata(priv->dev, priv);
> > +
> > +       priv->base = devm_platform_ioremap_resource(pdev, 0);
> > +       if (IS_ERR(priv->base))
> > +               return PTR_ERR(priv->base);
> > +
> > +       priv->irq = platform_get_irq(pdev, 0);
> > +       if (!priv->irq)
> > +               return priv->irq;
> > +
> > +       ret = devm_request_irq(&pdev->dev, priv->irq,
> > aspeed_peci_irq_handler,
> > +                              0, "peci-aspeed", priv);
> > +       if (ret)
> > +               return ret;
> > +
> > +       init_completion(&priv->xfer_complete);
> > +       spin_lock_init(&priv->lock);
> > +
> > +       priv->rst = devm_reset_control_get(&pdev->dev, NULL);
> > +       if (IS_ERR(priv->rst))
> > +               return dev_err_probe(priv->dev, PTR_ERR(priv->rst),
> > +                                    "failed to get reset control\n");
> > +
> > +       ret = aspeed_peci_reset_control_deassert(priv->dev, priv->rst);
> > +       if (ret)
> > +               return dev_err_probe(priv->dev, ret, "cannot deassert reset
> > control\n");
> > +
> > +       priv->clk = devm_clk_get(priv->dev, NULL);
> > +       if (IS_ERR(priv->clk))
> > +               return dev_err_probe(priv->dev, PTR_ERR(priv->clk), "failed
> > to get clk\n");
> > +
> > +       ret = aspeed_peci_clk_enable(priv->dev, priv->clk);
> > +       if (ret)
> > +               return dev_err_probe(priv->dev, ret, "failed to enable
> > clock\n");
> > +
> > +       aspeed_peci_property_setup(priv);
> > +
> > +       aspeed_peci_init_regs(priv);
> > +
> > +       controller = devm_peci_controller_add(priv->dev, &aspeed_ops);
> > +       if (IS_ERR(controller))
> > +               return dev_err_probe(priv->dev, PTR_ERR(controller),
> > +                                    "failed to add aspeed peci
> > controller\n");
> > +
> > +       priv->controller = controller;
> > +
> > +       return 0;
> > +}
> > +
> > +static const struct of_device_id aspeed_peci_of_table[] = {
> > +       { .compatible = "aspeed,ast2400-peci", },
> > +       { .compatible = "aspeed,ast2500-peci", },
> > +       { .compatible = "aspeed,ast2600-peci", },
> > +       { }
> > +};
> > +MODULE_DEVICE_TABLE(of, aspeed_peci_of_table);
> > +
> > +static struct platform_driver aspeed_peci_driver = {
> > +       .probe  = aspeed_peci_probe,
> > +       .driver = {
> > +               .name           = "peci-aspeed",
> > +               .of_match_table = aspeed_peci_of_table,
> > +       },
> > +};
> > +module_platform_driver(aspeed_peci_driver);
> > +
> > +MODULE_AUTHOR("Ryan Chen <ryan_chen@aspeedtech.com>");
> > +MODULE_AUTHOR("Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>");
> > +MODULE_DESCRIPTION("ASPEED PECI driver");
> > +MODULE_LICENSE("GPL");
> > +MODULE_IMPORT_NS(PECI);
> > --
> > 2.31.1
> >

Dan Williams Aug. 27, 2021, 4:24 p.m. UTC | #9

On Thu, Aug 26, 2021 at 4:55 PM Winiarska, Iwona
<iwona.winiarska@intel.com> wrote:
>

> On Wed, 2021-08-25 at 18:35 -0700, Dan Williams wrote:

> > On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska

> > <iwona.winiarska@intel.com> wrote:

> > >

> > > From: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> > >

> > > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical

> > > interface (a.k.a PECI wire).

> >

> > Maybe a one sentence blurb here and in the Kconfig reminding people

> > why they should care if they have a PECI driver or not?

>

> Ok, I'll expand it a bit.

[..]
> > > +static int aspeed_peci_xfer(struct peci_controller *controller,

> > > +                           u8 addr, struct peci_request *req)

> > > +{

> > > +       struct aspeed_peci *priv = dev_get_drvdata(controller->dev.parent);

> > > +       unsigned long flags, timeout = msecs_to_jiffies(priv-

> > > >cmd_timeout_ms);

> > > +       u32 peci_head;

> > > +       int ret;

> > > +

> > > +       if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||

> > > +           req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)

> > > +               return -EINVAL;

> > > +

> > > +       /* Check command sts and bus idle state */

> > > +       ret = aspeed_peci_check_idle(priv);

> > > +       if (ret)

> > > +               return ret; /* -ETIMEDOUT */

> > > +

> > > +       spin_lock_irqsave(&priv->lock, flags);

> > > +       reinit_completion(&priv->xfer_complete);

> > > +

> > > +       peci_head = FIELD_PREP(ASPEED_PECI_TARGET_ADDR_MASK, addr) |

> > > +                   FIELD_PREP(ASPEED_PECI_WR_LEN_MASK, req->tx.len) |

> > > +                   FIELD_PREP(ASPEED_PECI_RD_LEN_MASK, req->rx.len);

> > > +

> > > +       writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);

> > > +

> > > +       memcpy_toio(priv->base + ASPEED_PECI_WR_DATA0, req->tx.buf,

> > > min_t(u8, req->tx.len, 16));

> > > +       if (req->tx.len > 16)

> > > +               memcpy_toio(priv->base + ASPEED_PECI_WR_DATA4, req->tx.buf +

> > > 16,

> > > +                           req->tx.len - 16);

> > > +

> > > +       dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);

> > > +       print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf, req-

> > > >tx.len);

> >

> > On CONFIG_DYNAMIC_DEBUG=n builds the kernel will do all the work of

> > reading through this buffer, but skip emitting it. Are you sure you

> > want to pay that overhead for every transaction?

>

> I can remove it or I can add something like:

>

> #if IS_ENABLED(CONFIG_PECI_DEBUG)

> #define peci_debug(fmt, ...) pr_debug(fmt, ##__VA_ARGS__)

> #else

> #define peci_debug(...) do { } while (0)

> #endif


It's the hex dump I'm worried about, not the debug statements as much.

I think the choices are remove the print_hex_dump_bytes(), put it
behind an IS_ENABLED(CONFIG_DYNAMIC_DEBUG) to ensure the overhead is
skipped in the CONFIG_DYNAMIC_DEBUG=n case, or live with the overhead
if this is not a fast path / infrequently used.

>

> (and similar peci_trace with trace_printk for usage in IRQ handlers and such).

>

> What do you think?


In general, no, don't wrap the base level print routines with
driver-specific ones. Also, trace_printk() is only for debug builds.
Note that trace points are built to be even less overhead than
dev_dbg(), so there's no overhead concern with disabled tracepoints,
they literally translate to nops when disabled.

>

> >

> > > +

> > > +       priv->status = 0;

> > > +       writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);

> > > +       spin_unlock_irqrestore(&priv->lock, flags);

> > > +

> > > +       ret = wait_for_completion_interruptible_timeout(&priv-

> > > >xfer_complete, timeout);

> >

> > spin_lock_irqsave() says "I don't know if interrupts are disabled

> > already, so I'll save the state, whatever it is, and restore later"

> >

> > wait_for_completion_interruptible_timeout() says "I know I am in a

> > sleepable context where interrupts are enabled"

> >

> > So, one of those is wrong, i.e. should it be spin_{lock,unlock}_irq()?

>

> You're right - I'll fix it.

>

> >

> >

> > > +       if (ret < 0)

> > > +               return ret;

> > > +

> > > +       if (ret == 0) {

> > > +               dev_dbg(priv->dev, "Timeout waiting for a response!\n");

> > > +               return -ETIMEDOUT;

> > > +       }

> > > +

> > > +       spin_lock_irqsave(&priv->lock, flags);

> > > +

> > > +       writel(0, priv->base + ASPEED_PECI_CMD);

> > > +

> > > +       if (priv->status != ASPEED_PECI_INT_CMD_DONE) {

> > > +               spin_unlock_irqrestore(&priv->lock, flags);

> > > +               dev_dbg(priv->dev, "No valid response!\n");

> > > +               return -EIO;

> > > +       }

> > > +

> > > +       spin_unlock_irqrestore(&priv->lock, flags);

> > > +

> > > +       memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_RD_DATA0,

> > > min_t(u8, req->rx.len, 16));

> > > +       if (req->rx.len > 16)

> > > +               memcpy_fromio(req->rx.buf + 16, priv->base +

> > > ASPEED_PECI_RD_DATA4,

> > > +                             req->rx.len - 16);

> > > +

> > > +       print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf, req-

> > > >rx.len);

> > > +

> > > +       return 0;

> > > +}

> > > +

> > > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)

> > > +{

> > > +       struct aspeed_peci *priv = arg;

> > > +       u32 status;

> > > +

> > > +       spin_lock(&priv->lock);

> > > +       status = readl(priv->base + ASPEED_PECI_INT_STS);

> > > +       writel(status, priv->base + ASPEED_PECI_INT_STS);

> > > +       priv->status |= (status & ASPEED_PECI_INT_MASK);

> > > +

> > > +       /*

> > > +        * In most cases, interrupt bits will be set one by one but also

> > > note

> > > +        * that multiple interrupt bits could be set at the same time.

> > > +        */

> > > +       if (status & ASPEED_PECI_INT_BUS_TIMEOUT)

> > > +               dev_dbg_ratelimited(priv->dev,

> > > "ASPEED_PECI_INT_BUS_TIMEOUT\n");

> > > +

> > > +       if (status & ASPEED_PECI_INT_BUS_CONTENTION)

> > > +               dev_dbg_ratelimited(priv->dev,

> > > "ASPEED_PECI_INT_BUS_CONTENTION\n");

> > > +

> > > +       if (status & ASPEED_PECI_INT_WR_FCS_BAD)

> > > +               dev_dbg_ratelimited(priv->dev,

> > > "ASPEED_PECI_INT_WR_FCS_BAD\n");

> > > +

> > > +       if (status & ASPEED_PECI_INT_WR_FCS_ABORT)

> > > +               dev_dbg_ratelimited(priv->dev,

> > > "ASPEED_PECI_INT_WR_FCS_ABORT\n");

> >

> > Are you sure these would not be better as tracepoints? If you're

> > debugging an interrupt related failure, the ratelimiting might get in

> > your way when you really need to know when one of these error

> > interrupts fire relative to another event.

>

> Tracepoints are ABI(ish), and using a full blown tracepoint just for IRQ status

> would probably be too much.


Tracepoints become ABI once someone ships tooling that depends on them
being there. These don't look  attractive for a tool, and they don't
look difficult to maintain if the interrupt handler needs to be
reworked. I.e. it would be trivial to keep a dead tracepoint around if
worse came to worse to keep a tool from failing to load.

> I was thinking about something like trace_printk hidden under a

> "CONFIG_PECI_DEBUG" (see above), but perhaps that's something for the future

> improvement?


Again trace_printk() is only for private builds.

>

> >

> > > +

> > > +       /*

> > > +        * All commands should be ended up with a ASPEED_PECI_INT_CMD_DONE

> > > bit

> > > +        * set even in an error case.

> > > +        */

> > > +       if (status & ASPEED_PECI_INT_CMD_DONE)

> > > +               complete(&priv->xfer_complete);

> >

> > Hmm, no need to check if there was a sequencing error, like a command

> > was never submitted?

>

> It's handled by checking if HW is idle in xfer before a command is sent, where

> we just expect a single interrupt per command.


I'm asking how do you determine if this status was spurious, or there
was a sequencing error in the driver?

Dan Williams Aug. 27, 2021, 7:11 p.m. UTC | #10

On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska
<iwona.winiarska@intel.com> wrote:
>

> PECI devices may not be discoverable at the time when PECI controller is

> being added (e.g. BMC can boot up when the Host system is still in S5).

> Since we currently don't have the capabilities to figure out the Host

> system state inside the PECI subsystem itself, we have to rely on

> userspace to do it for us.

>

> In the future, PECI subsystem may be expanded with mechanisms that allow

> us to avoid depending on userspace interaction (e.g. CPU presence could

> be detected using GPIO, and the information on whether it's discoverable

> could be obtained over IPMI).


Thanks for this detail.

> Unfortunately, those methods may ultimately not be available (support

> will vary from platform to platform), which means that we still need

> platform independent method triggered by userspace.

>

> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> ---

>  Documentation/ABI/testing/sysfs-bus-peci | 16 +++++

>  drivers/peci/Makefile                    |  2 +-

>  drivers/peci/core.c                      |  3 +-

>  drivers/peci/device.c                    |  1 +

>  drivers/peci/internal.h                  |  5 ++

>  drivers/peci/sysfs.c                     | 82 ++++++++++++++++++++++++

>  6 files changed, 107 insertions(+), 2 deletions(-)

>  create mode 100644 Documentation/ABI/testing/sysfs-bus-peci

>  create mode 100644 drivers/peci/sysfs.c

>

> diff --git a/Documentation/ABI/testing/sysfs-bus-peci b/Documentation/ABI/testing/sysfs-bus-peci

> new file mode 100644

> index 000000000000..56c2b2216bbd

> --- /dev/null

> +++ b/Documentation/ABI/testing/sysfs-bus-peci

> @@ -0,0 +1,16 @@

> +What:          /sys/bus/peci/rescan

> +Date:          July 2021

> +KernelVersion: 5.15

> +Contact:       Iwona Winiarska <iwona.winiarska@intel.com>

> +Description:

> +               Writing a non-zero value to this attribute will

> +               initiate scan for PECI devices on all PECI controllers

> +               in the system.

> +

> +What:          /sys/bus/peci/devices/<controller_id>-<device_addr>/remove

> +Date:          July 2021

> +KernelVersion: 5.15

> +Contact:       Iwona Winiarska <iwona.winiarska@intel.com>

> +Description:

> +               Writing a non-zero value to this attribute will

> +               remove the PECI device and any of its children.

> diff --git a/drivers/peci/Makefile b/drivers/peci/Makefile

> index c5f9d3fe21bb..917f689e147a 100644

> --- a/drivers/peci/Makefile

> +++ b/drivers/peci/Makefile

> @@ -1,7 +1,7 @@

>  # SPDX-License-Identifier: GPL-2.0-only

>

>  # Core functionality

> -peci-y := core.o request.o device.o

> +peci-y := core.o request.o device.o sysfs.o

>  obj-$(CONFIG_PECI) += peci.o

>

>  # Hardware specific bus drivers

> diff --git a/drivers/peci/core.c b/drivers/peci/core.c

> index d143f1a7fe98..c473acb3c2a0 100644

> --- a/drivers/peci/core.c

> +++ b/drivers/peci/core.c

> @@ -34,7 +34,7 @@ struct device_type peci_controller_type = {

>         .release        = peci_controller_dev_release,

>  };

>

> -static int peci_controller_scan_devices(struct peci_controller *controller)

> +int peci_controller_scan_devices(struct peci_controller *controller)

>  {

>         int ret;

>         u8 addr;

> @@ -159,6 +159,7 @@ EXPORT_SYMBOL_NS_GPL(devm_peci_controller_add, PECI);

>

>  struct bus_type peci_bus_type = {

>         .name           = "peci",

> +       .bus_groups     = peci_bus_groups,

>  };

>

>  static int __init peci_init(void)

> diff --git a/drivers/peci/device.c b/drivers/peci/device.c

> index 32811248997b..d77d9dabd51e 100644

> --- a/drivers/peci/device.c

> +++ b/drivers/peci/device.c

> @@ -110,5 +110,6 @@ static void peci_device_release(struct device *dev)

>  }

>

>  struct device_type peci_device_type = {

> +       .groups         = peci_device_groups,

>         .release        = peci_device_release,

>  };

> diff --git a/drivers/peci/internal.h b/drivers/peci/internal.h

> index 57d11a902c5d..978e12c8e1d3 100644

> --- a/drivers/peci/internal.h

> +++ b/drivers/peci/internal.h

> @@ -8,6 +8,7 @@

>  #include <linux/types.h>

>

>  struct peci_controller;

> +struct attribute_group;

>  struct peci_device;

>  struct peci_request;

>

> @@ -19,12 +20,16 @@ struct peci_request *peci_request_alloc(struct peci_device *device, u8 tx_len, u

>  void peci_request_free(struct peci_request *req);

>

>  extern struct device_type peci_device_type;

> +extern const struct attribute_group *peci_device_groups[];

>

>  int peci_device_create(struct peci_controller *controller, u8 addr);

>  void peci_device_destroy(struct peci_device *device);

>

>  extern struct bus_type peci_bus_type;

> +extern const struct attribute_group *peci_bus_groups[];


To me, sysfs.c is small enough to just fold into core.c, then no need
to declare public attribute arrays like this, but up to you if you
prefer the sysfs.c split.

>

>  extern struct device_type peci_controller_type;

>

> +int peci_controller_scan_devices(struct peci_controller *controller);

> +

>  #endif /* __PECI_INTERNAL_H */

> diff --git a/drivers/peci/sysfs.c b/drivers/peci/sysfs.c

> new file mode 100644

> index 000000000000..db9ef05776e3

> --- /dev/null

> +++ b/drivers/peci/sysfs.c

> @@ -0,0 +1,82 @@

> +// SPDX-License-Identifier: GPL-2.0-only

> +// Copyright (c) 2021 Intel Corporation

> +

> +#include <linux/device.h>

> +#include <linux/kernel.h>

> +#include <linux/peci.h>

> +

> +#include "internal.h"

> +

> +static int rescan_controller(struct device *dev, void *data)

> +{

> +       if (dev->type != &peci_controller_type)

> +               return 0;

> +

> +       return peci_controller_scan_devices(to_peci_controller(dev));

> +}

> +

> +static ssize_t rescan_store(struct bus_type *bus, const char *buf, size_t count)

> +{

> +       bool res;

> +       int ret;

> +

> +       ret = kstrtobool(buf, &res);

> +       if (ret)

> +               return ret;

> +

> +       if (!res)

> +               return count;

> +

> +       ret = bus_for_each_dev(&peci_bus_type, NULL, NULL, rescan_controller);

> +       if (ret)

> +               return ret;

> +

> +       return count;

> +}

> +static BUS_ATTR_WO(rescan);

> +

> +static struct attribute *peci_bus_attrs[] = {

> +       &bus_attr_rescan.attr,

> +       NULL

> +};

> +

> +static const struct attribute_group peci_bus_group = {

> +       .attrs = peci_bus_attrs,

> +};

> +

> +const struct attribute_group *peci_bus_groups[] = {

> +       &peci_bus_group,

> +       NULL

> +};

> +

> +static ssize_t remove_store(struct device *dev, struct device_attribute *attr,

> +                           const char *buf, size_t count)

> +{

> +       struct peci_device *device = to_peci_device(dev);

> +       bool res;

> +       int ret;

> +

> +       ret = kstrtobool(buf, &res);

> +       if (ret)

> +               return ret;

> +

> +       if (res && device_remove_file_self(dev, attr))

> +               peci_device_destroy(device);


How do you solve races between sysfs device remove and controller
device remove? Looks like double-free at first glance. Have a look at
the  kill_device() helper as one way to resolve this double-delete
race..

> +

> +       return count;

> +}

> +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, remove_store);

> +

> +static struct attribute *peci_device_attrs[] = {

> +       &dev_attr_remove.attr,

> +       NULL

> +};

> +

> +static const struct attribute_group peci_device_group = {

> +       .attrs = peci_device_attrs,

> +};

> +

> +const struct attribute_group *peci_device_groups[] = {

> +       &peci_device_group,

> +       NULL

> +};

> --

> 2.31.1

>

Iwona Winiarska Aug. 29, 2021, 7:42 p.m. UTC | #11

On Fri, 2021-08-27 at 09:24 -0700, Dan Williams wrote:
> On Thu, Aug 26, 2021 at 4:55 PM Winiarska, Iwona

> <iwona.winiarska@intel.com> wrote:

> > 

> > On Wed, 2021-08-25 at 18:35 -0700, Dan Williams wrote:

> > > On Tue, Aug 3, 2021 at 4:35 AM Iwona Winiarska

> > > <iwona.winiarska@intel.com> wrote:

> > > > 

> > > > From: Jae Hyun Yoo <jae.hyun.yoo@linux.intel.com>

> > > > 

> > > > ASPEED AST24xx/AST25xx/AST26xx SoCs supports the PECI electrical

> > > > interface (a.k.a PECI wire).

> > > 

> > > Maybe a one sentence blurb here and in the Kconfig reminding people

> > > why they should care if they have a PECI driver or not?

> > 

> > Ok, I'll expand it a bit.

> [..]

> > > > +static int aspeed_peci_xfer(struct peci_controller *controller,

> > > > +                           u8 addr, struct peci_request *req)

> > > > +{

> > > > +       struct aspeed_peci *priv = dev_get_drvdata(controller-

> > > > >dev.parent);

> > > > +       unsigned long flags, timeout = msecs_to_jiffies(priv-

> > > > > cmd_timeout_ms);

> > > > +       u32 peci_head;

> > > > +       int ret;

> > > > +

> > > > +       if (req->tx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX ||

> > > > +           req->rx.len > ASPEED_PECI_DATA_BUF_SIZE_MAX)

> > > > +               return -EINVAL;

> > > > +

> > > > +       /* Check command sts and bus idle state */

> > > > +       ret = aspeed_peci_check_idle(priv);

> > > > +       if (ret)

> > > > +               return ret; /* -ETIMEDOUT */

> > > > +

> > > > +       spin_lock_irqsave(&priv->lock, flags);

> > > > +       reinit_completion(&priv->xfer_complete);

> > > > +

> > > > +       peci_head = FIELD_PREP(ASPEED_PECI_TARGET_ADDR_MASK, addr) |

> > > > +                   FIELD_PREP(ASPEED_PECI_WR_LEN_MASK, req->tx.len) |

> > > > +                   FIELD_PREP(ASPEED_PECI_RD_LEN_MASK, req->rx.len);

> > > > +

> > > > +       writel(peci_head, priv->base + ASPEED_PECI_RW_LENGTH);

> > > > +

> > > > +       memcpy_toio(priv->base + ASPEED_PECI_WR_DATA0, req->tx.buf,

> > > > min_t(u8, req->tx.len, 16));

> > > > +       if (req->tx.len > 16)

> > > > +               memcpy_toio(priv->base + ASPEED_PECI_WR_DATA4, req-

> > > > >tx.buf +

> > > > 16,

> > > > +                           req->tx.len - 16);

> > > > +

> > > > +       dev_dbg(priv->dev, "HEAD : 0x%08x\n", peci_head);

> > > > +       print_hex_dump_bytes("TX : ", DUMP_PREFIX_NONE, req->tx.buf,

> > > > req-

> > > > > tx.len);

> > > 

> > > On CONFIG_DYNAMIC_DEBUG=n builds the kernel will do all the work of

> > > reading through this buffer, but skip emitting it. Are you sure you

> > > want to pay that overhead for every transaction?

> > 

> > I can remove it or I can add something like:

> > 

> > #if IS_ENABLED(CONFIG_PECI_DEBUG)

> > #define peci_debug(fmt, ...) pr_debug(fmt, ##__VA_ARGS__)

> > #else

> > #define peci_debug(...) do { } while (0)

> > #endif

> 

> It's the hex dump I'm worried about, not the debug statements as much.

> 

> I think the choices are remove the print_hex_dump_bytes(), put it

> behind an IS_ENABLED(CONFIG_DYNAMIC_DEBUG) to ensure the overhead is

> skipped in the CONFIG_DYNAMIC_DEBUG=n case, or live with the overhead

> if this is not a fast path / infrequently used.


I will place it behind IS_ENABLED(CONFIG_DYNAMIC_DEBUG).

> 

> > 

> > (and similar peci_trace with trace_printk for usage in IRQ handlers and

> > such).

> > 

> > What do you think?

> 

> In general, no, don't wrap the base level print routines with

> driver-specific ones. Also, trace_printk() is only for debug builds.

> Note that trace points are built to be even less overhead than

> dev_dbg(), so there's no overhead concern with disabled tracepoints,

> they literally translate to nops when disabled.


Ack.

> 

> > 

> > > 

> > > > +

> > > > +       priv->status = 0;

> > > > +       writel(ASPEED_PECI_CMD_FIRE, priv->base + ASPEED_PECI_CMD);

> > > > +       spin_unlock_irqrestore(&priv->lock, flags);

> > > > +

> > > > +       ret = wait_for_completion_interruptible_timeout(&priv-

> > > > > xfer_complete, timeout);

> > > 

> > > spin_lock_irqsave() says "I don't know if interrupts are disabled

> > > already, so I'll save the state, whatever it is, and restore later"

> > > 

> > > wait_for_completion_interruptible_timeout() says "I know I am in a

> > > sleepable context where interrupts are enabled"

> > > 

> > > So, one of those is wrong, i.e. should it be spin_{lock,unlock}_irq()?

> > 

> > You're right - I'll fix it.

> > 

> > > 

> > > 

> > > > +       if (ret < 0)

> > > > +               return ret;

> > > > +

> > > > +       if (ret == 0) {

> > > > +               dev_dbg(priv->dev, "Timeout waiting for a response!\n");

> > > > +               return -ETIMEDOUT;

> > > > +       }

> > > > +

> > > > +       spin_lock_irqsave(&priv->lock, flags);

> > > > +

> > > > +       writel(0, priv->base + ASPEED_PECI_CMD);

> > > > +

> > > > +       if (priv->status != ASPEED_PECI_INT_CMD_DONE) {

> > > > +               spin_unlock_irqrestore(&priv->lock, flags);

> > > > +               dev_dbg(priv->dev, "No valid response!\n");

> > > > +               return -EIO;

> > > > +       }

> > > > +

> > > > +       spin_unlock_irqrestore(&priv->lock, flags);

> > > > +

> > > > +       memcpy_fromio(req->rx.buf, priv->base + ASPEED_PECI_RD_DATA0,

> > > > min_t(u8, req->rx.len, 16));

> > > > +       if (req->rx.len > 16)

> > > > +               memcpy_fromio(req->rx.buf + 16, priv->base +

> > > > ASPEED_PECI_RD_DATA4,

> > > > +                             req->rx.len - 16);

> > > > +

> > > > +       print_hex_dump_bytes("RX : ", DUMP_PREFIX_NONE, req->rx.buf,

> > > > req-

> > > > > rx.len);

> > > > +

> > > > +       return 0;

> > > > +}

> > > > +

> > > > +static irqreturn_t aspeed_peci_irq_handler(int irq, void *arg)

> > > > +{

> > > > +       struct aspeed_peci *priv = arg;

> > > > +       u32 status;

> > > > +

> > > > +       spin_lock(&priv->lock);

> > > > +       status = readl(priv->base + ASPEED_PECI_INT_STS);

> > > > +       writel(status, priv->base + ASPEED_PECI_INT_STS);

> > > > +       priv->status |= (status & ASPEED_PECI_INT_MASK);

> > > > +

> > > > +       /*

> > > > +        * In most cases, interrupt bits will be set one by one but also

> > > > note

> > > > +        * that multiple interrupt bits could be set at the same time.

> > > > +        */

> > > > +       if (status & ASPEED_PECI_INT_BUS_TIMEOUT)

> > > > +               dev_dbg_ratelimited(priv->dev,

> > > > "ASPEED_PECI_INT_BUS_TIMEOUT\n");

> > > > +

> > > > +       if (status & ASPEED_PECI_INT_BUS_CONTENTION)

> > > > +               dev_dbg_ratelimited(priv->dev,

> > > > "ASPEED_PECI_INT_BUS_CONTENTION\n");

> > > > +

> > > > +       if (status & ASPEED_PECI_INT_WR_FCS_BAD)

> > > > +               dev_dbg_ratelimited(priv->dev,

> > > > "ASPEED_PECI_INT_WR_FCS_BAD\n");

> > > > +

> > > > +       if (status & ASPEED_PECI_INT_WR_FCS_ABORT)

> > > > +               dev_dbg_ratelimited(priv->dev,

> > > > "ASPEED_PECI_INT_WR_FCS_ABORT\n");

> > > 

> > > Are you sure these would not be better as tracepoints? If you're

> > > debugging an interrupt related failure, the ratelimiting might get in

> > > your way when you really need to know when one of these error

> > > interrupts fire relative to another event.

> > 

> > Tracepoints are ABI(ish), and using a full blown tracepoint just for IRQ

> > status

> > would probably be too much.

> 

> Tracepoints become ABI once someone ships tooling that depends on them

> being there. These don't look  attractive for a tool, and they don't

> look difficult to maintain if the interrupt handler needs to be

> reworked. I.e. it would be trivial to keep a dead tracepoint around if

> worse came to worse to keep a tool from failing to load.


After more consideration, I would prefer to remove these logs for now - in case
of error I'll log full status in xfer().

> 

> > I was thinking about something like trace_printk hidden under a

> > "CONFIG_PECI_DEBUG" (see above), but perhaps that's something for the future

> > improvement?

> 

> Again trace_printk() is only for private builds.

> 

> > 

> > > 

> > > > +

> > > > +       /*

> > > > +        * All commands should be ended up with a

> > > > ASPEED_PECI_INT_CMD_DONE

> > > > bit

> > > > +        * set even in an error case.

> > > > +        */

> > > > +       if (status & ASPEED_PECI_INT_CMD_DONE)

> > > > +               complete(&priv->xfer_complete);

> > > 

> > > Hmm, no need to check if there was a sequencing error, like a command

> > > was never submitted?

> > 

> > It's handled by checking if HW is idle in xfer before a command is sent,

> > where

> > we just expect a single interrupt per command.

> 

> I'm asking how do you determine if this status was spurious, or there

> was a sequencing error in the driver?


I don't think we have any means to determine it.
PECI itself doesn't provide any mechanism to verify it (there is no sequence
number or tag to match request/response).
We're relying on the fact that BMC is a requester and initiates communication
with CPU - the interrupt won't be generated if BMC doesn't send any request.

Thanks
-Iwona

Borislav Petkov Oct. 4, 2021, 7:03 p.m. UTC | #12

On Tue, Aug 03, 2021 at 01:31:20PM +0200, Iwona Winiarska wrote:
> Baseboard management controllers (BMC) often run Linux but are usually

> implemented with non-X86 processors. They can use PECI to access package

> config space (PCS) registers on the host CPU and since some information,

> e.g. figuring out the core count, can be obtained using different

> registers on different CPU generations, they need to decode the family

> and model.

> 

> Move the data from arch/x86/include/asm/intel-family.h into a new file

> include/linux/x86/intel-family.h so that it can be used by other

> architectures.

> 

> Signed-off-by: Iwona Winiarska <iwona.winiarska@intel.com>

> Reviewed-by: Tony Luck <tony.luck@intel.com>

> Reviewed-by: Dan Williams <dan.j.williams@intel.com>

> ---

> To limit tree-wide changes and help people that were expecting

> intel-family defines in arch/x86 to find it more easily without going

> through git history, we're not removing the original header

> completely, we're keeping it as a "stub" that includes the new one.

> If there is a consensus that the tree-wide option is better,

> we can choose this approach.


Why can't the linux/ namespace header include the x86 one so that
nothing changes for arch/x86/?

And if it is really only a handful of families you need, you might just
as well copy them into the peci headers and slap a comment above it
saying where they come from and save yourself all that churn...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[v2,00/15] Introduce PECI subsystem

Message

Comments