mbox series

[0/4] thermal: Introduce Qualcomm Cooling Driver suppport

Message ID 20220912085049.3517140-1-bhupesh.sharma@linaro.org
Headers show
Series thermal: Introduce Qualcomm Cooling Driver suppport | expand

Message

Bhupesh Sharma Sept. 12, 2022, 8:50 a.m. UTC
This patchset introduces the Qualcomm Cooling Driver (aka Qualcomm Thermal
Mitigation Driver) and the related support code (dt-binding, MAINTAINER
file etc).

Several Qualcomm Snapdragon SoCs have Thermal Mitigation Devices (TMDs)
present on various remote subsystem(s) (for e.g. the Compute DSP, aka cDSP),
which can be used for several mitigations for remote subsystem(s), including
remote processor mitigation, rail voltage restriction etc. 

Here we introduce the Qualcomm Cooling Driver which is based on the
kernel thermal driver framework and also employs the kernel QMI interface
to send the message to remote subsystem(s).

At the very top-level, the dts is supposed to describe a TMD node, which
should further represent the remote subsystem(s) present on the SoC and
further each child of a subsystem should represent the separate cooling devices
available on the remote subsystem.

Note that this patchset is targeted for the 'linux-pm' tree and the dts
patchset/changes targeted for 'linux-arm-msm' tree will be sent as a
separate patchset.

This patchset is based on the CONFIG_QCOM_THERMAL related fix sent via
[1]. Otherwise the latest changes from 'linux-next/master' are used to
rebase the patchset.

[1]. https://lore.kernel.org/all/CAA8EJpoM5nW=pVJB4zy4Jh9Q3gE4KOju2QVy_WtmUokKMyXtuw@mail.gmail.com/T/#m4e2b765e68e3123b3c0e28c806409dae4b988432

Cc: andersson@kernel.org
Cc: robh@kernel.org
Cc: daniel.lezcano@linaro.org
Cc: rafael@kernel.org

Bhupesh Sharma (4):
  thermal: qcom: qmi_cooling: Add skeletal qmi cooling driver
  thermal: qcom: Add Kconfig entry & compilation support for qmi cooling
    driver
  dt-bindings: thermal: Add qcom,qmi-tmd-device and qcom,tmd-device yaml
    bindings
  MAINTAINERS: Add entry for Qualcomm Cooling Driver

 .../bindings/thermal/qcom,qmi-tmd-device.yaml |  78 +++
 .../bindings/thermal/qcom,tmd-device.yaml     | 122 ++++
 MAINTAINERS                                   |  10 +
 drivers/thermal/qcom/Kconfig                  |   4 +
 drivers/thermal/qcom/Makefile                 |   2 +
 drivers/thermal/qcom/qmi_cooling/Kconfig      |  14 +
 drivers/thermal/qcom/qmi_cooling/Makefile     |   3 +
 .../qcom/qmi_cooling/qcom_qmi_cooling.c       | 632 ++++++++++++++++++
 .../qcom/qmi_cooling/qcom_tmd_services.c      | 352 ++++++++++
 .../qcom/qmi_cooling/qcom_tmd_services.h      | 120 ++++
 include/dt-bindings/thermal/qcom,tmd.h        |  14 +
 11 files changed, 1351 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml
 create mode 100644 Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
 create mode 100644 drivers/thermal/qcom/qmi_cooling/Kconfig
 create mode 100644 drivers/thermal/qcom/qmi_cooling/Makefile
 create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c
 create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.c
 create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.h
 create mode 100644 include/dt-bindings/thermal/qcom,tmd.h

Comments

Rob Herring Sept. 12, 2022, 3:38 p.m. UTC | #1
On Mon, 12 Sep 2022 14:20:48 +0530, Bhupesh Sharma wrote:
> Add qcom,qmi-tmd-device and qcom,tmd-device yaml bindings.
> 
> Qualcomm QMI based TMD cooling device(s) are used for various
> mitigations for remote subsystem(s) including remote processor
> mitigation, rail voltage restriction etc.
> 
> Each child node represents one remote subsystem and each child
> of this subsystem in-turn represents separate TMD cooling device.
> 
> Cc: daniel.lezcano@linaro.org
> Cc: rafael@kernel.org
> Cc: andersson@kernel.org
> Cc: robh@kernel.org
> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
> ---
>  .../bindings/thermal/qcom,qmi-tmd-device.yaml |  78 +++++++++++
>  .../bindings/thermal/qcom,tmd-device.yaml     | 122 ++++++++++++++++++
>  include/dt-bindings/thermal/qcom,tmd.h        |  14 ++
>  3 files changed, 214 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml
>  create mode 100644 Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
>  create mode 100644 include/dt-bindings/thermal/qcom,tmd.h
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/remoteproc/qcom,adsp.example.dtb: adsp: 'qcom,instance-id' is a required property
	From schema: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/remoteproc/qcom,adsp.example.dtb: adsp: 'clock-names', 'clocks', 'compatible', 'cx-supply', 'interrupt-names', 'interrupts-extended', 'memory-region', 'qcom,smem-state-names', 'qcom,smem-states', 'smd-edge' do not match any of the regexes: '^tmd-device[0-9]?$', 'pinctrl-[0-9]+'
	From schema: /builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.
Bjorn Andersson Sept. 13, 2022, 3:39 p.m. UTC | #2
On Mon, Sep 12, 2022 at 02:23:21PM -0700, Jeff Johnson wrote:
> On 9/12/2022 1:50 AM, Bhupesh Sharma wrote:
[..]
> > diff --git a/drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c b/drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c
[..]
> > +static struct qmi_elem_info tmd_mitigation_dev_id_type_v01_ei[] = {
> 
> note that commit ff6d365898d ("soc: qcom: qmi: use const for struct
> qmi_elem_info") allows QMI message encoding/decoding rules to be const
> 
> <https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=for-next&id=ff6d365898d4d31bd557954c7fc53f38977b491c>
> 
> I'm waiting for that to land in the soc tree before I submit my changes to
> all of the existing drivers, but you can do this now for the new driver
> 

I did merge your patch recently, so you should be able to fetch
linux-next and continue this work:

https://patchwork.kernel.org/project/linux-arm-msm/patch/20220822153435.7856-1-quic_jjohnson@quicinc.com/

Looking forward to the continuation,
Bjorn
Bjorn Andersson Sept. 13, 2022, 6:42 p.m. UTC | #3
On Mon, Sep 12, 2022 at 02:20:46PM +0530, Bhupesh Sharma wrote:
> Add a skeleton driver for supporting Qualcomm QMI thermal mitigation
> (TMD) cooling devices.
> 
> The QMI TMD cooling devices are used for various mitigations for
> remote subsystem(s) including remote processor mitigation, rail
> voltage restriction etc. This driver uses kernel QMI interface
> to send the message to remote subsystem(s).
> 
> Each child node of the QMI TMD devicetree node should represent
> each remote subsystem and each child of this subsystem represents
> separate cooling devices.
> 
> Cc: daniel.lezcano@linaro.org
> Cc: rafael@kernel.org
> Cc: andersson@kernel.org
> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
> ---
>  .../qcom/qmi_cooling/qcom_qmi_cooling.c       | 632 ++++++++++++++++++
>  .../qcom/qmi_cooling/qcom_tmd_services.c      | 352 ++++++++++
>  .../qcom/qmi_cooling/qcom_tmd_services.h      | 120 ++++
>  3 files changed, 1104 insertions(+)
>  create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c
>  create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.c
>  create mode 100644 drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.h
> 
> diff --git a/drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c b/drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c
> new file mode 100644
> index 000000000000..4cb601533b9d
> --- /dev/null
> +++ b/drivers/thermal/qcom/qmi_cooling/qcom_qmi_cooling.c
> @@ -0,0 +1,632 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, Linaro Limited

Afaict this is based on code which is copyrighted Linux Foundation. You
should keep that (in addition to Linaro).

> + */
> +
> +#include <linux/err.h>
> +#include <linux/module.h>
> +#include <linux/net.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc/qcom_rproc.h>
> +#include <linux/slab.h>
> +#include <linux/soc/qcom/qmi.h>
> +#include <linux/thermal.h>
> +
> +#include "qcom_tmd_services.h"
> +
> +#define QMI_TMD_RESP_TIMEOUT		msecs_to_jiffies(100)
> +#define QMI_CLIENT_NAME_LENGTH		40
> +#define QMI_MAX_ALLOWED_INSTANCE_ID	0x80
> +
> +/**
> + * struct qmi_plat_data - qmi compile-time platform data
> + * @ninstances: Number of instances supported by platform
> + */
> +struct qmi_plat_data {
> +	const u32		ninstances;
> +};

This is unused.

> +
> +struct qmi_cooling_device {
> +	struct device_node		*np;

This is a local variable used at probe time, I think you can pass it
around on the stack instead.

> +	char				cdev_name[THERMAL_NAME_LENGTH];

That said, np->name is what you carry in cdev_name. So you'd save some
memory by just carrying np...

> +	char				qmi_name[QMI_CLIENT_NAME_LENGTH];
> +	bool                            connection_active;
> +	struct list_head		qmi_node;

There's no other node, so "node" would probably suffice and save you
some line wraps below.

> +	struct thermal_cooling_device	*cdev;
> +	unsigned int			mtgn_state;

Seems like this could be named just "state".

> +	unsigned int			max_level;
> +	struct qmi_tmd_instance		*instance;
> +};
> +
> +struct qmi_tmd_instance {
> +	struct device			*dev;
> +	struct qmi_handle		handle;
> +	struct mutex			mutex;
> +	u32				instance_id;
> +	struct list_head		tmd_cdev_list;
> +	struct work_struct		svc_arrive_work;
> +};
> +
> +/**
> + * struct qmi_tmd_priv
> + * @dev: device.
> + * @instances: array of QMI TMD instances.
> + * @ninstances: number of QMI TMD instances.
> + */
> +struct qmi_tmd_priv {
> +	struct device			*dev;
> +	struct qmi_tmd_instance		*instances;
> +	u32				ninstances;
> +};
> +
> +static char device_clients[][QMI_CLIENT_NAME_LENGTH] = {

Afaict the only reason you have this list is to validate the label
specified in DT. I think you should drop it and trust the dtb
information - and possibly check this in the DT binding.

> +	{"pa"},
> +	{"pa_fr1"},
> +	{"cx_vdd_limit"},
> +	{"modem"},
> +	{"modem_current"},
> +	{"modem_skin"},
> +	{"modem_bw"},
> +	{"modem_bw_backoff"},
> +	{"vbatt_low"},
> +	{"charge_state"},
> +	{"mmw0"},
> +	{"mmw1"},
> +	{"mmw2"},
> +	{"mmw3"},
> +	{"mmw_skin0"},
> +	{"mmw_skin1"},
> +	{"mmw_skin2"},
> +	{"mmw_skin3"},
> +	{"wlan"},
> +	{"wlan_bw"},
> +	{"mmw_skin0_dsc"},
> +	{"mmw_skin1_dsc"},
> +	{"mmw_skin2_dsc"},
> +	{"mmw_skin3_dsc"},
> +	{"modem_skin_lte_dsc"},
> +	{"modem_skin_nr_dsc"},
> +	{"pa_dsc"},
> +	{"pa_fr1_dsc"},
> +	{"cdsp_sw"},
> +	{"cdsp_hw"},
> +	{"cpuv_restriction_cold"},
> +	{"cpr_cold"},
> +	{"modem_lte_dsc"},
> +	{"modem_nr_dsc"},
> +	{"modem_nr_scg_dsc"},
> +	{"sdr0_lte_dsc"},
> +	{"sdr1_lte_dsc"},
> +	{"sdr0_nr_dsc"},
> +	{"sdr1_nr_dsc"},
> +	{"pa_lte_sdr0_dsc"},
> +	{"pa_lte_sdr1_dsc"},
> +	{"pa_nr_sdr0_dsc"},
> +	{"pa_nr_sdr1_dsc"},
> +	{"pa_nr_sdr0_scg_dsc"},
> +	{"pa_nr_sdr1_scg_dsc"},
> +	{"mmw0_dsc"},
> +	{"mmw1_dsc"},
> +	{"mmw2_dsc"},
> +	{"mmw3_dsc"},
> +	{"mmw_ific_dsc"},
> +};
> +
> +static int qmi_get_max_state(struct thermal_cooling_device *cdev,
> +				 unsigned long *state)
> +{
> +	struct qmi_cooling_device *qmi_cdev = cdev->devdata;
> +
> +	if (!qmi_cdev)
> +		return -EINVAL;
> +
> +	*state = qmi_cdev->max_level;
> +
> +	return 0;
> +}
> +
> +static int qmi_get_cur_state(struct thermal_cooling_device *cdev,
> +				 unsigned long *state)
> +{
> +	struct qmi_cooling_device *qmi_cdev = cdev->devdata;
> +
> +	if (!qmi_cdev)
> +		return -EINVAL;
> +
> +	*state = qmi_cdev->mtgn_state;
> +
> +	return 0;
> +}
> +
> +static int qmi_tmd_send_state_request(struct qmi_cooling_device *qmi_cdev,
> +				uint8_t state)
> +{
> +	int ret = 0;
> +	struct tmd_set_mitigation_level_req_msg_v01 req;
> +	struct tmd_set_mitigation_level_resp_msg_v01 tmd_resp;
> +	struct qmi_tmd_instance *tmd_instance = qmi_cdev->instance;
> +	struct qmi_txn txn;
> +
> +	memset(&req, 0, sizeof(req));
> +	memset(&tmd_resp, 0, sizeof(tmd_resp));
> +
> +	strscpy(req.mitigation_dev_id.mitigation_dev_id, qmi_cdev->qmi_name,
> +		QMI_TMD_MITIGATION_DEV_ID_LENGTH_MAX_V01);
> +	req.mitigation_level = state;
> +
> +	mutex_lock(&tmd_instance->mutex);
> +
> +	ret = qmi_txn_init(&tmd_instance->handle, &txn,
> +		tmd_set_mitigation_level_resp_msg_v01_ei, &tmd_resp);
> +	if (ret < 0) {
> +		pr_err("qmi set state:%d txn init failed for %s ret:%d\n",
> +			state, qmi_cdev->cdev_name, ret);

Carry the struct device and use dev_err() instead of pr_err

> +		goto qmi_send_exit;
> +	}
> +
> +	ret = qmi_send_request(&tmd_instance->handle, NULL, &txn,
> +			QMI_TMD_SET_MITIGATION_LEVEL_REQ_V01,
> +			TMD_SET_MITIGATION_LEVEL_REQ_MSG_V01_MAX_MSG_LEN,
> +			tmd_set_mitigation_level_req_msg_v01_ei, &req);
> +	if (ret < 0) {
> +		pr_err("qmi set state:%d txn send failed for %s ret:%d\n",
> +			state, qmi_cdev->cdev_name, ret);
> +		qmi_txn_cancel(&txn);
> +		goto qmi_send_exit;
> +	}
> +
> +	ret = qmi_txn_wait(&txn, QMI_TMD_RESP_TIMEOUT);
> +	if (ret < 0) {
> +		pr_err("qmi set state:%d txn wait failed for %s ret:%d\n",
> +			state, qmi_cdev->cdev_name, ret);
> +		goto qmi_send_exit;
> +	}
> +	if (tmd_resp.resp.result != QMI_RESULT_SUCCESS_V01) {
> +		ret = tmd_resp.resp.result;
> +		pr_err("qmi set state:%d NOT success for %s ret:%d\n",
> +			state, qmi_cdev->cdev_name, ret);
> +		goto qmi_send_exit;
> +	}
> +	ret = 0;
> +	pr_debug("Requested qmi state:%d for %s\n", state, qmi_cdev->cdev_name);
> +
> +qmi_send_exit:
> +	mutex_unlock(&tmd_instance->mutex);
> +	return ret;
> +}
> +
> +static int qmi_set_cur_state(struct thermal_cooling_device *cdev,
> +				 unsigned long state)
> +{
> +	struct qmi_cooling_device *qmi_cdev = cdev->devdata;
> +	int ret = 0;
> +
> +	if (!qmi_cdev)
> +		return -EINVAL;
> +
> +	if (state > qmi_cdev->max_level)
> +		return -EINVAL;
> +
> +	if (qmi_cdev->mtgn_state == state)
> +		return 0;
> +
> +	/* save it and return if server exit */
> +	if (!qmi_cdev->connection_active) {
> +		qmi_cdev->mtgn_state = state;
> +		pr_debug("Pending request:%ld for %s\n", state,
> +				qmi_cdev->cdev_name);
> +		return 0;
> +	}
> +
> +	/* It is best effort to save state even if QMI fail */

Why is this preferred over leaving mtgn_state at its old value?

If it really should be done this way, you could write this:

	qmi_cdev->state = state;

	return qmi_tmd_send_state_request();

> +	ret = qmi_tmd_send_state_request(qmi_cdev, (uint8_t)state);
> +
> +	qmi_cdev->mtgn_state = state;
> +
> +	return ret;
> +}
> +
> +static struct thermal_cooling_device_ops qmi_device_ops = {
> +	.get_max_state = qmi_get_max_state,
> +	.get_cur_state = qmi_get_cur_state,
> +	.set_cur_state = qmi_set_cur_state,
> +};
> +
> +static int qmi_register_cooling_device(struct qmi_cooling_device *qmi_cdev)
> +{
> +	qmi_cdev->cdev = thermal_of_cooling_device_register(
> +					qmi_cdev->np,
> +					qmi_cdev->cdev_name,
> +					qmi_cdev,
> +					&qmi_device_ops);
> +	if (IS_ERR(qmi_cdev->cdev)) {
> +		pr_err("Cooling register failed for %s, ret:%ld\n",
> +			qmi_cdev->cdev_name, PTR_ERR(qmi_cdev->cdev));
> +		return PTR_ERR(qmi_cdev->cdev);

If this happens qmi_cdev->dev will be IS_ERR(), but the code treat !NULL
as initialized.

So I think you should use a local variable here and only assign
qmi_cdev->cdev on success.

> +	}
> +	pr_debug("Cooling register success for %s\n", qmi_cdev->cdev_name);

Use dev_dbg();

> +
> +	return 0;
> +}
> +
> +static int verify_devices_and_register(struct qmi_tmd_instance *tmd_instance)
> +{
> +	struct tmd_get_mitigation_device_list_req_msg_v01 req;
> +	struct tmd_get_mitigation_device_list_resp_msg_v01 *tmd_resp;
> +	int ret = 0, i;
> +	struct qmi_txn txn;
> +
> +	memset(&req, 0, sizeof(req));
> +	/* size of tmd_resp is very high, use heap memory rather than stack */

What is "very high"? Presumably this is "a few kB"?

> +	tmd_resp = kzalloc(sizeof(*tmd_resp), GFP_KERNEL);
> +	if (!tmd_resp)
> +		return -ENOMEM;
> +
> +	mutex_lock(&tmd_instance->mutex);
> +	ret = qmi_txn_init(&tmd_instance->handle, &txn,
> +		tmd_get_mitigation_device_list_resp_msg_v01_ei, tmd_resp);
> +	if (ret < 0) {
> +		pr_err("Transaction Init error for instance_id:0x%x ret:%d\n",
> +			tmd_instance->instance_id, ret);
> +		goto reg_exit;
> +	}
> +
> +	ret = qmi_send_request(&tmd_instance->handle, NULL, &txn,
> +			QMI_TMD_GET_MITIGATION_DEVICE_LIST_REQ_V01,
> +			TMD_GET_MITIGATION_DEVICE_LIST_REQ_MSG_V01_MAX_MSG_LEN,
> +			tmd_get_mitigation_device_list_req_msg_v01_ei,
> +			&req);
> +	if (ret < 0) {
> +		qmi_txn_cancel(&txn);
> +		goto reg_exit;
> +	}
> +
> +	ret = qmi_txn_wait(&txn, QMI_TMD_RESP_TIMEOUT);
> +	if (ret < 0) {
> +		pr_err("Transaction wait error for instance_id:0x%x ret:%d\n",
> +			tmd_instance->instance_id, ret);
> +		goto reg_exit;
> +	}
> +	if (tmd_resp->resp.result != QMI_RESULT_SUCCESS_V01) {
> +		ret = tmd_resp->resp.result;
> +		pr_err("Get device list NOT success for instance_id:0x%x ret:%d\n",
> +			tmd_instance->instance_id, ret);
> +		goto reg_exit;
> +	}
> +	mutex_unlock(&tmd_instance->mutex);
> +
> +	for (i = 0; i < tmd_resp->mitigation_device_list_len; i++) {
> +		struct qmi_cooling_device *qmi_cdev = NULL;
> +
> +		list_for_each_entry(qmi_cdev, &tmd_instance->tmd_cdev_list,
> +					qmi_node) {
> +			struct tmd_mitigation_dev_list_type_v01 *device =
> +				&tmd_resp->mitigation_device_list[i];
> +
> +			if ((strncasecmp(qmi_cdev->qmi_name,
> +				device->mitigation_dev_id.mitigation_dev_id,
> +				QMI_TMD_MITIGATION_DEV_ID_LENGTH_MAX_V01)))
> +				continue;
> +
> +			qmi_cdev->connection_active = true;
> +			qmi_cdev->max_level = device->max_mitigation_level;
> +			/*
> +			 * It is better to set current state
> +			 * initially or during restart
> +			 */
> +			qmi_tmd_send_state_request(qmi_cdev,
> +							qmi_cdev->mtgn_state);

Unwrap this line.

> +			if (!qmi_cdev->cdev)
> +				ret = qmi_register_cooling_device(qmi_cdev);
> +			break;
> +		}
> +	}

I find this hard to follow.

You loop through the response and for each response you loop through the
cooling devices and if one found you register it and then you pick the
next one.

I believe you can write that cleaner.

Also, for each supported cooling device returned by the remote you're
overwriting "ret". So the final return value will depend on the success
of registering the last cooling device only.

> +
> +	kfree(tmd_resp);
> +	return ret;
> +
> +reg_exit:
> +	mutex_unlock(&tmd_instance->mutex);
> +	kfree(tmd_resp);
> +
> +	return ret;
> +}
> +
> +static void qmi_tmd_svc_arrive(struct work_struct *work)
> +{
> +	struct qmi_tmd_instance *tmd_instance = container_of(work,
> +						struct qmi_tmd_instance,
> +						svc_arrive_work);
> +
> +	verify_devices_and_register(tmd_instance);

Seems like this could be inlined.

> +}
> +
> +static void thermal_qmi_net_reset(struct qmi_handle *qmi)
> +{
> +	struct qmi_tmd_instance *tmd_instance = container_of(qmi,
> +						struct qmi_tmd_instance,
> +						handle);
> +	struct qmi_cooling_device *qmi_cdev = NULL;
> +
> +	list_for_each_entry(qmi_cdev, &tmd_instance->tmd_cdev_list,
> +					qmi_node) {
> +		if (qmi_cdev->connection_active)
> +			qmi_tmd_send_state_request(qmi_cdev,
> +							qmi_cdev->mtgn_state);

Unwrap this line.

> +	}
> +}
> +
> +static void thermal_qmi_del_server(struct qmi_handle *qmi,
> +				    struct qmi_service *service)
> +{
> +	struct qmi_tmd_instance *tmd_instance = container_of(qmi,
> +						struct qmi_tmd_instance,
> +						handle);
> +	struct qmi_cooling_device *qmi_cdev = NULL;
> +
> +	list_for_each_entry(qmi_cdev, &tmd_instance->tmd_cdev_list, qmi_node)
> +		qmi_cdev->connection_active = false;
> +}
> +
> +static int thermal_qmi_new_server(struct qmi_handle *qmi,
> +				    struct qmi_service *service)
> +{
> +	struct qmi_tmd_instance *tmd_instance = container_of(qmi,
> +						struct qmi_tmd_instance,
> +						handle);
> +	struct sockaddr_qrtr sq = {AF_QIPCRTR, service->node, service->port};
> +
> +	mutex_lock(&tmd_instance->mutex);
> +	kernel_connect(qmi->sock, (struct sockaddr *)&sq, sizeof(sq), 0);
> +	mutex_unlock(&tmd_instance->mutex);
> +	queue_work(system_highpri_wq, &tmd_instance->svc_arrive_work);
> +
> +	return 0;
> +}
> +
> +static struct qmi_ops thermal_qmi_event_ops = {
> +	.new_server = thermal_qmi_new_server,
> +	.del_server = thermal_qmi_del_server,
> +	.net_reset = thermal_qmi_net_reset,
> +};
> +
> +static void qmi_tmd_cleanup(struct qmi_tmd_priv *priv)
> +{
> +	int i;
> +	struct qmi_tmd_instance *tmd_instance = priv->instances;
> +	struct qmi_cooling_device *qmi_cdev, *c_next;
> +
> +	for (i = 0; i < priv->ninstances; i++) {
> +		mutex_lock(&tmd_instance[i].mutex);
> +		list_for_each_entry_safe(qmi_cdev, c_next,
> +				&tmd_instance[i].tmd_cdev_list, qmi_node) {
> +			qmi_cdev->connection_active = false;
> +			if (qmi_cdev->cdev)
> +				thermal_cooling_device_unregister(
> +					qmi_cdev->cdev);
> +
> +			list_del(&qmi_cdev->qmi_node);
> +		}
> +		qmi_handle_release(&tmd_instance[i].handle);

I think you should release your qmi handle first, as that will ensure
that your driver won't receive any further notifications (and
qmi_send_request() will just fail gracefully).

> +
> +		mutex_unlock(&tmd_instance[i].mutex);

You might have work scheduled when you get here.

> +	}
> +}
> +
> +static int qmi_get_dt_instance_data(struct qmi_tmd_priv *priv,
> +				    struct qmi_tmd_instance *instance,
> +				    struct device_node *node)
> +{
> +	struct device *dev = priv->dev;
> +	struct qmi_cooling_device *qmi_cdev;
> +	struct device_node *subnode;
> +	int ret, i;
> +	u32 instance_id;
> +
> +	ret = of_property_read_u32(node, "qcom,instance-id", &instance_id);
> +	if (ret) {
> +		dev_err(dev, "error reading qcom,instance-id (%d)\n",
> +				ret);
> +		return ret;
> +	}
> +
> +	if (instance_id >= QMI_MAX_ALLOWED_INSTANCE_ID) {
> +		dev_err(dev, "Instance ID exceeds max allowed value (%d)\n", instance_id);
> +		return -EINVAL;
> +	}
> +
> +	instance->instance_id = instance_id;
> +
> +	instance->dev = dev;
> +	mutex_init(&instance->mutex);
> +	INIT_LIST_HEAD(&instance->tmd_cdev_list);
> +	INIT_WORK(&instance->svc_arrive_work, qmi_tmd_svc_arrive);
> +
> +	for_each_available_child_of_node(node, subnode) {
> +		const char *qmi_name;
> +
> +		qmi_cdev = devm_kzalloc(dev, sizeof(*qmi_cdev),
> +				GFP_KERNEL);

Unwrap this line please.

> +		if (!qmi_cdev) {
> +			ret = -ENOMEM;
> +			goto data_error;
> +		}
> +
> +		strscpy(qmi_cdev->cdev_name, subnode->name,
> +				THERMAL_NAME_LENGTH);
> +
> +		if (!of_property_read_string(subnode,
> +					"label",
> +					&qmi_name)) {

Afaict this is still < 80 chars if you unwrap this line.

> +			strscpy(qmi_cdev->qmi_name, qmi_name,
> +					QMI_CLIENT_NAME_LENGTH);
> +		} else {
> +			dev_err(dev, "Fail to parse dev name for %s\n",
> +					subnode->name);
> +			of_node_put(subnode);
> +			break;
> +		}
> +
> +		/* Check for supported qmi dev */
> +		for (i = 0; i < ARRAY_SIZE(device_clients); i++) {

As mentioned above, I suggest that you drop this check.

> +			if (strcmp(device_clients[i],
> +						qmi_cdev->qmi_name) == 0)
> +				break;
> +		}
> +
> +		if (i >= ARRAY_SIZE(device_clients)) {
> +			dev_err(dev, "Not supported dev name for %s\n",
> +					subnode->name);
> +			of_node_put(subnode);
> +			break;
> +		}
> +		qmi_cdev->instance = instance;
> +		qmi_cdev->np = subnode;
> +		qmi_cdev->mtgn_state = 0;
> +		list_add(&qmi_cdev->qmi_node, &instance->tmd_cdev_list);
> +	}
> +
> +	of_node_put(node);

You never increment "node", so I don't think you should put it either.

> +
> +	return 0;
> +data_error:
> +	of_node_put(subnode);
> +
> +	return ret;
> +}
> +
> +static int qmi_tmd_device_init(struct qmi_tmd_priv *priv)
> +{
> +	int i, ret;
> +	u32 ninstances = priv->ninstances;
> +
> +	for (i = 0; i < ninstances; i++) {
> +		struct qmi_tmd_instance *tmd_instance = &priv->instances[i];
> +
> +		if (list_empty(&tmd_instance->tmd_cdev_list))
> +			continue;
> +
> +		ret = qmi_handle_init(&tmd_instance->handle,
> +			TMD_GET_MITIGATION_DEVICE_LIST_RESP_MSG_V01_MAX_MSG_LEN,
> +			&thermal_qmi_event_ops, NULL);
> +		if (ret < 0) {
> +			dev_err(priv->dev, "QMI[0x%x] handle init failed. err:%d\n",
> +					tmd_instance->instance_id, ret);
> +			priv->ninstances = i;
> +			return ret;
> +		}
> +
> +		ret = qmi_add_lookup(&tmd_instance->handle, TMD_SERVICE_ID_V01,
> +					TMD_SERVICE_VERS_V01,
> +					tmd_instance->instance_id);
> +		if (ret < 0) {
> +			dev_err(priv->dev, "QMI register failed for 0x%x, ret:%d\n",
> +				tmd_instance->instance_id, ret);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id qmi_tmd_device_table[] = {
> +	{.compatible = "qcom,qmi-tmd-devices"},
> +	{}
> +};
> +MODULE_DEVICE_TABLE(of, qmi_tmd_device_table);

Please move this below qmi_tmd_device_remove()

> +
> +static int qmi_tmd_device_probe(struct platform_device *pdev)
> +{
> +	struct device *dev;
> +	struct device_node *np;
> +	struct device_node *child;
> +	struct qmi_tmd_instance *instances;
> +	const struct qmi_plat_data *data;
> +	const struct of_device_id *id;
> +	struct qmi_tmd_priv *priv;
> +	int ret;
> +	u32 ninstances;
> +
> +	if (pdev->dev.of_node)
> +		dev = &pdev->dev;
> +	else
> +		dev = pdev->dev.parent;
> +
> +	np = dev->of_node;
> +
> +	id = of_match_node(qmi_tmd_device_table, np);

Use of_device_get_match_data().

> +	if (!id)
> +		return -ENODEV;
> +
> +	data = id->data;

That said, you don't actually use or specify the data.

> +
> +	if (np)

The driver is only bound by DT match, so you don't need to guard against
!np.

> +		ninstances = of_get_available_child_count(np);
> +
> +	if (ninstances <= 0) {

And if for some reason np was NULL, then ninstances would be
uninitialized.

Also, there's no reason to check for negative return values from
of_get_available_child_count().

> +		dev_err(dev, "No instances to process\n");
> +		return -EINVAL;
> +	}
> +
> +	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	priv->dev = dev;
> +	priv->ninstances = ninstances;
> +
> +	priv->instances = devm_kcalloc(dev, priv->ninstances,
> +					sizeof(*priv->instances), GFP_KERNEL);
> +	if (!priv->instances)
> +		return -ENOMEM;
> +
> +	instances = priv->instances;
> +
> +	for_each_available_child_of_node(np, child) {
> +		ret = qmi_get_dt_instance_data(priv, instances, child);
> +		if (ret) {
> +			of_node_put(child);
> +			return ret;
> +		}
> +
> +		instances++;
> +	}
> +
> +	platform_set_drvdata(pdev, priv);
> +
> +	ret = qmi_tmd_device_init(priv);
> +	if (ret)
> +		goto probe_err;
> +
> +	dev_dbg(dev, "QMI Thermal Mitigation Device driver probe success!\n");

Please skip this, you can look in sysfs to see that your device probed.

> +	return 0;
> +
> +probe_err:
> +	qmi_tmd_cleanup(priv);
> +	return ret;
> +}
> +
> +static int qmi_tmd_device_remove(struct platform_device *pdev)
> +{
> +	struct qmi_tmd_priv *priv = platform_get_drvdata(pdev);
> +
> +	qmi_tmd_cleanup(priv);
> +
> +	return 0;
> +}
> +
> +static struct platform_driver qmi_tmd_device_driver = {
> +	.probe          = qmi_tmd_device_probe,
> +	.remove         = qmi_tmd_device_remove,
> +	.driver         = {
> +		.name   = "qcom-qmi-tmd-devices",

Why not qcom-qmi-cooling?

> +		.of_match_table = qmi_tmd_device_table,
> +	},
> +};
> +
> +module_platform_driver(qmi_tmd_device_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Qualcomm QMI Thermal Mitigation Device driver");
> +MODULE_ALIAS("platform:qcom-qmi-tmd-devices");

Module auto loading happens based on MODULE_DEVICE_TABLE(of, ), so no
need to specify this MODULE_ALIAS.

> diff --git a/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.c b/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.c
> new file mode 100644
> index 000000000000..5b950b8952f0
> --- /dev/null
> +++ b/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.c
> @@ -0,0 +1,352 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, Linaro Limited

Linux Foundation.

> + */
> +
> +#include <linux/soc/qcom/qmi.h>
> +
> +#include "qcom_tmd_services.h"
> +
> +static struct qmi_elem_info tmd_mitigation_dev_id_type_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRING,
> +		.elem_len       = QMI_TMD_MITIGATION_DEV_ID_LENGTH_MAX_V01 + 1,
> +		.elem_size      = sizeof(char),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0,
> +		.offset         = offsetof(
> +					struct tmd_mitigation_dev_id_type_v01,
> +					mitigation_dev_id),
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +static struct qmi_elem_info tmd_mitigation_dev_list_type_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type       = NO_ARRAY,
> +		.tlv_type       = 0,
> +		.offset         = offsetof(
> +					struct tmd_mitigation_dev_list_type_v01,
> +					mitigation_dev_id),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_UNSIGNED_1_BYTE,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0,
> +		.offset         = offsetof(
> +					struct tmd_mitigation_dev_list_type_v01,
> +					max_mitigation_level),
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_get_mitigation_device_list_req_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_get_mitigation_device_list_resp_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct qmi_response_type_v01),
> +		.array_type       = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +			struct tmd_get_mitigation_device_list_resp_msg_v01,
> +			resp),
> +		.ei_array      = qmi_response_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_OPT_FLAG,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x10,
> +		.offset         = offsetof(
> +			struct tmd_get_mitigation_device_list_resp_msg_v01,
> +				mitigation_device_list_valid),
> +	},
> +	{
> +		.data_type      = QMI_DATA_LEN,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x10,
> +		.offset         = offsetof(
> +			struct tmd_get_mitigation_device_list_resp_msg_v01,
> +				mitigation_device_list_len),
> +	},
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = QMI_TMD_MITIGATION_DEV_LIST_MAX_V01,
> +		.elem_size      = sizeof(
> +				struct tmd_mitigation_dev_list_type_v01),
> +		.array_type       = VAR_LEN_ARRAY,
> +		.tlv_type       = 0x10,
> +		.offset         = offsetof(
> +			struct tmd_get_mitigation_device_list_resp_msg_v01,
> +				mitigation_device_list),
> +		.ei_array      = tmd_mitigation_dev_list_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_set_mitigation_level_req_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x01,
> +		.offset         = offsetof(
> +				struct tmd_set_mitigation_level_req_msg_v01,
> +					mitigation_dev_id),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_UNSIGNED_1_BYTE,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +				struct tmd_set_mitigation_level_req_msg_v01,
> +					mitigation_level),
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_set_mitigation_level_resp_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct qmi_response_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +			struct tmd_set_mitigation_level_resp_msg_v01,
> +				resp),
> +		.ei_array      = qmi_response_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_get_mitigation_level_req_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x01,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_req_msg_v01,
> +					mitigation_device),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_get_mitigation_level_resp_msg_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct qmi_response_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_resp_msg_v01,
> +					resp),
> +		.ei_array      = qmi_response_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_OPT_FLAG,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x10,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_resp_msg_v01,
> +					current_mitigation_level_valid),
> +	},
> +	{
> +		.data_type      = QMI_UNSIGNED_1_BYTE,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x10,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_resp_msg_v01,
> +					current_mitigation_level),
> +	},
> +	{
> +		.data_type      = QMI_OPT_FLAG,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x11,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_resp_msg_v01,
> +					requested_mitigation_level_valid),
> +	},
> +	{
> +		.data_type      = QMI_UNSIGNED_1_BYTE,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x11,
> +		.offset         = offsetof(
> +				struct tmd_get_mitigation_level_resp_msg_v01,
> +					requested_mitigation_level),
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info
> +	tmd_register_notification_mitigation_level_req_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x01,
> +		.offset         = offsetof(
> +		struct tmd_register_notification_mitigation_level_req_msg_v01,
> +				mitigation_device),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info
> +	tmd_register_notification_mitigation_level_resp_msg_v01_ei[]
> +									= {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct qmi_response_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +		struct tmd_register_notification_mitigation_level_resp_msg_v01,
> +				resp),
> +		.ei_array      = qmi_response_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info
> +	tmd_deregister_notification_mitigation_level_req_msg_v01_ei[]
> +									= {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x01,
> +		.offset         = offsetof(struct
> +		tmd_deregister_notification_mitigation_level_req_msg_v01,
> +				mitigation_device),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info
> +	tmd_deregister_notification_mitigation_level_resp_msg_v01_ei[]
> +									= {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct qmi_response_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(struct
> +		tmd_deregister_notification_mitigation_level_resp_msg_v01,
> +					   resp),
> +		.ei_array      = qmi_response_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> +
> +struct qmi_elem_info tmd_mitigation_level_report_ind_msg_v01_ei[] = {
> +	{
> +		.data_type      = QMI_STRUCT,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(struct tmd_mitigation_dev_id_type_v01),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x01,
> +		.offset         = offsetof(
> +				struct tmd_mitigation_level_report_ind_msg_v01,
> +					mitigation_device),
> +		.ei_array      = tmd_mitigation_dev_id_type_v01_ei,
> +	},
> +	{
> +		.data_type      = QMI_UNSIGNED_1_BYTE,
> +		.elem_len       = 1,
> +		.elem_size      = sizeof(uint8_t),
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = 0x02,
> +		.offset         = offsetof(
> +				struct tmd_mitigation_level_report_ind_msg_v01,
> +					   current_mitigation_level),
> +	},
> +	{
> +		.data_type      = QMI_EOTI,
> +		.array_type     = NO_ARRAY,
> +		.tlv_type       = QMI_COMMON_TLV_TYPE,
> +	},
> +};
> diff --git a/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.h b/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.h
> new file mode 100644
> index 000000000000..8af0bfd7eb48
> --- /dev/null
> +++ b/drivers/thermal/qcom/qmi_cooling/qcom_tmd_services.h
> @@ -0,0 +1,120 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022, Linaro Limited

Linux Foundation.

Regards,
Bjorn

> + */
> +
> +#ifndef __QCOM_TMD_SERVICES_H
> +#define __QCOM_TMD_SERVICES_H
> +
> +#define TMD_SERVICE_ID_V01 0x18
> +#define TMD_SERVICE_VERS_V01 0x01
> +
> +#define QMI_TMD_GET_MITIGATION_DEVICE_LIST_RESP_V01	0x0020
> +#define QMI_TMD_GET_MITIGATION_LEVEL_REQ_V01		0x0022
> +#define QMI_TMD_GET_SUPPORTED_MSGS_REQ_V01		0x001E
> +#define QMI_TMD_SET_MITIGATION_LEVEL_REQ_V01		0x0021
> +#define QMI_TMD_REGISTER_NOTIFICATION_MITIGATION_LEVEL_RESP_V01		0x0023
> +#define QMI_TMD_GET_SUPPORTED_MSGS_RESP_V01				0x001E
> +#define QMI_TMD_SET_MITIGATION_LEVEL_RESP_V01				0x0021
> +#define QMI_TMD_DEREGISTER_NOTIFICATION_MITIGATION_LEVEL_RESP_V01	0x0024
> +#define QMI_TMD_MITIGATION_LEVEL_REPORT_IND_V01				0x0025
> +#define QMI_TMD_GET_MITIGATION_LEVEL_RESP_V01				0x0022
> +#define QMI_TMD_GET_SUPPORTED_FIELDS_REQ_V01				0x001F
> +#define QMI_TMD_GET_MITIGATION_DEVICE_LIST_REQ_V01			0x0020
> +#define QMI_TMD_REGISTER_NOTIFICATION_MITIGATION_LEVEL_REQ_V01		0x0023
> +#define QMI_TMD_DEREGISTER_NOTIFICATION_MITIGATION_LEVEL_REQ_V01	0x0024
> +#define QMI_TMD_GET_SUPPORTED_FIELDS_RESP_V01				0x001F
> +
> +#define QMI_TMD_MITIGATION_DEV_ID_LENGTH_MAX_V01	32
> +#define QMI_TMD_MITIGATION_DEV_LIST_MAX_V01		32
> +
> +struct tmd_mitigation_dev_id_type_v01 {
> +	char mitigation_dev_id[QMI_TMD_MITIGATION_DEV_ID_LENGTH_MAX_V01 + 1];
> +};
> +
> +struct tmd_mitigation_dev_list_type_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_dev_id;
> +	uint8_t max_mitigation_level;
> +};
> +
> +struct tmd_get_mitigation_device_list_req_msg_v01 {
> +	char placeholder;
> +};
> +#define TMD_GET_MITIGATION_DEVICE_LIST_REQ_MSG_V01_MAX_MSG_LEN	0
> +extern struct qmi_elem_info tmd_get_mitigation_device_list_req_msg_v01_ei[];
> +
> +struct tmd_get_mitigation_device_list_resp_msg_v01 {
> +	struct qmi_response_type_v01 resp;
> +	uint8_t mitigation_device_list_valid;
> +	uint32_t mitigation_device_list_len;
> +	struct tmd_mitigation_dev_list_type_v01
> +		mitigation_device_list[QMI_TMD_MITIGATION_DEV_LIST_MAX_V01];
> +};
> +#define TMD_GET_MITIGATION_DEVICE_LIST_RESP_MSG_V01_MAX_MSG_LEN	1099
> +extern struct qmi_elem_info tmd_get_mitigation_device_list_resp_msg_v01_ei[];
> +
> +struct tmd_set_mitigation_level_req_msg_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_dev_id;
> +	uint8_t mitigation_level;
> +};
> +#define TMD_SET_MITIGATION_LEVEL_REQ_MSG_V01_MAX_MSG_LEN	40
> +extern struct qmi_elem_info tmd_set_mitigation_level_req_msg_v01_ei[];
> +
> +struct tmd_set_mitigation_level_resp_msg_v01 {
> +	struct qmi_response_type_v01 resp;
> +};
> +#define TMD_SET_MITIGATION_LEVEL_RESP_MSG_V01_MAX_MSG_LEN	7
> +extern struct qmi_elem_info tmd_set_mitigation_level_resp_msg_v01_ei[];
> +
> +struct tmd_get_mitigation_level_req_msg_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_device;
> +};
> +#define TMD_GET_MITIGATION_LEVEL_REQ_MSG_V01_MAX_MSG_LEN	36
> +extern struct qmi_elem_info tmd_get_mitigation_level_req_msg_v01_ei[];
> +
> +struct tmd_get_mitigation_level_resp_msg_v01 {
> +	struct qmi_response_type_v01 resp;
> +	uint8_t current_mitigation_level_valid;
> +	uint8_t current_mitigation_level;
> +	uint8_t requested_mitigation_level_valid;
> +	uint8_t requested_mitigation_level;
> +};
> +#define TMD_GET_MITIGATION_LEVEL_RESP_MSG_V01_MAX_MSG_LEN	15
> +extern struct qmi_elem_info tmd_get_mitigation_level_resp_msg_v01_ei[];
> +
> +struct tmd_register_notification_mitigation_level_req_msg_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_device;
> +};
> +#define TMD_REGISTER_NOTIFICATION_MITIGATION_LEVEL_REQ_MSG_V01_MAX_MSG_LEN	36
> +extern struct qmi_elem_info
> +		tmd_register_notification_mitigation_level_req_msg_v01_ei[];
> +
> +struct tmd_register_notification_mitigation_level_resp_msg_v01 {
> +	struct qmi_response_type_v01 resp;
> +};
> +#define TMD_REGISTER_NOTIFICATION_MITIGATION_LEVEL_RESP_MSG_V01_MAX_MSG_LEN	7
> +extern struct qmi_elem_info
> +	tmd_register_notification_mitigation_level_resp_msg_v01_ei[];
> +
> +struct tmd_deregister_notification_mitigation_level_req_msg_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_device;
> +};
> +#define TMD_DEREGISTER_NOTIFICATION_MITIGATION_LEVEL_REQ_MSG_V01_MAX_MSG_LEN	36
> +extern struct qmi_elem_info
> +	tmd_deregister_notification_mitigation_level_req_msg_v01_ei[];
> +
> +struct tmd_deregister_notification_mitigation_level_resp_msg_v01 {
> +	struct qmi_response_type_v01 resp;
> +};
> +#define TMD_DEREGISTER_NOTIFICATION_MITIGATION_LEVEL_RESP_MSG_V01_MAX_MSG_LEN	7
> +extern struct qmi_elem_info
> +	tmd_deregister_notification_mitigation_level_resp_msg_v01_ei[];
> +
> +struct tmd_mitigation_level_report_ind_msg_v01 {
> +	struct tmd_mitigation_dev_id_type_v01 mitigation_device;
> +	uint8_t current_mitigation_level;
> +};
> +#define TMD_MITIGATION_LEVEL_REPORT_IND_MSG_V01_MAX_MSG_LEN			40
> +extern struct qmi_elem_info tmd_mitigation_level_report_ind_msg_v01_ei[];
> +
> +#endif
> -- 
> 2.37.1
>
Bjorn Andersson Sept. 13, 2022, 7:10 p.m. UTC | #4
On Mon, Sep 12, 2022 at 02:20:48PM +0530, Bhupesh Sharma wrote:
> Add qcom,qmi-tmd-device and qcom,tmd-device yaml bindings.

Looks like a duplicate of $subject.

> 
> Qualcomm QMI based TMD cooling device(s) are used for various

What is "TMD" an abbreviation of?

> mitigations for remote subsystem(s) including remote processor
> mitigation, rail voltage restriction etc.
> 
> Each child node represents one remote subsystem and each child
> of this subsystem in-turn represents separate TMD cooling device.
> 
> Cc: daniel.lezcano@linaro.org
> Cc: rafael@kernel.org
> Cc: andersson@kernel.org
> Cc: robh@kernel.org
> Signed-off-by: Bhupesh Sharma <bhupesh.sharma@linaro.org>
> ---
>  .../bindings/thermal/qcom,qmi-tmd-device.yaml |  78 +++++++++++
>  .../bindings/thermal/qcom,tmd-device.yaml     | 122 ++++++++++++++++++
>  include/dt-bindings/thermal/qcom,tmd.h        |  14 ++
>  3 files changed, 214 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml
>  create mode 100644 Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
>  create mode 100644 include/dt-bindings/thermal/qcom,tmd.h
> 
> diff --git a/Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml b/Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml
> new file mode 100644
> index 000000000000..dfda5b611a93
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/qcom,qmi-tmd-device.yaml
> @@ -0,0 +1,78 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +
> +%YAML 1.2
> +---
> +$id: "http://devicetree.org/schemas/thermal/qcom,qmi-tmd-device.yaml#"
> +$schema: "http://devicetree.org/meta-schemas/core.yaml#"
> +
> +title: Qualcomm QMI based thermal mitigation (TMD) cooling devices.
> +
> +maintainers:
> +  - Bhupesh Sharma <bhupesh.sharma@linaro.org>
> +
> +description:
> +  Qualcomm QMI based TMD cooling device(s) are used for various
> +  mitigations for remote subsystem(s) including remote processor
> +  mitigation, rail voltage restriction etc.
> +
> +properties:
> +  $nodename:
> +    const: qmi-tmd-devices
> +
> +  compatible:
> +    items:
> +      - const: qcom,qmi-tmd-devices
> +
> +  modem0:
> +    $ref: /schemas/thermal/qcom,tmd-device.yaml#
> +
> +  adsp:
> +    $ref: /schemas/thermal/qcom,tmd-device.yaml#
> +
> +  cdsp:
> +    $ref: /schemas/thermal/qcom,tmd-device.yaml#
> +
> +  slpi:
> +    $ref: /schemas/thermal/qcom,tmd-device.yaml#
> +
> +required:
> +  - compatible
> +
> +unevaluatedProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/thermal/qcom,tmd.h>
> +    qmi-tmd-devices {

Looking at the implementation I see no relationship between the
individual instances (i.e. between the children of this node).

My suggestion is that you drop this top-level node and just list out
modem, adsp etc individually - which would mean that you can remove one
layer of indirection in the driver, as each instance would just need a
list of cooling-devices.

> +      compatible = "qcom,qmi-tmd-devices";
> +
> +      modem0 {

So you would move the compatible here.

> +        qcom,instance-id = <MODEM0_INSTANCE_ID>;
> +
> +        modem0_pa: tmd-device0 {
> +          label = "pa";
> +          #cooling-cells = <2>;
> +        };
> +
> +        modem0_proc: tmd-device1 {
> +          label = "modem";
> +          #cooling-cells = <2>;
> +        };
> +
> +        modem0_current: tmd-device2 {
> +          label = "modem_current";
> +          #cooling-cells = <2>;
> +        };
> +
> +        modem0_skin: tmd-device3 {
> +          label = "modem_skin";
> +          #cooling-cells = <2>;
> +        };
> +
> +        modem0_vdd: tmd-device4 {
> +          label = "cpuv_restriction_cold";
> +          #cooling-cells = <2>;
> +        };
> +      };
> +    };
> +...
> diff --git a/Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml b/Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
> new file mode 100644
> index 000000000000..38ac62f03376
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/thermal/qcom,tmd-device.yaml
> @@ -0,0 +1,122 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +
> +%YAML 1.2
> +---
> +$id: "http://devicetree.org/schemas/thermal/qcom,tmd-device.yaml#"
> +$schema: "http://devicetree.org/meta-schemas/core.yaml#"
> +

I see no reason for splitting this into a separate binding.

> +title: Qualcomm thermal mitigation (TMD) cooling devices
> +
> +maintainers:
> +  - Bhupesh Sharma <bhupesh.sharma@linaro.org>
> +
> +description:
> +  Qualcomm thermal mitigation (TMD) cooling devices. Each child node
> +  represents one remote subsystem and each child of this subsystem in-turn
> +  represents separate cooling devices.
> +
> +properties:
> +  $nodename:
> +    pattern: "^(modem|adsp|cdsp|slpi[0-9])?$"
> +
> +  qcom,instance-id:
> +    $ref: /schemas/types.yaml#/definitions/uint32
> +    description:
> +      Remote subsystem QMI server instance id to be used for communicating with QMI.
> +
> +patternProperties:
> +  "^tmd-device[0-9]?$":

So max 10 cooling devices per remote?

> +    type: object
> +    description:
> +      Subnodes indicating tmd cooling device of a specific category.
> +    properties:
> +      label:
> +        maxItems: 1
> +        description: |
> +          Remote subsystem device identifier. Acceptable device names -
> +          "pa" -> for pa cooling device,
> +          "cpuv_restriction_cold" -> for vdd restriction,
> +          "cx_vdd_limit" -> for vdd limit,
> +          "modem" -> for processor passive cooling device,
> +          "modem_current" -> for current limiting device,
> +          "modem_bw" ->  for bus bandwidth limiting device,
> +          "cpr_cold" -> for cpr restriction.

Afaict there are about 50 valid cooling devices listed in the driver.
Why limit this to these 7 here?

> +
> +      "#cooling-cells":
> +        const: 2
> +
> +    required:
> +      - label
> +      - "#cooling-cells"
> +
> +    additionalProperties: false
> +
> +required:
> +  - qcom,instance-id
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/thermal/qcom,tmd.h>
> +    modem0 {

As written here this example is incomplete, as these nodes can't live on
their own.

But this is actually what I propose above.

> +      qcom,instance-id = <MODEM0_INSTANCE_ID>;
> +
> +      modem0_pa: tmd-device0 {
> +        label = "pa";
> +        #cooling-cells = <2>;
> +      };
> +
> +      modem0_proc: tmd-device1 {
> +        label = "modem";
> +        #cooling-cells = <2>;
> +      };
> +
> +      modem0_current: tmd-device2 {
> +        label = "modem_current";
> +        #cooling-cells = <2>;
> +      };
> +
> +      modem0_skin: tmd-device3 {
> +        label = "modem_skin";
> +        #cooling-cells = <2>;
> +      };
> +
> +      modem0_vdd: tmd-device4 {
> +        label = "cpuv_restriction_cold";
> +        #cooling-cells = <2>;
> +      };
> +    };
> +
> +  - |
> +    #include <dt-bindings/thermal/qcom,tmd.h>
> +    adsp {
> +      qcom,instance-id = <ADSP_INSTANCE_ID>;
> +
> +      adsp_vdd: tmd-device1 {
> +        label = "cpuv_restriction_cold";
> +        #cooling-cells = <2>;
> +      };
> +    };
> +
> +  - |
> +    #include <dt-bindings/thermal/qcom,tmd.h>
> +    cdsp {
> +      qcom,instance-id = <CDSP_INSTANCE_ID>;
> +
> +      cdsp_vdd: tmd-device1 {
> +        label = "cpuv_restriction_cold";
> +        #cooling-cells = <2>;
> +      };
> +    };
> +
> +  - |
> +    #include <dt-bindings/thermal/qcom,tmd.h>
> +    slpi {
> +      qcom,instance-id = <SLPI_INSTANCE_ID>;
> +
> +      slpi_vdd: tmd-device1 {
> +        label = "cpuv_restriction_cold";
> +        #cooling-cells = <2>;
> +      };
> +    };
> diff --git a/include/dt-bindings/thermal/qcom,tmd.h b/include/dt-bindings/thermal/qcom,tmd.h
> new file mode 100644
> index 000000000000..5ede4422e04e
> --- /dev/null
> +++ b/include/dt-bindings/thermal/qcom,tmd.h

This is a quite generic name, how about qcom,qmi-cooling.h?

> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This header provides constants for the Qualcomm TMD instances.
> + */
> +
> +#ifndef _DT_BINDINGS_THERMAL_QCOM_TMD_H_
> +#define _DT_BINDINGS_THERMAL_QCOM_TMD_H_
> +
> +#define MODEM0_INSTANCE_ID	0x0
> +#define ADSP_INSTANCE_ID	0x1
> +#define CDSP_INSTANCE_ID	0x43
> +#define SLPI_INSTANCE_ID	0x53

QMI cooling isn't the only thing dealing with "instance id" and all of
them would deal with instances ids of type modem, adsp, cdsp, slpi etc.

As such I think these are too generic, how about

QMI_COOLING_ADSP etc?

Regards,
Bjorn

> +
> +#endif
> -- 
> 2.37.1
>