mbox series

[v3,0/3] Introduce Protection Domain Restart (PDR) Helpers

Message ID 20191230050008.8143-1-sibis@codeaurora.org
Headers show
Series Introduce Protection Domain Restart (PDR) Helpers | expand

Message

Sibi Sankar Dec. 30, 2019, 5 a.m. UTC
Qualcomm SoCs (starting with MSM8998) allow for multiple protection
domains (PDs) to run on the same Q6 sub-system. This allows for
services like AVS AUDIO to have their own separate address space and
crash/recover without disrupting the other PDs running on the same Q6
ADSP. This patch series introduces pdr helper library and adds PD
tracking functionality for "avs/audio" allowing apr services to register
themselves asynchronously once the dependent PDs are up.

V3:
 * patches 2 and 3 remain unchanged.
 * reset servloc_addr/servreg_addr.
 * fixup the helpers to handle servloc_work/servreg_work asynchronously.
 * fixup useage of list_lock across traversals, insertions and deletions.
 * fixup the helpers to use a single lookup list.
 * skip waiting for response on ind_ack send.
 * introduce pdr_servreg_link_create to re-use existing qmi connection to
   servreg instances. This helps tracking PDs running on the same remote
   processor.
 * have a per node curr_state in pdr_list_node to preserve all state
   updates during indack_cb.
 * introduce additional servreg_service_state values to help the client
   distinguish between a fatal and non-fatal pdr_lookup errors.
 * re-order pdr_handle_release sequence.
 * fixup "!ind_msg->service_path returns true always" warning.
 * fixup comments.

V2:
 * fixup pd_status callback to return void.
 * return 0 from pdr_get_domain_list on success.
 * introduce status_lock to linearize the pd_status reported back
   to the clients.
 * use the correct service name length across various string operations
   performed.
 * service locator will now schedule the pending lookups registered
   when it comes up.
 * other minor cleanups that Bjorn suggested.
 * use pr_warn to indicate that the wait for service locator timed
   out.
 * add Bjorn/Srini's "Reviewed-by" for the dt-bindings.

To Do:
 * Add support for pd tracking in fastrpc driver.

Sibi Sankar (3):
  soc: qcom: Introduce Protection Domain Restart helpers
  dt-bindings: soc: qcom: apr: Add protection domain bindings
  soc: qcom: apr: Add avs/audio tracking functionality

 .../devicetree/bindings/soc/qcom/qcom,apr.txt |  59 ++
 drivers/soc/qcom/Kconfig                      |   6 +
 drivers/soc/qcom/Makefile                     |   1 +
 drivers/soc/qcom/apr.c                        | 100 ++-
 drivers/soc/qcom/pdr_interface.c              | 709 ++++++++++++++++++
 drivers/soc/qcom/pdr_internal.h               | 375 +++++++++
 include/linux/soc/qcom/apr.h                  |   1 +
 include/linux/soc/qcom/pdr.h                  | 109 +++
 include/linux/soc/qcom/qmi.h                  |   1 +
 9 files changed, 1350 insertions(+), 11 deletions(-)
 create mode 100644 drivers/soc/qcom/pdr_interface.c
 create mode 100644 drivers/soc/qcom/pdr_internal.h
 create mode 100644 include/linux/soc/qcom/pdr.h

Comments

Bjorn Andersson Jan. 2, 2020, 8:45 p.m. UTC | #1
On Sun 29 Dec 21:00 PST 2019, Sibi Sankar wrote:
[..]
> diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
[..]
> +static int servreg_locator_new_server(struct qmi_handle *qmi,
> +				      struct qmi_service *svc)
> +{
> +	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
> +					      servloc_client);
> +	struct pdr_service *pds, *tmp;
> +
> +	/* Create a Local client port for QMI communication */
> +	pdr->servloc_addr.sq_family = AF_QIPCRTR;
> +	pdr->servloc_addr.sq_node = svc->node;
> +	pdr->servloc_addr.sq_port = svc->port;
> +
> +	mutex_lock(&pdr->locator_lock);
> +	pdr->locator_available = true;
> +	mutex_unlock(&pdr->locator_lock);
> +
> +	/* Service pending lookup requests */
> +	mutex_lock(&pdr->list_lock);
> +	list_for_each_entry_safe(pds, tmp, &pdr->lookups, node) {

No need to make this _safe, as you're not modifying the list in the
loop.

> +		if (pds->need_servreg_lookup)
> +			schedule_work(&pdr->servloc_work);
> +	}
> +	mutex_unlock(&pdr->list_lock);
> +
> +	return 0;
> +}
[..]
> +static void pdr_servreg_link_create(struct pdr_handle *pdr,
> +				    struct pdr_service *pds)
> +{
> +	struct pdr_service *pds_iter, *tmp;
> +	bool link_exists = false;
> +
> +	/* Check if a QMI link to SERVREG instance already exists */
> +	mutex_lock(&pdr->list_lock);
> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {
> +		if (pds_iter->instance == pds->instance &&

Flip this condition around and continue if it's not a match, to save
indentation and to split the two expressions into two distinct checks.

> +		    strcmp(pds_iter->service_path, pds->service_path)) {

Isn't this just saying:
	if (pds_iter == pds)
		continue;

With the purpose of link_exists to be !empty(set(lookups) - pds) ? 

But if I read pdr_add_lookup() correctly it's possible that a client
could call pdr_add_lookup() more than once before pdr_servloc_work() is
scheduled, in which case "set(lookup) - pds" isn't empty and as such you
won't add the lookup?

> +			link_exists = true;
> +			pds->service_connected = pds_iter->service_connected;
> +			if (pds_iter->service_connected)
> +				pds->need_servreg_register = true;
> +			else
> +				pds->need_servreg_remove = true;
> +			queue_work(pdr->servreg_wq, &pdr->servreg_work);
> +			break;
> +		}
> +	}
[..]
> +static void pdr_servloc_work(struct work_struct *work)
> +{
> +	struct pdr_handle *pdr = container_of(work, struct pdr_handle,
> +					      servloc_work);
> +	struct pdr_service *pds, *tmp;
> +	int ret;
> +
> +	/* Bail out early if PD Mapper is not up */
> +	mutex_lock(&pdr->locator_lock);
> +	if (!pdr->locator_available) {
> +		mutex_unlock(&pdr->locator_lock);
> +		pr_warn("PDR: SERVICE LOCATOR service not available\n");
> +		return;
> +	}
> +	mutex_unlock(&pdr->locator_lock);
> +
> +	mutex_lock(&pdr->list_lock);
> +	list_for_each_entry_safe(pds, tmp, &pdr->lookups, node) {

As written right now you don't need _safe here, because in the only case
you're modifying the list you end up exiting the loop.

> +		if (!pds->need_servreg_lookup)
> +			continue;
> +
> +		pds->need_servreg_lookup = false;
> +		mutex_unlock(&pdr->list_lock);

You should probably just hold on to list_lock over this entire loop.

> +
> +		ret = pdr_locate_service(pdr, pds);
> +		if (ret < 0) {
> +			if (ret == -ENXIO)
> +				pds->state = SERVREG_LOCATOR_UNKNOWN_SERVICE;
> +			else if (ret == -EAGAIN)
> +				pds->state = SERVREG_LOCATOR_DB_UPDATED;

Isn't this something that we should recover from?

> +			else
> +				pds->state = SERVREG_LOCATOR_ERR;
> +
> +			pr_err("PDR: service lookup for %s failed: %d\n",
> +			       pds->service_name, ret);
> +
> +			/* Remove from lookup list */
> +			mutex_lock(&pdr->list_lock);
> +			list_del(&pds->node);

What should I do in my driver when this happens?

> +			mutex_unlock(&pdr->list_lock);
> +
> +			/* Notify Lookup failed */
> +			mutex_lock(&pdr->status_lock);
> +			pdr->status(pdr, pds);
> +			mutex_unlock(&pdr->status_lock);
> +			kfree(pds);
> +		} else {
> +			pdr_servreg_link_create(pdr, pds);
> +		}
> +
> +		return;

There might be more pds entries with need_servreg_lookup in the list,
shouldn't we allow this to continue?

This would though imply that you should hold onto the list_lock over the
entire loop, which I think looks fine.

> +	}
> +	mutex_unlock(&pdr->list_lock);
> +}
> +
> +/**
> + * pdr_add_lookup() - register a tracking request for a PD
> + * @pdr:		PDR client handle
> + * @service_name:	service name of the tracking request
> + * @service_path:	service path of the tracking request
> + *
> + * Registering a pdr lookup allows for tracking the life cycle of the PD.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int pdr_add_lookup(struct pdr_handle *pdr, const char *service_name,
> +		   const char *service_path)
> +{
> +	struct pdr_service *pds, *pds_iter, *tmp;
> +	int ret;
> +
> +	if (!service_name || strlen(service_name) > SERVREG_NAME_LENGTH ||
> +	    !service_path || strlen(service_path) > SERVREG_NAME_LENGTH)
> +		return -EINVAL;
> +
> +	pds = kzalloc(sizeof(*pds), GFP_KERNEL);
> +	if (!pds)
> +		return -ENOMEM;
> +
> +	pds->service = SERVREG_NOTIFIER_SERVICE;
> +	strcpy(pds->service_name, service_name);
> +	strcpy(pds->service_path, service_path);
> +	pds->need_servreg_lookup = true;
> +
> +	mutex_lock(&pdr->list_lock);
> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {

No _safe

> +		if (!strcmp(pds_iter->service_path, service_path)) {
> +			mutex_unlock(&pdr->list_lock);
> +			ret = -EALREADY;
> +			goto err;
> +		}
> +	}
> +
> +	list_add(&pds->node, &pdr->lookups);
> +	mutex_unlock(&pdr->list_lock);
> +
> +	schedule_work(&pdr->servloc_work);
> +
> +	return 0;
> +err:
> +	kfree(pds);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(pdr_add_lookup);
> +
> +/**
> + * pdr_restart_pd() - restart PD
> + * @pdr:		PDR client handle
> + * @service_path:	service path of restart request
> + *
> + * Restarts the PD tracked by the PDR client handle for a given service path.
> + *
> + * Return: 0 on success, negative errno on failure.
> + */
> +int pdr_restart_pd(struct pdr_handle *pdr, const char *service_path)
> +{
> +	struct servreg_restart_pd_req req;
> +	struct servreg_restart_pd_resp resp;
> +	struct pdr_service *pds = NULL, *pds_iter, *tmp;
> +	struct qmi_txn txn;
> +	int ret;
> +
> +	if (!service_path || strlen(service_path) > SERVREG_NAME_LENGTH)
> +		return -EINVAL;
> +
> +	mutex_lock(&pdr->list_lock);
> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {
> +		if (!pds_iter->service_connected)
> +			continue;
> +
> +		if (!strcmp(pds_iter->service_path, service_path)) {
> +			pds = pds_iter;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&pdr->list_lock);
> +
> +	if (!pds)

Given that you may only call pdr_restart_pd() on something created by
first calling pdr_add_lookup(), how about returning the struct
pdr_service from pdr_add_lookup() instead and then have the client pass
that as an argument to this function.

Most clients doesn't care about pdr_restart_pd() so they would only have
to IS_ERR(pdr_add_lookup()) anyways, and the ones that care can carry
the returned pointer.


Note that the struct pdr_service doesn't have to be defined in a way
that it's possible to dereference by clients.

> +		return -EINVAL;
> +
> +	/* Prepare req message */
> +	strcpy(req.service_path, pds->service_path);
> +
> +	ret = qmi_txn_init(&pdr->servreg_client, &txn,
> +			   servreg_restart_pd_resp_ei,
> +			   &resp);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = qmi_send_request(&pdr->servreg_client, &pdr->servreg_addr,
> +			       &txn, SERVREG_RESTART_PD_REQ,
> +			       SERVREG_RESTART_PD_REQ_MAX_LEN,
> +			       servreg_restart_pd_req_ei, &req);
> +	if (ret < 0) {
> +		qmi_txn_cancel(&txn);
> +		return ret;
> +	}
> +
> +	ret = qmi_txn_wait(&txn, 5 * HZ);
> +	if (ret < 0) {
> +		pr_err("PDR: %s PD restart txn wait failed: %d\n",
> +		       pds->service_path, ret);
> +		return ret;
> +	}
> +
> +	/* Check response if PDR is disabled */
> +	if (resp.resp.result == QMI_RESULT_FAILURE_V01 &&
> +	    resp.resp.error == QMI_ERR_DISABLED_V01) {
> +		pr_err("PDR: %s PD restart is disabled: 0x%x\n",
> +		       pds->service_path, resp.resp.error);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	/* Check the response for other error case*/
> +	if (resp.resp.result != QMI_RESULT_SUCCESS_V01) {
> +		pr_err("PDR: %s request for PD restart failed: 0x%x\n",
> +		       pds->service_path, resp.resp.error);
> +		return -EREMOTEIO;
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(pdr_restart_pd);
[..]
> +/**
> + * struct pdr_service - context to track lookups/restarts
> + * @service_name:		name of the service running on the PD
> + * @service_path:		service path of the PD
> + * @service_data_valid:		indicates if service_data field has valid data
> + * @service_data:		service data provided by servreg_locator service
> + * @need_servreg_lookup:	state flag for tracking servreg lookup requests
> + * @need_servreg_register:	state flag for tracking pending servreg register
> + * @need_servreg_remove:	state flag for tracking pending servreg remove
> + * @service_connected:		current state of servreg_notifier qmi service
> + * @state:			current state of PD
> + * @service:			servreg_notifer service type
> + * @instance:			instance id of the @service
> + * @priv:			handle for client's use
> + * @node:			list_head for house keeping
> + */
> +struct pdr_service {

This is primarily internal bookkeeping, how about not exposing it to the
clients? This would imply that status() would have to be called with
pdr_service->priv and pdr_service->state as arguments instead.

> +	char service_name[SERVREG_NAME_LENGTH + 1];
> +	char service_path[SERVREG_NAME_LENGTH + 1];
> +
> +	u8 service_data_valid;
> +	u32 service_data;
> +
> +	bool need_servreg_lookup;
> +	bool need_servreg_register;
> +	bool need_servreg_remove;
> +	bool service_connected;
> +	int state;
> +
> +	unsigned int instance;
> +	unsigned int service;
> +
> +	void *priv;
> +	struct list_head node;
> +};
> +
[..]
> +	void (*status)(struct pdr_handle *pdr, struct pdr_service *pds);
> +};

Regards,
Bjorn
Sibi Sankar Jan. 3, 2020, 12:36 p.m. UTC | #2
Hey Bjorn,

Thanks for taking time to review
the series.

On 2020-01-03 02:15, Bjorn Andersson wrote:
> On Sun 29 Dec 21:00 PST 2019, Sibi Sankar wrote:
> [..]
>> diff --git a/drivers/soc/qcom/pdr_interface.c 
>> b/drivers/soc/qcom/pdr_interface.c
> [..]
>> +static int servreg_locator_new_server(struct qmi_handle *qmi,
>> +				      struct qmi_service *svc)
>> +{
>> +	struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
>> +					      servloc_client);
>> +	struct pdr_service *pds, *tmp;
>> +
>> +	/* Create a Local client port for QMI communication */
>> +	pdr->servloc_addr.sq_family = AF_QIPCRTR;
>> +	pdr->servloc_addr.sq_node = svc->node;
>> +	pdr->servloc_addr.sq_port = svc->port;
>> +
>> +	mutex_lock(&pdr->locator_lock);
>> +	pdr->locator_available = true;
>> +	mutex_unlock(&pdr->locator_lock);
>> +
>> +	/* Service pending lookup requests */
>> +	mutex_lock(&pdr->list_lock);
>> +	list_for_each_entry_safe(pds, tmp, &pdr->lookups, node) {
> 
> No need to make this _safe, as you're not modifying the list in the
> loop.

sure I'll do that

> 
>> +		if (pds->need_servreg_lookup)
>> +			schedule_work(&pdr->servloc_work);
>> +	}
>> +	mutex_unlock(&pdr->list_lock);
>> +
>> +	return 0;
>> +}
> [..]
>> +static void pdr_servreg_link_create(struct pdr_handle *pdr,
>> +				    struct pdr_service *pds)
>> +{
>> +	struct pdr_service *pds_iter, *tmp;
>> +	bool link_exists = false;
>> +
>> +	/* Check if a QMI link to SERVREG instance already exists */
>> +	mutex_lock(&pdr->list_lock);
>> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {
>> +		if (pds_iter->instance == pds->instance &&
> 
> Flip this condition around and continue if it's not a match, to save
> indentation and to split the two expressions into two distinct checks.

sure I'll do that

> 
>> +		    strcmp(pds_iter->service_path, pds->service_path)) {
> 
> Isn't this just saying:
> 	if (pds_iter == pds)
> 		continue;
> 
> With the purpose of link_exists to be !empty(set(lookups) - pds) ?

More like:
!empty(set(lookups_with_same_instance) - pds)

servreg_link_create was added to re-use
an existing qmi_lookup i.e deal with
PDs running on the same remote processor.
This can be identified by looking for
a lookup with the same instance value
but with a different service path. We
still need to register the service_path
with the servreg service once its up.

> 
> But if I read pdr_add_lookup() correctly it's possible that a client
> could call pdr_add_lookup() more than once before pdr_servloc_work() is
> scheduled, in which case "set(lookup) - pds" isn't empty and as such 
> you
> won't add the lookup?

holding the lock over entire servloc_work
should handle that scenario? That way we
can ensure qmi_lookup is called atleast
once.

> 
>> +			link_exists = true;
>> +			pds->service_connected = pds_iter->service_connected;
>> +			if (pds_iter->service_connected)
>> +				pds->need_servreg_register = true;
>> +			else
>> +				pds->need_servreg_remove = true;
>> +			queue_work(pdr->servreg_wq, &pdr->servreg_work);
>> +			break;
>> +		}
>> +	}
> [..]
>> +static void pdr_servloc_work(struct work_struct *work)
>> +{
>> +	struct pdr_handle *pdr = container_of(work, struct pdr_handle,
>> +					      servloc_work);
>> +	struct pdr_service *pds, *tmp;
>> +	int ret;
>> +
>> +	/* Bail out early if PD Mapper is not up */
>> +	mutex_lock(&pdr->locator_lock);
>> +	if (!pdr->locator_available) {
>> +		mutex_unlock(&pdr->locator_lock);
>> +		pr_warn("PDR: SERVICE LOCATOR service not available\n");
>> +		return;
>> +	}
>> +	mutex_unlock(&pdr->locator_lock);
>> +
>> +	mutex_lock(&pdr->list_lock);
>> +	list_for_each_entry_safe(pds, tmp, &pdr->lookups, node) {
> 
> As written right now you don't need _safe here, because in the only 
> case
> you're modifying the list you end up exiting the loop.

sure

> 
>> +		if (!pds->need_servreg_lookup)
>> +			continue;
>> +
>> +		pds->need_servreg_lookup = false;
>> +		mutex_unlock(&pdr->list_lock);
> 
> You should probably just hold on to list_lock over this entire loop.
> 
>> +
>> +		ret = pdr_locate_service(pdr, pds);
>> +		if (ret < 0) {
>> +			if (ret == -ENXIO)
>> +				pds->state = SERVREG_LOCATOR_UNKNOWN_SERVICE;
>> +			else if (ret == -EAGAIN)
>> +				pds->state = SERVREG_LOCATOR_DB_UPDATED;
> 
> Isn't this something that we should recover from?

yes its a case where the json
referenced by pd-mapper has been
updated mid lookup. Calling lookup
again should ideally fix this but
we'll have to decide on the max
number of retries. I guess I can
simulate such a scenario with
a custom json file and pd-mapper
changes.

> 
>> +			else
>> +				pds->state = SERVREG_LOCATOR_ERR;
>> +
>> +			pr_err("PDR: service lookup for %s failed: %d\n",
>> +			       pds->service_name, ret);
>> +
>> +			/* Remove from lookup list */
>> +			mutex_lock(&pdr->list_lock);
>> +			list_del(&pds->node);
> 
> What should I do in my driver when this happens?

db_updated -> retry should fix
               this error

unknown_service -> lookup not found.

^^ With the way pd-mapper is implemented
its not really recoverable until pd-mapper
is restarted with different args.

locator_err -> not really recoverable

> 
>> +			mutex_unlock(&pdr->list_lock);
>> +
>> +			/* Notify Lookup failed */
>> +			mutex_lock(&pdr->status_lock);
>> +			pdr->status(pdr, pds);
>> +			mutex_unlock(&pdr->status_lock);
>> +			kfree(pds);
>> +		} else {
>> +			pdr_servreg_link_create(pdr, pds);
>> +		}
>> +
>> +		return;
> 
> There might be more pds entries with need_servreg_lookup in the list,
> shouldn't we allow this to continue?

but we've already scheduled a
number of workers to deal with
this.

> 
> This would though imply that you should hold onto the list_lock over 
> the
> entire loop, which I think looks fine.

sure

> 
>> +	}
>> +	mutex_unlock(&pdr->list_lock);
>> +}
>> +
>> +/**
>> + * pdr_add_lookup() - register a tracking request for a PD
>> + * @pdr:		PDR client handle
>> + * @service_name:	service name of the tracking request
>> + * @service_path:	service path of the tracking request
>> + *
>> + * Registering a pdr lookup allows for tracking the life cycle of the 
>> PD.
>> + *
>> + * Return: 0 on success, negative errno on failure.
>> + */
>> +int pdr_add_lookup(struct pdr_handle *pdr, const char *service_name,
>> +		   const char *service_path)
>> +{
>> +	struct pdr_service *pds, *pds_iter, *tmp;
>> +	int ret;
>> +
>> +	if (!service_name || strlen(service_name) > SERVREG_NAME_LENGTH ||
>> +	    !service_path || strlen(service_path) > SERVREG_NAME_LENGTH)
>> +		return -EINVAL;
>> +
>> +	pds = kzalloc(sizeof(*pds), GFP_KERNEL);
>> +	if (!pds)
>> +		return -ENOMEM;
>> +
>> +	pds->service = SERVREG_NOTIFIER_SERVICE;
>> +	strcpy(pds->service_name, service_name);
>> +	strcpy(pds->service_path, service_path);
>> +	pds->need_servreg_lookup = true;
>> +
>> +	mutex_lock(&pdr->list_lock);
>> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {
> 
> No _safe

Thanks will update

> 
>> +		if (!strcmp(pds_iter->service_path, service_path)) {
>> +			mutex_unlock(&pdr->list_lock);
>> +			ret = -EALREADY;
>> +			goto err;
>> +		}
>> +	}
>> +
>> +	list_add(&pds->node, &pdr->lookups);
>> +	mutex_unlock(&pdr->list_lock);
>> +
>> +	schedule_work(&pdr->servloc_work);
>> +
>> +	return 0;
>> +err:
>> +	kfree(pds);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(pdr_add_lookup);
>> +
>> +/**
>> + * pdr_restart_pd() - restart PD
>> + * @pdr:		PDR client handle
>> + * @service_path:	service path of restart request
>> + *
>> + * Restarts the PD tracked by the PDR client handle for a given 
>> service path.
>> + *
>> + * Return: 0 on success, negative errno on failure.
>> + */
>> +int pdr_restart_pd(struct pdr_handle *pdr, const char *service_path)
>> +{
>> +	struct servreg_restart_pd_req req;
>> +	struct servreg_restart_pd_resp resp;
>> +	struct pdr_service *pds = NULL, *pds_iter, *tmp;
>> +	struct qmi_txn txn;
>> +	int ret;
>> +
>> +	if (!service_path || strlen(service_path) > SERVREG_NAME_LENGTH)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&pdr->list_lock);
>> +	list_for_each_entry_safe(pds_iter, tmp, &pdr->lookups, node) {
>> +		if (!pds_iter->service_connected)
>> +			continue;
>> +
>> +		if (!strcmp(pds_iter->service_path, service_path)) {
>> +			pds = pds_iter;
>> +			break;
>> +		}
>> +	}
>> +	mutex_unlock(&pdr->list_lock);
>> +
>> +	if (!pds)
> 
> Given that you may only call pdr_restart_pd() on something created by
> first calling pdr_add_lookup(), how about returning the struct
> pdr_service from pdr_add_lookup() instead and then have the client pass
> that as an argument to this function.
> 
> Most clients doesn't care about pdr_restart_pd() so they would only 
> have
> to IS_ERR(pdr_add_lookup()) anyways, and the ones that care can carry
> the returned pointer.
> 
> 
> Note that the struct pdr_service doesn't have to be defined in a way
> that it's possible to dereference by clients.

sure will update the design in the
next re-spin.

> 
>> +		return -EINVAL;
>> +
>> +	/* Prepare req message */
>> +	strcpy(req.service_path, pds->service_path);
>> +
>> +	ret = qmi_txn_init(&pdr->servreg_client, &txn,
>> +			   servreg_restart_pd_resp_ei,
>> +			   &resp);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = qmi_send_request(&pdr->servreg_client, &pdr->servreg_addr,
>> +			       &txn, SERVREG_RESTART_PD_REQ,
>> +			       SERVREG_RESTART_PD_REQ_MAX_LEN,
>> +			       servreg_restart_pd_req_ei, &req);
>> +	if (ret < 0) {
>> +		qmi_txn_cancel(&txn);
>> +		return ret;
>> +	}
>> +
>> +	ret = qmi_txn_wait(&txn, 5 * HZ);
>> +	if (ret < 0) {
>> +		pr_err("PDR: %s PD restart txn wait failed: %d\n",
>> +		       pds->service_path, ret);
>> +		return ret;
>> +	}
>> +
>> +	/* Check response if PDR is disabled */
>> +	if (resp.resp.result == QMI_RESULT_FAILURE_V01 &&
>> +	    resp.resp.error == QMI_ERR_DISABLED_V01) {
>> +		pr_err("PDR: %s PD restart is disabled: 0x%x\n",
>> +		       pds->service_path, resp.resp.error);
>> +		return -EOPNOTSUPP;
>> +	}
>> +
>> +	/* Check the response for other error case*/
>> +	if (resp.resp.result != QMI_RESULT_SUCCESS_V01) {
>> +		pr_err("PDR: %s request for PD restart failed: 0x%x\n",
>> +		       pds->service_path, resp.resp.error);
>> +		return -EREMOTEIO;
>> +	}
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(pdr_restart_pd);
> [..]
>> +/**
>> + * struct pdr_service - context to track lookups/restarts
>> + * @service_name:		name of the service running on the PD
>> + * @service_path:		service path of the PD
>> + * @service_data_valid:		indicates if service_data field has valid 
>> data
>> + * @service_data:		service data provided by servreg_locator service
>> + * @need_servreg_lookup:	state flag for tracking servreg lookup 
>> requests
>> + * @need_servreg_register:	state flag for tracking pending servreg 
>> register
>> + * @need_servreg_remove:	state flag for tracking pending servreg 
>> remove
>> + * @service_connected:		current state of servreg_notifier qmi service
>> + * @state:			current state of PD
>> + * @service:			servreg_notifer service type
>> + * @instance:			instance id of the @service
>> + * @priv:			handle for client's use
>> + * @node:			list_head for house keeping
>> + */
>> +struct pdr_service {
> 
> This is primarily internal bookkeeping, how about not exposing it to 
> the
> clients? This would imply that status() would have to be called with
> pdr_service->priv and pdr_service->state as arguments instead.

sure will update the design in the
next re-spin.

> 
>> +	char service_name[SERVREG_NAME_LENGTH + 1];
>> +	char service_path[SERVREG_NAME_LENGTH + 1];
>> +
>> +	u8 service_data_valid;
>> +	u32 service_data;
>> +
>> +	bool need_servreg_lookup;
>> +	bool need_servreg_register;
>> +	bool need_servreg_remove;
>> +	bool service_connected;
>> +	int state;
>> +
>> +	unsigned int instance;
>> +	unsigned int service;
>> +
>> +	void *priv;
>> +	struct list_head node;
>> +};
>> +
> [..]
>> +	void (*status)(struct pdr_handle *pdr, struct pdr_service *pds);
>> +};
> 
> Regards,
> Bjorn
Rob Herring Jan. 31, 2020, 2:34 p.m. UTC | #3
On Mon, Dec 30, 2019 at 10:30:07AM +0530, Sibi Sankar wrote:
> Add optional "qcom,protection-domain" bindings for APR services. This
> helps to capture the dependencies between APR services and the PD on
> which each apr service run.

This is meaningless to me...

> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
> ---
>  .../devicetree/bindings/soc/qcom/qcom,apr.txt | 59 +++++++++++++++++++
>  1 file changed, 59 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
> index db501269f47b8..f87c0b2a48de4 100644
> --- a/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
> @@ -45,6 +45,12 @@ by the individual bindings for the specific service
>  			12 - Ultrasound stream manager.
>  			13 - Listen stream manager.
>  
> +- qcom,protection-domain
> +	Usage: optional
> +	Value type: <stringlist>
> +	Definition: Must list the protection domain service name and path
> +		    that the particular apr service has a dependency on.

How many strings can there be and can you enumerate the possible 
strings?

> +
>  = EXAMPLE
>  The following example represents a QDSP based sound card on a MSM8996 device
>  which uses apr as communication between Apps and QDSP.
> @@ -82,3 +88,56 @@ which uses apr as communication between Apps and QDSP.
>  			...
>  		};
>  	};
> +
> += EXAMPLE 2
> +The following example represents a QDSP based sound card on SDM845 device.
> +Here the apr services are dependent on "avs/audio" service running on AUDIO
> +Protection Domain hosted on ADSP remote processor.
> +
> +	apr {
> +		compatible = "qcom,apr-v2";
> +		qcom,glink-channels = "apr_audio_svc";
> +		qcom,apr-domain = <APR_DOMAIN_ADSP>;
> +
> +		q6core {
> +			compatible = "qcom,q6core";
> +			reg = <APR_SVC_ADSP_CORE>;
> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
> +		};
> +
> +		q6afe: q6afe {
> +			compatible = "qcom,q6afe";
> +			reg = <APR_SVC_AFE>;
> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
> +			q6afedai: dais {
> +				compatible = "qcom,q6afe-dais";
> +				#sound-dai-cells = <1>;
> +
> +				qi2s@22 {
> +					reg = <22>;
> +					qcom,sd-lines = <3>;
> +				};
> +			};
> +		};
> +
> +		q6asm: q6asm {
> +			compatible = "qcom,q6asm";
> +			reg = <APR_SVC_ASM>;
> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
> +			q6asmdai: dais {
> +				compatible = "qcom,q6asm-dais";
> +				#sound-dai-cells = <1>;
> +				iommus = <&apps_smmu 0x1821 0x0>;
> +			};
> +		};
> +
> +		q6adm: q6adm {
> +			compatible = "qcom,q6adm";
> +			reg = <APR_SVC_ADM>;
> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";

Perhaps an example where not everything is the same. The example shows 
me this isn't needed in DT.

> +			q6routing: routing {
> +				compatible = "qcom,q6adm-routing";
> +				#sound-dai-cells = <0>;
> +			};
> +		};
> +	};
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
Sibi Sankar Feb. 1, 2020, 1:44 p.m. UTC | #4
Hey Rob,
Thanks for the review!

On 1/31/20 8:04 PM, Rob Herring wrote:
> On Mon, Dec 30, 2019 at 10:30:07AM +0530, Sibi Sankar wrote:
>> Add optional "qcom,protection-domain" bindings for APR services. This
>> helps to capture the dependencies between APR services and the PD on
>> which each apr service run.
> 
> This is meaningless to me...

Qualcomm SoCs (starting with MSM8998) allow for multiple protection
domains (PDs) to run on the same Q6 sub-system. This allows for
services like AVS AUDIO to have their own separate address space and
crash/recover without disrupting the other PDs running on the same Q6
ADSP. Add "qcom,protection-domain" bindings to capture the dependencies
between the APR service and the PD on which the apr service runs.

Is ^^ better?

> 
>> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
>> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
>> Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
>> ---
>>   .../devicetree/bindings/soc/qcom/qcom,apr.txt | 59 +++++++++++++++++++
>>   1 file changed, 59 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
>> index db501269f47b8..f87c0b2a48de4 100644
>> --- a/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
>> +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,apr.txt
>> @@ -45,6 +45,12 @@ by the individual bindings for the specific service
>>   			12 - Ultrasound stream manager.
>>   			13 - Listen stream manager.
>>   
>> +- qcom,protection-domain
>> +	Usage: optional
>> +	Value type: <stringlist>
>> +	Definition: Must list the protection domain service name and path
>> +		    that the particular apr service has a dependency on.
> 
> How many strings can there be and can you enumerate the possible
> strings?

https://lore.kernel.org/lkml/a19623d4-ab33-d87e-5925-d0411d7479dd@codeaurora.org/

Like I explained in ^^ avs/audio is the only service
that the apr device depends on and is known to run only
in "msm/adsp/audio_pd" service path.

However there are other service:service_path pairs
which will get documented when I add support for more
clients like fastrpc and ath10k.

> 
>> +
>>   = EXAMPLE
>>   The following example represents a QDSP based sound card on a MSM8996 device
>>   which uses apr as communication between Apps and QDSP.
>> @@ -82,3 +88,56 @@ which uses apr as communication between Apps and QDSP.
>>   			...
>>   		};
>>   	};
>> +
>> += EXAMPLE 2
>> +The following example represents a QDSP based sound card on SDM845 device.
>> +Here the apr services are dependent on "avs/audio" service running on AUDIO
>> +Protection Domain hosted on ADSP remote processor.
>> +
>> +	apr {
>> +		compatible = "qcom,apr-v2";
>> +		qcom,glink-channels = "apr_audio_svc";
>> +		qcom,apr-domain = <APR_DOMAIN_ADSP>;
>> +
>> +		q6core {
>> +			compatible = "qcom,q6core";
>> +			reg = <APR_SVC_ADSP_CORE>;
>> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
>> +		};
>> +
>> +		q6afe: q6afe {
>> +			compatible = "qcom,q6afe";
>> +			reg = <APR_SVC_AFE>;
>> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
>> +			q6afedai: dais {
>> +				compatible = "qcom,q6afe-dais";
>> +				#sound-dai-cells = <1>;
>> +
>> +				qi2s@22 {
>> +					reg = <22>;
>> +					qcom,sd-lines = <3>;
>> +				};
>> +			};
>> +		};
>> +
>> +		q6asm: q6asm {
>> +			compatible = "qcom,q6asm";
>> +			reg = <APR_SVC_ASM>;
>> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
>> +			q6asmdai: dais {
>> +				compatible = "qcom,q6asm-dais";
>> +				#sound-dai-cells = <1>;
>> +				iommus = <&apps_smmu 0x1821 0x0>;
>> +			};
>> +		};
>> +
>> +		q6adm: q6adm {
>> +			compatible = "qcom,q6adm";
>> +			reg = <APR_SVC_ADM>;
>> +			qcom,protection-domain = "avs/audio", "msm/adsp/audio_pd";
> 
> Perhaps an example where not everything is the same. The example shows
> me this isn't needed in DT.

yes will update the example.

> 
>> +			q6routing: routing {
>> +				compatible = "qcom,q6adm-routing";
>> +				#sound-dai-cells = <0>;
>> +			};
>> +		};
>> +	};
>> -- 
>> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
>> a Linux Foundation Collaborative Project
>>