mbox series

[0/3] soundwire: qcom: add pm runtime support

Message ID 20220221104127.15670-1-srinivas.kandagatla@linaro.org
Headers show
Series soundwire: qcom: add pm runtime support | expand

Message

Srinivas Kandagatla Feb. 21, 2022, 10:41 a.m. UTC
This patchset adds pm runtime support to Qualcomm SounWire Controller using
SoundWire Clock Stop and Wake up using Headset events on supported instances and
a bus reset on instances that require full reset.


Tested it on SM8250 MTP and Dragon Board DB845c

--srini


Srinivas Kandagatla (3):
  soundwire: qcom: add runtime pm support
  dt-bindings: soundwire: qcom: document optional wake irq
  soundwire: qcom: add wake up interrupt support

 .../bindings/soundwire/qcom,sdw.txt           |   2 +-
 drivers/soundwire/qcom.c                      | 215 +++++++++++++++++-
 2 files changed, 215 insertions(+), 2 deletions(-)

Comments

Srinivasa Rao Mandadapu Feb. 21, 2022, 1:02 p.m. UTC | #1
Thanks Srini for Your patches!!!

I think runtime pm support in bolero codecs side still pending right?


On 2/21/2022 4:11 PM, Srinivas Kandagatla wrote:
> This patchset adds pm runtime support to Qualcomm SounWire Controller using
> SoundWire Clock Stop and Wake up using Headset events on supported instances and
> a bus reset on instances that require full reset.
>
>
> Tested it on SM8250 MTP and Dragon Board DB845c
>
> --srini
>
>
> Srinivas Kandagatla (3):
>    soundwire: qcom: add runtime pm support
>    dt-bindings: soundwire: qcom: document optional wake irq
>    soundwire: qcom: add wake up interrupt support
>
>   .../bindings/soundwire/qcom,sdw.txt           |   2 +-
>   drivers/soundwire/qcom.c                      | 215 +++++++++++++++++-
>   2 files changed, 215 insertions(+), 2 deletions(-)
>
Pierre-Louis Bossart Feb. 22, 2022, 7:15 p.m. UTC | #2
On 2/21/22 04:41, Srinivas Kandagatla wrote:
> Add support to runtime PM using SoundWire clock stop on supported instances
> and bus reset on instances that require full reset.

This commit message and code are a bit confusing, e.g. you have a
boolean state

ctrl->clk_stop_bus_reset = true;

Does this mean bus reset on exiting clock stop? or just that clock stop
is not supported and bus reset is required with complete re-enumeration.

It would be good to try and explain using SoundWire 1.x terminology what
actions are taken on resume.


> @@ -1267,6 +1305,7 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  	ctrl->bus.ops = &qcom_swrm_ops;
>  	ctrl->bus.port_ops = &qcom_swrm_port_ops;
>  	ctrl->bus.compute_params = &qcom_swrm_compute_params;
> +	ctrl->bus.clk_stop_timeout = 300;
>  
>  	ret = qcom_swrm_get_port_config(ctrl);
>  	if (ret)
> @@ -1319,6 +1358,21 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  		 (ctrl->version >> 24) & 0xff, (ctrl->version >> 16) & 0xff,
>  		 ctrl->version & 0xffff);
>  
> +	pm_runtime_set_autosuspend_delay(dev, 3000);
> +	pm_runtime_use_autosuspend(dev);
> +	pm_runtime_mark_last_busy(dev);
> +	pm_runtime_set_active(dev);
> +	pm_runtime_enable(dev);
> +
> +	/* Clk stop is not supported on WSA Soundwire masters */
> +	if (ctrl->version <= 0x01030000) {
> +		ctrl->clk_stop_bus_reset = true;
> +	} else {
> +		ctrl->reg_read(ctrl, SWRM_COMP_MASTER_ID, &val);
> +		if (val == MASTER_ID_WSA)
> +			ctrl->clk_stop_bus_reset = true;
> +	}

I think this means clock_stop_not_supported?

> +static int swrm_runtime_resume(struct device *dev)
> +{
> +	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(dev);
> +	int ret;
> +
> +	clk_prepare_enable(ctrl->hclk);
> +
> +	if (ctrl->clk_stop_bus_reset) {
> +		reinit_completion(&ctrl->enumeration);
> +		ctrl->reg_write(ctrl, SWRM_COMP_SW_RESET, 0x01);
> +		qcom_swrm_get_device_status(ctrl);

don't you need some sort of delay before checking the controller and
device status? The bus reset sequence takes 4096 bits, that's a non-zero
time.

> +		sdw_handle_slave_status(&ctrl->bus, ctrl->status);
> +		qcom_swrm_init(ctrl);
> +		wait_for_completion_timeout(&ctrl->enumeration,
> +					    msecs_to_jiffies(TIMEOUT_MS));
> +	} else {
> +		ctrl->reg_write(ctrl, SWRM_MCP_BUS_CTRL, SWRM_MCP_BUS_CLK_START);
> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CLEAR,
> +			SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET);
> +
> +		ctrl->intr_mask |= SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET;
> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_MASK_ADDR, ctrl->intr_mask);
> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CPU_EN, ctrl->intr_mask);
> +
> +		usleep_range(100, 105);
> +	}
> +
> +	if (!swrm_wait_for_frame_gen_enabled(ctrl))
> +		dev_err(ctrl->dev, "link failed to connect\n");
> +
> +	usleep_range(300, 305);
> +	ret = sdw_bus_exit_clk_stop(&ctrl->bus);
> +	if (ret < 0)
> +		dev_err(ctrl->dev, "bus failed to exit clock stop %d\n", ret);
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused swrm_runtime_suspend(struct device *dev)
> +{
> +	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(dev);
> +	int ret;
> +
> +	if (!ctrl->clk_stop_bus_reset) {
> +		/* Mask bus clash interrupt */
> +		ctrl->intr_mask &= ~SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET;
> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_MASK_ADDR, ctrl->intr_mask);
> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CPU_EN, ctrl->intr_mask);
> +	}
> +	/* Prepare slaves for clock stop */
> +	ret = sdw_bus_prep_clk_stop(&ctrl->bus);
> +	if (ret < 0) {

if a device has lost sync and reports -ENODATA, you want still want to
go ahead and not prevent the suspend operation from happening.

> +		dev_err(dev, "prepare clock stop failed %d", ret);
> +		return ret;
> +	}
> +
> +	ret = sdw_bus_clk_stop(&ctrl->bus);
> +	if (ret < 0 && ret != -ENODATA) {
> +		dev_err(dev, "bus clock stop failed %d", ret);
> +		return ret;
> +	}
> +
> +	clk_disable_unprepare(ctrl->hclk);
> +
> +	usleep_range(300, 305);
> +
> +	return 0;
> +}
> +
> +static const struct dev_pm_ops swrm_dev_pm_ops = {
> +	SET_RUNTIME_PM_OPS(swrm_runtime_suspend, swrm_runtime_resume, NULL)
> +};
> +
>  static const struct of_device_id qcom_swrm_of_match[] = {
>  	{ .compatible = "qcom,soundwire-v1.3.0", .data = &swrm_v1_3_data },
>  	{ .compatible = "qcom,soundwire-v1.5.1", .data = &swrm_v1_5_data },
> @@ -1359,6 +1506,7 @@ static struct platform_driver qcom_swrm_driver = {
>  	.driver = {
>  		.name	= "qcom-soundwire",
>  		.of_match_table = qcom_swrm_of_match,
> +		.pm = &swrm_dev_pm_ops,
>  	}
>  };
>  module_platform_driver(qcom_swrm_driver);
Pierre-Louis Bossart Feb. 22, 2022, 7:26 p.m. UTC | #3
> +static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
> +{
> +	struct qcom_swrm_ctrl *swrm = dev_id;
> +	int ret = IRQ_HANDLED;
> +	struct sdw_slave *slave;
> +
> +	clk_prepare_enable(swrm->hclk);
> +
> +	if (swrm->wake_irq > 0) {
> +		if (!irqd_irq_disabled(irq_get_irq_data(swrm->wake_irq)))
> +			disable_irq_nosync(swrm->wake_irq);
> +	}
> +
> +	/*
> +	 * resume all the slaves which must have potentially generated this
> +	 * interrupt, this should also wake the controller at the same time.
> +	 * this is much safer than waking controller directly that will deadlock!
> +	 */
There should be no difference if you first resume the controller and
then attached peripherals, or do a loop where you rely on the pm_runtime
framework.

The notion that there might be a dead-lock is surprising, you would need
to elaborate here.

> +	list_for_each_entry(slave, &swrm->bus.slaves, node) {
> +		ret = pm_runtime_get_sync(&slave->dev);

In my experience, you don't want to blur layers and take references on
the child devices from the parent device. I don't know how many times we
end-up with weird behavior.

we've done something similar on the Intel side but implemented in a less
directive manner.

ret = device_for_each_child(bus->dev, NULL, intel_resume_child_device);

static int intel_resume_child_device(struct device *dev, void *data)
{
[...]	
	ret = pm_request_resume(dev);
	if (ret < 0)
		dev_err(dev, "%s: pm_request_resume failed: %d\n", __func__, ret);

	return ret;
}


> +		if (ret < 0 && ret != -EACCES) {
> +			dev_err_ratelimited(swrm->dev,
> +					    "pm_runtime_get_sync failed in %s, ret %d\n",
> +					    __func__, ret);
> +			pm_runtime_put_noidle(&slave->dev);
> +			ret = IRQ_NONE;
> +			goto err;
> +		}
> +	}
> +
> +	list_for_each_entry(slave, &swrm->bus.slaves, node) {
> +		pm_runtime_mark_last_busy(&slave->dev);
> +		pm_runtime_put_autosuspend(&slave->dev);
> +	}
> +err:
> +	clk_disable_unprepare(swrm->hclk);
> +	return IRQ_HANDLED;
> +}
> +
Srinivas Kandagatla Feb. 22, 2022, 10:52 p.m. UTC | #4
On 22/02/2022 19:26, Pierre-Louis Bossart wrote:
> 
> 
> 
>> +static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
>> +{
>> +	struct qcom_swrm_ctrl *swrm = dev_id;
>> +	int ret = IRQ_HANDLED;
>> +	struct sdw_slave *slave;
>> +
>> +	clk_prepare_enable(swrm->hclk);
>> +
>> +	if (swrm->wake_irq > 0) {
>> +		if (!irqd_irq_disabled(irq_get_irq_data(swrm->wake_irq)))
>> +			disable_irq_nosync(swrm->wake_irq);
>> +	}
>> +
>> +	/*
>> +	 * resume all the slaves which must have potentially generated this
>> +	 * interrupt, this should also wake the controller at the same time.
>> +	 * this is much safer than waking controller directly that will deadlock!
>> +	 */
> There should be no difference if you first resume the controller and
> then attached peripherals, or do a loop where you rely on the pm_runtime
> framework.
> 
> The notion that there might be a dead-lock is surprising, you would need
> to elaborate here.Issue is, if wakeup interrupt resumes the controller first which can 
trigger an slave pending interrupt (ex: Button press event) in the 
middle of resume that will try to wake the slave device which in turn 
will try to wake parent in the middle of resume resulting in a dead lock.

This was the best way to avoid dead lock.

> 
>> +	list_for_each_entry(slave, &swrm->bus.slaves, node) {
>> +		ret = pm_runtime_get_sync(&slave->dev);
> 
> In my experience, you don't want to blur layers and take references on
> the child devices from the parent device. I don't know how many times we
> end-up with weird behavior.
> 
> we've done something similar on the Intel side but implemented in a less
> directive manner.
thanks, I can take some inspiration from that.


--srini
> 
> ret = device_for_each_child(bus->dev, NULL, intel_resume_child_device);
> 
> static int intel_resume_child_device(struct device *dev, void *data)
> {
> [...]	
> 	ret = pm_request_resume(dev);
> 	if (ret < 0)
> 		dev_err(dev, "%s: pm_request_resume failed: %d\n", __func__, ret);
> 
> 	return ret;
> }
> 
> 
>> +		if (ret < 0 && ret != -EACCES) {
>> +			dev_err_ratelimited(swrm->dev,
>> +					    "pm_runtime_get_sync failed in %s, ret %d\n",
>> +					    __func__, ret);
>> +			pm_runtime_put_noidle(&slave->dev);
>> +			ret = IRQ_NONE;
>> +			goto err;
>> +		}
>> +	}
>> +
>> +	list_for_each_entry(slave, &swrm->bus.slaves, node) {
>> +		pm_runtime_mark_last_busy(&slave->dev);
>> +		pm_runtime_put_autosuspend(&slave->dev);
>> +	}
>> +err:
>> +	clk_disable_unprepare(swrm->hclk);
>> +	return IRQ_HANDLED;
>> +}
>> +
>
Pierre-Louis Bossart Feb. 23, 2022, 12:31 a.m. UTC | #5
On 2/22/22 16:52, Srinivas Kandagatla wrote:
> 
> 
> On 22/02/2022 19:26, Pierre-Louis Bossart wrote:
>>
>>
>>
>>> +static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
>>> +{
>>> +    struct qcom_swrm_ctrl *swrm = dev_id;
>>> +    int ret = IRQ_HANDLED;
>>> +    struct sdw_slave *slave;
>>> +
>>> +    clk_prepare_enable(swrm->hclk);
>>> +
>>> +    if (swrm->wake_irq > 0) {
>>> +        if (!irqd_irq_disabled(irq_get_irq_data(swrm->wake_irq)))
>>> +            disable_irq_nosync(swrm->wake_irq);
>>> +    }
>>> +
>>> +    /*
>>> +     * resume all the slaves which must have potentially generated this
>>> +     * interrupt, this should also wake the controller at the same
>>> time.
>>> +     * this is much safer than waking controller directly that will
>>> deadlock!
>>> +     */
>> There should be no difference if you first resume the controller and
>> then attached peripherals, or do a loop where you rely on the pm_runtime
>> framework.
>>
>> The notion that there might be a dead-lock is surprising, you would need
>> to elaborate here.Issue is, if wakeup interrupt resumes the controller
>> first which can 
> trigger an slave pending interrupt (ex: Button press event) in the
> middle of resume that will try to wake the slave device which in turn
> will try to wake parent in the middle of resume resulting in a dead lock.
> 
> This was the best way to avoid dead lock.

Not following, sorry. if you use pm_runtime functions and it so happens
that the resume already started, then those routines would wait for the
resume to complete.

In other words, there can be multiple requests to resume, but only the
*first* request will trigger a transition and others will just increase
a refcount.

In addition, the pm_runtime framework guarantees that the peripheral
device can only start resuming when the parent controller device is
fully resumed.

While I am at it, one thing that kept us busy as well is the
relationship between system suspend and pm_runtime suspend. In the
generic case a system suspend will cause a pm_runtime resume before you
can actually start the system suspend, but you might be able to skip
this step. In the Intel case when the controller and its parent device
were suspended we had to pm_runtime resume everything because some
registers were no longer accessible.
Srinivas Kandagatla Feb. 23, 2022, 4:22 p.m. UTC | #6
On 23/02/2022 00:31, Pierre-Louis Bossart wrote:
> 
> 
> On 2/22/22 16:52, Srinivas Kandagatla wrote:
>>
>>
>> On 22/02/2022 19:26, Pierre-Louis Bossart wrote:
>>>
>>>
>>>
>>>> +static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
>>>> +{
>>>> +    struct qcom_swrm_ctrl *swrm = dev_id;
>>>> +    int ret = IRQ_HANDLED;
>>>> +    struct sdw_slave *slave;
>>>> +
>>>> +    clk_prepare_enable(swrm->hclk);
>>>> +
>>>> +    if (swrm->wake_irq > 0) {
>>>> +        if (!irqd_irq_disabled(irq_get_irq_data(swrm->wake_irq)))
>>>> +            disable_irq_nosync(swrm->wake_irq);
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * resume all the slaves which must have potentially generated this
>>>> +     * interrupt, this should also wake the controller at the same
>>>> time.
>>>> +     * this is much safer than waking controller directly that will
>>>> deadlock!
>>>> +     */
>>> There should be no difference if you first resume the controller and
>>> then attached peripherals, or do a loop where you rely on the pm_runtime
>>> framework.
>>>
>>> The notion that there might be a dead-lock is surprising, you would need
>>> to elaborate here.Issue is, if wakeup interrupt resumes the controller
>>> first which can
>> trigger an slave pending interrupt (ex: Button press event) in the
>> middle of resume that will try to wake the slave device which in turn
>> will try to wake parent in the middle of resume resulting in a dead lock.
>>
>> This was the best way to avoid dead lock.
> 
> Not following, sorry. if you use pm_runtime functions and it so happens
> that the resume already started, then those routines would wait for the
> resume to complete.
yes that is true,

TBH, I was trying to reproduce the issue since morning to collect some 
traces but no luck so far, I hit these issues pretty much rarely. Now 
code has changed since few months back am unable to reproduce this 
anymore. Or it might be just the state of code I had while writing this up.

But when I hit the issue, this is how it looks like:

1. IRQ Wake interrupt resume parent.

2. parent is in middle of resuming

3. Slave Pend interrupt in controller fired up

4. because of (3) child resume is requested and then the parent resume 
blocked on (2) to finish.

5. from (2) we also trying to resume child.

6. (5) is blocked on (4) to finish which is blocked on (2) to finish

we are dead locked. Only way for me to avoid dead lock was to wake the 
child up after IRQ wake interrupts.

here is the stack trace of blocked-tasks from sysrq

root@linaro-gnome:~# [  182.327220] sysrq: Show Blocked State
[  182.331063] task:irq/20-soundwir state:D stack:    0 pid:  445 ppid: 
     2 flags:0x00000008
[  182.339655] Call trace:
[  182.342176]  __switch_to+0x168/0x1b8
[  182.345864]  __schedule+0x2a8/0x880
[  182.349459]  schedule+0x54/0xf0
[  182.352700]  rpm_resume+0xc4/0x550
[  182.356211]  rpm_resume+0x348/0x550
[  182.359805]  rpm_resume+0x348/0x550
[  182.363400]  __pm_runtime_resume+0x48/0xb8
[  182.367616]  sdw_handle_slave_status+0x1f8/0xf80
[  182.372371]  qcom_swrm_irq_handler+0x5c4/0x6f0
[  182.376942]  irq_thread_fn+0x2c/0xa0
[  182.380626]  irq_thread+0x16c/0x288
[  182.384221]  kthread+0x11c/0x128
[  182.387549]  ret_from_fork+0x10/0x20
[  182.391231] task:irq/187-swr_wak state:D stack:    0 pid:  446 ppid: 
     2 flags:0x00000008
[  182.399819] Call trace:
[  182.402339]  __switch_to+0x168/0x1b8
[  182.406019]  __schedule+0x2a8/0x880
[  182.409614]  schedule+0x54/0xf0
[  182.412854]  rpm_resume+0xc4/0x550
[  182.416363]  rpm_resume+0x348/0x550
[  182.419957]  rpm_resume+0x348/0x550
[  182.423552]  __pm_runtime_resume+0x48/0xb8
[  182.427767]  swrm_runtime_resume+0x98/0x3d0
[  182.432079]  pm_generic_runtime_resume+0x2c/0x48
[  182.436832]  __rpm_callback+0x44/0x190
[  182.440693]  rpm_callback+0x6c/0x78
[  182.444289]  rpm_resume+0x2f0/0x550
[  182.447883]  __pm_runtime_resume+0x48/0xb8
[  182.452099]  qcom_swrm_wake_irq_handler+0x20/0x128
[  182.457033]  irq_thread_fn+0x2c/0xa0
[  182.460712]  irq_thread+0x16c/0x288
[  182.464306]  kthread+0x11c/0x128
[  182.467634]  ret_from_fork+0x10/0x20


As am unable to reproduce this issue anymore so I will remove the code 
dealing with slaves directly for now till we are able to really 
reproduce the issue.

> 
> In other words, there can be multiple requests to resume, but only the
> *first* request will trigger a transition and others will just increase
> a refcount.
> 
> In addition, the pm_runtime framework guarantees that the peripheral
> device can only start resuming when the parent controller device is
> fully resumed.
> 
> While I am at it, one thing that kept us busy as well is the
> relationship between system suspend and pm_runtime suspend. In the
> generic case a system suspend will cause a pm_runtime resume before you
> can actually start the system suspend, but you might be able to skip
> this step. In the Intel case when the controller and its parent device
> were suspended we had to pm_runtime resume everything because some
> registers were no longer accessible.
Interesting, thanks for the hints, will keep that in mind.

--srini
> 
>
Srinivas Kandagatla Feb. 23, 2022, 4:36 p.m. UTC | #7
On 22/02/2022 19:15, Pierre-Louis Bossart wrote:
> 
> 
> On 2/21/22 04:41, Srinivas Kandagatla wrote:
>> Add support to runtime PM using SoundWire clock stop on supported instances
>> and bus reset on instances that require full reset.
> 
> This commit message and code are a bit confusing, e.g. you have a
> boolean state
> 
> ctrl->clk_stop_bus_reset = true;
> 
> Does this mean bus reset on exiting clock stop? or just that clock stop
> is not supported and bus reset is required with complete re-enumeration.

WSA instances of SoundWire controllers do not support clk stop and a 
reset of ip is required with complete re-enumeration when a clock is cut 
off during suspend.

> 
> It would be good to try and explain using SoundWire 1.x terminology what
> actions are taken on resume.
> 
There are two cases here.

1> Controller Instance support ClockStop Mode0, we mostly use the 
generic core to do that except in resume path we make sure that we start 
the clock.

2> Controller Instances which that do not support ClockStop, we do soft 
reset of controller along with hard resetting slaves.

> 
>> @@ -1267,6 +1305,7 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>   	ctrl->bus.ops = &qcom_swrm_ops;
>>   	ctrl->bus.port_ops = &qcom_swrm_port_ops;
>>   	ctrl->bus.compute_params = &qcom_swrm_compute_params;
>> +	ctrl->bus.clk_stop_timeout = 300;
>>   
>>   	ret = qcom_swrm_get_port_config(ctrl);
>>   	if (ret)
>> @@ -1319,6 +1358,21 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>   		 (ctrl->version >> 24) & 0xff, (ctrl->version >> 16) & 0xff,
>>   		 ctrl->version & 0xffff);
>>   
>> +	pm_runtime_set_autosuspend_delay(dev, 3000);
>> +	pm_runtime_use_autosuspend(dev);
>> +	pm_runtime_mark_last_busy(dev);
>> +	pm_runtime_set_active(dev);
>> +	pm_runtime_enable(dev);
>> +
>> +	/* Clk stop is not supported on WSA Soundwire masters */
>> +	if (ctrl->version <= 0x01030000) {
>> +		ctrl->clk_stop_bus_reset = true;
>> +	} else {
>> +		ctrl->reg_read(ctrl, SWRM_COMP_MASTER_ID, &val);
>> +		if (val == MASTER_ID_WSA)
>> +			ctrl->clk_stop_bus_reset = true;
>> +	}
> 
> I think this means clock_stop_not_supported?

Yes It makes more sense to reword this to clock_stop_not_supported.


> 
>> +static int swrm_runtime_resume(struct device *dev)
>> +{
>> +	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +	clk_prepare_enable(ctrl->hclk);
>> +
>> +	if (ctrl->clk_stop_bus_reset) {
>> +		reinit_completion(&ctrl->enumeration);
>> +		ctrl->reg_write(ctrl, SWRM_COMP_SW_RESET, 0x01);
>> +		qcom_swrm_get_device_status(ctrl);
> 
> don't you need some sort of delay before checking the controller and
> device status? The bus reset sequence takes 4096 bits, that's a non-zero
> time.

This is soft reset not full Bus Reset as WSA slaves have hard reset pins 
that will be toggled as part of suspend-resume.


> 
>> +		sdw_handle_slave_status(&ctrl->bus, ctrl->status);
>> +		qcom_swrm_init(ctrl);
>> +		wait_for_completion_timeout(&ctrl->enumeration,
>> +					    msecs_to_jiffies(TIMEOUT_MS));
>> +	} else {
>> +		ctrl->reg_write(ctrl, SWRM_MCP_BUS_CTRL, SWRM_MCP_BUS_CLK_START);
>> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CLEAR,
>> +			SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET);
>> +
>> +		ctrl->intr_mask |= SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET;
>> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_MASK_ADDR, ctrl->intr_mask);
>> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CPU_EN, ctrl->intr_mask);
>> +
>> +		usleep_range(100, 105);
>> +	}
>> +
>> +	if (!swrm_wait_for_frame_gen_enabled(ctrl))
>> +		dev_err(ctrl->dev, "link failed to connect\n");
>> +
>> +	usleep_range(300, 305);
>> +	ret = sdw_bus_exit_clk_stop(&ctrl->bus);
>> +	if (ret < 0)
>> +		dev_err(ctrl->dev, "bus failed to exit clock stop %d\n", ret);
>> +
>> +	return 0;
>> +}
>> +
>> +static int __maybe_unused swrm_runtime_suspend(struct device *dev)
>> +{
>> +	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(dev);
>> +	int ret;
>> +
>> +	if (!ctrl->clk_stop_bus_reset) {
>> +		/* Mask bus clash interrupt */
>> +		ctrl->intr_mask &= ~SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET;
>> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_MASK_ADDR, ctrl->intr_mask);
>> +		ctrl->reg_write(ctrl, SWRM_INTERRUPT_CPU_EN, ctrl->intr_mask);
>> +	}
>> +	/* Prepare slaves for clock stop */
>> +	ret = sdw_bus_prep_clk_stop(&ctrl->bus);
>> +	if (ret < 0) {
> 
> if a device has lost sync and reports -ENODATA, you want still want to
> go ahead and not prevent the suspend operation from happening.

okay.


--srini
> 
>> +		dev_err(dev, "prepare clock stop failed %d", ret);
>> +		return ret;
>> +	}
>> +
>> +	ret = sdw_bus_clk_stop(&ctrl->bus);
>> +	if (ret < 0 && ret != -ENODATA) {
>> +		dev_err(dev, "bus clock stop failed %d", ret);
>> +		return ret;
>> +	}
>> +
>> +	clk_disable_unprepare(ctrl->hclk);
>> +
>> +	usleep_range(300, 305);
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct dev_pm_ops swrm_dev_pm_ops = {
>> +	SET_RUNTIME_PM_OPS(swrm_runtime_suspend, swrm_runtime_resume, NULL)
>> +};
>> +
>>   static const struct of_device_id qcom_swrm_of_match[] = {
>>   	{ .compatible = "qcom,soundwire-v1.3.0", .data = &swrm_v1_3_data },
>>   	{ .compatible = "qcom,soundwire-v1.5.1", .data = &swrm_v1_5_data },
>> @@ -1359,6 +1506,7 @@ static struct platform_driver qcom_swrm_driver = {
>>   	.driver = {
>>   		.name	= "qcom-soundwire",
>>   		.of_match_table = qcom_swrm_of_match,
>> +		.pm = &swrm_dev_pm_ops,
>>   	}
>>   };
>>   module_platform_driver(qcom_swrm_driver);
Pierre-Louis Bossart Feb. 23, 2022, 6:14 p.m. UTC | #8
>> Not following, sorry. if you use pm_runtime functions and it so happens
>> that the resume already started, then those routines would wait for the
>> resume to complete.
> yes that is true,
> 
> TBH, I was trying to reproduce the issue since morning to collect some
> traces but no luck so far, I hit these issues pretty much rarely. Now
> code has changed since few months back am unable to reproduce this
> anymore. Or it might be just the state of code I had while writing this up.
> 
> But when I hit the issue, this is how it looks like:
> 
> 1. IRQ Wake interrupt resume parent.
> 
> 2. parent is in middle of resuming
> 
> 3. Slave Pend interrupt in controller fired up
> 
> 4. because of (3) child resume is requested and then the parent resume
> blocked on (2) to finish.
> 
> 5. from (2) we also trying to resume child.
> 
> 6. (5) is blocked on (4) to finish which is blocked on (2) to finish
> 
> we are dead locked. Only way for me to avoid dead lock was to wake the
> child up after IRQ wake interrupts.

Maybe a red-herring, but we're seen cases like this where we called
pm_runtime_get_sync() while resuming, that didn't go so well. I would
look into the use of _no_pm routines if you are already trying to resume.
Pierre-Louis Bossart Feb. 23, 2022, 6:21 p.m. UTC | #9
> There are two cases here.
> 
> 1> Controller Instance support ClockStop Mode0, we mostly use the
> generic core to do that except in resume path we make sure that we start
> the clock.
> 
> 2> Controller Instances which that do not support ClockStop, we do soft
> reset of controller along with hard resetting slaves.

both are fine. we have similar cases defined in sdw_intel.h



>>> +static int swrm_runtime_resume(struct device *dev)
>>> +{
>>> +    struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(dev);
>>> +    int ret;
>>> +
>>> +    clk_prepare_enable(ctrl->hclk);
>>> +
>>> +    if (ctrl->clk_stop_bus_reset) {
>>> +        reinit_completion(&ctrl->enumeration);
>>> +        ctrl->reg_write(ctrl, SWRM_COMP_SW_RESET, 0x01);
>>> +        qcom_swrm_get_device_status(ctrl);
>>
>> don't you need some sort of delay before checking the controller and
>> device status? The bus reset sequence takes 4096 bits, that's a non-zero
>> time.
> 
> This is soft reset not full Bus Reset as WSA slaves have hard reset pins
> that will be toggled as part of suspend-resume.

Above you mentioned the peripherals go through a reset as well, in which
case they can only re-attach on the bus after 16 frames best case - they
need to observe a full cycle of the dynamic sync before changing status.

That's still a non-zero delay (0.3ms for a 48kHz frame rate)

>>
>>> +        sdw_handle_slave_status(&ctrl->bus, ctrl->status);
>>> +        qcom_swrm_init(ctrl);