diff mbox series

[6/6] soundwire: qcom: do not probe devices before bus/link init

Message ID 20230420101617.142225-7-krzysztof.kozlowski@linaro.org
State New
Headers show
Series ASoC/soundwire: qcom: correctly probe devices after link init | expand

Commit Message

Krzysztof Kozlowski April 20, 2023, 10:16 a.m. UTC
Soundwire devices are supposed to be kept in reset state (powered off)
till their probe() or component bind() callbacks.  However if they are
already powered on, then they might enumerate before the master
initializes bus in qcom_swrm_init() leading to occasional errors like:

  qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
  wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
  wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
  qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow

The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
1. request_threaded_irq()
2. sdw_bus_master_add() - which will cause probe() and component bind()
   of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
   might already start accessing their registers.
3. qcom_swrm_init() - which initializes the link/bus and enables
   interrupts.

Any access to device registers at (2) above, will fail because link/bus
is not yet initialized.

However the fix is not as simple as moving qcom_swrm_init() before
sdw_bus_master_add(), because this will cause early interrupt of new
slave attached.  The interrupt handler expects bus master (ctrl->bus.md)
to be allocated, so this would lead to NULL pointer exception.

Rework the init sequence and change the interrupt handler.  The correct
sequence fixing accessing device registers before link init is now:
1. qcom_swrm_init()
2. request_threaded_irq()
3. sdw_bus_master_add()
which still might cause early interrupts, if Soundwire devices are not
in powered off state before their probe.  This early interrupt issue is
fixed by checking if bus master (ctrl->bus.md) was allocated and if not,
scheduling delayed work for enumerating the slave device.  Since we
actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag
from the interrupt, because it is not really valid and driver should use
flags provided by DTS.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

---

Change context depends on:
https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.org
https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@linaro.org/

Cc: Patrick Lai <quic_plai@quicinc.com>
---
 drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++--------
 1 file changed, 72 insertions(+), 17 deletions(-)

Comments

Pierre-Louis Bossart April 20, 2023, 9:37 p.m. UTC | #1
On 4/20/23 05:16, Krzysztof Kozlowski wrote:
> Soundwire devices are supposed to be kept in reset state (powered off)
> till their probe() or component bind() callbacks.  However if they are
> already powered on, then they might enumerate before the master
> initializes bus in qcom_swrm_init() leading to occasional errors like:

The problem statement is really hard to follow.

The peripheral can only be enumerated AFTER
a) the manager starts the bus clock and transmitting PING frames
b) the peripheral detects the sync words for 16 frames in a row.
c) the peripheral reports as Attached in the Device0 slot

That sequence holds whether the manager does the enumeration manually or
relies on hardware-assisted autoenumeration. This is what the spec requires.

So why can't the bus clock start be controlled by the manager driver,
and started once all required initializations are done?

I mean, there's got to be some sort of parent-child hierarchy with
manager first, peripheral(s) second, I don't get how these steps could
be inverted or race.

>   qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
>   qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
> 
> The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
> 1. request_threaded_irq()
> 2. sdw_bus_master_add() - which will cause probe() and component bind()
>    of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
>    might already start accessing their registers.

not if the bus clock hasn't started...

> 3. qcom_swrm_init() - which initializes the link/bus and enables
>    interrupts.

if you can move the clock start in 3) then problem solved. Why can't
this be done?

> Any access to device registers at (2) above, will fail because link/bus
> is not yet initialized.
> 
> However the fix is not as simple as moving qcom_swrm_init() before
> sdw_bus_master_add(), because this will cause early interrupt of new
> slave attached.  The interrupt handler expects bus master (ctrl->bus.md)
> to be allocated, so this would lead to NULL pointer exception.
> 
> Rework the init sequence and change the interrupt handler.  The correct
> sequence fixing accessing device registers before link init is now:
> 1. qcom_swrm_init()
> 2. request_threaded_irq()
> 3. sdw_bus_master_add()
> which still might cause early interrupts, if Soundwire devices are not
> in powered off state before their probe.  This early interrupt issue is

You'd need to clarify in which step the bus clock starts. In general,
you want to clock started last.

> fixed by checking if bus master (ctrl->bus.md) was allocated and if not,
> scheduling delayed work for enumerating the slave device.  Since we
> actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag
> from the interrupt, because it is not really valid and driver should use
> flags provided by DTS.
> 
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> 
> ---
> 
> Change context depends on:
> https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.org
> https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@linaro.org/
> 
> Cc: Patrick Lai <quic_plai@quicinc.com>
> ---
>  drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 72 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
> index 679990dc3cc4..802d939ce7aa 100644
> --- a/drivers/soundwire/qcom.c
> +++ b/drivers/soundwire/qcom.c
> @@ -19,6 +19,7 @@
>  #include <linux/slimbus.h>
>  #include <linux/soundwire/sdw.h>
>  #include <linux/soundwire/sdw_registers.h>
> +#include <linux/workqueue.h>
>  #include <sound/pcm_params.h>
>  #include <sound/soc.h>
>  #include "bus.h"
> @@ -187,6 +188,7 @@ struct qcom_swrm_ctrl {
>  #endif
>  	struct completion broadcast;
>  	struct completion enumeration;
> +	struct delayed_work new_slave_work;
>  	/* Port alloc/free lock */
>  	struct mutex port_lock;
>  	struct clk *hclk;
> @@ -606,6 +608,37 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus)
>  	return 0;
>  }
>  
> +static void qcom_swrm_new_slave(struct work_struct *work)
> +{
> +	struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl,
> +						   new_slave_work.work);
> +
> +	/*
> +	 * All Soundwire slave deviecs are expected to be in reset state (powered down)
> +	 * during sdw_bus_master_add().  The slave device should be brougth

typo/grammar: brought out of reset

> +	 * from reset by its probe() or bind() function, as a result of
> +	 * sdw_bus_master_add().
> +	 * Add a simple check to avoid NULL pointer except on early interrupts.
> +	 * Note that if this condition happens, the slave device will not be
> +	 * enumerated. Its driver should be fixed.

???

The codec driver is NEVER involved in enumeration.

The only thing a codec driver should do is provide a callback to be
notified of a status change for additional initialization, but the
enumeration can be done even in the absence of a codec driver.

The proof in the pudding is that you can 'blacklist' a codec driver and
bind it later, after the hardware is enumerated. You can even unbind a
codec driver and nothing bad will happen (we hardened that sequence last
year).

probe != enumeration != initialization for SoundWire codecs.

Probe and enumeration can happen in any order
Initialization can only happen after both probe and enumeration happened.

> +	 *
> +	 * smp_load_acquire() paired with sdw_master_device_add().
> +	 */
> +	if (!smp_load_acquire(&ctrl->bus.md)) {
> +		dev_err(ctrl->dev,
> +			"Got unexpected, early interrupt, device will not be enumerated\n");
> +		return;
> +	}
> +
> +	clk_prepare_enable(ctrl->hclk);
> +
> +	qcom_swrm_get_device_status(ctrl);
> +	qcom_swrm_enumerate(&ctrl->bus);
> +	sdw_handle_slave_status(&ctrl->bus, ctrl->status);
> +
> +	clk_disable_unprepare(ctrl->hclk);
> +};
> +
>  static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
>  {
>  	struct qcom_swrm_ctrl *ctrl = dev_id;
> @@ -668,9 +701,17 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id)
>  					dev_dbg(ctrl->dev, "Slave status not changed %x\n",
>  						slave_status);
>  				} else {
> -					qcom_swrm_get_device_status(ctrl);
> -					qcom_swrm_enumerate(&ctrl->bus);
> -					sdw_handle_slave_status(&ctrl->bus, ctrl->status);
> +					unsigned long delay = 0;
> +
> +					/*
> +					 * See qcom_swrm_new_slave() for
> +					 * explanation. smp_load_acquire() paired
> +					 * with sdw_master_device_add().
> +					 */
> +					if (!smp_load_acquire(&ctrl->bus.md))
> +						delay = 10;
> +					schedule_delayed_work(&ctrl->new_slave_work,
> +							      delay);
>  				}
>  				break;
>  			case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
> @@ -780,6 +821,7 @@ static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
>  
>  	ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK;
>  	/* Mask soundwire interrupts */
> +
>  	if (ctrl->version < SWRM_VERSION_2_0_0)
>  		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
>  				SWRM_INTERRUPT_STATUS_RMSK);
> @@ -1485,6 +1527,7 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  	mutex_init(&ctrl->port_lock);
>  	init_completion(&ctrl->broadcast);
>  	init_completion(&ctrl->enumeration);
> +	INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
>  
>  	ctrl->bus.ops = &qcom_swrm_ops;
>  	ctrl->bus.port_ops = &qcom_swrm_port_ops;
> @@ -1514,9 +1557,10 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  
>  	ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
>  
> +	qcom_swrm_init(ctrl);
> +
>  	ret = devm_request_threaded_irq(dev, ctrl->irq, NULL,
>  					qcom_swrm_irq_handler,
> -					IRQF_TRIGGER_RISING |
>  					IRQF_ONESHOT,
>  					"soundwire", ctrl);
>  	if (ret) {
> @@ -1524,18 +1568,6 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  		goto err_clk;
>  	}
>  
> -	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
> -	if (ctrl->wake_irq > 0) {
> -		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
> -						qcom_swrm_wake_irq_handler,
> -						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
> -						"swr_wake_irq", ctrl);
> -		if (ret) {
> -			dev_err(dev, "Failed to request soundwire wake irq\n");
> -			goto err_init;
> -		}
> -	}
> -
>  	pm_runtime_set_autosuspend_delay(dev, 3000);
>  	pm_runtime_use_autosuspend(dev);
>  	pm_runtime_mark_last_busy(dev);
> @@ -1549,7 +1581,18 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>  		goto err_clk;
>  	}
>  
> -	qcom_swrm_init(ctrl);
> +	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
> +	if (ctrl->wake_irq > 0) {
> +		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
> +						qcom_swrm_wake_irq_handler,
> +						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
> +						"swr_wake_irq", ctrl);
> +		if (ret) {
> +			dev_err(dev, "Failed to request soundwire wake irq\n");
> +			goto err_init;
> +		}
> +	}
> +
>  	wait_for_completion_timeout(&ctrl->enumeration,
>  				    msecs_to_jiffies(TIMEOUT_MS));
>  	ret = qcom_swrm_register_dais(ctrl);
> @@ -1589,6 +1632,18 @@ static int qcom_swrm_remove(struct platform_device *pdev)
>  {
>  	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
>  
> +	/*
> +	 * Mask interrupts to be sure no delayed work can be scheduler after

typo/grammar: scheduled

> +	 * removing Soundwire bus master.
> +	 */
> +	if (ctrl->version < SWRM_VERSION_2_0_0)
> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
> +				0);
> +	if (ctrl->mmio)
> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
> +				0);
> +
> +	cancel_delayed_work_sync(&ctrl->new_slave_work);
>  	sdw_bus_master_delete(&ctrl->bus);
>  	clk_disable_unprepare(ctrl->hclk);

should the last two be inverted? Keeping a clock running while removing
stuff is asking for trouble.
Krzysztof Kozlowski May 1, 2023, 12:24 p.m. UTC | #2
On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
> 
> 
> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>> Soundwire devices are supposed to be kept in reset state (powered off)
>> till their probe() or component bind() callbacks.  However if they are
>> already powered on, then they might enumerate before the master
>> initializes bus in qcom_swrm_init() leading to occasional errors like:
> 
> The problem statement is really hard to follow.
> 
> The peripheral can only be enumerated AFTER
> a) the manager starts the bus clock and transmitting PING frames
> b) the peripheral detects the sync words for 16 frames in a row.
> c) the peripheral reports as Attached in the Device0 slot
> 
> That sequence holds whether the manager does the enumeration manually or
> relies on hardware-assisted autoenumeration. This is what the spec requires.
> 
> So why can't the bus clock start be controlled by the manager driver,
> and started once all required initializations are done?
> 
> I mean, there's got to be some sort of parent-child hierarchy with
> manager first, peripheral(s) second, I don't get how these steps could
> be inverted or race.
> 
>>   qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
>>   qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
>>
>> The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
>> 1. request_threaded_irq()
>> 2. sdw_bus_master_add() - which will cause probe() and component bind()
>>    of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
>>    might already start accessing their registers.
> 
> not if the bus clock hasn't started...
> 
>> 3. qcom_swrm_init() - which initializes the link/bus and enables
>>    interrupts.
> 
> if you can move the clock start in 3) then problem solved. Why can't
> this be done?

Responding to all your three responses:
The clock is enabled in this 3. qcom_swrm_init(), so the old code to my
knowledge is written exactly how you expect.

However even with stopped clock, the device enumerates at
sdw_bus_master_add(), before anything is enabled.

I also checked the reset values of these registers - clock is off after
reset. Assuming of course I look at correct clock registers... but I
have only one.

> 
>> Any access to device registers at (2) above, will fail because link/bus
>> is not yet initialized.
>>
>> However the fix is not as simple as moving qcom_swrm_init() before
>> sdw_bus_master_add(), because this will cause early interrupt of new
>> slave attached.  The interrupt handler expects bus master (ctrl->bus.md)
>> to be allocated, so this would lead to NULL pointer exception.
>>
>> Rework the init sequence and change the interrupt handler.  The correct
>> sequence fixing accessing device registers before link init is now:
>> 1. qcom_swrm_init()
>> 2. request_threaded_irq()
>> 3. sdw_bus_master_add()
>> which still might cause early interrupts, if Soundwire devices are not
>> in powered off state before their probe.  This early interrupt issue is
> 
> You'd need to clarify in which step the bus clock starts. In general,
> you want to clock started last.

Clock is enabled in qcom_swrm_init() step, but as I wrote above, it
looks like it does not matter for enumeration.

> 
>> fixed by checking if bus master (ctrl->bus.md) was allocated and if not,
>> scheduling delayed work for enumerating the slave device.  Since we
>> actually can handle early interrupt now, drop IRQF_TRIGGER_RISING flag
>> from the interrupt, because it is not really valid and driver should use
>> flags provided by DTS.
>>
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>
>> ---
>>
>> Change context depends on:
>> https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.org
>> https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@linaro.org/
>>
>> Cc: Patrick Lai <quic_plai@quicinc.com>
>> ---
>>  drivers/soundwire/qcom.c | 89 ++++++++++++++++++++++++++++++++--------
>>  1 file changed, 72 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
>> index 679990dc3cc4..802d939ce7aa 100644
>> --- a/drivers/soundwire/qcom.c
>> +++ b/drivers/soundwire/qcom.c
>> @@ -19,6 +19,7 @@
>>  #include <linux/slimbus.h>
>>  #include <linux/soundwire/sdw.h>
>>  #include <linux/soundwire/sdw_registers.h>
>> +#include <linux/workqueue.h>
>>  #include <sound/pcm_params.h>
>>  #include <sound/soc.h>
>>  #include "bus.h"
>> @@ -187,6 +188,7 @@ struct qcom_swrm_ctrl {
>>  #endif
>>  	struct completion broadcast;
>>  	struct completion enumeration;
>> +	struct delayed_work new_slave_work;
>>  	/* Port alloc/free lock */
>>  	struct mutex port_lock;
>>  	struct clk *hclk;
>> @@ -606,6 +608,37 @@ static int qcom_swrm_enumerate(struct sdw_bus *bus)
>>  	return 0;
>>  }
>>  
>> +static void qcom_swrm_new_slave(struct work_struct *work)
>> +{
>> +	struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl,
>> +						   new_slave_work.work);
>> +
>> +	/*
>> +	 * All Soundwire slave deviecs are expected to be in reset state (powered down)
>> +	 * during sdw_bus_master_add().  The slave device should be brougth
> 
> typo/grammar: brought out of reset

ack

> 
>> +	 * from reset by its probe() or bind() function, as a result of
>> +	 * sdw_bus_master_add().
>> +	 * Add a simple check to avoid NULL pointer except on early interrupts.
>> +	 * Note that if this condition happens, the slave device will not be
>> +	 * enumerated. Its driver should be fixed.
> 
> ???
> 
> The codec driver is NEVER involved in enumeration.

If the device stays in power down, only the driver bind can bring it on.
enumeration won't happen when device is powered down, right?

> 
> The only thing a codec driver should do is provide a callback to be
> notified of a status change for additional initialization, but the
> enumeration can be done even in the absence of a codec driver.
> 
> The proof in the pudding is that you can 'blacklist' a codec driver and
> bind it later, after the hardware is enumerated. You can even unbind a
> codec driver and nothing bad will happen (we hardened that sequence last
> year).
> 
> probe != enumeration != initialization for SoundWire codecs.
> 
> Probe and enumeration can happen in any order
> Initialization can only happen after both probe and enumeration happened.

I am speaking here about component_master_ops->bind() callback.
> 
>> +	 *
>> +	 * smp_load_acquire() paired with sdw_master_device_add().
>> +	 */
>> +	if (!smp_load_acquire(&ctrl->bus.md)) {
>> +		dev_err(ctrl->dev,
>> +			"Got unexpected, early interrupt, device will not be enumerated\n");
>> +		return;
>> +	}
>> +
>> +	clk_prepare_enable(ctrl->hclk);
>> +
>> +	qcom_swrm_get_device_status(ctrl);
>> +	qcom_swrm_enumerate(&ctrl->bus);
>> +	sdw_handle_slave_status(&ctrl->bus, ctrl->status);
>> +
>> +	clk_disable_unprepare(ctrl->hclk);
>> +};
>> +
>>  static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
>>  {
>>  	struct qcom_swrm_ctrl *ctrl = dev_id;
>> @@ -668,9 +701,17 @@ static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id)
>>  					dev_dbg(ctrl->dev, "Slave status not changed %x\n",
>>  						slave_status);
>>  				} else {
>> -					qcom_swrm_get_device_status(ctrl);
>> -					qcom_swrm_enumerate(&ctrl->bus);
>> -					sdw_handle_slave_status(&ctrl->bus, ctrl->status);
>> +					unsigned long delay = 0;
>> +
>> +					/*
>> +					 * See qcom_swrm_new_slave() for
>> +					 * explanation. smp_load_acquire() paired
>> +					 * with sdw_master_device_add().
>> +					 */
>> +					if (!smp_load_acquire(&ctrl->bus.md))
>> +						delay = 10;
>> +					schedule_delayed_work(&ctrl->new_slave_work,
>> +							      delay);
>>  				}
>>  				break;
>>  			case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
>> @@ -780,6 +821,7 @@ static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
>>  
>>  	ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK;
>>  	/* Mask soundwire interrupts */
>> +
>>  	if (ctrl->version < SWRM_VERSION_2_0_0)
>>  		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
>>  				SWRM_INTERRUPT_STATUS_RMSK);
>> @@ -1485,6 +1527,7 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>  	mutex_init(&ctrl->port_lock);
>>  	init_completion(&ctrl->broadcast);
>>  	init_completion(&ctrl->enumeration);
>> +	INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
>>  
>>  	ctrl->bus.ops = &qcom_swrm_ops;
>>  	ctrl->bus.port_ops = &qcom_swrm_port_ops;
>> @@ -1514,9 +1557,10 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>  
>>  	ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
>>  
>> +	qcom_swrm_init(ctrl);
>> +
>>  	ret = devm_request_threaded_irq(dev, ctrl->irq, NULL,
>>  					qcom_swrm_irq_handler,
>> -					IRQF_TRIGGER_RISING |
>>  					IRQF_ONESHOT,
>>  					"soundwire", ctrl);
>>  	if (ret) {
>> @@ -1524,18 +1568,6 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>  		goto err_clk;
>>  	}
>>  
>> -	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
>> -	if (ctrl->wake_irq > 0) {
>> -		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
>> -						qcom_swrm_wake_irq_handler,
>> -						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
>> -						"swr_wake_irq", ctrl);
>> -		if (ret) {
>> -			dev_err(dev, "Failed to request soundwire wake irq\n");
>> -			goto err_init;
>> -		}
>> -	}
>> -
>>  	pm_runtime_set_autosuspend_delay(dev, 3000);
>>  	pm_runtime_use_autosuspend(dev);
>>  	pm_runtime_mark_last_busy(dev);
>> @@ -1549,7 +1581,18 @@ static int qcom_swrm_probe(struct platform_device *pdev)
>>  		goto err_clk;
>>  	}
>>  
>> -	qcom_swrm_init(ctrl);
>> +	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
>> +	if (ctrl->wake_irq > 0) {
>> +		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
>> +						qcom_swrm_wake_irq_handler,
>> +						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
>> +						"swr_wake_irq", ctrl);
>> +		if (ret) {
>> +			dev_err(dev, "Failed to request soundwire wake irq\n");
>> +			goto err_init;
>> +		}
>> +	}
>> +
>>  	wait_for_completion_timeout(&ctrl->enumeration,
>>  				    msecs_to_jiffies(TIMEOUT_MS));
>>  	ret = qcom_swrm_register_dais(ctrl);
>> @@ -1589,6 +1632,18 @@ static int qcom_swrm_remove(struct platform_device *pdev)
>>  {
>>  	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
>>  
>> +	/*
>> +	 * Mask interrupts to be sure no delayed work can be scheduler after
> 
> typo/grammar: scheduled

ack

> 
>> +	 * removing Soundwire bus master.
>> +	 */
>> +	if (ctrl->version < SWRM_VERSION_2_0_0)
>> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
>> +				0);
>> +	if (ctrl->mmio)
>> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
>> +				0);
>> +
>> +	cancel_delayed_work_sync(&ctrl->new_slave_work);
>>  	sdw_bus_master_delete(&ctrl->bus);
>>  	clk_disable_unprepare(ctrl->hclk);
> 
> should the last two be inverted? Keeping a clock running while removing
> stuff is asking for trouble.

It is a reversed probe(), which is usually correct. Do you expect
probe() like:

	clk_enable
	sdw_bus_master_add

but remove() like:

	clk_disable
	sdw_bus_master_delete

?

Best regards,
Krzysztof
Pierre-Louis Bossart May 1, 2023, 1:43 p.m. UTC | #3
On 5/1/23 07:24, Krzysztof Kozlowski wrote:
> On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
>>
>>
>> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>>> Soundwire devices are supposed to be kept in reset state (powered off)
>>> till their probe() or component bind() callbacks.  However if they are
>>> already powered on, then they might enumerate before the master
>>> initializes bus in qcom_swrm_init() leading to occasional errors like:
>>
>> The problem statement is really hard to follow.
>>
>> The peripheral can only be enumerated AFTER
>> a) the manager starts the bus clock and transmitting PING frames
>> b) the peripheral detects the sync words for 16 frames in a row.
>> c) the peripheral reports as Attached in the Device0 slot
>>
>> That sequence holds whether the manager does the enumeration manually or
>> relies on hardware-assisted autoenumeration. This is what the spec requires.
>>
>> So why can't the bus clock start be controlled by the manager driver,
>> and started once all required initializations are done?
>>
>> I mean, there's got to be some sort of parent-child hierarchy with
>> manager first, peripheral(s) second, I don't get how these steps could
>> be inverted or race.
>>
>>>   qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
>>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
>>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
>>>   qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
>>>
>>> The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
>>> 1. request_threaded_irq()
>>> 2. sdw_bus_master_add() - which will cause probe() and component bind()
>>>    of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
>>>    might already start accessing their registers.
>>
>> not if the bus clock hasn't started...
>>
>>> 3. qcom_swrm_init() - which initializes the link/bus and enables
>>>    interrupts.
>>
>> if you can move the clock start in 3) then problem solved. Why can't
>> this be done?
> 
> Responding to all your three responses:
> The clock is enabled in this 3. qcom_swrm_init(), so the old code to my
> knowledge is written exactly how you expect.
> 
> However even with stopped clock, the device enumerates at
> sdw_bus_master_add(), before anything is enabled.

Erm, that's not physically possible...

The peripheral can report as attached and be enumerated by the manager,
i.e. assigned a non-zero "Device Number" after the peripheral
synchronizes on 16 frames with valid static and dynamic syncwords. That
can only happen if there is a clock toggling and PING frames transmitted
on the data line.

There's something else at play here.

> I also checked the reset values of these registers - clock is off after
> reset. Assuming of course I look at correct clock registers... but I
> have only one.
> 
>>
>>> Any access to device registers at (2) above, will fail because link/bus
>>> is not yet initialized.
>>>
>>> However the fix is not as simple as moving qcom_swrm_init() before
>>> sdw_bus_master_add(), because this will cause early interrupt of new
>>> slave attached.  The interrupt handler expects bus master (ctrl->bus.md)
>>> to be allocated, so this would lead to NULL pointer exception.
>>>
>>> Rework the init sequence and change the interrupt handler.  The correct
>>> sequence fixing accessing device registers before link init is now:
>>> 1. qcom_swrm_init()
>>> 2. request_threaded_irq()
>>> 3. sdw_bus_master_add()
>>> which still might cause early interrupts, if Soundwire devices are not
>>> in powered off state before their probe.  This early interrupt issue is
>>
>> You'd need to clarify in which step the bus clock starts. In general,
>> you want to clock started last.
> 
> Clock is enabled in qcom_swrm_init() step, but as I wrote above, it
> looks like it does not matter for enumeration.

without a clock you can't have any enumeration.


>>> +	 * from reset by its probe() or bind() function, as a result of
>>> +	 * sdw_bus_master_add().
>>> +	 * Add a simple check to avoid NULL pointer except on early interrupts.
>>> +	 * Note that if this condition happens, the slave device will not be
>>> +	 * enumerated. Its driver should be fixed.
>>
>> ???
>>
>> The codec driver is NEVER involved in enumeration.
> 
> If the device stays in power down, only the driver bind can bring it on.
> enumeration won't happen when device is powered down, right?

The codec driver can indeed control the codec power with sideband links
- i.e. not with SoundWire but gpios/I2C/SPI, etc. - and it could very
well prevent the codec hardware from showing up on the bus until TBD
platform-specific criteria are met.

But that's not really taking part in the enumeration process, rather
gating the enumeration process.

>> The only thing a codec driver should do is provide a callback to be
>> notified of a status change for additional initialization, but the
>> enumeration can be done even in the absence of a codec driver.
>>
>> The proof in the pudding is that you can 'blacklist' a codec driver and
>> bind it later, after the hardware is enumerated. You can even unbind a
>> codec driver and nothing bad will happen (we hardened that sequence last
>> year).
>>
>> probe != enumeration != initialization for SoundWire codecs.
>>
>> Probe and enumeration can happen in any order
>> Initialization can only happen after both probe and enumeration happened.
> 
> I am speaking here about component_master_ops->bind() callback.

That's on the manager side, I really don't see how this is related to
the codec?

>>> +	 * removing Soundwire bus master.
>>> +	 */
>>> +	if (ctrl->version < SWRM_VERSION_2_0_0)
>>> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
>>> +				0);
>>> +	if (ctrl->mmio)
>>> +		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
>>> +				0);
>>> +
>>> +	cancel_delayed_work_sync(&ctrl->new_slave_work);
>>>  	sdw_bus_master_delete(&ctrl->bus);
>>>  	clk_disable_unprepare(ctrl->hclk);
>>
>> should the last two be inverted? Keeping a clock running while removing
>> stuff is asking for trouble.

actually it doesn't really matter, if the interrupts are disabled first
and you wait for in-flight work to be done. Ignore this comment.
> 
> It is a reversed probe(), which is usually correct. Do you expect
> probe() like:
> 
> 	clk_enable
> 	sdw_bus_master_add

it's likely the other way around,

probe():
sdw_bus_master_add
clk_enable

assuming that clk_enable() starts the bus - not sure it does based on
the answers above.
Pierre-Louis Bossart May 3, 2023, 2:11 p.m. UTC | #4
On 5/3/23 03:00, Krzysztof Kozlowski wrote:
> On 01/05/2023 15:43, Pierre-Louis Bossart wrote:
>>
>>
>> On 5/1/23 07:24, Krzysztof Kozlowski wrote:
>>> On 20/04/2023 23:37, Pierre-Louis Bossart wrote:
>>>>
>>>>
>>>> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>>>>> Soundwire devices are supposed to be kept in reset state (powered off)
>>>>> till their probe() or component bind() callbacks.  However if they are
>>>>> already powered on, then they might enumerate before the master
>>>>> initializes bus in qcom_swrm_init() leading to occasional errors like:
>>>>
>>>> The problem statement is really hard to follow.
>>>>
>>>> The peripheral can only be enumerated AFTER
>>>> a) the manager starts the bus clock and transmitting PING frames
>>>> b) the peripheral detects the sync words for 16 frames in a row.
>>>> c) the peripheral reports as Attached in the Device0 slot
>>>>
>>>> That sequence holds whether the manager does the enumeration manually or
>>>> relies on hardware-assisted autoenumeration. This is what the spec requires.
>>>>
>>>> So why can't the bus clock start be controlled by the manager driver,
>>>> and started once all required initializations are done?
>>>>
>>>> I mean, there's got to be some sort of parent-child hierarchy with
>>>> manager first, peripheral(s) second, I don't get how these steps could
>>>> be inverted or race.
>>>>
>>>>>   qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
>>>>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
>>>>>   wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
>>>>>   qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow
>>>>>
>>>>> The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
>>>>> 1. request_threaded_irq()
>>>>> 2. sdw_bus_master_add() - which will cause probe() and component bind()
>>>>>    of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
>>>>>    might already start accessing their registers.
>>>>
>>>> not if the bus clock hasn't started...
>>>>
>>>>> 3. qcom_swrm_init() - which initializes the link/bus and enables
>>>>>    interrupts.
>>>>
>>>> if you can move the clock start in 3) then problem solved. Why can't
>>>> this be done?
>>>
>>> Responding to all your three responses:
>>> The clock is enabled in this 3. qcom_swrm_init(), so the old code to my
>>> knowledge is written exactly how you expect.
>>>
>>> However even with stopped clock, the device enumerates at
>>> sdw_bus_master_add(), before anything is enabled.
>>
>> Erm, that's not physically possible...
>>
>> The peripheral can report as attached and be enumerated by the manager,
>> i.e. assigned a non-zero "Device Number" after the peripheral
>> synchronizes on 16 frames with valid static and dynamic syncwords. That
>> can only happen if there is a clock toggling and PING frames transmitted
>> on the data line.
>>
>> There's something else at play here.
> 
> Yes, I think you are right and that "else" is my limited knowledge on
> the entire setup.
> 
> You gave me awesome hint in email before that probe != enumeration !=
> initialization, however the wcd938x sound codec drivers were assuming
> some steps are equal.
> 
> wcd938x comes with three devices on two drivers:
> 1. RX Soundwire device (wcd938x-sdw.c driver)
> 2. TX Soundwire device, which is used as regmap (also wcd938x-sdw.c driver)
> 3. platform device (wcd938x.c driver) - glue and component master,
> actually having most of the code using TX Soundwire device regmap.
> 
> The probe of each RX and TX Soundwire devices added components, but that
> this did not mean devices were enumerated, as you said.
> 
> Considering what Mark said about using regcache (and sync it), I am now
> replacing entire solution with proper regcache handling device
> enumeration after component bind.

I thought you were talking about ASoC "struct snd_soc_component" but no,
that's "struct component_ops" in wcd938x-sdw.c

Now I think I get the scope of the problem, I didn't click on the fact
that it's a platform driver that registers ASoC components, and not the
SoundWire codec driver.

That's certainly more complicated than any other SoundWire-based systems
I've seen so far. It's already difficult to keep track of the SoundWire
peripheral driver .probe(), the ASoC component .probe(), the SoundWire
bus .update_status() callback and now there's
the component .bind() added.

Wow. What could possibly go wrong, eh?
diff mbox series

Patch

diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c
index 679990dc3cc4..802d939ce7aa 100644
--- a/drivers/soundwire/qcom.c
+++ b/drivers/soundwire/qcom.c
@@ -19,6 +19,7 @@ 
 #include <linux/slimbus.h>
 #include <linux/soundwire/sdw.h>
 #include <linux/soundwire/sdw_registers.h>
+#include <linux/workqueue.h>
 #include <sound/pcm_params.h>
 #include <sound/soc.h>
 #include "bus.h"
@@ -187,6 +188,7 @@  struct qcom_swrm_ctrl {
 #endif
 	struct completion broadcast;
 	struct completion enumeration;
+	struct delayed_work new_slave_work;
 	/* Port alloc/free lock */
 	struct mutex port_lock;
 	struct clk *hclk;
@@ -606,6 +608,37 @@  static int qcom_swrm_enumerate(struct sdw_bus *bus)
 	return 0;
 }
 
+static void qcom_swrm_new_slave(struct work_struct *work)
+{
+	struct qcom_swrm_ctrl *ctrl = container_of(work, struct qcom_swrm_ctrl,
+						   new_slave_work.work);
+
+	/*
+	 * All Soundwire slave deviecs are expected to be in reset state (powered down)
+	 * during sdw_bus_master_add().  The slave device should be brougth
+	 * from reset by its probe() or bind() function, as a result of
+	 * sdw_bus_master_add().
+	 * Add a simple check to avoid NULL pointer except on early interrupts.
+	 * Note that if this condition happens, the slave device will not be
+	 * enumerated. Its driver should be fixed.
+	 *
+	 * smp_load_acquire() paired with sdw_master_device_add().
+	 */
+	if (!smp_load_acquire(&ctrl->bus.md)) {
+		dev_err(ctrl->dev,
+			"Got unexpected, early interrupt, device will not be enumerated\n");
+		return;
+	}
+
+	clk_prepare_enable(ctrl->hclk);
+
+	qcom_swrm_get_device_status(ctrl);
+	qcom_swrm_enumerate(&ctrl->bus);
+	sdw_handle_slave_status(&ctrl->bus, ctrl->status);
+
+	clk_disable_unprepare(ctrl->hclk);
+};
+
 static irqreturn_t qcom_swrm_wake_irq_handler(int irq, void *dev_id)
 {
 	struct qcom_swrm_ctrl *ctrl = dev_id;
@@ -668,9 +701,17 @@  static irqreturn_t qcom_swrm_irq_handler(int irq, void *dev_id)
 					dev_dbg(ctrl->dev, "Slave status not changed %x\n",
 						slave_status);
 				} else {
-					qcom_swrm_get_device_status(ctrl);
-					qcom_swrm_enumerate(&ctrl->bus);
-					sdw_handle_slave_status(&ctrl->bus, ctrl->status);
+					unsigned long delay = 0;
+
+					/*
+					 * See qcom_swrm_new_slave() for
+					 * explanation. smp_load_acquire() paired
+					 * with sdw_master_device_add().
+					 */
+					if (!smp_load_acquire(&ctrl->bus.md))
+						delay = 10;
+					schedule_delayed_work(&ctrl->new_slave_work,
+							      delay);
 				}
 				break;
 			case SWRM_INTERRUPT_STATUS_MASTER_CLASH_DET:
@@ -780,6 +821,7 @@  static int qcom_swrm_init(struct qcom_swrm_ctrl *ctrl)
 
 	ctrl->intr_mask = SWRM_INTERRUPT_STATUS_RMSK;
 	/* Mask soundwire interrupts */
+
 	if (ctrl->version < SWRM_VERSION_2_0_0)
 		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
 				SWRM_INTERRUPT_STATUS_RMSK);
@@ -1485,6 +1527,7 @@  static int qcom_swrm_probe(struct platform_device *pdev)
 	mutex_init(&ctrl->port_lock);
 	init_completion(&ctrl->broadcast);
 	init_completion(&ctrl->enumeration);
+	INIT_DELAYED_WORK(&ctrl->new_slave_work, qcom_swrm_new_slave);
 
 	ctrl->bus.ops = &qcom_swrm_ops;
 	ctrl->bus.port_ops = &qcom_swrm_port_ops;
@@ -1514,9 +1557,10 @@  static int qcom_swrm_probe(struct platform_device *pdev)
 
 	ctrl->reg_read(ctrl, SWRM_COMP_HW_VERSION, &ctrl->version);
 
+	qcom_swrm_init(ctrl);
+
 	ret = devm_request_threaded_irq(dev, ctrl->irq, NULL,
 					qcom_swrm_irq_handler,
-					IRQF_TRIGGER_RISING |
 					IRQF_ONESHOT,
 					"soundwire", ctrl);
 	if (ret) {
@@ -1524,18 +1568,6 @@  static int qcom_swrm_probe(struct platform_device *pdev)
 		goto err_clk;
 	}
 
-	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
-	if (ctrl->wake_irq > 0) {
-		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
-						qcom_swrm_wake_irq_handler,
-						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
-						"swr_wake_irq", ctrl);
-		if (ret) {
-			dev_err(dev, "Failed to request soundwire wake irq\n");
-			goto err_init;
-		}
-	}
-
 	pm_runtime_set_autosuspend_delay(dev, 3000);
 	pm_runtime_use_autosuspend(dev);
 	pm_runtime_mark_last_busy(dev);
@@ -1549,7 +1581,18 @@  static int qcom_swrm_probe(struct platform_device *pdev)
 		goto err_clk;
 	}
 
-	qcom_swrm_init(ctrl);
+	ctrl->wake_irq = of_irq_get(dev->of_node, 1);
+	if (ctrl->wake_irq > 0) {
+		ret = devm_request_threaded_irq(dev, ctrl->wake_irq, NULL,
+						qcom_swrm_wake_irq_handler,
+						IRQF_TRIGGER_HIGH | IRQF_ONESHOT,
+						"swr_wake_irq", ctrl);
+		if (ret) {
+			dev_err(dev, "Failed to request soundwire wake irq\n");
+			goto err_init;
+		}
+	}
+
 	wait_for_completion_timeout(&ctrl->enumeration,
 				    msecs_to_jiffies(TIMEOUT_MS));
 	ret = qcom_swrm_register_dais(ctrl);
@@ -1589,6 +1632,18 @@  static int qcom_swrm_remove(struct platform_device *pdev)
 {
 	struct qcom_swrm_ctrl *ctrl = dev_get_drvdata(&pdev->dev);
 
+	/*
+	 * Mask interrupts to be sure no delayed work can be scheduler after
+	 * removing Soundwire bus master.
+	 */
+	if (ctrl->version < SWRM_VERSION_2_0_0)
+		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_MASK_ADDR],
+				0);
+	if (ctrl->mmio)
+		ctrl->reg_write(ctrl, ctrl->reg_layout[SWRM_REG_INTERRUPT_CPU_EN],
+				0);
+
+	cancel_delayed_work_sync(&ctrl->new_slave_work);
 	sdw_bus_master_delete(&ctrl->bus);
 	clk_disable_unprepare(ctrl->hclk);