mbox series

[0/6] ASoC/soundwire: qcom: correctly probe devices after link init

Message ID 20230420101617.142225-1-krzysztof.kozlowski@linaro.org
Headers show
Series ASoC/soundwire: qcom: correctly probe devices after link init | expand

Message

Krzysztof Kozlowski April 20, 2023, 10:16 a.m. UTC
Hi,

Dependencies
============
1. ASoC codec: changes are independent, however they should rather come the same
   cycle as Soundwire changes, to avoid new warning and small delay.

2. Soundwire: changes (context) depend on:
   https://lore.kernel.org/r/20230209131336.18252-3-srinivas.kandagatla@linaro.org
   https://lore.kernel.org/all/20230418095447.577001-1-krzysztof.kozlowski@linaro.org/

Problems solved
===============
Soundwire devices are supposed to be kept in reset state (powered off)
till their probe() or component bind() callbacks.  However if they are
already powered on, then they might enumerate before the master
initializes bus in qcom_swrm_init() leading to occasional errors like:

  qcom-soundwire 6d30000.soundwire-controller: Qualcomm Soundwire controller v2.0.0 Registered
  wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:4 (ops wcd938x_sdw_component_ops)
  wcd938x_codec audio-codec: bound sdw:0:0217:010d:00:3 (ops wcd938x_sdw_component_ops)
  qcom-soundwire 6ad0000.soundwire-controller: swrm_wait_for_wr_fifo_avail err write overflow

The problem primarily lies in Qualcomm Soundwire controller probe() sequence:
1. request_threaded_irq()
2. sdw_bus_master_add() - which will cause probe() and component bind()
   of Soundwire devices, e.g. WCD938x codec drivers.  Device drivers
   might already start accessing their registers.
3. qcom_swrm_init() - which initializes the link/bus and enables
   interrupts.

Any access to device registers at (2) above, will fail because link/bus
is not yet initialized.

Cc: Patrick Lai <quic_plai@quicinc.com>

Best regards,
Krzysztof

Dmitry Torokhov (1):
  ASoC: wcd938x: switch to using gpiod API

Krzysztof Kozlowski (5):
  ASoC: codecs: wcd938x: Keep device in reset till bind
  ASoC: codecs: wcd938x: Check for enumeration before using TX device
  soundwire: qcom: drop unused struct qcom_swrm_ctrl members
  soudnwire: master: protect concurrecnt check for bus->md
  soundwire: qcom: do not probe devices before bus/link init

 drivers/soundwire/master.c |  7 ++-
 drivers/soundwire/qcom.c   | 92 +++++++++++++++++++++++++++++---------
 sound/soc/codecs/wcd938x.c | 44 +++++++++++-------
 3 files changed, 107 insertions(+), 36 deletions(-)

Comments

Krzysztof Kozlowski April 20, 2023, 12:32 p.m. UTC | #1
On 20/04/2023 14:30, Krzysztof Kozlowski wrote:
> On 20/04/2023 13:58, Mark Brown wrote:
>> On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
>>
>>> -	gpio_direction_output(wcd938x->reset_gpio, 0);
>>> -	/* 20us sleep required after pulling the reset gpio to LOW */
>>> +	gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
>>> +	/* 20us sleep required after asserting the reset gpio */
>>
>> This is inverting the sense of the GPIO in the API from active low to
>> active high which will mean we're introducing a new reliance on having
>> the signal described as active low in DT.  That's an ABI concern.
> 
> It's bringing it to the correct level. Old code was not respecting the
> DTS thus if such DTS came with inverted design, the driver would not work.
> 
> We were already fixing the upstream DTS users and I thought all of them
> are fixed since long time (half a year) or even correct from the
> beginning. Now I found one more case with incorrect level, which I will fix.

No, my bad - all upstream DTSes are corrected since half year.

> 
>>
>> I remain deeply unconvinced that remapping active low outputs like this
>> in the GPIO API is helping.
> 
> The code is mapping them to correct state. The previous state was
> incorrect and did not allow to handle active high (which can happen).
> This is the effort to make code correct - driver and DTS.

Best regards,
Krzysztof
Mark Brown April 20, 2023, 1 p.m. UTC | #2
On Thu, Apr 20, 2023 at 02:30:17PM +0200, Krzysztof Kozlowski wrote:
> On 20/04/2023 13:58, Mark Brown wrote:
> > On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:

> >> -	gpio_direction_output(wcd938x->reset_gpio, 0);
> >> -	/* 20us sleep required after pulling the reset gpio to LOW */
> >> +	gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
> >> +	/* 20us sleep required after asserting the reset gpio */

> > This is inverting the sense of the GPIO in the API from active low to
> > active high which will mean we're introducing a new reliance on having
> > the signal described as active low in DT.  That's an ABI concern.

> It's bringing it to the correct level. Old code was not respecting the
> DTS thus if such DTS came with inverted design, the driver would not work.

Sure, but OTOH if the user didn't bother specifying as active low it
would work.  I suspect it's more likely that someone missed a flag that
had no practical impact in DT than that someone would add an inverter to
their design.

> We were already fixing the upstream DTS users and I thought all of them
> are fixed since long time (half a year) or even correct from the
> beginning. Now I found one more case with incorrect level, which I will fix.

That's just upstream, what about any downstream users?

> > I remain deeply unconvinced that remapping active low outputs like this
> > in the GPIO API is helping.

> The code is mapping them to correct state. The previous state was
> incorrect and did not allow to handle active high (which can happen).
> This is the effort to make code correct - driver and DTS.

We could handle inversions through an explicit property if that were
needed, that would be a less problematic transition and clearer in the
consumer code.
Krzysztof Kozlowski April 20, 2023, 2:16 p.m. UTC | #3
On 20/04/2023 15:00, Mark Brown wrote:
> On Thu, Apr 20, 2023 at 02:30:17PM +0200, Krzysztof Kozlowski wrote:
>> On 20/04/2023 13:58, Mark Brown wrote:
>>> On Thu, Apr 20, 2023 at 12:16:12PM +0200, Krzysztof Kozlowski wrote:
> 
>>>> -	gpio_direction_output(wcd938x->reset_gpio, 0);
>>>> -	/* 20us sleep required after pulling the reset gpio to LOW */
>>>> +	gpiod_set_value_cansleep(wcd938x->reset_gpio, 1);
>>>> +	/* 20us sleep required after asserting the reset gpio */
> 
>>> This is inverting the sense of the GPIO in the API from active low to
>>> active high which will mean we're introducing a new reliance on having
>>> the signal described as active low in DT.  That's an ABI concern.
> 
>> It's bringing it to the correct level. Old code was not respecting the
>> DTS thus if such DTS came with inverted design, the driver would not work.
> 
> Sure, but OTOH if the user didn't bother specifying as active low it
> would work.  I suspect it's more likely that someone missed a flag that
> had no practical impact in DT than that someone would add an inverter to
> their design.
> 
>> We were already fixing the upstream DTS users and I thought all of them
>> are fixed since long time (half a year) or even correct from the
>> beginning. Now I found one more case with incorrect level, which I will fix.
> 
> That's just upstream, what about any downstream users?

Life of downstream. We all know the drill: merge your DTS or suffer. The
WCD938x codecs are moderately new, so I do not expect many downstream
users. They are in theory possible, because driver was merged in
v5.14-rc1 and for the newest products Qualcomm uses v5.15. Although now
it is v5.15, but the time driver was merged, maybe it was v5.10.

I could rework this patch to provide backwards compatible solution like
I did for WSA:
https://lore.kernel.org/all/20230102114152.297305-4-krzysztof.kozlowski@linaro.org/

There are downsides of it, but as you pointed out - it's actually very
rare to have the signal inverted in hardware.

> 
>>> I remain deeply unconvinced that remapping active low outputs like this
>>> in the GPIO API is helping.
> 
>> The code is mapping them to correct state. The previous state was
>> incorrect and did not allow to handle active high (which can happen).
>> This is the effort to make code correct - driver and DTS.
> 
> We could handle inversions through an explicit property if that were
> needed, that would be a less problematic transition and clearer in the
> consumer code.

I am not sure if it is worth. The DTS is supposed to describe hardware,
so even if reset pin flag was not effective, it is a mistake to describe
it as ACTIVE_HIGH. Do we care about keeping broken code happy? If yes,
then property is the way to go. If partially, then I can add
backwards-compatible approach like I mentioned above.

Best regards,
Krzysztof
Mark Brown April 20, 2023, 4:28 p.m. UTC | #4
On Thu, Apr 20, 2023 at 04:16:59PM +0200, Krzysztof Kozlowski wrote:
> On 20/04/2023 15:00, Mark Brown wrote:

> > That's just upstream, what about any downstream users?

> Life of downstream. We all know the drill: merge your DTS or suffer. The

No, the DT is supposed to be an ABI.  The point in having a domain
specific language with a compiler is to allow device trees to be
distributed independently of the kernel.

> I could rework this patch to provide backwards compatible solution like
> I did for WSA:
> https://lore.kernel.org/all/20230102114152.297305-4-krzysztof.kozlowski@linaro.org/

There we go...

> > We could handle inversions through an explicit property if that were
> > needed, that would be a less problematic transition and clearer in the
> > consumer code.

> I am not sure if it is worth. The DTS is supposed to describe hardware,
> so even if reset pin flag was not effective, it is a mistake to describe
> it as ACTIVE_HIGH. Do we care about keeping broken code happy? If yes,
> then property is the way to go. If partially, then I can add
> backwards-compatible approach like I mentioned above.

It's not just this individual transition, it's the whole thing with
encoding the polarity of the signal at all - it's a layer of abstraction
that feels like it introduces at least as many problems as it solves,
and requiring configuration on every single system integration doesn't
feel like the right choice in general.
Pierre-Louis Bossart April 20, 2023, 4:42 p.m. UTC | #5
typos in commit title...

On 4/20/23 05:16, Krzysztof Kozlowski wrote:
> The Soundwire master controllers might want to check for bus->md

Apologies for being pedantic but 'manager' and 'controller' are
different concepts in SoundWire, see DisCo spec.
It's not a 1:1 mapping, a controller can rely on M managers

> initialization to avoid race between early interrupt and finish of
> sdw_bus_master_add()/sdw_master_device_add().  Such early interrupt can
> happen if Soundwire devices are not powered off during their probe.
> 
> Add a store release barrier, so the Soundwire controllers can safely
> check it in concurrent (e.g. in interrupt) way.

Can you elaborate on the race condition? I am not following what breaks,
and what entity generates the 'early interrupt'.

I am specifically concerned about adding this in common code without any
matching smp_load_acquire() - which is only added in the following patch
for the Qualcomm manager only, but not added for Intel/AMD managers. Is
this not a problem?

> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> 
> ---
> 
> Cc: Patrick Lai <quic_plai@quicinc.com>
> ---
>  drivers/soundwire/master.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/soundwire/master.c b/drivers/soundwire/master.c
> index 9b05c9e25ebe..d5bf13e7e602 100644
> --- a/drivers/soundwire/master.c
> +++ b/drivers/soundwire/master.c
> @@ -161,7 +161,12 @@ int sdw_master_device_add(struct sdw_bus *bus, struct device *parent,
>  	/* add shortcuts to improve code readability/compactness */
>  	md->bus = bus;
>  	bus->dev = &md->dev;
> -	bus->md = md;
> +	/*
> +	 * Make sure the contents of md is stored before storing bus->md.
> +	 * Paired with new slave attached and slave status interrupts
> +	 * on the Soundwire master side.
> +	 */
> +	smp_store_release(&bus->md, md);
>  
>  	pm_runtime_set_autosuspend_delay(&bus->md->dev, SDW_MASTER_SUSPEND_DELAY_MS);
>  	pm_runtime_use_autosuspend(&bus->md->dev);
Krzysztof Kozlowski April 20, 2023, 5:27 p.m. UTC | #6
On 20/04/2023 18:42, Pierre-Louis Bossart wrote:
> typos in commit title...
> 
> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>> The Soundwire master controllers might want to check for bus->md
> 
> Apologies for being pedantic but 'manager' and 'controller' are
> different concepts in SoundWire, see DisCo spec.
> It's not a 1:1 mapping, a controller can rely on M managers

I wrote master, not manager. For the Qualcomm case one controller is one
master, but in general I try to avoid the master/slave terminology.

> 
>> initialization to avoid race between early interrupt and finish of
>> sdw_bus_master_add()/sdw_master_device_add().  Such early interrupt can
>> happen if Soundwire devices are not powered off during their probe.
>>
>> Add a store release barrier, so the Soundwire controllers can safely
>> check it in concurrent (e.g. in interrupt) way.
> 
> Can you elaborate on the race condition? I am not following what breaks,
> and what entity generates the 'early interrupt'.

The condition is explained in next patch. If you think it's better, I
can squash it with next.

If the condition is still not clear, drop a note in next patch, so I
will elaborate there.

> 
> I am specifically concerned about adding this in common code without any
> matching smp_load_acquire() - which is only added in the following patch
> for the Qualcomm manager only, but not added for Intel/AMD managers. Is
> this not a problem?

Shouldn't be. The barrier just won't be effective for these drivers, but
that should not be a problem, because I also did not add to these
checking bus->md in a concurrent path.

Basically the barrier here is necessary because I want to check bus->md
in Qualcomm master interrupt handler.

Best regards,
Krzysztof
Pierre-Louis Bossart April 20, 2023, 9:13 p.m. UTC | #7
On 4/20/23 12:27, Krzysztof Kozlowski wrote:
> On 20/04/2023 18:42, Pierre-Louis Bossart wrote:
>> typos in commit title...
>>
>> On 4/20/23 05:16, Krzysztof Kozlowski wrote:
>>> The Soundwire master controllers might want to check for bus->md
>>
>> Apologies for being pedantic but 'manager' and 'controller' are
>> different concepts in SoundWire, see DisCo spec.
>> It's not a 1:1 mapping, a controller can rely on M managers
> 
> I wrote master, not manager. For the Qualcomm case one controller is one
> master, but in general I try to avoid the master/slave terminology.

The Soundwire 1.2.1 spec moved away from master/slave and uses
manager/peripheral. It's the same concepts, just different terms. At
some point we'll update the code, it's just been too busy in 2022/2023
to do this replacement. It doesn't hurt to use the new terms.

>>> initialization to avoid race between early interrupt and finish of
>>> sdw_bus_master_add()/sdw_master_device_add().  Such early interrupt can
>>> happen if Soundwire devices are not powered off during their probe.
>>>
>>> Add a store release barrier, so the Soundwire controllers can safely
>>> check it in concurrent (e.g. in interrupt) way.
>>
>> Can you elaborate on the race condition? I am not following what breaks,
>> and what entity generates the 'early interrupt'.
> 
> The condition is explained in next patch. If you think it's better, I
> can squash it with next.
> 
> If the condition is still not clear, drop a note in next patch, so I
> will elaborate there.

will do.

>> I am specifically concerned about adding this in common code without any
>> matching smp_load_acquire() - which is only added in the following patch
>> for the Qualcomm manager only, but not added for Intel/AMD managers. Is
>> this not a problem?
> 
> Shouldn't be. The barrier just won't be effective for these drivers, but
> that should not be a problem, because I also did not add to these
> checking bus->md in a concurrent path.
> 
> Basically the barrier here is necessary because I want to check bus->md
> in Qualcomm master interrupt handler.

I really don't have any understanding or background on what this does.

Is there actually a precedent for this? I mean, dealing with the
device/driver model is already complicated, if now we have to be careful
on when the device pointer is stored it adds a whole new element of
complexity or skillset required to understand the bus operation.

Re-looking at the code, the 'md' variable is allocated in
sdw_master_device_add(), initialized with all kinds of values, used by
device_register() so presumably when you store the value it's stored
somewhere consistent, no?