mbox series

[0/6] firmware: qcom: scm: Fixes for concurrency

Message ID 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-v1-0-7056127007a7@linaro.org
Headers show
Series firmware: qcom: scm: Fixes for concurrency | expand

Message

Krzysztof Kozlowski Nov. 19, 2024, 6:33 p.m. UTC
SCM driver looks messy in terms of handling concurrency of probe.  The
driver exports interface which is guarded by global '__scm' variable
but:
1. Lacks proper read barrier (commit adding write barriers mixed up
   READ_ONCE with a read barrier).
2. Lacks barriers or checks for '__scm' in multiple places.
3. Lacks probe error cleanup.

I fixed here few visible things, but this was not tested extensively.  I
tried only SM8450.

ARM32 and SC8280xp/X1E platforms would be useful for testing as well.

All the issues here are non-urgent, IOW, they were here for some time
(v6.10-rc1 and earlier).

Best regards,
Krzysztof

---
Krzysztof Kozlowski (6):
      firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
      firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool()
      firmware: qcom: scm: Handle various probe ordering for qcom_scm_assign_mem()
      [RFC/RFT] firmware: qcom: scm: Cleanup global '__scm' on probe failures
      firmware: qcom: scm: smc: Handle missing SCM device
      firmware: qcom: scm: smc: Narrow 'mempool' variable scope

 drivers/firmware/qcom/qcom_scm-smc.c |  6 +++-
 drivers/firmware/qcom/qcom_scm.c     | 55 +++++++++++++++++++++++++-----------
 2 files changed, 44 insertions(+), 17 deletions(-)
---
base-commit: 414c97c966b69e4a6ea7b32970fa166b2f9b9ef0
change-id: 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-a25d59074882

Best regards,

Comments

Bartosz Golaszewski Nov. 20, 2024, 2:07 p.m. UTC | #1
On Tue, Nov 19, 2024 at 7:37 PM Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
>
> The SCM driver can defer or fail probe, or just load a bit later so
> callers of qcom_scm_assign_mem() should defer if the device is not ready.
>
> This fixes theoretical NULL pointer exception, triggered via introducing
> probe deferral in SCM driver with call trace:
>
>   qcom_tzmem_alloc+0x70/0x1ac (P)
>   qcom_tzmem_alloc+0x64/0x1ac (L)
>   qcom_scm_assign_mem+0x78/0x194
>   qcom_rmtfs_mem_probe+0x2d4/0x38c
>   platform_probe+0x68/0xc8
>
> Fixes: d82bd359972a ("firmware: scm: Add new SCM call API for switching memory ownership")
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>
> ---
>
> I am not sure about commit introducing it (Fixes tag) thus not Cc-ing
> stable.
> ---
>  drivers/firmware/qcom/qcom_scm.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644
> --- a/drivers/firmware/qcom/qcom_scm.c
> +++ b/drivers/firmware/qcom/qcom_scm.c
> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz,
>         int ret, i, b;
>         u64 srcvm_bits = *srcvm;
>
> +       if (!qcom_scm_is_available())
> +               return -EPROBE_DEFER;
> +

Should we be returning -EPROBE_DEFER from functions that are not
necessarily limited to being used in probe()? For instance ath10k uses
it in a workqueue job. I think this is why this driver is probed in
subsys_initcall() rather than module_initcall().

Bart

>         src_sz = hweight64(srcvm_bits) * sizeof(*src);
>         mem_to_map_sz = sizeof(*mem_to_map);
>         dest_sz = dest_cnt * sizeof(*destvm);
>
> --
> 2.43.0
>
>
Krzysztof Kozlowski Nov. 20, 2024, 2:19 p.m. UTC | #2
On 20/11/2024 15:07, Bartosz Golaszewski wrote:
>> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
>> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644
>> --- a/drivers/firmware/qcom/qcom_scm.c
>> +++ b/drivers/firmware/qcom/qcom_scm.c
>> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz,
>>         int ret, i, b;
>>         u64 srcvm_bits = *srcvm;
>>
>> +       if (!qcom_scm_is_available())
>> +               return -EPROBE_DEFER;
>> +
> 
> Should we be returning -EPROBE_DEFER from functions that are not
> necessarily limited to being used in probe()? For instance ath10k uses
> it in a workqueue job. I think this is why this driver is probed in
> subsys_initcall() rather than module_initcall().
Uh, good point. To my understanding, every resource like function can do
it, e.g. clk_get. Whether drivers call it in probe() or somewhere else -
e.g. some startup call like there is plenty in the ASoC or DMA
device_alloc_chan_resources() - is responsibility of the
driver/consumer, not the provider of that resource.

With such explanation returning EPROBE_DEFER is ok, just like returning
anything else (e.g. EINVAL).

Now about this function: it is not exactly "get a resource" one, but
still the caller might want to call it again later, which is implied by
EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like
power-supply is doing in power_supply_get_property().

Best regards,
Krzysztof
Krzysztof Kozlowski Nov. 20, 2024, 2:21 p.m. UTC | #3
On 20/11/2024 15:19, Krzysztof Kozlowski wrote:
> On 20/11/2024 15:07, Bartosz Golaszewski wrote:
>>> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
>>> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644
>>> --- a/drivers/firmware/qcom/qcom_scm.c
>>> +++ b/drivers/firmware/qcom/qcom_scm.c
>>> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz,
>>>         int ret, i, b;
>>>         u64 srcvm_bits = *srcvm;
>>>
>>> +       if (!qcom_scm_is_available())
>>> +               return -EPROBE_DEFER;
>>> +
>>
>> Should we be returning -EPROBE_DEFER from functions that are not
>> necessarily limited to being used in probe()? For instance ath10k uses
>> it in a workqueue job. I think this is why this driver is probed in

One more here: qcom_scm_assign_mem() is used in both contexts: probe()
and some other cases like mentioned workqueue. EAGAIN for probe() would
not result in defered probe, I think.


>> subsys_initcall() rather than module_initcall().
> Uh, good point. To my understanding, every resource like function can do
> it, e.g. clk_get. Whether drivers call it in probe() or somewhere else -
> e.g. some startup call like there is plenty in the ASoC or DMA
> device_alloc_chan_resources() - is responsibility of the
> driver/consumer, not the provider of that resource.
> 
> With such explanation returning EPROBE_DEFER is ok, just like returning
> anything else (e.g. EINVAL).
> 
> Now about this function: it is not exactly "get a resource" one, but
> still the caller might want to call it again later, which is implied by
> EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like
> power-supply is doing in power_supply_get_property().
> 
Best regards,
Krzysztof
Neil Armstrong Nov. 20, 2024, 4:20 p.m. UTC | #4
On 20/11/2024 12:13, Dmitry Baryshkov wrote:
> On Tue, Nov 19, 2024 at 07:33:16PM +0100, Krzysztof Kozlowski wrote:
>> SCM driver looks messy in terms of handling concurrency of probe.  The
>> driver exports interface which is guarded by global '__scm' variable
>> but:
>> 1. Lacks proper read barrier (commit adding write barriers mixed up
>>     READ_ONCE with a read barrier).
>> 2. Lacks barriers or checks for '__scm' in multiple places.
>> 3. Lacks probe error cleanup.
>>
>> I fixed here few visible things, but this was not tested extensively.  I
>> tried only SM8450.
>>
>> ARM32 and SC8280xp/X1E platforms would be useful for testing as well.
> 
> ARM32 devices are present in the lab.

I passed the patchset on our devices, and no regressions observed:

arm32: https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/116195
arm64(including x1e): https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/116201

Neil

> 
>>
>> All the issues here are non-urgent, IOW, they were here for some time
>> (v6.10-rc1 and earlier).
>>
>> Best regards,
>> Krzysztof
>>
>> ---
>> Krzysztof Kozlowski (6):
>>        firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
>>        firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool()
>>        firmware: qcom: scm: Handle various probe ordering for qcom_scm_assign_mem()
>>        [RFC/RFT] firmware: qcom: scm: Cleanup global '__scm' on probe failures
>>        firmware: qcom: scm: smc: Handle missing SCM device
>>        firmware: qcom: scm: smc: Narrow 'mempool' variable scope
>>
>>   drivers/firmware/qcom/qcom_scm-smc.c |  6 +++-
>>   drivers/firmware/qcom/qcom_scm.c     | 55 +++++++++++++++++++++++++-----------
>>   2 files changed, 44 insertions(+), 17 deletions(-)
>> ---
>> base-commit: 414c97c966b69e4a6ea7b32970fa166b2f9b9ef0
>> change-id: 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-a25d59074882
>>
>> Best regards,
>> -- 
>> Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>
>
Bjorn Andersson Nov. 27, 2024, 4:45 a.m. UTC | #5
On Wed, Nov 20, 2024 at 03:19:00PM +0100, Krzysztof Kozlowski wrote:
> On 20/11/2024 15:07, Bartosz Golaszewski wrote:
> >> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
> >> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644
> >> --- a/drivers/firmware/qcom/qcom_scm.c
> >> +++ b/drivers/firmware/qcom/qcom_scm.c
> >> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz,
> >>         int ret, i, b;
> >>         u64 srcvm_bits = *srcvm;
> >>
> >> +       if (!qcom_scm_is_available())
> >> +               return -EPROBE_DEFER;
> >> +
> > 
> > Should we be returning -EPROBE_DEFER from functions that are not
> > necessarily limited to being used in probe()? For instance ath10k uses
> > it in a workqueue job. I think this is why this driver is probed in
> > subsys_initcall() rather than module_initcall().
> Uh, good point. To my understanding, every resource like function can do
> it, e.g. clk_get. Whether drivers call it in probe() or somewhere else -
> e.g. some startup call like there is plenty in the ASoC or DMA
> device_alloc_chan_resources() - is responsibility of the
> driver/consumer, not the provider of that resource.
> 
> With such explanation returning EPROBE_DEFER is ok, just like returning
> anything else (e.g. EINVAL).
> 
> Now about this function: it is not exactly "get a resource" one, but
> still the caller might want to call it again later, which is implied by
> EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like
> power-supply is doing in power_supply_get_property().
> 

The return value here will wander up the stack and I'm not convinced
that all callers will handle an EAGAIN in a favourable way.

The way we've dealt with this before is to say that if a client will
call qcom_scm_*() they must call qcom_scm_is_available() during their
initialization and handle the EPROBE_DEFER accordingly.

Regards,
Bjorn

> Best regards,
> Krzysztof