Message ID | 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-v1-0-7056127007a7@linaro.org |
---|---|
Headers | show |
Series | firmware: qcom: scm: Fixes for concurrency | expand |
On Tue, Nov 19, 2024 at 7:37 PM Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote: > > The SCM driver can defer or fail probe, or just load a bit later so > callers of qcom_scm_assign_mem() should defer if the device is not ready. > > This fixes theoretical NULL pointer exception, triggered via introducing > probe deferral in SCM driver with call trace: > > qcom_tzmem_alloc+0x70/0x1ac (P) > qcom_tzmem_alloc+0x64/0x1ac (L) > qcom_scm_assign_mem+0x78/0x194 > qcom_rmtfs_mem_probe+0x2d4/0x38c > platform_probe+0x68/0xc8 > > Fixes: d82bd359972a ("firmware: scm: Add new SCM call API for switching memory ownership") > Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > > --- > > I am not sure about commit introducing it (Fixes tag) thus not Cc-ing > stable. > --- > drivers/firmware/qcom/qcom_scm.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c > index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644 > --- a/drivers/firmware/qcom/qcom_scm.c > +++ b/drivers/firmware/qcom/qcom_scm.c > @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz, > int ret, i, b; > u64 srcvm_bits = *srcvm; > > + if (!qcom_scm_is_available()) > + return -EPROBE_DEFER; > + Should we be returning -EPROBE_DEFER from functions that are not necessarily limited to being used in probe()? For instance ath10k uses it in a workqueue job. I think this is why this driver is probed in subsys_initcall() rather than module_initcall(). Bart > src_sz = hweight64(srcvm_bits) * sizeof(*src); > mem_to_map_sz = sizeof(*mem_to_map); > dest_sz = dest_cnt * sizeof(*destvm); > > -- > 2.43.0 > >
On 20/11/2024 15:07, Bartosz Golaszewski wrote: >> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c >> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644 >> --- a/drivers/firmware/qcom/qcom_scm.c >> +++ b/drivers/firmware/qcom/qcom_scm.c >> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz, >> int ret, i, b; >> u64 srcvm_bits = *srcvm; >> >> + if (!qcom_scm_is_available()) >> + return -EPROBE_DEFER; >> + > > Should we be returning -EPROBE_DEFER from functions that are not > necessarily limited to being used in probe()? For instance ath10k uses > it in a workqueue job. I think this is why this driver is probed in > subsys_initcall() rather than module_initcall(). Uh, good point. To my understanding, every resource like function can do it, e.g. clk_get. Whether drivers call it in probe() or somewhere else - e.g. some startup call like there is plenty in the ASoC or DMA device_alloc_chan_resources() - is responsibility of the driver/consumer, not the provider of that resource. With such explanation returning EPROBE_DEFER is ok, just like returning anything else (e.g. EINVAL). Now about this function: it is not exactly "get a resource" one, but still the caller might want to call it again later, which is implied by EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like power-supply is doing in power_supply_get_property(). Best regards, Krzysztof
On 20/11/2024 15:19, Krzysztof Kozlowski wrote: > On 20/11/2024 15:07, Bartosz Golaszewski wrote: >>> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c >>> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644 >>> --- a/drivers/firmware/qcom/qcom_scm.c >>> +++ b/drivers/firmware/qcom/qcom_scm.c >>> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz, >>> int ret, i, b; >>> u64 srcvm_bits = *srcvm; >>> >>> + if (!qcom_scm_is_available()) >>> + return -EPROBE_DEFER; >>> + >> >> Should we be returning -EPROBE_DEFER from functions that are not >> necessarily limited to being used in probe()? For instance ath10k uses >> it in a workqueue job. I think this is why this driver is probed in One more here: qcom_scm_assign_mem() is used in both contexts: probe() and some other cases like mentioned workqueue. EAGAIN for probe() would not result in defered probe, I think. >> subsys_initcall() rather than module_initcall(). > Uh, good point. To my understanding, every resource like function can do > it, e.g. clk_get. Whether drivers call it in probe() or somewhere else - > e.g. some startup call like there is plenty in the ASoC or DMA > device_alloc_chan_resources() - is responsibility of the > driver/consumer, not the provider of that resource. > > With such explanation returning EPROBE_DEFER is ok, just like returning > anything else (e.g. EINVAL). > > Now about this function: it is not exactly "get a resource" one, but > still the caller might want to call it again later, which is implied by > EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like > power-supply is doing in power_supply_get_property(). > Best regards, Krzysztof
On 20/11/2024 12:13, Dmitry Baryshkov wrote: > On Tue, Nov 19, 2024 at 07:33:16PM +0100, Krzysztof Kozlowski wrote: >> SCM driver looks messy in terms of handling concurrency of probe. The >> driver exports interface which is guarded by global '__scm' variable >> but: >> 1. Lacks proper read barrier (commit adding write barriers mixed up >> READ_ONCE with a read barrier). >> 2. Lacks barriers or checks for '__scm' in multiple places. >> 3. Lacks probe error cleanup. >> >> I fixed here few visible things, but this was not tested extensively. I >> tried only SM8450. >> >> ARM32 and SC8280xp/X1E platforms would be useful for testing as well. > > ARM32 devices are present in the lab. I passed the patchset on our devices, and no regressions observed: arm32: https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/116195 arm64(including x1e): https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/116201 Neil > >> >> All the issues here are non-urgent, IOW, they were here for some time >> (v6.10-rc1 and earlier). >> >> Best regards, >> Krzysztof >> >> --- >> Krzysztof Kozlowski (6): >> firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available() >> firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool() >> firmware: qcom: scm: Handle various probe ordering for qcom_scm_assign_mem() >> [RFC/RFT] firmware: qcom: scm: Cleanup global '__scm' on probe failures >> firmware: qcom: scm: smc: Handle missing SCM device >> firmware: qcom: scm: smc: Narrow 'mempool' variable scope >> >> drivers/firmware/qcom/qcom_scm-smc.c | 6 +++- >> drivers/firmware/qcom/qcom_scm.c | 55 +++++++++++++++++++++++++----------- >> 2 files changed, 44 insertions(+), 17 deletions(-) >> --- >> base-commit: 414c97c966b69e4a6ea7b32970fa166b2f9b9ef0 >> change-id: 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-a25d59074882 >> >> Best regards, >> -- >> Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> >> >
On Wed, Nov 20, 2024 at 03:19:00PM +0100, Krzysztof Kozlowski wrote: > On 20/11/2024 15:07, Bartosz Golaszewski wrote: > >> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c > >> index 5d91b8e22844608f35432f1ba9c08d477d4ff762..93212c8f20ad65ecc44804b00f4b93e3eaaf8d95 100644 > >> --- a/drivers/firmware/qcom/qcom_scm.c > >> +++ b/drivers/firmware/qcom/qcom_scm.c > >> @@ -1075,6 +1075,9 @@ int qcom_scm_assign_mem(phys_addr_t mem_addr, size_t mem_sz, > >> int ret, i, b; > >> u64 srcvm_bits = *srcvm; > >> > >> + if (!qcom_scm_is_available()) > >> + return -EPROBE_DEFER; > >> + > > > > Should we be returning -EPROBE_DEFER from functions that are not > > necessarily limited to being used in probe()? For instance ath10k uses > > it in a workqueue job. I think this is why this driver is probed in > > subsys_initcall() rather than module_initcall(). > Uh, good point. To my understanding, every resource like function can do > it, e.g. clk_get. Whether drivers call it in probe() or somewhere else - > e.g. some startup call like there is plenty in the ASoC or DMA > device_alloc_chan_resources() - is responsibility of the > driver/consumer, not the provider of that resource. > > With such explanation returning EPROBE_DEFER is ok, just like returning > anything else (e.g. EINVAL). > > Now about this function: it is not exactly "get a resource" one, but > still the caller might want to call it again later, which is implied by > EPROBE_DEFER. Maybe this should be EAGAIN instead? Just like > power-supply is doing in power_supply_get_property(). > The return value here will wander up the stack and I'm not convinced that all callers will handle an EAGAIN in a favourable way. The way we've dealt with this before is to say that if a client will call qcom_scm_*() they must call qcom_scm_is_available() during their initialization and handle the EPROBE_DEFER accordingly. Regards, Bjorn > Best regards, > Krzysztof