Message ID | 1599741856-16239-2-git-send-email-mansur@codeaurora.org |
---|---|
State | New |
Headers | show |
Series | [v2,1/3] venus: core: handle race condititon for core ops | expand |
On 2020-09-11 15:40, Stanimir Varbanov wrote: > On 9/10/20 3:44 PM, Mansur Alisha Shaik wrote: >> For core ops we are having only write protect but there >> is no read protect, because of this in multthreading >> and concurrency, one CPU core is reading without wait >> which is causing the NULL pointer dereferece crash. >> >> one such scenario is as show below, where in one CPU >> core, core->ops becoming NULL and in another CPU core >> calling core->ops->session_init(). >> >> CPU: core-7: >> Call trace: >> hfi_session_init+0x180/0x1dc [venus_core] >> vdec_queue_setup+0x9c/0x364 [venus_dec] >> vb2_core_reqbufs+0x1e4/0x368 [videobuf2_common] >> vb2_reqbufs+0x4c/0x64 [videobuf2_v4l2] >> v4l2_m2m_reqbufs+0x50/0x84 [v4l2_mem2mem] >> v4l2_m2m_ioctl_reqbufs+0x2c/0x38 [v4l2_mem2mem] >> v4l_reqbufs+0x4c/0x5c >> __video_do_ioctl+0x2b0/0x39c >> >> CPU: core-0: >> Call trace: >> venus_shutdown+0x98/0xfc [venus_core] >> venus_sys_error_handler+0x64/0x148 [venus_core] >> process_one_work+0x210/0x3d0 >> worker_thread+0x248/0x3f4 >> kthread+0x11c/0x12c >> >> Signed-off-by: Mansur Alisha Shaik <mansur@codeaurora.org> >> Acked-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> >> --- >> Changes in V2: >> - Addressed review comments by stan by validating on top >> - of >> https://lore.kernel.org/patchwork/project/lkml/list/?series=455962 >> >> drivers/media/platform/qcom/venus/hfi.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/media/platform/qcom/venus/hfi.c >> b/drivers/media/platform/qcom/venus/hfi.c >> index a59022a..3137071 100644 >> --- a/drivers/media/platform/qcom/venus/hfi.c >> +++ b/drivers/media/platform/qcom/venus/hfi.c >> @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create); >> int hfi_session_init(struct venus_inst *inst, u32 pixfmt) >> { >> struct venus_core *core = inst->core; >> - const struct hfi_ops *ops = core->ops; >> + const struct hfi_ops *ops; >> int ret; >> > > If we are in system error recovery the session_init cannot pass > successfully, so we exit early in the function. > > I'd suggest to make it: > > /* If core shutdown is in progress or we are in system error > recovery, > return an error */ > mutex_lock(&core->lock); > if (!core->ops || core->sys_error) { > mutex_unclock(&core->lock); > return -EIO; > } > mutex_unclock(&core->lock); > Tried above suggestion and ran the failed scenario, I didn't see any issue. Posted new version https://lore.kernel.org/patchwork/project/lkml/list/?series=463091 >> if (inst->state != INST_UNINIT) >> @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, >> u32 pixfmt) >> inst->hfi_codec = to_codec_type(pixfmt); >> reinit_completion(&inst->done); >> >> + mutex_lock(&core->lock); >> + ops = core->ops; >> ret = ops->session_init(inst, inst->session_type, inst->hfi_codec); >> if (ret) >> return ret; >> >> + mutex_unlock(&core->lock); >> ret = wait_session_msg(inst); >> if (ret) >> return ret; >>
diff --git a/drivers/media/platform/qcom/venus/hfi.c b/drivers/media/platform/qcom/venus/hfi.c index a59022a..3137071 100644 --- a/drivers/media/platform/qcom/venus/hfi.c +++ b/drivers/media/platform/qcom/venus/hfi.c @@ -195,7 +195,7 @@ EXPORT_SYMBOL_GPL(hfi_session_create); int hfi_session_init(struct venus_inst *inst, u32 pixfmt) { struct venus_core *core = inst->core; - const struct hfi_ops *ops = core->ops; + const struct hfi_ops *ops; int ret; if (inst->state != INST_UNINIT) @@ -204,10 +204,13 @@ int hfi_session_init(struct venus_inst *inst, u32 pixfmt) inst->hfi_codec = to_codec_type(pixfmt); reinit_completion(&inst->done); + mutex_lock(&core->lock); + ops = core->ops; ret = ops->session_init(inst, inst->session_type, inst->hfi_codec); if (ret) return ret; + mutex_unlock(&core->lock); ret = wait_session_msg(inst); if (ret) return ret;