diff mbox series

[6/7] remoteproc: qcom_q6v5_pas: Add hwspinlock bust on stop

Message ID 20240516-hwspinlock-bust-v1-6-47a90a859238@quicinc.com
State New
Headers show
Series Add support for hwspinlock bust | expand

Commit Message

Chris Lew May 16, 2024, 10:58 p.m. UTC
From: Richard Maina <quic_rmaina@quicinc.com>

When remoteproc goes down unexpectedly this results in a state where any
acquired hwspinlocks will remain locked possibly resulting in deadlock.
In order to ensure all locks are freed we include a call to
hwspin_lock_bust() during remoteproc shutdown.

For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
is used to take the hwspinlock. Remoteproc should use this id to try and
bust the lock on remoteproc stop.

This edge case only occurs with q6v5_pas watchdog crashes. The error
fatal case has handling to clear the hwspinlock before the error fatal
interrupt is triggered.

Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
Signed-off-by: Chris Lew <quic_clew@quicinc.com>
---
 drivers/remoteproc/qcom_q6v5_pas.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Mukesh Ojha May 17, 2024, 7:19 a.m. UTC | #1
On 5/17/2024 4:28 AM, Chris Lew wrote:
> From: Richard Maina <quic_rmaina@quicinc.com>
> 
> When remoteproc goes down unexpectedly this results in a state where any
> acquired hwspinlocks will remain locked possibly resulting in deadlock.
> In order to ensure all locks are freed we include a call to
> hwspin_lock_bust() during remoteproc shutdown.
> 
> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
> is used to take the hwspinlock. Remoteproc should use this id to try and
> bust the lock on remoteproc stop.
> 
> This edge case only occurs with q6v5_pas watchdog crashes. The error
> fatal case has handling to clear the hwspinlock before the error fatal
> interrupt is triggered.
> 
> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
> ---
>   drivers/remoteproc/qcom_q6v5_pas.c | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
> index 54d8005d40a3..57178fcb9aa3 100644
> --- a/drivers/remoteproc/qcom_q6v5_pas.c
> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
> @@ -10,6 +10,7 @@
>   #include <linux/clk.h>
>   #include <linux/delay.h>
>   #include <linux/firmware.h>
> +#include <linux/hwspinlock.h>
>   #include <linux/interrupt.h>
>   #include <linux/kernel.h>
>   #include <linux/module.h>
> @@ -52,6 +53,7 @@ struct adsp_data {
>   	const char *ssr_name;
>   	const char *sysmon_name;
>   	int ssctl_id;
> +	int hwlock_id;
>   
>   	int region_assign_idx;
>   	int region_assign_count;
> @@ -84,6 +86,9 @@ struct qcom_adsp {
>   	bool decrypt_shutdown;
>   	const char *info_name;
>   
> +	struct hwspinlock *hwlock;
> +	int hwlock_id;
> +
>   	const struct firmware *firmware;
>   	const struct firmware *dtb_firmware;
>   
> @@ -399,6 +404,12 @@ static int adsp_stop(struct rproc *rproc)
>   	if (handover)
>   		qcom_pas_handover(&adsp->q6v5);
>   
> +	if (adsp->hwlock) {
> +		ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
> +		if (ret)
> +			dev_info(adsp->dev, "failed to bust hwspinlock\n");
> +	}
> +

What if recovery is disabled, fatal case is clearing the hwspinlock but
wdog case is not, right ?

-Mukesh
Mukesh Ojha May 17, 2024, 7:21 a.m. UTC | #2
On 5/17/2024 4:28 AM, Chris Lew wrote:
> From: Richard Maina <quic_rmaina@quicinc.com>
> 
> When remoteproc goes down unexpectedly this results in a state where any
> acquired hwspinlocks will remain locked possibly resulting in deadlock.
> In order to ensure all locks are freed we include a call to
> hwspin_lock_bust() during remoteproc shutdown.
> 
> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
> is used to take the hwspinlock. Remoteproc should use this id to try and
> bust the lock on remoteproc stop.
> 
> This edge case only occurs with q6v5_pas watchdog crashes. The error
> fatal case has handling to clear the hwspinlock before the error fatal
> interrupt is triggered.
> 
> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
> ---
>   drivers/remoteproc/qcom_q6v5_pas.c | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
> index 54d8005d40a3..57178fcb9aa3 100644
> --- a/drivers/remoteproc/qcom_q6v5_pas.c
> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
> @@ -10,6 +10,7 @@
>   #include <linux/clk.h>
>   #include <linux/delay.h>
>   #include <linux/firmware.h>
> +#include <linux/hwspinlock.h>
>   #include <linux/interrupt.h>
>   #include <linux/kernel.h>
>   #include <linux/module.h>
> @@ -52,6 +53,7 @@ struct adsp_data {
>   	const char *ssr_name;
>   	const char *sysmon_name;
>   	int ssctl_id;
> +	int hwlock_id;
>   
>   	int region_assign_idx;
>   	int region_assign_count;
> @@ -84,6 +86,9 @@ struct qcom_adsp {
>   	bool decrypt_shutdown;
>   	const char *info_name;
>   
> +	struct hwspinlock *hwlock;
> +	int hwlock_id;
> +
>   	const struct firmware *firmware;
>   	const struct firmware *dtb_firmware;
>   
> @@ -399,6 +404,12 @@ static int adsp_stop(struct rproc *rproc)
>   	if (handover)
>   		qcom_pas_handover(&adsp->q6v5);
>   
> +	if (adsp->hwlock) {
> +		ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
> +		if (ret)
> +			dev_info(adsp->dev, "failed to bust hwspinlock\n");
> +	}
> +

As you said above, you seem to cover only wdog case and fatal cases
are already handled at remote;

You are clearing here for both ? is this right understanding ?

-Mukesh
Bryan O'Donoghue May 17, 2024, 9:08 a.m. UTC | #3
On 17/05/2024 00:58, Chris Lew wrote:
> From: Richard Maina <quic_rmaina@quicinc.com>
> 
> When remoteproc goes down unexpectedly this results in a state where any
> acquired hwspinlocks will remain locked possibly resulting in deadlock.
> In order to ensure all locks are freed we include a call to
> hwspin_lock_bust() during remoteproc shutdown.
> 
> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
> is used to take the hwspinlock. Remoteproc should use this id to try and
> bust the lock on remoteproc stop.
> 
> This edge case only occurs with q6v5_pas watchdog crashes. The error
> fatal case has handling to clear the hwspinlock before the error fatal
> interrupt is triggered.
> 
> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
> ---

> +	if (adsp->hwlock) {
> +		ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
> +		if (ret)
> +			dev_info(adsp->dev, "failed to bust hwspinlock\n");

qcom_hwspinlock_bust() already prints an error on failure, you're 
printing a second error here.

Choose at most one.

---
bod
Chris Lew May 17, 2024, 5:20 p.m. UTC | #4
On 5/17/2024 2:08 AM, Bryan O'Donoghue wrote:
> On 17/05/2024 00:58, Chris Lew wrote:
>> From: Richard Maina <quic_rmaina@quicinc.com>
>>
>> When remoteproc goes down unexpectedly this results in a state where any
>> acquired hwspinlocks will remain locked possibly resulting in deadlock.
>> In order to ensure all locks are freed we include a call to
>> hwspin_lock_bust() during remoteproc shutdown.
>>
>> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
>> is used to take the hwspinlock. Remoteproc should use this id to try and
>> bust the lock on remoteproc stop.
>>
>> This edge case only occurs with q6v5_pas watchdog crashes. The error
>> fatal case has handling to clear the hwspinlock before the error fatal
>> interrupt is triggered.
>>
>> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
>> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
>> ---
> 
>> +    if (adsp->hwlock) {
>> +        ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
>> +        if (ret)
>> +            dev_info(adsp->dev, "failed to bust hwspinlock\n");
> 
> qcom_hwspinlock_bust() already prints an error on failure, you're 
> printing a second error here.
> 
> Choose at most one.
> 

Ack, will remove the error print here and leave the one in 
qcom_hwspinlock_bust()

> ---
> bod
Chris Lew May 17, 2024, 5:25 p.m. UTC | #5
On 5/17/2024 12:21 AM, Mukesh Ojha wrote:
> 
> 
> On 5/17/2024 4:28 AM, Chris Lew wrote:
>> From: Richard Maina <quic_rmaina@quicinc.com>
>>
>> When remoteproc goes down unexpectedly this results in a state where any
>> acquired hwspinlocks will remain locked possibly resulting in deadlock.
>> In order to ensure all locks are freed we include a call to
>> hwspin_lock_bust() during remoteproc shutdown.
>>
>> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
>> is used to take the hwspinlock. Remoteproc should use this id to try and
>> bust the lock on remoteproc stop.
>>
>> This edge case only occurs with q6v5_pas watchdog crashes. The error
>> fatal case has handling to clear the hwspinlock before the error fatal
>> interrupt is triggered.
>>
>> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
>> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
>> ---
>>   drivers/remoteproc/qcom_q6v5_pas.c | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
>> b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 54d8005d40a3..57178fcb9aa3 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/clk.h>
>>   #include <linux/delay.h>
>>   #include <linux/firmware.h>
>> +#include <linux/hwspinlock.h>
>>   #include <linux/interrupt.h>
>>   #include <linux/kernel.h>
>>   #include <linux/module.h>
>> @@ -52,6 +53,7 @@ struct adsp_data {
>>       const char *ssr_name;
>>       const char *sysmon_name;
>>       int ssctl_id;
>> +    int hwlock_id;
>>       int region_assign_idx;
>>       int region_assign_count;
>> @@ -84,6 +86,9 @@ struct qcom_adsp {
>>       bool decrypt_shutdown;
>>       const char *info_name;
>> +    struct hwspinlock *hwlock;
>> +    int hwlock_id;
>> +
>>       const struct firmware *firmware;
>>       const struct firmware *dtb_firmware;
>> @@ -399,6 +404,12 @@ static int adsp_stop(struct rproc *rproc)
>>       if (handover)
>>           qcom_pas_handover(&adsp->q6v5);
>> +    if (adsp->hwlock) {
>> +        ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
>> +        if (ret)
>> +            dev_info(adsp->dev, "failed to bust hwspinlock\n");
>> +    }
>> +
> 
> As you said above, you seem to cover only wdog case and fatal cases
> are already handled at remote;
> 
> You are clearing here for both ? is this right understanding ?
> 

Yes that is correct. While the firmware is able to handle error fatal 
cases, I think it is still the responsibility of the remoteproc driver 
to try and bust the lock on behalf of the rproc in both cases.

The bust will only clear the spinlock if it has the token (specific id 
for that rproc) that is passed into hwspin_lock_bust, so it is safe to 
attempt this even if the value has been cleared by the rproc in the 
error fatal case.

> -Mukesh
Chris Lew May 21, 2024, 9:08 p.m. UTC | #6
On 5/21/2024 10:38 AM, Konrad Dybcio wrote:
> 
> 
> On 5/17/24 00:58, Chris Lew wrote:
>> From: Richard Maina <quic_rmaina@quicinc.com>
>>
>> When remoteproc goes down unexpectedly this results in a state where any
>> acquired hwspinlocks will remain locked possibly resulting in deadlock.
>> In order to ensure all locks are freed we include a call to
>> hwspin_lock_bust() during remoteproc shutdown.
>>
>> For qcom_q6v5_pas remoteprocs, each remoteproc has an assigned id that
>> is used to take the hwspinlock. Remoteproc should use this id to try and
>> bust the lock on remoteproc stop.
>>
>> This edge case only occurs with q6v5_pas watchdog crashes. The error
>> fatal case has handling to clear the hwspinlock before the error fatal
>> interrupt is triggered.
>>
>> Signed-off-by: Richard Maina <quic_rmaina@quicinc.com>
>> Signed-off-by: Chris Lew <quic_clew@quicinc.com>
>> ---
> 
> 
>>   drivers/remoteproc/qcom_q6v5_pas.c | 28 ++++++++++++++++++++++++++++
>>   1 file changed, 28 insertions(+)
>>
>> diff --git a/drivers/remoteproc/qcom_q6v5_pas.c 
>> b/drivers/remoteproc/qcom_q6v5_pas.c
>> index 54d8005d40a3..57178fcb9aa3 100644
>> --- a/drivers/remoteproc/qcom_q6v5_pas.c
>> +++ b/drivers/remoteproc/qcom_q6v5_pas.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/clk.h>
>>   #include <linux/delay.h>
>>   #include <linux/firmware.h>
>> +#include <linux/hwspinlock.h>
>>   #include <linux/interrupt.h>
>>   #include <linux/kernel.h>
>>   #include <linux/module.h>
>> @@ -52,6 +53,7 @@ struct adsp_data {
>>       const char *ssr_name;
>>       const char *sysmon_name;
>>       int ssctl_id;
>> +    int hwlock_id;
>>       int region_assign_idx;
>>       int region_assign_count;
>> @@ -84,6 +86,9 @@ struct qcom_adsp {
>>       bool decrypt_shutdown;
>>       const char *info_name;
>> +    struct hwspinlock *hwlock;
>> +    int hwlock_id;
> 
> IIRC, this is the same one that is passed in the DT.
> 
> Can we get it dynamically from there?
> 

The argument passed in DT is the index of the hwlock in the TCSR mutex 
region. The index determines use of hwlock[0..n]

This id is supposed to be the identifier that is passed into 
hwspin_lock_bust(). The actual value that we would read from 
hwlock[0..n] to see if we need to bust the lock.

Maybe the naming of this variable is confusing. Do you have any 
suggestions to make it clearer? Could call it hwlock_bust_id.

We could also try increasing the #hwlock-cells to 2 and have something 
like <&phandle index bust_id>. To me this seemed odd for clients that 
weren't planning on using the bust_id.

> Konrad
diff mbox series

Patch

diff --git a/drivers/remoteproc/qcom_q6v5_pas.c b/drivers/remoteproc/qcom_q6v5_pas.c
index 54d8005d40a3..57178fcb9aa3 100644
--- a/drivers/remoteproc/qcom_q6v5_pas.c
+++ b/drivers/remoteproc/qcom_q6v5_pas.c
@@ -10,6 +10,7 @@ 
 #include <linux/clk.h>
 #include <linux/delay.h>
 #include <linux/firmware.h>
+#include <linux/hwspinlock.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -52,6 +53,7 @@  struct adsp_data {
 	const char *ssr_name;
 	const char *sysmon_name;
 	int ssctl_id;
+	int hwlock_id;
 
 	int region_assign_idx;
 	int region_assign_count;
@@ -84,6 +86,9 @@  struct qcom_adsp {
 	bool decrypt_shutdown;
 	const char *info_name;
 
+	struct hwspinlock *hwlock;
+	int hwlock_id;
+
 	const struct firmware *firmware;
 	const struct firmware *dtb_firmware;
 
@@ -399,6 +404,12 @@  static int adsp_stop(struct rproc *rproc)
 	if (handover)
 		qcom_pas_handover(&adsp->q6v5);
 
+	if (adsp->hwlock) {
+		ret = hwspin_lock_bust(adsp->hwlock, adsp->hwlock_id);
+		if (ret)
+			dev_info(adsp->dev, "failed to bust hwspinlock\n");
+	}
+
 	return ret;
 }
 
@@ -684,6 +695,7 @@  static int adsp_probe(struct platform_device *pdev)
 	struct rproc *rproc;
 	const char *fw_name, *dtb_fw_name = NULL;
 	const struct rproc_ops *ops = &adsp_ops;
+	int hwlock_idx;
 	int ret;
 
 	desc = of_device_get_match_data(&pdev->dev);
@@ -736,6 +748,17 @@  static int adsp_probe(struct platform_device *pdev)
 		adsp->dtb_firmware_name = dtb_fw_name;
 		adsp->dtb_pas_id = desc->dtb_pas_id;
 	}
+
+	if (desc->hwlock_id) {
+		adsp->hwlock_id = desc->hwlock_id;
+		hwlock_idx = of_hwspin_lock_get_id(pdev->dev.of_node, 0);
+		if (hwlock_idx >= 0) {
+			adsp->hwlock = hwspin_lock_request_specific(hwlock_idx);
+			if (!adsp->hwlock)
+				dev_err(&pdev->dev, "failed to request hwspinlock\n");
+		}
+	}
+
 	platform_set_drvdata(pdev, adsp);
 
 	ret = device_init_wakeup(adsp->dev, true);
@@ -1196,6 +1219,7 @@  static const struct adsp_data sm8550_adsp_resource = {
 	.ssr_name = "lpass",
 	.sysmon_name = "adsp",
 	.ssctl_id = 0x14,
+	.hwlock_id = 3,
 };
 
 static const struct adsp_data sm8550_cdsp_resource = {
@@ -1216,6 +1240,7 @@  static const struct adsp_data sm8550_cdsp_resource = {
 	.ssr_name = "cdsp",
 	.sysmon_name = "cdsp",
 	.ssctl_id = 0x17,
+	.hwlock_id = 6,
 };
 
 static const struct adsp_data sm8550_mpss_resource = {
@@ -1236,6 +1261,7 @@  static const struct adsp_data sm8550_mpss_resource = {
 	.ssr_name = "mpss",
 	.sysmon_name = "modem",
 	.ssctl_id = 0x12,
+	.hwlock_id = 2,
 	.region_assign_idx = 2,
 	.region_assign_count = 1,
 	.region_assign_vmid = QCOM_SCM_VMID_MSS_MSA,
@@ -1275,6 +1301,7 @@  static const struct adsp_data sm8650_cdsp_resource = {
 	.ssr_name = "cdsp",
 	.sysmon_name = "cdsp",
 	.ssctl_id = 0x17,
+	.hwlock_id = 6,
 	.region_assign_idx = 2,
 	.region_assign_count = 1,
 	.region_assign_shared = true,
@@ -1299,6 +1326,7 @@  static const struct adsp_data sm8650_mpss_resource = {
 	.ssr_name = "mpss",
 	.sysmon_name = "modem",
 	.ssctl_id = 0x12,
+	.hwlock_id = 2,
 	.region_assign_idx = 2,
 	.region_assign_count = 3,
 	.region_assign_vmid = QCOM_SCM_VMID_MSS_MSA,