diff mbox series

[v3] crypto: Fix hungtask for PADATA_RESET

Message ID 20230904133341.2528440-1-lujialin4@huawei.com
State Accepted
Commit 8f4f68e788c3a7a696546291258bfa5fdb215523
Headers show
Series [v3] crypto: Fix hungtask for PADATA_RESET | expand

Commit Message

Lu Jialin Sept. 4, 2023, 1:33 p.m. UTC
We found a hungtask bug in test_aead_vec_cfg as follows:

INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call trace:
 __switch_to+0x98/0xe0
 __schedule+0x6c4/0xf40
 schedule+0xd8/0x1b4
 schedule_timeout+0x474/0x560
 wait_for_common+0x368/0x4e0
 wait_for_completion+0x20/0x30
 wait_for_completion+0x20/0x30
 test_aead_vec_cfg+0xab4/0xd50
 test_aead+0x144/0x1f0
 alg_test_aead+0xd8/0x1e0
 alg_test+0x634/0x890
 cryptomgr_test+0x40/0x70
 kthread+0x1e0/0x220
 ret_from_fork+0x10/0x18
 Kernel panic - not syncing: hung_task: blocked tasks

For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
case, aead_request_complete() will be called in pcrypt_aead_serial and the
return err is 0 for padata_do_parallel. But, when pinst->flags is
PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
hung at wait_for_completion(&wait->completion), which will cause
hungtask.

The problem comes as following:
(padata_do_parallel)                 |
    rcu_read_lock_bh();              |
    err = -EINVAL;                   |   (padata_replace)
                                     |     pinst->flags |= PADATA_RESET;
    err = -EBUSY                     |
    if (pinst->flags & PADATA_RESET) |
        rcu_read_unlock_bh()         |
        return err

In order to resolve the problem, we replace the return err -EBUSY with
-EAGAIN, which means parallel_data is changing, and the caller should call
it again.

v3:
remove retry and just change the return err.
v2:
introduce padata_try_do_parallel() in pcrypt_aead_encrypt and
pcrypt_aead_decrypt to solve the hungtask.

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Guo Zihua <guozihua@huawei.com>
---
 crypto/pcrypt.c | 4 ++++
 kernel/padata.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Lu Jialin Sept. 6, 2023, 7:49 a.m. UTC | #1
Hi Steffen,

padata_do_parallel is only called by pcrypt_aead_encrypt/decrypt, 
therefore, changing in padata_do_parallel and changing in 
pcrypt_aead_encrypt/decrypt have the same effect. Both should be ok.

Thanks.

Herbert, the two ways look both right. What is your suggestion?

On 2023/9/5 17:45, Steffen Klassert wrote:
> On Mon, Sep 04, 2023 at 01:33:41PM +0000, Lu Jialin wrote:
>> ---
>>   crypto/pcrypt.c | 4 ++++
>>   kernel/padata.c | 2 +-
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
>> index 8c1d0ca41213..d0d954fe9d54 100644
>> --- a/crypto/pcrypt.c
>> +++ b/crypto/pcrypt.c
>> @@ -117,6 +117,8 @@ static int pcrypt_aead_encrypt(struct aead_request *req)
>>   	err = padata_do_parallel(ictx->psenc, padata, &ctx->cb_cpu);
>>   	if (!err)
>>   		return -EINPROGRESS;
>> +	if (err == -EBUSY)
>> +		return -EAGAIN;
>>   
>>   	return err;
>>   }
>> @@ -164,6 +166,8 @@ static int pcrypt_aead_decrypt(struct aead_request *req)
>>   	err = padata_do_parallel(ictx->psdec, padata, &ctx->cb_cpu);
>>   	if (!err)
>>   		return -EINPROGRESS;
>> +	if (err == -EBUSY)
>> +		return -EAGAIN;
>>   
>>   	return err;
>>   }
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index 222d60195de6..81c8183f3176 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -202,7 +202,7 @@ int padata_do_parallel(struct padata_shell *ps,
>>   		*cb_cpu = cpu;
>>   	}
>>   
>> -	err =  -EBUSY;
>> +	err = -EBUSY;
> Why not just returning -EAGAIN here directly?
>
>
Herbert Xu Sept. 6, 2023, 7:56 a.m. UTC | #2
On Wed, Sep 06, 2023 at 03:49:30PM +0800, Lu Jialin wrote:
> Hi Steffen,
> 
> padata_do_parallel is only called by pcrypt_aead_encrypt/decrypt, therefore,
> changing in padata_do_parallel and changing in pcrypt_aead_encrypt/decrypt
> have the same effect. Both should be ok.
> 
> Thanks.
> 
> Herbert, the two ways look both right. What is your suggestion?

Either way is fine by me.

Thanks,
Herbert Xu Sept. 15, 2023, 10:43 a.m. UTC | #3
On Mon, Sep 04, 2023 at 01:33:41PM +0000, Lu Jialin wrote:
> We found a hungtask bug in test_aead_vec_cfg as follows:
> 
> INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Call trace:
>  __switch_to+0x98/0xe0
>  __schedule+0x6c4/0xf40
>  schedule+0xd8/0x1b4
>  schedule_timeout+0x474/0x560
>  wait_for_common+0x368/0x4e0
>  wait_for_completion+0x20/0x30
>  wait_for_completion+0x20/0x30
>  test_aead_vec_cfg+0xab4/0xd50
>  test_aead+0x144/0x1f0
>  alg_test_aead+0xd8/0x1e0
>  alg_test+0x634/0x890
>  cryptomgr_test+0x40/0x70
>  kthread+0x1e0/0x220
>  ret_from_fork+0x10/0x18
>  Kernel panic - not syncing: hung_task: blocked tasks
> 
> For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
> wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
> case, aead_request_complete() will be called in pcrypt_aead_serial and the
> return err is 0 for padata_do_parallel. But, when pinst->flags is
> PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
> won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
> hung at wait_for_completion(&wait->completion), which will cause
> hungtask.
> 
> The problem comes as following:
> (padata_do_parallel)                 |
>     rcu_read_lock_bh();              |
>     err = -EINVAL;                   |   (padata_replace)
>                                      |     pinst->flags |= PADATA_RESET;
>     err = -EBUSY                     |
>     if (pinst->flags & PADATA_RESET) |
>         rcu_read_unlock_bh()         |
>         return err
> 
> In order to resolve the problem, we replace the return err -EBUSY with
> -EAGAIN, which means parallel_data is changing, and the caller should call
> it again.
> 
> v3:
> remove retry and just change the return err.
> v2:
> introduce padata_try_do_parallel() in pcrypt_aead_encrypt and
> pcrypt_aead_decrypt to solve the hungtask.
> 
> Signed-off-by: Lu Jialin <lujialin4@huawei.com>
> Signed-off-by: Guo Zihua <guozihua@huawei.com>
> ---
>  crypto/pcrypt.c | 4 ++++
>  kernel/padata.c | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)

Patch applied.  Thanks.
diff mbox series

Patch

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 8c1d0ca41213..d0d954fe9d54 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -117,6 +117,8 @@  static int pcrypt_aead_encrypt(struct aead_request *req)
 	err = padata_do_parallel(ictx->psenc, padata, &ctx->cb_cpu);
 	if (!err)
 		return -EINPROGRESS;
+	if (err == -EBUSY)
+		return -EAGAIN;
 
 	return err;
 }
@@ -164,6 +166,8 @@  static int pcrypt_aead_decrypt(struct aead_request *req)
 	err = padata_do_parallel(ictx->psdec, padata, &ctx->cb_cpu);
 	if (!err)
 		return -EINPROGRESS;
+	if (err == -EBUSY)
+		return -EAGAIN;
 
 	return err;
 }
diff --git a/kernel/padata.c b/kernel/padata.c
index 222d60195de6..81c8183f3176 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -202,7 +202,7 @@  int padata_do_parallel(struct padata_shell *ps,
 		*cb_cpu = cpu;
 	}
 
-	err =  -EBUSY;
+	err = -EBUSY;
 	if ((pinst->flags & PADATA_RESET))
 		goto out;