kdb: Fix the deadlock issue in KDB debugging.

Message ID	20240228025602.3087748-1-liu.yeC@h3c.com
State	New
Headers	show Received: from h3cspam02-ex.h3c.com (smtp.h3c.com [60.191.123.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBF911D685; Wed, 28 Feb 2024 05:11:35 +0000 (UTC) From: LiuYe <liu.yeC@h3c.com> To: <jason.wessel@windriver.com>, <daniel.thompson@linaro.org> CC: <dianders@chromium.org>, <gregkh@linuxfoundation.org>, <jirislaby@kernel.org>, <kgdb-bugreport@lists.sourceforge.net>, <linux-kernel@vger.kernel.org>, <linux-serial@vger.kernel.org>, <liu.yeC@h3c.com> Subject: [PATCH] kdb: Fix the deadlock issue in KDB debugging. Date: Wed, 28 Feb 2024 10:56:02 +0800 Message-ID: <20240228025602.3087748-1-liu.yeC@h3c.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	kdb: Fix the deadlock issue in KDB debugging. \| expand kdb: Fix the deadlock issue in KDB debugging.

Liuye Feb. 28, 2024, 2:56 a.m. UTC

master cpu : After executing the go command, a deadlock occurs.
slave cpu: may be performing thread migration,
        acquiring the running queue lock of master CPU.
        Then it was interrupted by kdb NMI and entered the nmi_handler process.
        (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
        while(1){ touch wathcdog}.)

example:
 BUG: spinlock lockup suspected on CPU#0, namex/10450
 lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
 ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
 ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
Call Trace:
 [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
 [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
 [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
 [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
 [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
 [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
 [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
 [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
 [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
 [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
 CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
 Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
  0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
  ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
  000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
  Call Trace:
  <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
  [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
  [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
  [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
  [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
  [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
  [<ffffffff8107b371>] insert_work+0x81/0xc0
  [<ffffffff8107b4e5>] __queue_work+0x135/0x390
  [<ffffffff8107b786>] queue_work_on+0x46/0x90
  [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
  [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
  [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
  [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
  [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
  [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
  [<ffffffff8108304d>] notify_die+0x3d/0x50
  [<ffffffff81017219>] do_int3+0x89/0x120
  [<ffffffff81480fb4>] int3+0x44/0x80

Signed-off-by: LiuYe <liu.yeC@h3c.com>
---
 drivers/tty/serial/kgdboc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--
2.25.1

-------------------------------------------------------------------------------------------------------------------------------------
±¾ÓÊ¼þ¼°Æä¸½¼þº¬ÓÐÐÂ»ªÈý¼¯ÍÅµÄ±£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢ËÍ¸øÉÏÃæµØÖ·ÖÐÁÐ³ö
µÄ¸öÈË»òÈº×é¡£½ûÖ¹ÈÎºÎÆäËûÈËÒÔÈÎºÎÐÎÊ½Ê¹ÓÃ£¨°üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØÐ¹Â¶¡¢¸´ÖÆ¡¢
»òÉ¢·¢£©±¾ÓÊ¼þÖÐµÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁË±¾ÓÊ¼þ£¬ÇëÄúÁ¢¼´µç»°»òÓÊ¼þÍ¨Öª·¢¼þÈË²¢É¾³ý±¾
ÓÊ¼þ£¡
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

Daniel Thompson Feb. 28, 2024, 12:05 p.m. UTC | #1

On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> master cpu : After executing the go command, a deadlock occurs.
> slave cpu: may be performing thread migration,
>         acquiring the running queue lock of master CPU.
>         Then it was interrupted by kdb NMI and entered the nmi_handler process.
>         (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
>         while(1){ touch wathcdog}.)

I think this description is a little short and doesn't clearly explain
the cause. How about:

Currently, if kgdboc includes 'kdb', then kgdboc will attempt to
use schedule_work() to provoke a keyboard reset when transitioning out
of the debugger and back to normal operation. This can cause
deadlock because schedule_work() is not NMI-safe.

The stack trace below shows an example of the problem. In this
case the master cpu is not running from NMI but it has parked
the slace CPUs using an NMI and the parked CPUs is holding
spinlocks needed by schedule_work().


> example:
>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888, .owner_cpu: 1
>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
> Call Trace:
>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0
>  [<ffffffff8147a7fc>] ? schedule+0x3c/0x90
>  [<ffffffff8147e71a>] ? schedule_hrtimeout_range_clock+0x10a/0x120
>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10
>  [<ffffffff811c839b>] ? ep_scan_ready_list+0x1db/0x1e0
>  [<ffffffff8147e743>] ? schedule_hrtimeout_range+0x13/0x20
>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0
>  [<ffffffff8108c540>] ? wake_up_q+0x70/0x70
>  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>  [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>   Call Trace:
>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>   [<ffffffff81017219>] do_int3+0x89/0x120
>   [<ffffffff81480fb4>] int3+0x44/0x80
>
> Signed-off-by: LiuYe <liu.yeC@h3c.com>
> ---
>  drivers/tty/serial/kgdboc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
> index 7ce7bb164..945318ef1 100644
> --- a/drivers/tty/serial/kgdboc.c
> +++ b/drivers/tty/serial/kgdboc.c
> @@ -22,6 +22,9 @@
>  #include <linux/module.h>
>  #include <linux/platform_device.h>
>  #include <linux/serial_core.h>
> +#include <linux/smp.h>
> +
> +#include "../kernel/sched/sched.h"
>
>  #define MAX_CONFIG_LEN         40
>
> @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
>                 dbg_restore_graphics = 0;
>                 con_debug_leave();
>         }
> -       kgdboc_restore_input();
> +       if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
> +               kgdboc_restore_input();

I don't think solving this by access internal scheduler state is the
right approach .

The description I wrote above perhaps already suggests why. The
deadlock occurs because it is unsafe to call schedule_work() from
the debug trap handler. The debug trap handler in your stack trace is not
running from an NMI but it certainly has NMI-like properties. Therefore
a better fix is not to call schedule_work() at all from the debug trap
handler.

Instead we need to use an NMI-safe API such as irq_work_queue() and that
irq_work can call schedule_work() and trigger the keyboard reset.


Daniel.

Liuye March 1, 2024, 3:30 a.m. UTC | #2

>On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
>> master cpu : After executing the go command, a deadlock occurs.
>> slave cpu: may be performing thread migration,
>>         acquiring the running queue lock of master CPU.
>>         Then it was interrupted by kdb NMI and entered the nmi_handler process.
>>         (nmi_handle-> kgdb_nmicallback-> kgdb_cpu_enter
>>         while(1){ touch wathcdog}.)
>
>I think this description is a little short and doesn't clearly explain the cause. How about:
>
>Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use schedule_work() to provoke a keyboard reset when transitioning out of the debugger and back to normal operation. This can cause deadlock because schedule_work() is not NMI-safe.
>
>The stack trace below shows an example of the problem. In this case the master cpu is not running from NMI but it has parked the slace CPUs using an NMI and the parked CPUs is holding spinlocks needed by schedule_work().
>
>

Due to the brevity of the description, there may be some misunderstanding, so a detailed description is provided as follows:

before KDB command “go”：

When a specific key is detected by the serial port, it will trigger kgdb_breakpoint, and the master CPU0 will enter the kdb_main_loop to process user commands in a loop.

kgdb_breakpoint
int3
do_int3
notify_die
atomic_notifier_call_chain
__atomic_notifier_call_chain
notifier_call_chain
kgdb_notify
__kgdb_notify
kgdb_handle_exception
kgdb_cpu_enter (kgdb_roundup_cpus send IPI to other slave CPU)
kdb_stub
kdb_main_loop

slave CPU1, CPU2, CPU3 ... and other CPUs:
Using CPU1 as an example:
Currently holding the running queue lock of the master CPU0 due to load_balance or other reasons, responding to the NMI sent by master CPU0 through kgdb_roundup_cpus. Enter the following stack:
nmi_handle
kgdb_nmicallback
kgdb_cpu_enter (The slave CPU1 will loop touch watchdog and wait for the master CPU0 to exit.)

The above is the state before executing the KDB command "go".

When the user executes the KDB command "go", it will trigger a deadlock.
master CPU0 :
kdb_main_loop return
kdb_stub return
kgdb_cpu_enter
kgdboc_post_exp_handler
queue_work_on
__queue_work
insert_work
wake_up_process
try_to_wake_up
_raw_spin_lock （Acquire the spin lock of master CPU0 rq->lock, but at this time the spin lock of master CPU0 is held by CPU1）

As a result, a deadlock has occurred.

Therefore, when the master CPU0 exits, if the rq->lock of CPU0 is locked, it should not wake up the worker on the system_wq.

>> example:
>>  BUG: spinlock lockup suspected on CPU#0, namex/10450
>>  lock: 0xffff881ffe823980, .magic: dead4ead, .owner: namexx/21888,
>> .owner_cpu: 1
>>  ffff881741d00000 ffff881741c01000 0000000000000000 0000000000000000
>>  ffff881740f58e78 ffff881741cffdd0 ffffffff8147a7fc ffff881740f58f20
>> Call Trace:
>>  [<ffffffff81479e6d>] ? __schedule+0x16d/0xac0  [<ffffffff8147a7fc>] ?
>> schedule+0x3c/0x90  [<ffffffff8147e71a>] ?
>> schedule_hrtimeout_range_clock+0x10a/0x120
>>  [<ffffffff8147d22e>] ? mutex_unlock+0xe/0x10  [<ffffffff811c839b>] ?
>> ep_scan_ready_list+0x1db/0x1e0  [<ffffffff8147e743>] ?
>> schedule_hrtimeout_range+0x13/0x20
>>  [<ffffffff811c864a>] ? ep_poll+0x27a/0x3b0  [<ffffffff8108c540>] ?
>> wake_up_q+0x70/0x70  [<ffffffff811c99a8>] ? SyS_epoll_wait+0xb8/0xd0
>> [<ffffffff8147f296>] ? entry_SYSCALL_64_fastpath+0x12/0x75
>>  CPU: 0 PID: 10450 Comm: namex Tainted: G           O    4.4.65 #1
>>  Hardware name: Insyde Purley/Type2 - Board Product Name1, BIOS 05.21.51.0036 07/19/2019
>>   0000000000000000 ffff881ffe813c10 ffffffff8124e883 ffff881741c01000
>>   ffff881ffe823980 ffff881ffe813c38 ffffffff810a7f7f ffff881ffe823980
>>   000000007d2b7cd0 0000000000000001 ffff881ffe813c68 ffffffff810a80e0
>>   Call Trace:
>>   <#DB>  [<ffffffff8124e883>] dump_stack+0x85/0xc2
>>   [<ffffffff810a7f7f>] spin_dump+0x7f/0x100
>>   [<ffffffff810a80e0>] do_raw_spin_lock+0xa0/0x150
>>   [<ffffffff8147eb55>] _raw_spin_lock+0x15/0x20
>>   [<ffffffff8108c256>] try_to_wake_up+0x176/0x3d0
>>   [<ffffffff8108c4c5>] wake_up_process+0x15/0x20
>>   [<ffffffff8107b371>] insert_work+0x81/0xc0
>>   [<ffffffff8107b4e5>] __queue_work+0x135/0x390
>>   [<ffffffff8107b786>] queue_work_on+0x46/0x90
>>   [<ffffffff81313d28>] kgdboc_post_exp_handler+0x48/0x70
>>   [<ffffffff810ed488>] kgdb_cpu_enter+0x598/0x610
>>   [<ffffffff810ed6e2>] kgdb_handle_exception+0xf2/0x1f0
>>   [<ffffffff81054e21>] __kgdb_notify+0x71/0xd0
>>   [<ffffffff81054eb5>] kgdb_notify+0x35/0x70
>>   [<ffffffff81082e6a>] notifier_call_chain+0x4a/0x70
>>   [<ffffffff8108304d>] notify_die+0x3d/0x50
>>   [<ffffffff81017219>] do_int3+0x89/0x120
>>   [<ffffffff81480fb4>] int3+0x44/0x80
>>
>> Signed-off-by: LiuYe <liu.yeC@h3c.com>
>> ---
>>  drivers/tty/serial/kgdboc.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
>> index 7ce7bb164..945318ef1 100644
>> --- a/drivers/tty/serial/kgdboc.c
>> +++ b/drivers/tty/serial/kgdboc.c
>> @@ -22,6 +22,9 @@
>>  #include <linux/module.h>
>>  #include <linux/platform_device.h>
>>  #include <linux/serial_core.h>
>> +#include <linux/smp.h>
>> +
>> +#include "../kernel/sched/sched.h"
>>
>>  #define MAX_CONFIG_LEN         40
>>
>> @@ -399,7 +402,8 @@ static void kgdboc_post_exp_handler(void)
>>                 dbg_restore_graphics = 0;
>>                 con_debug_leave();
>>         }
>> -       kgdboc_restore_input();
>> +       if (!raw_spin_is_locked(&(cpu_rq(smp_processor_id())->lock)))
>> +               kgdboc_restore_input();
>
>I don't think solving this by access internal scheduler state is the right approach .
>
>The description I wrote above perhaps already suggests why. The deadlock occurs because it is unsafe to call schedule_work() from the debug trap handler. The debug trap handler in your stack trace is not running from an NMI but it certainly has NMI-like properties. Therefore a better fix is not to call schedule_work() at all from the debug trap handler.
>
>Instead we need to use an NMI-safe API such as irq_work_queue() and that irq_work can call schedule_work() and trigger the keyboard reset.
>
>
>Daniel.


-------------------------------------------------------------------------------------------------------------------------------------
本邮件及其附件含有新华三集团的保密信息，仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、
或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本
邮件！
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!

Daniel Thompson March 1, 2024, 10:59 a.m. UTC | #3

On Fri, Mar 01, 2024 at 03:30:25AM +0000, Liuye wrote:
> >On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> >> master cpu : After executing the go command, a deadlock occurs.
> >> slave cpu: may be performing thread migration, acquiring the
> >> running queue lock of master CPU.  Then it was interrupted by kdb
> >> NMI and entered the nmi_handler process.  (nmi_handle->
> >> kgdb_nmicallback-> kgdb_cpu_enter while(1){ touch wathcdog}.)
> >
> >I think this description is a little short and doesn't clearly
> >explain the cause. How about:
> >
> >Currently, if kgdboc includes 'kdb', then kgdboc will attempt to use
> >schedule_work() to provoke a keyboard reset when transitioning out of
> >the debugger and back to normal operation. This can cause deadlock
> >because schedule_work() is not NMI-safe.
> >
> >The stack trace below shows an example of the problem. In this case
> >the master cpu is not running from NMI but it has parked the slace
> >CPUs using an NMI and the parked CPUs is holding spinlocks needed by
> >schedule_work().
>
> Due to the brevity of the description, there may be some
> misunderstanding, so a detailed description is provided as follows:

So, there is a small mistake in the example description I provided.
After double checking the code it should start slightly differently:
  "Currently, if CONFIG_KDB_KEYBOARD is enabled, then kgdboc will
  attempt to use schedule_work() ...".

However other than that I think it is correct.

The important bit of feedback here is that the patch description should
commence with a description of the bug rather than a description of the
symptom. In this case the bug is kgdboc calls a function that is not
safe to call from this calling context.

It is really useful to describe the symptom as part of the patch
description. However if we focus on the symptom without additional
code review then we can end up with the wrong fix. That is what
happened here. It is unsafe to call schedule_work() and checking
the runqueue locks is insufficient to make it safe because we are
still calling a function from an inappropriate calling context..

Daniel.

Greg Kroah-Hartman March 2, 2024, 8:44 p.m. UTC | #4

On Wed, Feb 28, 2024 at 10:56:02AM +0800, LiuYe wrote:
> This e-mail and its attachments contain confidential information from New H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!

Now deleted.

Liuye March 12, 2024, 8:37 a.m. UTC | #5

I know that you said schedule_work is not NMI save, which is the first issue. Perhaps it can be fixed using irq_work_queue. But even if irq_work_queue is used to implement it, there will still be a deadlock problem because slave cpu1 still has not released the running queue lock of master CPU0.

Daniel Thompson March 12, 2024, 9:57 a.m. UTC | #6

On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> I know that you said schedule_work is not NMI save, which is the first
> issue. Perhaps it can be fixed using irq_work_queue. But even if
> irq_work_queue is used to implement it, there will still be a deadlock
> problem because slave cpu1 still has not released the running queue
> lock of master CPU0.

This doesn't sound right to me. Why do you think CPU1 won't release the
run queue lock?


Daniel.

Liuye March 12, 2024, 10:04 a.m. UTC | #7

>On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> I know that you said schedule_work is not NMI save, which is the first 
>> issue. Perhaps it can be fixed using irq_work_queue. But even if 
>> irq_work_queue is used to implement it, there will still be a deadlock 
>> problem because slave cpu1 still has not released the running queue 
>> lock of master CPU0.
>
>This doesn't sound right to me. Why do you think CPU1 won't release the run queue lock?

In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.

Daniel Thompson March 12, 2024, 10:24 a.m. UTC | #8

On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> I know that you said schedule_work is not NMI save, which is the first
> >> issue. Perhaps it can be fixed using irq_work_queue. But even if
> >> irq_work_queue is used to implement it, there will still be a deadlock
> >> problem because slave cpu1 still has not released the running queue
> >> lock of master CPU0.
> >
> >This doesn't sound right to me. Why do you think CPU1 won't release
> >the run queue lock?
>
> In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.

That shouldn't be a problem. CPU0 will have released that lock by
the time the irq work is dispatched.


Daniel.

Liuye March 13, 2024, 1:22 a.m. UTC | #9

>On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> I know that you said schedule_work is not NMI save, which is the 
>> >> first issue. Perhaps it can be fixed using irq_work_queue. But even 
>> >> if irq_work_queue is used to implement it, there will still be a 
>> >> deadlock problem because slave cpu1 still has not released the 
>> >> running queue lock of master CPU0.
>> >
>> >This doesn't sound right to me. Why do you think CPU1 won't release 
>> >the run queue lock?
>>
>> In this example, CPU1 is waiting for CPU0 to release dbg_slave_lock.
>
>That shouldn't be a problem. CPU0 will have released that lock by the time the irq work is dispatched.

Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be handled, and we are back to the previous issue.

Daniel Thompson March 13, 2024, 2:17 p.m. UTC | #10

On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> >> I know that you said schedule_work is not NMI save, which is the
> >> >> first issue. Perhaps it can be fixed using irq_work_queue. But
> >> >> even if irq_work_queue is used to implement it, there will still
> >> >> be a deadlock problem because slave cpu1 still has not released
> >> >> the running queue lock of master CPU0.
> >> >
> >> >This doesn't sound right to me. Why do you think CPU1 won't
> >> >release the run queue lock?
> >>
> >> In this example, CPU1 is waiting for CPU0 to release
> >> dbg_slave_lock.
> >
> >That shouldn't be a problem. CPU0 will have released that lock by the
> >time the irq work is dispatched.
>
> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to be
> handled, and we are back to the previous issue.

Sorry but I still don't understand what problem you think can happen
here. What is wrong with calling schedule_work() from the IRQ work
handler?

Both irq_work_queue() and schedule_work() are calls to queue deferred
work. It does not matter when the work is queued (providing we are lock
safe). What matters is when the work is actually executed.

Please can you describe the problem you think exists based on when the
work is executed.

Daniel.

Daniel Thompson March 14, 2024, 1:09 p.m. UTC | #11

On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
> >> >> >> I know that you said schedule_work is not NMI save, which is
> >> >> >> the first issue. Perhaps it can be fixed using
> >> >> >> irq_work_queue. But even if irq_work_queue is used to
> >> >> >> implement it, there will still be a deadlock problem because
> >> >> >> slave cpu1 still has not released the running queue lock of
> >> >> >> master CPU0.
> >> >> >
> >> >> >This doesn't sound right to me. Why do you think CPU1 won't
> >> >> >release the run queue lock?
> >> >>
> >> >> In this example, CPU1 is waiting for CPU0 to release
> >> >> dbg_slave_lock.
> >> >
> >> >That shouldn't be a problem. CPU0 will have released that lock by
> >> >the time the irq work is dispatched.
> >>
> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to
> >> be handled, and we are back to the previous issue.
> >
> > Sorry but I still don't understand what problem you think can happen
> > here. What is wrong with calling schedule_work() from the IRQ work
> > handler?
> >
> > Both irq_work_queue() and schedule_work() are calls to queue deferred
> > work. It does not matter when the work is queued (providing we are
> > lock safe). What matters is when the work is actually executed.
> >
> > Please can you describe the problem you think exists based on when
> > the work is executed.
>
> CPU0 enters the KDB process when processing serial port interrupts and
> triggers an IPI (NMI) to other CPUs.  After entering a stable state,
> CPU0 is in interrupt context, while other CPUs are in NMI context.
> Before other CPUs enter NMI context, there is a chance to obtain the
> running queue of CPU0.

Focusing on the run queue locks in this analysis is a mistake. Before
the other CPUs enter NMI context there is a chance for them to obtain
*any* locks, including the timer wheel locks.


> At this time, when CPU0 is processing kgdboc_restore_input, calling
> schedule_work, need_more_worker here determines the chance to wake up
> processes on system_wq.
>
> This will cause CPU0 to acquire the running queue lock of this core,
> which is held by other CPUs.  but other CPUs are still in NMI context
> and have not exited because waiting for CPU0 to release the
> dbg_slave_lock after schedule_work.
>
> After thinking about it, the problem is not whether schedule_work is
> NMI safe, but that processes on system_wq should not be awakened
> immediately when schedule_work is called.

I disagree with this conclusion.

The problem *is* that schedue_work() is not NMI-safe.

You cannot solve an NMI safety problem by replacing schedule_work()
with another function that is also not NMI-safe. That simply changes
the locks that need to be taken to provoke a deadlock.


> I replaced schedule_work with schedule_delayed_work, and this solved
> my problem.

This may stop some specific reproduction from taking place but it
does not fix the potential deadlock.

I still believe that using irq_work is the only way to solve this
properly. Please try the following change:

diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index 7ce7bb1640054..161b25ecc5e15 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/platform_device.h>
 #include <linux/serial_core.h>
+#include <linux/irq_work.h>

 #define MAX_CONFIG_LEN		40

@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)

 static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);

+static void kgdboc_queue_restore_input_helper(struct irq_work *unused)
+{
+	schedule_work(&kgdboc_restore_input_work);
+}
+
+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, kgdboc_queue_restore_input_helper);
+
 static void kgdboc_restore_input(void)
 {
 	if (likely(system_state == SYSTEM_RUNNING))
-		schedule_work(&kgdboc_restore_input_work);
+		irq_work_queue(&kgdboc_restore_input_irq_work);
 }

 static int kgdboc_register_kbd(char **cptr)
@@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
 			i--;
 		}
 	}
+	irq_work_sync(&kgdboc_restore_input_irq_work);
 	flush_work(&kgdboc_restore_input_work);
 }
 #else /* ! CONFIG_KDB_KEYBOARD */

Liuye March 15, 2024, 9:59 a.m. UTC | #12

>On Thu, Mar 14, 2024 at 07:06:22AM +0000, Liuye wrote:
>> >On Wed, Mar 13, 2024 at 01:22:17AM +0000, Liuye wrote:
>> >> >On Tue, Mar 12, 2024 at 10:04:54AM +0000, Liuye wrote:
>> >> >> >On Tue, Mar 12, 2024 at 08:37:11AM +0000, Liuye wrote:
>> >> >> >> I know that you said schedule_work is not NMI save, which is 
>> >> >> >> the first issue. Perhaps it can be fixed using 
>> >> >> >> irq_work_queue. But even if irq_work_queue is used to 
>> >> >> >> implement it, there will still be a deadlock problem because 
>> >> >> >> slave cpu1 still has not released the running queue lock of 
>> >> >> >> master CPU0.
>> >> >> >
>> >> >> >This doesn't sound right to me. Why do you think CPU1 won't 
>> >> >> >release the run queue lock?
>> >> >>
>> >> >> In this example, CPU1 is waiting for CPU0 to release 
>> >> >> dbg_slave_lock.
>> >> >
>> >> >That shouldn't be a problem. CPU0 will have released that lock by 
>> >> >the time the irq work is dispatched.
>> >>
>> >> Release dbg_slave_lock in CPU0. Before that, shcedule_work needs to 
>> >> be handled, and we are back to the previous issue.
>> >
>> > Sorry but I still don't understand what problem you think can happen 
>> > here. What is wrong with calling schedule_work() from the IRQ work 
>> > handler?
>> >
>> > Both irq_work_queue() and schedule_work() are calls to queue 
>> > deferred work. It does not matter when the work is queued (providing 
>> > we are lock safe). What matters is when the work is actually executed.
>> >
>> > Please can you describe the problem you think exists based on when 
>> > the work is executed.
>>
>> CPU0 enters the KDB process when processing serial port interrupts and 
>> triggers an IPI (NMI) to other CPUs.  After entering a stable state,
>> CPU0 is in interrupt context, while other CPUs are in NMI context.
>> Before other CPUs enter NMI context, there is a chance to obtain the 
>> running queue of CPU0.
>
>Focusing on the run queue locks in this analysis is a mistake. Before the other CPUs enter NMI context there is a chance for them to obtain
>*any* locks, including the timer wheel locks.

Yes, you are right. I did not consider it comprehensively.

>> At this time, when CPU0 is processing kgdboc_restore_input, calling 
>> schedule_work, need_more_worker here determines the chance to wake up 
>> processes on system_wq.
>>
>> This will cause CPU0 to acquire the running queue lock of this core, 
>> which is held by other CPUs.  but other CPUs are still in NMI context 
>> and have not exited because waiting for CPU0 to release the 
>> dbg_slave_lock after schedule_work.
>>
>> After thinking about it, the problem is not whether schedule_work is 
>> NMI safe, but that processes on system_wq should not be awakened 
>> immediately when schedule_work is called.
>
>I disagree with this conclusion.
>
>The problem *is* that schedue_work() is not NMI-safe.
>
>You cannot solve an NMI safety problem by replacing schedule_work() with another function that is also not NMI-safe. That simply changes the locks that need to be taken to provoke a deadlock.
>
>
>> I replaced schedule_work with schedule_delayed_work, and this solved 
>> my problem.
>
>This may stop some specific reproduction from taking place but it does not fix the potential deadlock.
>
>I still believe that using irq_work is the only way to solve this properly. Please try the following change:

I tried the following patch and it also resolved the issue.
Thank you for your guidance and suggestions. I will organize this issue and resend a patch.

>diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c index 7ce7bb1640054..161b25ecc5e15 100644
>--- a/drivers/tty/serial/kgdboc.c
>+++ b/drivers/tty/serial/kgdboc.c
>@@ -22,6 +22,7 @@
> #include <linux/module.h>
> #include <linux/platform_device.h>
> #include <linux/serial_core.h>
>+#include <linux/irq_work.h>
>
> #define MAX_CONFIG_LEN		40
>
>@@ -99,10 +100,17 @@ static void kgdboc_restore_input_helper(struct work_struct *dummy)
>
> static DECLARE_WORK(kgdboc_restore_input_work, kgdboc_restore_input_helper);
>
>+static void kgdboc_queue_restore_input_helper(struct irq_work *unused) 
>+{
>+	schedule_work(&kgdboc_restore_input_work);
>+}
>+
>+static DEFINE_IRQ_WORK(kgdboc_restore_input_irq_work, 
>+kgdboc_queue_restore_input_helper);
>+
> static void kgdboc_restore_input(void)
> {
> 	if (likely(system_state == SYSTEM_RUNNING))
>-		schedule_work(&kgdboc_restore_input_work);
>+		irq_work_queue(&kgdboc_restore_input_irq_work);
> }
>
> static int kgdboc_register_kbd(char **cptr) @@ -133,6 +141,7 @@ static void kgdboc_unregister_kbd(void)
> 			i--;
> 		}
> 	}
>+	irq_work_sync(&kgdboc_restore_input_irq_work);
> 	flush_work(&kgdboc_restore_input_work);
> }
> #else /* ! CONFIG_KDB_KEYBOARD */

kdb: Fix the deadlock issue in KDB debugging.

Commit Message

Comments

Patch