Message ID | 20230822102606.2821311-1-usama.anjum@collabora.com |
---|---|
State | Superseded |
Headers | show |
Series | [v3] tty/sysrq: replace smp_processor_id() with get_cpu() | expand |
On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote: > On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote: >> The smp_processor_id() shouldn't be called from preemptible code. >> Instead use get_cpu() and put_cpu() which disables preemption in >> addition to getting the processor id. This fixes the following bug: >> >> [ 119.143590] sysrq: Show backtrace of all active CPUs >> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 >> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 >> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 >> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 >> [ 119.145053] Call trace: >> [ 119.145093] dump_backtrace+0x0/0x1a0 >> [ 119.145122] show_stack+0x18/0x70 >> [ 119.145141] dump_stack+0xc4/0x11c >> [ 119.145159] check_preemption_disabled+0x100/0x110 >> [ 119.145175] debug_smp_processor_id+0x20/0x30 >> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 >> [ 119.145211] __handle_sysrq+0x8c/0x1a0 >> [ 119.145227] write_sysrq_trigger+0x94/0x12c >> [ 119.145247] proc_reg_write+0xa8/0xe4 >> [ 119.145266] vfs_write+0xec/0x280 >> [ 119.145282] ksys_write+0x6c/0x100 >> [ 119.145298] __arm64_sys_write+0x20/0x30 >> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 >> [ 119.145332] do_el0_svc+0x24/0x8c >> [ 119.145348] el0_svc+0x10/0x20 >> [ 119.145364] el0_sync_handler+0x134/0x140 >> [ 119.145381] el0_sync+0x180/0x1c0 >> >> Cc: stable@vger.kernel.org >> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in sysrq_handle_showallcpus(). > > How has this never shown up before now? What changed to cause this to > now be triggered? This feels odd that no one has seen this in the past > 20+ years :( Not sure. Probably the combination of reproduction has happened now. The following three conditions are needed for the warning to appear: * Enable CONFIG_DEBUG_PREEMPT * Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64 * Trigger showallcpu's stack sysrq > > >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> Changes since v2: >> - Add changelog and resend >> >> Changes since v1: >> - Add "Cc: stable@vger.kernel.org" tag >> --- >> drivers/tty/sysrq.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c >> index 23198e3f1461a..6b4a28bcf2f5f 100644 >> --- a/drivers/tty/sysrq.c >> +++ b/drivers/tty/sysrq.c >> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) >> if (in_hardirq()) >> regs = get_irq_regs(); >> >> - pr_info("CPU%d:\n", smp_processor_id()); >> + pr_info("CPU%d:\n", get_cpu()); >> if (regs) >> show_regs(regs); >> else >> show_stack(NULL, NULL, KERN_INFO); >> >> schedule_work(&sysrq_showallcpus); >> + put_cpu(); > > Why are you putting the cpu _after_ you schedule the work? The sysrq_showallcpus work prints stack traces on all CPUs other than the current CPU. So we are re-enabling preemption after scheduling work from current CPU. So that it doesn't get changed. > > thanks, > > greg k-h
On 8/23/23 4:06 PM, Muhammad Usama Anjum wrote: > On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote: >> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote: >>> The smp_processor_id() shouldn't be called from preemptible code. >>> Instead use get_cpu() and put_cpu() which disables preemption in >>> addition to getting the processor id. This fixes the following bug: >>> >>> [ 119.143590] sysrq: Show backtrace of all active CPUs >>> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 >>> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 >>> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 >>> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 >>> [ 119.145053] Call trace: >>> [ 119.145093] dump_backtrace+0x0/0x1a0 >>> [ 119.145122] show_stack+0x18/0x70 >>> [ 119.145141] dump_stack+0xc4/0x11c >>> [ 119.145159] check_preemption_disabled+0x100/0x110 >>> [ 119.145175] debug_smp_processor_id+0x20/0x30 >>> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 >>> [ 119.145211] __handle_sysrq+0x8c/0x1a0 >>> [ 119.145227] write_sysrq_trigger+0x94/0x12c >>> [ 119.145247] proc_reg_write+0xa8/0xe4 >>> [ 119.145266] vfs_write+0xec/0x280 >>> [ 119.145282] ksys_write+0x6c/0x100 >>> [ 119.145298] __arm64_sys_write+0x20/0x30 >>> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 >>> [ 119.145332] do_el0_svc+0x24/0x8c >>> [ 119.145348] el0_svc+0x10/0x20 >>> [ 119.145364] el0_sync_handler+0x134/0x140 >>> [ 119.145381] el0_sync+0x180/0x1c0 >>> >>> Cc: stable@vger.kernel.org >>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in > sysrq_handle_showallcpus(). > >> >> How has this never shown up before now? What changed to cause this to >> now be triggered? This feels odd that no one has seen this in the past >> 20+ years :( > Not sure. Probably the combination of reproduction has happened now. The > following three conditions are needed for the warning to appear: > * Enable CONFIG_DEBUG_PREEMPT > * Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64 > * Trigger showallcpu's stack sysrqAny thoughts? > >> >> >>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>> --- >>> Changes since v2: >>> - Add changelog and resend >>> >>> Changes since v1: >>> - Add "Cc: stable@vger.kernel.org" tag >>> --- >>> drivers/tty/sysrq.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c >>> index 23198e3f1461a..6b4a28bcf2f5f 100644 >>> --- a/drivers/tty/sysrq.c >>> +++ b/drivers/tty/sysrq.c >>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) >>> if (in_hardirq()) >>> regs = get_irq_regs(); >>> >>> - pr_info("CPU%d:\n", smp_processor_id()); >>> + pr_info("CPU%d:\n", get_cpu()); >>> if (regs) >>> show_regs(regs); >>> else >>> show_stack(NULL, NULL, KERN_INFO); >>> >>> schedule_work(&sysrq_showallcpus); >>> + put_cpu(); >> >> Why are you putting the cpu _after_ you schedule the work? > The sysrq_showallcpus work prints stack traces on all CPUs other than the > current CPU. So we are re-enabling preemption after scheduling work from > current CPU. So that it doesn't get changed. >> >> thanks, >> >> greg k-h >
On 8/23/23 4:06 PM, Muhammad Usama Anjum wrote: > On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote: >> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote: >>> The smp_processor_id() shouldn't be called from preemptible code. >>> Instead use get_cpu() and put_cpu() which disables preemption in >>> addition to getting the processor id. This fixes the following bug: >>> >>> [ 119.143590] sysrq: Show backtrace of all active CPUs >>> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 >>> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 >>> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 >>> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 >>> [ 119.145053] Call trace: >>> [ 119.145093] dump_backtrace+0x0/0x1a0 >>> [ 119.145122] show_stack+0x18/0x70 >>> [ 119.145141] dump_stack+0xc4/0x11c >>> [ 119.145159] check_preemption_disabled+0x100/0x110 >>> [ 119.145175] debug_smp_processor_id+0x20/0x30 >>> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 >>> [ 119.145211] __handle_sysrq+0x8c/0x1a0 >>> [ 119.145227] write_sysrq_trigger+0x94/0x12c >>> [ 119.145247] proc_reg_write+0xa8/0xe4 >>> [ 119.145266] vfs_write+0xec/0x280 >>> [ 119.145282] ksys_write+0x6c/0x100 >>> [ 119.145298] __arm64_sys_write+0x20/0x30 >>> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 >>> [ 119.145332] do_el0_svc+0x24/0x8c >>> [ 119.145348] el0_svc+0x10/0x20 >>> [ 119.145364] el0_sync_handler+0x134/0x140 >>> [ 119.145381] el0_sync+0x180/0x1c0 >>> >>> Cc: stable@vger.kernel.org >>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in > sysrq_handle_showallcpus(). > >> >> How has this never shown up before now? What changed to cause this to >> now be triggered? This feels odd that no one has seen this in the past >> 20+ years :( > Not sure. Probably the combination of reproduction has happened now. The > following three conditions are needed for the warning to appear: > * Enable CONFIG_DEBUG_PREEMPT > * Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64 > * Trigger showallcpu's stack sysrq Any thoughts about the patch? > >> >> >>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>> --- >>> Changes since v2: >>> - Add changelog and resend >>> >>> Changes since v1: >>> - Add "Cc: stable@vger.kernel.org" tag >>> --- >>> drivers/tty/sysrq.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c >>> index 23198e3f1461a..6b4a28bcf2f5f 100644 >>> --- a/drivers/tty/sysrq.c >>> +++ b/drivers/tty/sysrq.c >>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) >>> if (in_hardirq()) >>> regs = get_irq_regs(); >>> >>> - pr_info("CPU%d:\n", smp_processor_id()); >>> + pr_info("CPU%d:\n", get_cpu()); >>> if (regs) >>> show_regs(regs); >>> else >>> show_stack(NULL, NULL, KERN_INFO); >>> >>> schedule_work(&sysrq_showallcpus); >>> + put_cpu(); >> >> Why are you putting the cpu _after_ you schedule the work? > The sysrq_showallcpus work prints stack traces on all CPUs other than the > current CPU. So we are re-enabling preemption after scheduling work from > current CPU. So that it doesn't get changed. >> >> thanks, >> >> greg k-h >
On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote: > The smp_processor_id() shouldn't be called from preemptible code. > Instead use get_cpu() and put_cpu() which disables preemption in > addition to getting the processor id. This fixes the following bug: > > [ 119.143590] sysrq: Show backtrace of all active CPUs > [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 > [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 > [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 > [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 > [ 119.145053] Call trace: > [ 119.145093] dump_backtrace+0x0/0x1a0 > [ 119.145122] show_stack+0x18/0x70 > [ 119.145141] dump_stack+0xc4/0x11c > [ 119.145159] check_preemption_disabled+0x100/0x110 > [ 119.145175] debug_smp_processor_id+0x20/0x30 > [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 > [ 119.145211] __handle_sysrq+0x8c/0x1a0 > [ 119.145227] write_sysrq_trigger+0x94/0x12c > [ 119.145247] proc_reg_write+0xa8/0xe4 > [ 119.145266] vfs_write+0xec/0x280 > [ 119.145282] ksys_write+0x6c/0x100 > [ 119.145298] __arm64_sys_write+0x20/0x30 > [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 > [ 119.145332] do_el0_svc+0x24/0x8c > [ 119.145348] el0_svc+0x10/0x20 > [ 119.145364] el0_sync_handler+0x134/0x140 > [ 119.145381] el0_sync+0x180/0x1c0 > > Cc: stable@vger.kernel.org > Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback") > Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- > Changes since v2: > - Add changelog and resend > > Changes since v1: > - Add "Cc: stable@vger.kernel.org" tag > --- > drivers/tty/sysrq.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c > index 23198e3f1461a..6b4a28bcf2f5f 100644 > --- a/drivers/tty/sysrq.c > +++ b/drivers/tty/sysrq.c > @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) > if (in_hardirq()) > regs = get_irq_regs(); > > - pr_info("CPU%d:\n", smp_processor_id()); > + pr_info("CPU%d:\n", get_cpu()); Why not call put_cpu() right here? > if (regs) > show_regs(regs); > else > show_stack(NULL, NULL, KERN_INFO); > > schedule_work(&sysrq_showallcpus); > + put_cpu(); Why wait so long here after you have scheduled work? Please drop the cpu reference right away, you don't need to hold it for this length of time, right? thanks, greg k-h
On 9/19/23 12:13 PM, Greg Kroah-Hartman wrote: > On Tue, Sep 19, 2023 at 07:52:42AM +0200, Jiri Slaby wrote: >> On 18. 09. 23, 10:10, Greg Kroah-Hartman wrote: >>> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote: >>>> The smp_processor_id() shouldn't be called from preemptible code. >>>> Instead use get_cpu() and put_cpu() which disables preemption in >>>> addition to getting the processor id. This fixes the following bug: >>>> >>>> [ 119.143590] sysrq: Show backtrace of all active CPUs >>>> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 >>>> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 >>>> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 >>>> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 >>>> [ 119.145053] Call trace: >>>> [ 119.145093] dump_backtrace+0x0/0x1a0 >>>> [ 119.145122] show_stack+0x18/0x70 >>>> [ 119.145141] dump_stack+0xc4/0x11c >>>> [ 119.145159] check_preemption_disabled+0x100/0x110 >>>> [ 119.145175] debug_smp_processor_id+0x20/0x30 >>>> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 >>>> [ 119.145211] __handle_sysrq+0x8c/0x1a0 >>>> [ 119.145227] write_sysrq_trigger+0x94/0x12c >>>> [ 119.145247] proc_reg_write+0xa8/0xe4 >>>> [ 119.145266] vfs_write+0xec/0x280 >>>> [ 119.145282] ksys_write+0x6c/0x100 >>>> [ 119.145298] __arm64_sys_write+0x20/0x30 >>>> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 >>>> [ 119.145332] do_el0_svc+0x24/0x8c >>>> [ 119.145348] el0_svc+0x10/0x20 >>>> [ 119.145364] el0_sync_handler+0x134/0x140 >>>> [ 119.145381] el0_sync+0x180/0x1c0 >>>> >>>> Cc: stable@vger.kernel.org >>>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback") >>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>>> --- >>>> Changes since v2: >>>> - Add changelog and resend >>>> >>>> Changes since v1: >>>> - Add "Cc: stable@vger.kernel.org" tag >>>> --- >>>> drivers/tty/sysrq.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c >>>> index 23198e3f1461a..6b4a28bcf2f5f 100644 >>>> --- a/drivers/tty/sysrq.c >>>> +++ b/drivers/tty/sysrq.c >>>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) >>>> if (in_hardirq()) >>>> regs = get_irq_regs(); >>>> - pr_info("CPU%d:\n", smp_processor_id()); >>>> + pr_info("CPU%d:\n", get_cpu()); >>> >>> Why not call put_cpu() right here? >>> >>>> if (regs) >>>> show_regs(regs); >>>> else >>>> show_stack(NULL, NULL, KERN_INFO); >>>> schedule_work(&sysrq_showallcpus); >>>> + put_cpu(); >>> >>> Why wait so long here after you have scheduled work? Please drop the >>> cpu reference right away, you don't need to hold it for this length of >>> time, right? >> >> As I understand it, this way, schedule_work() will queue the work on the >> "gotten" (current) CPU. So sysrq_showregs_othercpus() will really dump other >> than the "gotten" cpu. > > Ok, that makes a bit more sense, but that's not what the code does > today, have people seen the regs dumped from the wrong cpu in the past? > >> If that is the case, it indeed should have been described in the commit log. Thanks for review. I'll add the explanation in the commit log and send again. > > Agreed. > > thanks for the review, > > greg k-h
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index 23198e3f1461a..6b4a28bcf2f5f 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key) if (in_hardirq()) regs = get_irq_regs(); - pr_info("CPU%d:\n", smp_processor_id()); + pr_info("CPU%d:\n", get_cpu()); if (regs) show_regs(regs); else show_stack(NULL, NULL, KERN_INFO); schedule_work(&sysrq_showallcpus); + put_cpu(); } }
The smp_processor_id() shouldn't be called from preemptible code. Instead use get_cpu() and put_cpu() which disables preemption in addition to getting the processor id. This fixes the following bug: [ 119.143590] sysrq: Show backtrace of all active CPUs [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873 [ 119.144586] caller is debug_smp_processor_id+0x20/0x30 [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3 [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023 [ 119.145053] Call trace: [ 119.145093] dump_backtrace+0x0/0x1a0 [ 119.145122] show_stack+0x18/0x70 [ 119.145141] dump_stack+0xc4/0x11c [ 119.145159] check_preemption_disabled+0x100/0x110 [ 119.145175] debug_smp_processor_id+0x20/0x30 [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0 [ 119.145211] __handle_sysrq+0x8c/0x1a0 [ 119.145227] write_sysrq_trigger+0x94/0x12c [ 119.145247] proc_reg_write+0xa8/0xe4 [ 119.145266] vfs_write+0xec/0x280 [ 119.145282] ksys_write+0x6c/0x100 [ 119.145298] __arm64_sys_write+0x20/0x30 [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4 [ 119.145332] do_el0_svc+0x24/0x8c [ 119.145348] el0_svc+0x10/0x20 [ 119.145364] el0_sync_handler+0x134/0x140 [ 119.145381] el0_sync+0x180/0x1c0 Cc: stable@vger.kernel.org Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback") Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> --- Changes since v2: - Add changelog and resend Changes since v1: - Add "Cc: stable@vger.kernel.org" tag --- drivers/tty/sysrq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)