diff mbox series

[v3] tty/sysrq: replace smp_processor_id() with get_cpu()

Message ID 20230822102606.2821311-1-usama.anjum@collabora.com
State Superseded
Headers show
Series [v3] tty/sysrq: replace smp_processor_id() with get_cpu() | expand

Commit Message

Muhammad Usama Anjum Aug. 22, 2023, 10:26 a.m. UTC
The smp_processor_id() shouldn't be called from preemptible code.
Instead use get_cpu() and put_cpu() which disables preemption in
addition to getting the processor id. This fixes the following bug:

[  119.143590] sysrq: Show backtrace of all active CPUs
[  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
[  119.144586] caller is debug_smp_processor_id+0x20/0x30
[  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
[  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
[  119.145053] Call trace:
[  119.145093]  dump_backtrace+0x0/0x1a0
[  119.145122]  show_stack+0x18/0x70
[  119.145141]  dump_stack+0xc4/0x11c
[  119.145159]  check_preemption_disabled+0x100/0x110
[  119.145175]  debug_smp_processor_id+0x20/0x30
[  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
[  119.145211]  __handle_sysrq+0x8c/0x1a0
[  119.145227]  write_sysrq_trigger+0x94/0x12c
[  119.145247]  proc_reg_write+0xa8/0xe4
[  119.145266]  vfs_write+0xec/0x280
[  119.145282]  ksys_write+0x6c/0x100
[  119.145298]  __arm64_sys_write+0x20/0x30
[  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
[  119.145332]  do_el0_svc+0x24/0x8c
[  119.145348]  el0_svc+0x10/0x20
[  119.145364]  el0_sync_handler+0x134/0x140
[  119.145381]  el0_sync+0x180/0x1c0

Cc: stable@vger.kernel.org
Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
Changes since v2:
- Add changelog and resend

Changes since v1:
- Add "Cc: stable@vger.kernel.org" tag
---
 drivers/tty/sysrq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Muhammad Usama Anjum Aug. 23, 2023, 11:06 a.m. UTC | #1
On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote:
> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>> The smp_processor_id() shouldn't be called from preemptible code.
>> Instead use get_cpu() and put_cpu() which disables preemption in
>> addition to getting the processor id. This fixes the following bug:
>>
>> [  119.143590] sysrq: Show backtrace of all active CPUs
>> [  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>> [  119.144586] caller is debug_smp_processor_id+0x20/0x30
>> [  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>> [  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>> [  119.145053] Call trace:
>> [  119.145093]  dump_backtrace+0x0/0x1a0
>> [  119.145122]  show_stack+0x18/0x70
>> [  119.145141]  dump_stack+0xc4/0x11c
>> [  119.145159]  check_preemption_disabled+0x100/0x110
>> [  119.145175]  debug_smp_processor_id+0x20/0x30
>> [  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
>> [  119.145211]  __handle_sysrq+0x8c/0x1a0
>> [  119.145227]  write_sysrq_trigger+0x94/0x12c
>> [  119.145247]  proc_reg_write+0xa8/0xe4
>> [  119.145266]  vfs_write+0xec/0x280
>> [  119.145282]  ksys_write+0x6c/0x100
>> [  119.145298]  __arm64_sys_write+0x20/0x30
>> [  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
>> [  119.145332]  do_el0_svc+0x24/0x8c
>> [  119.145348]  el0_svc+0x10/0x20
>> [  119.145364]  el0_sync_handler+0x134/0x140
>> [  119.145381]  el0_sync+0x180/0x1c0
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in
sysrq_handle_showallcpus().

> 
> How has this never shown up before now?  What changed to cause this to
> now be triggered?  This feels odd that no one has seen this in the past
> 20+ years :(
Not sure. Probably the combination of reproduction has happened now. The
following three conditions are needed for the warning to appear:
* Enable CONFIG_DEBUG_PREEMPT
* Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64
* Trigger showallcpu's stack sysrq

> 
> 
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>> Changes since v2:
>> - Add changelog and resend
>>
>> Changes since v1:
>> - Add "Cc: stable@vger.kernel.org" tag
>> ---
>>  drivers/tty/sysrq.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>> --- a/drivers/tty/sysrq.c
>> +++ b/drivers/tty/sysrq.c
>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>>  		if (in_hardirq())
>>  			regs = get_irq_regs();
>>  
>> -		pr_info("CPU%d:\n", smp_processor_id());
>> +		pr_info("CPU%d:\n", get_cpu());
>>  		if (regs)
>>  			show_regs(regs);
>>  		else
>>  			show_stack(NULL, NULL, KERN_INFO);
>>  
>>  		schedule_work(&sysrq_showallcpus);
>> +		put_cpu();
> 
> Why are you putting the cpu _after_ you schedule the work?
The sysrq_showallcpus work prints stack traces on all CPUs other than the
current CPU. So we are re-enabling preemption after scheduling work from
current CPU. So that it doesn't get changed.
> 
> thanks,
> 
> greg k-h
Muhammad Usama Anjum Aug. 30, 2023, 7:34 a.m. UTC | #2
On 8/23/23 4:06 PM, Muhammad Usama Anjum wrote:
> On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote:
>> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>>> The smp_processor_id() shouldn't be called from preemptible code.
>>> Instead use get_cpu() and put_cpu() which disables preemption in
>>> addition to getting the processor id. This fixes the following bug:
>>>
>>> [  119.143590] sysrq: Show backtrace of all active CPUs
>>> [  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>>> [  119.144586] caller is debug_smp_processor_id+0x20/0x30
>>> [  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>>> [  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>>> [  119.145053] Call trace:
>>> [  119.145093]  dump_backtrace+0x0/0x1a0
>>> [  119.145122]  show_stack+0x18/0x70
>>> [  119.145141]  dump_stack+0xc4/0x11c
>>> [  119.145159]  check_preemption_disabled+0x100/0x110
>>> [  119.145175]  debug_smp_processor_id+0x20/0x30
>>> [  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
>>> [  119.145211]  __handle_sysrq+0x8c/0x1a0
>>> [  119.145227]  write_sysrq_trigger+0x94/0x12c
>>> [  119.145247]  proc_reg_write+0xa8/0xe4
>>> [  119.145266]  vfs_write+0xec/0x280
>>> [  119.145282]  ksys_write+0x6c/0x100
>>> [  119.145298]  __arm64_sys_write+0x20/0x30
>>> [  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
>>> [  119.145332]  do_el0_svc+0x24/0x8c
>>> [  119.145348]  el0_svc+0x10/0x20
>>> [  119.145364]  el0_sync_handler+0x134/0x140
>>> [  119.145381]  el0_sync+0x180/0x1c0
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in
> sysrq_handle_showallcpus().
> 
>>
>> How has this never shown up before now?  What changed to cause this to
>> now be triggered?  This feels odd that no one has seen this in the past
>> 20+ years :(
> Not sure. Probably the combination of reproduction has happened now. The
> following three conditions are needed for the warning to appear:
> * Enable CONFIG_DEBUG_PREEMPT
> * Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64
> * Trigger showallcpu's stack sysrqAny thoughts?

> 
>>
>>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>> Changes since v2:
>>> - Add changelog and resend
>>>
>>> Changes since v1:
>>> - Add "Cc: stable@vger.kernel.org" tag
>>> ---
>>>  drivers/tty/sysrq.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>>> --- a/drivers/tty/sysrq.c
>>> +++ b/drivers/tty/sysrq.c
>>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>>>  		if (in_hardirq())
>>>  			regs = get_irq_regs();
>>>  
>>> -		pr_info("CPU%d:\n", smp_processor_id());
>>> +		pr_info("CPU%d:\n", get_cpu());
>>>  		if (regs)
>>>  			show_regs(regs);
>>>  		else
>>>  			show_stack(NULL, NULL, KERN_INFO);
>>>  
>>>  		schedule_work(&sysrq_showallcpus);
>>> +		put_cpu();
>>
>> Why are you putting the cpu _after_ you schedule the work?
> The sysrq_showallcpus work prints stack traces on all CPUs other than the
> current CPU. So we are re-enabling preemption after scheduling work from
> current CPU. So that it doesn't get changed.
>>
>> thanks,
>>
>> greg k-h
>
Muhammad Usama Anjum Sept. 13, 2023, 5:30 a.m. UTC | #3
On 8/23/23 4:06 PM, Muhammad Usama Anjum wrote:
> On 8/22/23 6:24 PM, Greg Kroah-Hartman wrote:
>> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>>> The smp_processor_id() shouldn't be called from preemptible code.
>>> Instead use get_cpu() and put_cpu() which disables preemption in
>>> addition to getting the processor id. This fixes the following bug:
>>>
>>> [  119.143590] sysrq: Show backtrace of all active CPUs
>>> [  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>>> [  119.144586] caller is debug_smp_processor_id+0x20/0x30
>>> [  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>>> [  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>>> [  119.145053] Call trace:
>>> [  119.145093]  dump_backtrace+0x0/0x1a0
>>> [  119.145122]  show_stack+0x18/0x70
>>> [  119.145141]  dump_stack+0xc4/0x11c
>>> [  119.145159]  check_preemption_disabled+0x100/0x110
>>> [  119.145175]  debug_smp_processor_id+0x20/0x30
>>> [  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
>>> [  119.145211]  __handle_sysrq+0x8c/0x1a0
>>> [  119.145227]  write_sysrq_trigger+0x94/0x12c
>>> [  119.145247]  proc_reg_write+0xa8/0xe4
>>> [  119.145266]  vfs_write+0xec/0x280
>>> [  119.145282]  ksys_write+0x6c/0x100
>>> [  119.145298]  __arm64_sys_write+0x20/0x30
>>> [  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
>>> [  119.145332]  do_el0_svc+0x24/0x8c
>>> [  119.145348]  el0_svc+0x10/0x20
>>> [  119.145364]  el0_sync_handler+0x134/0x140
>>> [  119.145381]  el0_sync+0x180/0x1c0
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")This commit had introduced the smp_processor_id() function in
> sysrq_handle_showallcpus().
> 
>>
>> How has this never shown up before now?  What changed to cause this to
>> now be triggered?  This feels odd that no one has seen this in the past
>> 20+ years :(
> Not sure. Probably the combination of reproduction has happened now. The
> following three conditions are needed for the warning to appear:
> * Enable CONFIG_DEBUG_PREEMPT
> * Arch which doesn't define arch_trigger_all_cpu_backtrace such as arm64
> * Trigger showallcpu's stack sysrq

Any thoughts about the patch?

> 
>>
>>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>> Changes since v2:
>>> - Add changelog and resend
>>>
>>> Changes since v1:
>>> - Add "Cc: stable@vger.kernel.org" tag
>>> ---
>>>  drivers/tty/sysrq.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>>> --- a/drivers/tty/sysrq.c
>>> +++ b/drivers/tty/sysrq.c
>>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>>>  		if (in_hardirq())
>>>  			regs = get_irq_regs();
>>>  
>>> -		pr_info("CPU%d:\n", smp_processor_id());
>>> +		pr_info("CPU%d:\n", get_cpu());
>>>  		if (regs)
>>>  			show_regs(regs);
>>>  		else
>>>  			show_stack(NULL, NULL, KERN_INFO);
>>>  
>>>  		schedule_work(&sysrq_showallcpus);
>>> +		put_cpu();
>>
>> Why are you putting the cpu _after_ you schedule the work?
> The sysrq_showallcpus work prints stack traces on all CPUs other than the
> current CPU. So we are re-enabling preemption after scheduling work from
> current CPU. So that it doesn't get changed.
>>
>> thanks,
>>
>> greg k-h
>
Greg KH Sept. 18, 2023, 8:10 a.m. UTC | #4
On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
> The smp_processor_id() shouldn't be called from preemptible code.
> Instead use get_cpu() and put_cpu() which disables preemption in
> addition to getting the processor id. This fixes the following bug:
> 
> [  119.143590] sysrq: Show backtrace of all active CPUs
> [  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
> [  119.144586] caller is debug_smp_processor_id+0x20/0x30
> [  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
> [  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
> [  119.145053] Call trace:
> [  119.145093]  dump_backtrace+0x0/0x1a0
> [  119.145122]  show_stack+0x18/0x70
> [  119.145141]  dump_stack+0xc4/0x11c
> [  119.145159]  check_preemption_disabled+0x100/0x110
> [  119.145175]  debug_smp_processor_id+0x20/0x30
> [  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
> [  119.145211]  __handle_sysrq+0x8c/0x1a0
> [  119.145227]  write_sysrq_trigger+0x94/0x12c
> [  119.145247]  proc_reg_write+0xa8/0xe4
> [  119.145266]  vfs_write+0xec/0x280
> [  119.145282]  ksys_write+0x6c/0x100
> [  119.145298]  __arm64_sys_write+0x20/0x30
> [  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
> [  119.145332]  do_el0_svc+0x24/0x8c
> [  119.145348]  el0_svc+0x10/0x20
> [  119.145364]  el0_sync_handler+0x134/0x140
> [  119.145381]  el0_sync+0x180/0x1c0
> 
> Cc: stable@vger.kernel.org
> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
> Changes since v2:
> - Add changelog and resend
> 
> Changes since v1:
> - Add "Cc: stable@vger.kernel.org" tag
> ---
>  drivers/tty/sysrq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 23198e3f1461a..6b4a28bcf2f5f 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>  		if (in_hardirq())
>  			regs = get_irq_regs();
>  
> -		pr_info("CPU%d:\n", smp_processor_id());
> +		pr_info("CPU%d:\n", get_cpu());

Why not call put_cpu() right here?

>  		if (regs)
>  			show_regs(regs);
>  		else
>  			show_stack(NULL, NULL, KERN_INFO);
>  
>  		schedule_work(&sysrq_showallcpus);
> +		put_cpu();

Why wait so long here after you have scheduled work?  Please drop the
cpu reference right away, you don't need to hold it for this length of
time, right?

thanks,

greg k-h
Muhammad Usama Anjum Oct. 9, 2023, 4:01 p.m. UTC | #5
On 9/19/23 12:13 PM, Greg Kroah-Hartman wrote:
> On Tue, Sep 19, 2023 at 07:52:42AM +0200, Jiri Slaby wrote:
>> On 18. 09. 23, 10:10, Greg Kroah-Hartman wrote:
>>> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>>>> The smp_processor_id() shouldn't be called from preemptible code.
>>>> Instead use get_cpu() and put_cpu() which disables preemption in
>>>> addition to getting the processor id. This fixes the following bug:
>>>>
>>>> [  119.143590] sysrq: Show backtrace of all active CPUs
>>>> [  119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>>>> [  119.144586] caller is debug_smp_processor_id+0x20/0x30
>>>> [  119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>>>> [  119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>>>> [  119.145053] Call trace:
>>>> [  119.145093]  dump_backtrace+0x0/0x1a0
>>>> [  119.145122]  show_stack+0x18/0x70
>>>> [  119.145141]  dump_stack+0xc4/0x11c
>>>> [  119.145159]  check_preemption_disabled+0x100/0x110
>>>> [  119.145175]  debug_smp_processor_id+0x20/0x30
>>>> [  119.145195]  sysrq_handle_showallcpus+0x20/0xc0
>>>> [  119.145211]  __handle_sysrq+0x8c/0x1a0
>>>> [  119.145227]  write_sysrq_trigger+0x94/0x12c
>>>> [  119.145247]  proc_reg_write+0xa8/0xe4
>>>> [  119.145266]  vfs_write+0xec/0x280
>>>> [  119.145282]  ksys_write+0x6c/0x100
>>>> [  119.145298]  __arm64_sys_write+0x20/0x30
>>>> [  119.145315]  el0_svc_common.constprop.0+0x78/0x1e4
>>>> [  119.145332]  do_el0_svc+0x24/0x8c
>>>> [  119.145348]  el0_svc+0x10/0x20
>>>> [  119.145364]  el0_sync_handler+0x134/0x140
>>>> [  119.145381]  el0_sync+0x180/0x1c0
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>>> ---
>>>> Changes since v2:
>>>> - Add changelog and resend
>>>>
>>>> Changes since v1:
>>>> - Add "Cc: stable@vger.kernel.org" tag
>>>> ---
>>>>   drivers/tty/sysrq.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>>>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>>>> --- a/drivers/tty/sysrq.c
>>>> +++ b/drivers/tty/sysrq.c
>>>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>>>>   		if (in_hardirq())
>>>>   			regs = get_irq_regs();
>>>> -		pr_info("CPU%d:\n", smp_processor_id());
>>>> +		pr_info("CPU%d:\n", get_cpu());
>>>
>>> Why not call put_cpu() right here?
>>>
>>>>   		if (regs)
>>>>   			show_regs(regs);
>>>>   		else
>>>>   			show_stack(NULL, NULL, KERN_INFO);
>>>>   		schedule_work(&sysrq_showallcpus);
>>>> +		put_cpu();
>>>
>>> Why wait so long here after you have scheduled work?  Please drop the
>>> cpu reference right away, you don't need to hold it for this length of
>>> time, right?
>>
>> As I understand it, this way, schedule_work() will queue the work on the
>> "gotten" (current) CPU. So sysrq_showregs_othercpus() will really dump other
>> than the "gotten" cpu.
> 
> Ok, that makes a bit more sense, but that's not what the code does
> today, have people seen the regs dumped from the wrong cpu in the past?
> 
>> If that is the case, it indeed should have been described in the commit log.
Thanks for review. I'll add the explanation in the commit log and send again.

> 
> Agreed.
> 
> thanks for the review,
> 
> greg k-h
diff mbox series

Patch

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 23198e3f1461a..6b4a28bcf2f5f 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -262,13 +262,14 @@  static void sysrq_handle_showallcpus(u8 key)
 		if (in_hardirq())
 			regs = get_irq_regs();
 
-		pr_info("CPU%d:\n", smp_processor_id());
+		pr_info("CPU%d:\n", get_cpu());
 		if (regs)
 			show_regs(regs);
 		else
 			show_stack(NULL, NULL, KERN_INFO);
 
 		schedule_work(&sysrq_showallcpus);
+		put_cpu();
 	}
 }