diff mbox series

[5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c

Message ID 20230320193731.GA36840@zipoli.concurrent-rt.com
State New
Headers show
Series [5.10.162-rt78] Restore initialization of wake_q_sleeper.next in fork.c | expand

Commit Message

Joe Korty March 20, 2023, 7:37 p.m. UTC
In the transition from 5.10.158-rt77 to 5.10.162-rt78,
the initialization of task_struct::wake_q_sleeper.next
was dropped.  Restore it.

This appears to be only a problem in 5.10.  5.15 does not
have wake_q_sleeper; 4.19 does have it but its initialization
there is still present.

The 5.10.162-rt78 patch that damaged fork.c is:

   0170-locking-rtmutex-add-sleeping-lock-implementation.patch

I do not have a simple test that brings out this problem.
My test consists of a shell script and eight binaries,
all of which were written in Ada.  strace shows that it
does a few thousand forks in rapid succession.  One of the
forks stalls out, after which no fork after that returns.
Eventually the 122 second stallout occurs and a large
number of threads are shown to be waiting for tasklist
lock, either in do_exit or in copy_process.  The kernel
.config has rt and many debug features enabled, lockdep
included.

Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com

Comments

Luis Claudio R. Goncalves March 20, 2023, 8 p.m. UTC | #1
On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote:
> In the transition from 5.10.158-rt77 to 5.10.162-rt78,
> the initialization of task_struct::wake_q_sleeper.next
> was dropped.  Restore it.
> 
> This appears to be only a problem in 5.10.  5.15 does not
> have wake_q_sleeper; 4.19 does have it but its initialization
> there is still present.
> 
> The 5.10.162-rt78 patch that damaged fork.c is:
> 
>    0170-locking-rtmutex-add-sleeping-lock-implementation.patch
> 
> I do not have a simple test that brings out this problem.
> My test consists of a shell script and eight binaries,
> all of which were written in Ada.  strace shows that it
> does a few thousand forks in rapid succession.  One of the
> forks stalls out, after which no fork after that returns.
> Eventually the 122 second stallout occurs and a large
> number of threads are shown to be waiting for tasklist
> lock, either in do_exit or in copy_process.  The kernel
> .config has rt and many debug features enabled, lockdep
> included.

Joe, thank you for investigating that problem and for writing a patch.

Earlier today Steffen Dirkwinkel sent a similar patch:

    https://lore.kernel.org/all/20230320080347.32434-1-linux@steffen.cc/

Would you mind giving your ACK to his patch? I have that patch queued for
my next build already.

Thank you,
Luis
 
> Signed-off-by: Joe Korty <joe.korty@concurrent-rt.com
> 
> Index: b/kernel/fork.c
> ===================================================================
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -960,6 +960,7 @@ static struct task_struct *dup_task_stru
>  	tsk->splice_pipe = NULL;
>  	tsk->task_frag.page = NULL;
>  	tsk->wake_q.next = NULL;
> +	tsk->wake_q_sleeper.next = NULL;
>  	tsk->pf_io_worker = NULL;
>  
>  	account_kernel_stack(tsk, 1);
> 
---end quoted text---
Joe Korty March 20, 2023, 8:04 p.m. UTC | #2
On Mon, Mar 20, 2023 at 05:00:13PM -0300, Luis Claudio R. Goncalves wrote:
> On Mon, Mar 20, 2023 at 03:37:31PM -0400, Joe Korty wrote:
> > In the transition from 5.10.158-rt77 to 5.10.162-rt78,
> > the initialization of task_struct::wake_q_sleeper.next
> > was dropped.  Restore it.
> > 
> > This appears to be only a problem in 5.10.  5.15 does not
> > have wake_q_sleeper; 4.19 does have it but its initialization
> > there is still present.
> > 
> > The 5.10.162-rt78 patch that damaged fork.c is:
> > 
> >    0170-locking-rtmutex-add-sleeping-lock-implementation.patch
> > 
> > I do not have a simple test that brings out this problem.
> > My test consists of a shell script and eight binaries,
> > all of which were written in Ada.  strace shows that it
> > does a few thousand forks in rapid succession.  One of the
> > forks stalls out, after which no fork after that returns.
> > Eventually the 122 second stallout occurs and a large
> > number of threads are shown to be waiting for tasklist
> > lock, either in do_exit or in copy_process.  The kernel
> > .config has rt and many debug features enabled, lockdep
> > included.
> 
> Joe, thank you for investigating that problem and for writing a patch.
> 
> Earlier today Steffen Dirkwinkel sent a similar patch:
> 
>     https://lore.kernel.org/all/20230320080347.32434-1-linux@steffen.cc/
> 
> Would you mind giving your ACK to his patch? I have that patch queued for
> my next build already.

Acked-by: Joe Korty <joe.korty@concurrent-rt.com>
diff mbox series

Patch

Index: b/kernel/fork.c
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -960,6 +960,7 @@  static struct task_struct *dup_task_stru
 	tsk->splice_pipe = NULL;
 	tsk->task_frag.page = NULL;
 	tsk->wake_q.next = NULL;
+	tsk->wake_q_sleeper.next = NULL;
 	tsk->pf_io_worker = NULL;
 
 	account_kernel_stack(tsk, 1);