Message ID | 1592cc2b0418a0512c83898dbef0b1c9722e8645.1618310545.git.asml.silence@gmail.com |
---|---|
State | Accepted |
Commit | c7d95613c7d6e003969722a290397b8271bdad17 |
Headers | show |
Series | io_uring: fix early sqd_list removal sqpoll hangs | expand |
On 13/04/2021 11:43, Pavel Begunkov wrote: > [ 245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds. > [ 245.463334] task:iou-sqp-1374 state:D flags:0x00004000 > [ 245.463345] Call Trace: > [ 245.463352] __schedule+0x36b/0x950 > [ 245.463376] schedule+0x68/0xe0 > [ 245.463385] __io_uring_cancel+0xfb/0x1a0 > [ 245.463407] do_exit+0xc0/0xb40 > [ 245.463423] io_sq_thread+0x49b/0x710 > [ 245.463445] ret_from_fork+0x22/0x30 > > It happens when sqpoll forgot to run park_task_work and goes to exit, > then exiting user may remove ctx from sqd_list, and so corresponding > io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully > it just stucks in do_exit() in this case. fwiw, it's actually a 5.12 problem and I have a reliable enough way to reproduce it. > Cc: stable@vger.kernel.org > Reported-by: Joakim Hassila <joj@mac.com> > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > fs/io_uring.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index cadd7a65a7f4..f390914666b1 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -6817,6 +6817,9 @@ static int io_sq_thread(void *data) > current->flags |= PF_NO_SETAFFINITY; > > mutex_lock(&sqd->lock); > + /* a user may had exited before the thread wstarted */ > + io_run_task_work_head(&sqd->park_task_work); > + > while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) { > int ret; > bool cap_entries, sqt_spin, needs_sched; > @@ -6833,10 +6836,10 @@ static int io_sq_thread(void *data) > } > cond_resched(); > mutex_lock(&sqd->lock); > - if (did_sig) > - break; > io_run_task_work(); > io_run_task_work_head(&sqd->park_task_work); > + if (did_sig) > + break; > timeout = jiffies + sqd->sq_thread_idle; > continue; > } > -- Pavel Begunkov
On 4/13/21 4:43 AM, Pavel Begunkov wrote: > [ 245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds. > [ 245.463334] task:iou-sqp-1374 state:D flags:0x00004000 > [ 245.463345] Call Trace: > [ 245.463352] __schedule+0x36b/0x950 > [ 245.463376] schedule+0x68/0xe0 > [ 245.463385] __io_uring_cancel+0xfb/0x1a0 > [ 245.463407] do_exit+0xc0/0xb40 > [ 245.463423] io_sq_thread+0x49b/0x710 > [ 245.463445] ret_from_fork+0x22/0x30 > > It happens when sqpoll forgot to run park_task_work and goes to exit, > then exiting user may remove ctx from sqd_list, and so corresponding > io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully > it just stucks in do_exit() in this case. Added for 5.12, thanks. -- Jens Axboe
diff --git a/fs/io_uring.c b/fs/io_uring.c index cadd7a65a7f4..f390914666b1 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6817,6 +6817,9 @@ static int io_sq_thread(void *data) current->flags |= PF_NO_SETAFFINITY; mutex_lock(&sqd->lock); + /* a user may had exited before the thread wstarted */ + io_run_task_work_head(&sqd->park_task_work); + while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) { int ret; bool cap_entries, sqt_spin, needs_sched; @@ -6833,10 +6836,10 @@ static int io_sq_thread(void *data) } cond_resched(); mutex_lock(&sqd->lock); - if (did_sig) - break; io_run_task_work(); io_run_task_work_head(&sqd->park_task_work); + if (did_sig) + break; timeout = jiffies + sqd->sq_thread_idle; continue; }
[ 245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds. [ 245.463334] task:iou-sqp-1374 state:D flags:0x00004000 [ 245.463345] Call Trace: [ 245.463352] __schedule+0x36b/0x950 [ 245.463376] schedule+0x68/0xe0 [ 245.463385] __io_uring_cancel+0xfb/0x1a0 [ 245.463407] do_exit+0xc0/0xb40 [ 245.463423] io_sq_thread+0x49b/0x710 [ 245.463445] ret_from_fork+0x22/0x30 It happens when sqpoll forgot to run park_task_work and goes to exit, then exiting user may remove ctx from sqd_list, and so corresponding io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully it just stucks in do_exit() in this case. Cc: stable@vger.kernel.org Reported-by: Joakim Hassila <joj@mac.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- fs/io_uring.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)