Message ID | 20250327150406.138736-1-senozhatsky@chromium.org |
---|---|
State | Superseded |
Headers | show |
Series | [PATCHv3] thunderbolt: do not double dequeue a request | expand |
On (25/03/28 00:03), Sergey Senozhatsky wrote: > Some of our devices crash in tb_cfg_request_dequeue(): > > general protection fault, probably for non-canonical address 0xdead000000000122 > > CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 > RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 > Call Trace: > <TASK> > ? tb_cfg_request_dequeue+0x2d/0xa0 > tb_cfg_request_work+0x33/0x80 > worker_thread+0x386/0x8f0 > kthread+0xed/0x110 > ret_from_fork+0x38/0x50 > ret_from_fork_asm+0x1b/0x30 > > The circumstances are unclear, however, the theory is that > tb_cfg_request_work() can be scheduled twice for a request: > first time via frame.callback from ring_work() and second > time from tb_cfg_request(). Both times kworkers will execute > tb_cfg_request_dequeue(), which results in double list_del() > from the ctl->request_queue (the list poison deference hints > at it: 0xdead000000000122). > > Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE > bit set. Mika, as was discussed in [1] thread we rolled out the fix to our fleet and we don't see the crashes anymore. So it's tested and verified. [1] https://lore.kernel.org/linux-kernel/20250327145543.GC3152277@black.fi.intel.com > --- > > v3: tweaked commit message > > drivers/thunderbolt/ctl.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c > index cd15e84c47f4..1db2e951b53f 100644 > --- a/drivers/thunderbolt/ctl.c > +++ b/drivers/thunderbolt/ctl.c > @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req) > struct tb_ctl *ctl = req->ctl; > > mutex_lock(&ctl->request_queue_lock); > + if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) { > + mutex_unlock(&ctl->request_queue_lock); > + return; > + } > + > list_del(&req->list); > clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags); > if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags)) > -- > 2.49.0.395.g12beb8f557-goog >
On Thu, May 08, 2025 at 12:47:18PM +0900, Sergey Senozhatsky wrote: > On (25/03/28 00:03), Sergey Senozhatsky wrote: > > Some of our devices crash in tb_cfg_request_dequeue(): > > > > general protection fault, probably for non-canonical address 0xdead000000000122 > > > > CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 > > RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 > > Call Trace: > > <TASK> > > ? tb_cfg_request_dequeue+0x2d/0xa0 > > tb_cfg_request_work+0x33/0x80 > > worker_thread+0x386/0x8f0 > > kthread+0xed/0x110 > > ret_from_fork+0x38/0x50 > > ret_from_fork_asm+0x1b/0x30 > > > > The circumstances are unclear, however, the theory is that > > tb_cfg_request_work() can be scheduled twice for a request: > > first time via frame.callback from ring_work() and second > > time from tb_cfg_request(). Both times kworkers will execute > > tb_cfg_request_dequeue(), which results in double list_del() > > from the ctl->request_queue (the list poison deference hints > > at it: 0xdead000000000122). > > > > Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE > > bit set. > > Mika, as was discussed in [1] thread we rolled out the fix to > our fleet and we don't see the crashes anymore. So it's tested > and verified. Cool, thanks! Applied to thunderbolt.git/fixes.
diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c index cd15e84c47f4..1db2e951b53f 100644 --- a/drivers/thunderbolt/ctl.c +++ b/drivers/thunderbolt/ctl.c @@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req) struct tb_ctl *ctl = req->ctl; mutex_lock(&ctl->request_queue_lock); + if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) { + mutex_unlock(&ctl->request_queue_lock); + return; + } + list_del(&req->list); clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags); if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))
Some of our devices crash in tb_cfg_request_dequeue(): general protection fault, probably for non-canonical address 0xdead000000000122 CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65 RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0 Call Trace: <TASK> ? tb_cfg_request_dequeue+0x2d/0xa0 tb_cfg_request_work+0x33/0x80 worker_thread+0x386/0x8f0 kthread+0xed/0x110 ret_from_fork+0x38/0x50 ret_from_fork_asm+0x1b/0x30 The circumstances are unclear, however, the theory is that tb_cfg_request_work() can be scheduled twice for a request: first time via frame.callback from ring_work() and second time from tb_cfg_request(). Both times kworkers will execute tb_cfg_request_dequeue(), which results in double list_del() from the ctl->request_queue (the list poison deference hints at it: 0xdead000000000122). Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE bit set. Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: stable@vger.kernel.org --- v3: tweaked commit message drivers/thunderbolt/ctl.c | 5 +++++ 1 file changed, 5 insertions(+)