Message ID | 20240415113436.3261042-1-vschneid@redhat.com |
---|---|
Headers | show |
Series | tcp/dcpp: Un-pin tw_timer | expand |
Apologies for the delayed reply, I was away for most of last week; On 16/04/24 17:01, Eric Dumazet wrote: > On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@redhat.com> wrote: >> >> On 15/04/24 14:35, Eric Dumazet wrote: >> > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: >> >> v4 -> v5 >> >> ++++++++ >> >> >> >> o Rebased against latest Linus' tree >> >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 >> >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org >> > >> > What was the issue again ? >> > >> > Please explain precisely why it was fundamentally tied to the use of >> > timers (and this was not possible to fix the issue without >> > adding work queues and more dependencies to TCP stack) >> >> In v4 I added the use of the ehash lock to serialize arming the timewait >> timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). >> >> Unfortunately, holding a lock both in a timer callback and in the context >> in which it is destroyed is invalid. AIUI the issue is as follows: >> >> CPUx CPUy >> spin_lock(foo); >> <timer fires> >> call_timer_fn() >> spin_lock(foo) // blocks >> timer_shutdown_sync() >> __timer_delete_sync() >> __try_to_del_timer_sync() // looped as long as timer is running >> <deadlock> >> >> In our case, we had in v4: >> >> inet_twsk_deschedule_put() >> spin_lock(ehash_lock); >> tw_timer_handler() >> inet_twsk_kill() >> spin_lock(ehash_lock); >> __inet_twsk_kill(); >> timer_shutdown_sync(&tw->tw_timer); >> >> The fix here is to move the timer deletion to a non-timer >> context. Workqueues fit the bill, and as the tw_timer_handler() would just queue >> a work item, I converted it to a delayed_work. > > I do not like this delayed work approach. > > Adding more dependencies to the TCP stack is not very nice from a > maintainer point of view. > > Why couldn't you call timer_shutdown_sync() before grabbing the lock ? We need the timer_shutdown_sync() and mod_timer() of tw->tw_timer to be serialized in some way. If they aren't, we have the following race: tcp_time_wait() inet_twsk_hashdance() inet_twsk_deschedule_put() // Returns 0 because not pending, but prevents future arming timer_shutdown_sync() inet_twsk_schedule() // Returns 0 as if timer had been succesfully armed mod_timer() This means inet_twsk_deschedule_put() doesn't end up calling inet_twsk_kill() (because the timer wasn't pending when it got shutdown), but inet_twsk_schedule() doesn't arm it either despite the hashdance() having updated the refcounts. If we leave the deschedule as a del_timer_sync(), the timer ends up armed in inet_twsk_schedule(), but that means waiting for the timer to fire to clean up the resources despite having called inet_twsk_deschedule_put().
Hi, On 22/04/24 16:31, Valentin Schneider wrote: > Apologies for the delayed reply, I was away for most of last week; > > On 16/04/24 17:01, Eric Dumazet wrote: >> On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@redhat.com> wrote: >>> >>> On 15/04/24 14:35, Eric Dumazet wrote: >>> > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: >>> >> v4 -> v5 >>> >> ++++++++ >>> >> >>> >> o Rebased against latest Linus' tree >>> >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 >>> >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org >>> > >>> > What was the issue again ? >>> > >>> > Please explain precisely why it was fundamentally tied to the use of >>> > timers (and this was not possible to fix the issue without >>> > adding work queues and more dependencies to TCP stack) >>> >>> In v4 I added the use of the ehash lock to serialize arming the timewait >>> timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). >>> >>> Unfortunately, holding a lock both in a timer callback and in the context >>> in which it is destroyed is invalid. AIUI the issue is as follows: >>> >>> CPUx CPUy >>> spin_lock(foo); >>> <timer fires> >>> call_timer_fn() >>> spin_lock(foo) // blocks >>> timer_shutdown_sync() >>> __timer_delete_sync() >>> __try_to_del_timer_sync() // looped as long as timer is running >>> <deadlock> >>> >>> In our case, we had in v4: >>> >>> inet_twsk_deschedule_put() >>> spin_lock(ehash_lock); >>> tw_timer_handler() >>> inet_twsk_kill() >>> spin_lock(ehash_lock); >>> __inet_twsk_kill(); >>> timer_shutdown_sync(&tw->tw_timer); >>> >>> The fix here is to move the timer deletion to a non-timer >>> context. Workqueues fit the bill, and as the tw_timer_handler() would just queue >>> a work item, I converted it to a delayed_work. Does this explanation make sense? This is the reasoning that drove me to involve workqueues. I'm open to suggestions on alternative approaches.