Message ID | 1604660094-123959-1-git-send-email-zhengchuan@huawei.com |
---|---|
State | New |
Headers | show |
Series | [v2] migration/multifd: close TLS channel before socket finalize | expand |
I think i have found it why. When we create tls client in migration_tls_client_create(), we reference tioc->master. As for main migration thread, it will do dereference after migration_channel_connect in socket_outgoing_migration(). As for non-TLS migration, it will do another reference in qemu_fopen_channel_output(ioc) of migration_channel_connect(). In a conclusion, we need to dereference the underlying QIOChannelSocket after tls handshake for multifd-TLS channel. The fix patch is sent and waiting for review. https://www.mail-archive.com/qemu-devel@nongnu.org/msg759110.html On 2020/11/10 19:56, Zheng Chuan wrote: > > > On 2020/11/10 19:01, Daniel P. Berrangé wrote: >> On Tue, Nov 10, 2020 at 06:45:45PM +0800, Zheng Chuan wrote: >>> >>> >>> On 2020/11/10 18:12, Daniel P. Berrangé wrote: >>>> On Fri, Nov 06, 2020 at 06:54:54PM +0800, Chuan Zheng wrote: >>>>> Since we now support tls multifd, when we cancel migration, the TLS >>>>> sockets will be left as CLOSE-WAIT On Src which results in socket >>>>> leak. >>>>> Fix it by closing TLS channel before socket finalize. >>>>> >>>>> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> >>>>> --- >>>>> migration/multifd.c | 14 ++++++++++++++ >>>>> 1 file changed, 14 insertions(+) >>>>> >>>>> diff --git a/migration/multifd.c b/migration/multifd.c >>>>> index 68b171f..a6838dc 100644 >>>>> --- a/migration/multifd.c >>>>> +++ b/migration/multifd.c >>>>> @@ -523,6 +523,19 @@ static void multifd_send_terminate_threads(Error *err) >>>>> } >>>>> } >>>>> >>>>> +static void multifd_tls_socket_close(QIOChannel *ioc, Error *err) >>>>> +{ >>>>> + if (ioc && >>>>> + object_dynamic_cast(OBJECT(ioc), >>>>> + TYPE_QIO_CHANNEL_TLS)) { >>>>> + /* >>>>> + * TLS channel is special, we need close it before >>>>> + * socket finalize. >>>>> + */ >>>>> + qio_channel_close(ioc, &err); >>>>> + } >>>>> +} >>>> >>>> This doesn't feel quite right to me. Calling qio_channel_close will close >>>> both the TLS layer, and the underlying QIOChannelSocket. If the latter >>>> is safe to do, then we don't need the object_dynamic_cast() check, we can >>>> do it unconditionally whether we're using TLS or not. >>>> >>>> Having said that, I'm not sure if we actually want to be using >>>> qio_channel_close or not ? >>>> >>>> I would have expected that there is already code somewhere else in the >>>> migration layer that is closing these multifd channels, but I can't >>>> actually find where that happens right now. Assuming that code does >>>> exist though, qio_channel_shutdown(ioc, BOTH) feels like the right >>>> answer to unblock waiting I/O ops. >>>> >>> Hi, Daniel. >>> Actually, I have tried to use qio_channel_shutdown at the same place, >>> but it seems not work right. >>> the socket connection is closed by observing through 'ss' command but >>> the socket fds in /proc/$(qemu pid)/fd are still residual. >>> >>> The underlying QIOChannelSocket will be closed by >>> qio_channel_socket_finalize() through object_unref(QIOChannel) later >>> in socket_send_channel_destroy(), >>> does that means it is safe to close both of TLS and tcp socket? >> >> Hmm, that makes me even more confused, because the object_unref >> should be calling qio_channel_close() already. >> >> eg with your patch we have: >> >> multifd_tls_socket_close(p->c, NULL); >> -> qio_channel_close(p->c) >> -> qio_channel_tls_close(p->c) >> -> qio_channel_close(master) >> >> socket_send_channel_destroy(p->c) >> -> object_unref(p->c) >> -> qio_channel_tls_socket_finalize(p->c) >> -> object_unref(master) >> -> qio_channel_close(master) >> >> so the multifd_tls_socket_close should not be doing anything >> at all *assuming* we releasing the last reference in our >> object_unref call. >> >> Given what you describe, I think we are *not* releasing the >> last reference. There is an active reference being held >> somewhere else, and that is preventing the QIOChannelSocket >> from being freed and thus the socket remains open. >> >> If that extra active reference is a bug, then we have a memory >> leak of the QIOChannelSocket object, that needs fixing somewhere. >> >> If that extra active reference is intentional, then we do indeed >> need to explicitly close the socket. That is possibly better >> handled by putting a qio_channel_close() call into the >> socket_send_channel_destroy() method. >> >> I wonder if we're leaking a reference to the underlying QIOChannelSocket, >> when we create the QIOChannelTLS wrapper ? That could explain a problem >> that only happens when using TLS. >> > Aha, you are right! > The QIOChannelSocket is added by an extra reference. > > Thread 1 "qemu-system-aar" hit Breakpoint 1, socket_send_channel_destroy ( > send=0xaaaabea527f0) at migration/socket.c:44 > 44 migration/socket.c: No such file or directory. > (gdb) p *((QIOChannelTLS)*send)->master > $5 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>, > properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 0x0, > ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0} > (gdb) p (QIOChannelTLS)*send > $6 = {parent = {parent = {class = 0xaaaabc6430c0, free = 0xffff9bd16c40 <g_free>, > properties = 0xffff61a04f00, ref = 1, parent = 0x0}, features = 2, > name = 0xaaaabdd81290 "multifd-tls-outgoing", ctx = 0x0, read_coroutine = 0x0, > write_coroutine = 0x0}, master = 0xaaaabe350e90, session = 0xaaaabcf1ead0, shutdown = 0} > (gdb) p *((QIOChannelTLS)*send)->master > $7 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>, > properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 0x0, > ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0} > > I'll make it further to see where we do this thing... > >> Regards, >> Daniel >> > -- Regards. Chuan
diff --git a/migration/multifd.c b/migration/multifd.c index 68b171f..a6838dc 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -523,6 +523,19 @@ static void multifd_send_terminate_threads(Error *err) } } +static void multifd_tls_socket_close(QIOChannel *ioc, Error *err) +{ + if (ioc && + object_dynamic_cast(OBJECT(ioc), + TYPE_QIO_CHANNEL_TLS)) { + /* + * TLS channel is special, we need close it before + * socket finalize. + */ + qio_channel_close(ioc, &err); + } +} + void multifd_save_cleanup(void) { int i; @@ -542,6 +555,7 @@ void multifd_save_cleanup(void) MultiFDSendParams *p = &multifd_send_state->params[i]; Error *local_err = NULL; + multifd_tls_socket_close(p->c, NULL); socket_send_channel_destroy(p->c); p->c = NULL; qemu_mutex_destroy(&p->mutex);
Since we now support tls multifd, when we cancel migration, the TLS sockets will be left as CLOSE-WAIT On Src which results in socket leak. Fix it by closing TLS channel before socket finalize. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> --- migration/multifd.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)