diff mbox series

[v2] migration/multifd: close TLS channel before socket finalize

Message ID 1604660094-123959-1-git-send-email-zhengchuan@huawei.com
State New
Headers show
Series [v2] migration/multifd: close TLS channel before socket finalize | expand

Commit Message

Zheng Chuan Nov. 6, 2020, 10:54 a.m. UTC
Since we now support tls multifd, when we cancel migration, the TLS
sockets will be left as CLOSE-WAIT On Src which results in socket
leak.
Fix it by closing TLS channel before socket finalize.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
---
 migration/multifd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Zheng Chuan Nov. 11, 2020, 7:07 a.m. UTC | #1
I think i have found it why.

When we create tls client in migration_tls_client_create(), we reference tioc->master.
As for main migration thread, it will do dereference after migration_channel_connect in socket_outgoing_migration().
As for non-TLS migration, it will do another reference in qemu_fopen_channel_output(ioc) of migration_channel_connect().

In a conclusion, we need to dereference the underlying QIOChannelSocket after tls handshake for multifd-TLS channel.
The fix patch is sent and waiting for review.
https://www.mail-archive.com/qemu-devel@nongnu.org/msg759110.html

On 2020/11/10 19:56, Zheng Chuan wrote:
> 

> 

> On 2020/11/10 19:01, Daniel P. Berrangé wrote:

>> On Tue, Nov 10, 2020 at 06:45:45PM +0800, Zheng Chuan wrote:

>>>

>>>

>>> On 2020/11/10 18:12, Daniel P. Berrangé wrote:

>>>> On Fri, Nov 06, 2020 at 06:54:54PM +0800, Chuan Zheng wrote:

>>>>> Since we now support tls multifd, when we cancel migration, the TLS

>>>>> sockets will be left as CLOSE-WAIT On Src which results in socket

>>>>> leak.

>>>>> Fix it by closing TLS channel before socket finalize.

>>>>>

>>>>> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>

>>>>> ---

>>>>>  migration/multifd.c | 14 ++++++++++++++

>>>>>  1 file changed, 14 insertions(+)

>>>>>

>>>>> diff --git a/migration/multifd.c b/migration/multifd.c

>>>>> index 68b171f..a6838dc 100644

>>>>> --- a/migration/multifd.c

>>>>> +++ b/migration/multifd.c

>>>>> @@ -523,6 +523,19 @@ static void multifd_send_terminate_threads(Error *err)

>>>>>      }

>>>>>  }

>>>>>  

>>>>> +static void multifd_tls_socket_close(QIOChannel *ioc, Error *err)

>>>>> +{

>>>>> +    if (ioc &&

>>>>> +        object_dynamic_cast(OBJECT(ioc),

>>>>> +                            TYPE_QIO_CHANNEL_TLS)) {

>>>>> +        /*

>>>>> +         * TLS channel is special, we need close it before

>>>>> +         * socket finalize.

>>>>> +         */

>>>>> +        qio_channel_close(ioc, &err);

>>>>> +    }

>>>>> +}

>>>>

>>>> This doesn't feel quite right to me.  Calling qio_channel_close will close

>>>> both the TLS layer, and the underlying QIOChannelSocket. If the latter

>>>> is safe to do, then we don't need the object_dynamic_cast() check, we can

>>>> do it unconditionally whether we're using TLS or not.

>>>>

>>>> Having said that, I'm not sure if we actually want to be using

>>>> qio_channel_close or not ?

>>>>

>>>> I would have expected that there is already code somewhere else in the

>>>> migration layer that is closing these multifd channels, but I can't

>>>> actually find where that happens right now.  Assuming that code does

>>>> exist though, qio_channel_shutdown(ioc, BOTH) feels like the right

>>>> answer to unblock waiting I/O ops.

>>>>

>>> Hi, Daniel.

>>> Actually, I have tried to use qio_channel_shutdown at the same place,

>>> but it seems not work right.

>>> the socket connection is closed by observing through 'ss' command but

>>> the socket fds in /proc/$(qemu pid)/fd are still residual.

>>>

>>> The underlying QIOChannelSocket will be closed by

>>> qio_channel_socket_finalize() through object_unref(QIOChannel) later

>>> in socket_send_channel_destroy(),

>>> does that means it is safe to close both of TLS and tcp socket?

>>

>> Hmm, that makes me even more confused, because the object_unref

>> should be calling qio_channel_close() already.

>>

>> eg with your patch we have:

>>

>>        multifd_tls_socket_close(p->c, NULL);

>>            -> qio_channel_close(p->c)

>> 	        -> qio_channel_tls_close(p->c)

>>                      -> qio_channel_close(master)

>>

>>        socket_send_channel_destroy(p->c)

>>            -> object_unref(p->c)

>> 	         -> qio_channel_tls_socket_finalize(p->c)

>> 		      -> object_unref(master)

>> 		              -> qio_channel_close(master)

>>

>> so the multifd_tls_socket_close should not be doing anything

>> at all *assuming* we releasing the last reference in our

>> object_unref call.

>>

>> Given what you describe, I think we are *not* releasing the

>> last reference. There is an active reference being held

>> somewhere else, and that is preventing the QIOChannelSocket

>> from being freed and thus the socket remains open.

>>

>> If that extra active reference is a bug, then we have a memory

>> leak of the QIOChannelSocket object, that needs fixing somewhere.

>>

>> If that extra active reference is intentional, then we do indeed

>> need to explicitly close the socket. That is possibly better

>> handled by putting a qio_channel_close() call into the

>> socket_send_channel_destroy() method.

>>

>> I wonder if we're leaking a reference to the underlying QIOChannelSocket,

>> when we create the QIOChannelTLS wrapper ? That could explain a problem

>> that only happens when using TLS.

>>

> Aha, you are right!

> The QIOChannelSocket is added by an extra reference.

> 

> Thread 1 "qemu-system-aar" hit Breakpoint 1, socket_send_channel_destroy (

>     send=0xaaaabea527f0) at migration/socket.c:44

> 44	migration/socket.c: No such file or directory.

> (gdb) p *((QIOChannelTLS)*send)->master

> $5 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>,

>     properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 0x0,

>   ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0}

> (gdb) p (QIOChannelTLS)*send

> $6 = {parent = {parent = {class = 0xaaaabc6430c0, free = 0xffff9bd16c40 <g_free>,

>       properties = 0xffff61a04f00, ref = 1, parent = 0x0}, features = 2,

>     name = 0xaaaabdd81290 "multifd-tls-outgoing", ctx = 0x0, read_coroutine = 0x0,

>     write_coroutine = 0x0}, master = 0xaaaabe350e90, session = 0xaaaabcf1ead0, shutdown = 0}

> (gdb) p *((QIOChannelTLS)*send)->master

> $7 = {parent = {class = 0xaaaabc690c90, free = 0xffff9bd16c40 <g_free>,

>     properties = 0xffff61a04de0, ref = 2, parent = 0x0}, features = 2, name = 0x0,

>   ctx = 0x0, read_coroutine = 0x0, write_coroutine = 0x0}

> 

> I'll make it further to see where we do this thing...

> 

>> Regards,

>> Daniel

>>

> 


-- 
Regards.
Chuan
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 68b171f..a6838dc 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -523,6 +523,19 @@  static void multifd_send_terminate_threads(Error *err)
     }
 }
 
+static void multifd_tls_socket_close(QIOChannel *ioc, Error *err)
+{
+    if (ioc &&
+        object_dynamic_cast(OBJECT(ioc),
+                            TYPE_QIO_CHANNEL_TLS)) {
+        /*
+         * TLS channel is special, we need close it before
+         * socket finalize.
+         */
+        qio_channel_close(ioc, &err);
+    }
+}
+
 void multifd_save_cleanup(void)
 {
     int i;
@@ -542,6 +555,7 @@  void multifd_save_cleanup(void)
         MultiFDSendParams *p = &multifd_send_state->params[i];
         Error *local_err = NULL;
 
+        multifd_tls_socket_close(p->c, NULL);
         socket_send_channel_destroy(p->c);
         p->c = NULL;
         qemu_mutex_destroy(&p->mutex);