diff mbox series

lxc container startup error and RFC patch

Message ID 699f3b58-af98-124a-1cf4-daccd103897f@redhat.com
State New
Headers show
Series lxc container startup error and RFC patch | expand

Commit Message

Cole Robinson July 29, 2021, 9:28 p.m. UTC
Hi all,

I'm seeing LXC container startup failures. This is with libvirt git,
fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with
other versions). Reproducer:

sudo virt-install --connect lxc:/// --name test-container --memory 128 
--boot init=/bin/sh

Starting install...
ERROR    error from service: 
GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does 
not belong to any known machine

libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate 
that, showing up in syslog, but commit 9c1693eff made it fatal:

commit 9c1693eff427661616ce1bd2795688f87288a412
Author: Pavel Hrdina <phrdina@redhat.com>
Date:   Fri Feb 5 16:17:35 2021 +0100

    vircgroup: use DBus call to systemd for some APIs

The error comes from virSystemdGetMachineByPID. The PID that shows up in 
the above error message does not match the leader PID as reported by 
machinectl. This change fixes the error but I don't know if it's correct 
or if it has other implications:



Maybe something else isn't working elsewhere. Clearly we try to add both
pids to the systemd machine, but virSystemdGetMachineByPID is not
working to match the non-leader pid, which is the one that the LXC
driver knows about.

Thoughts?
Can anyone else reproduce?

Thanks,
Cole

Comments

Jim Fehlig Aug. 2, 2021, 8:20 p.m. UTC | #1
On 7/29/21 3:28 PM, Cole Robinson wrote:
> Hi all,

> 

> I'm seeing LXC container startup failures. This is with libvirt git,

> fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with

> other versions). Reproducer:


 From my experience its more related to cgroups. Works with V2-only, doesn't 
work with V1 or hybrid.

> sudo virt-install --connect lxc:/// --name test-container --memory 128

> --boot init=/bin/sh

> 

> Starting install...

> ERROR    error from service:

> GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does

> not belong to any known machine

> 

> libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate

> that, showing up in syslog, but commit 9c1693eff made it fatal:

> 

> commit 9c1693eff427661616ce1bd2795688f87288a412

> Author: Pavel Hrdina <phrdina@redhat.com>

> Date:   Fri Feb 5 16:17:35 2021 +0100

> 

>      vircgroup: use DBus call to systemd for some APIs

> 

> The error comes from virSystemdGetMachineByPID. The PID that shows up in

> the above error message does not match the leader PID as reported by

> machinectl. This change fixes the error but I don't know if it's correct

> or if it has other implications:


I'm not familiar enough with the driver to review your change with confidence.

> diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c

> index 066e013ed4..54ecb1316b 100644

> --- a/src/lxc/lxc_controller.c

> +++ b/src/lxc/lxc_controller.c

> @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl)

>       nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1);

>   

>       if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def,

> -                                            ctrl->initpid,

> +                                            getpid(),

>                                               ctrl->nnicindexes,

>                                               ctrl->nicindexes)))

>           goto cleanup;

>   

> -    if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0)

> +    if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0)

>           goto cleanup;

>   

>       /* Add all qemu-nbd tasks to the cgroup */

> 

> 

> Maybe something else isn't working elsewhere. Clearly we try to add both

> pids to the systemd machine, but virSystemdGetMachineByPID is not

> working to match the non-leader pid, which is the one that the LXC

> driver knows about.

> 

> Thoughts?

> Can anyone else reproduce?


https://gitlab.com/libvirt/libvirt/-/issues/182

Regards,
Jim
Michal Prívozník Aug. 3, 2021, 5:36 a.m. UTC | #2
On 8/2/21 10:20 PM, Jim Fehlig wrote:
> On 7/29/21 3:28 PM, Cole Robinson wrote:

>> Hi all,

>>

>> I'm seeing LXC container startup failures. This is with libvirt git,

>> fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with

>> other versions). Reproducer:

> 

> From my experience its more related to cgroups. Works with V2-only,

> doesn't work with V1 or hybrid.


Ah, that's the missing piece! I tried to reproduce on my fedora VMs but
all of them are fully switched to v2. Thanks Jim, I'll give it another try.

> 

>> sudo virt-install --connect lxc:/// --name test-container --memory 128

>> --boot init=/bin/sh

>>

>> Starting install...

>> ERROR    error from service:

>> GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does

>> not belong to any known machine

>>

>> libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate

>> that, showing up in syslog, but commit 9c1693eff made it fatal:

>>

>> commit 9c1693eff427661616ce1bd2795688f87288a412

>> Author: Pavel Hrdina <phrdina@redhat.com>

>> Date:   Fri Feb 5 16:17:35 2021 +0100

>>

>>      vircgroup: use DBus call to systemd for some APIs

>>

>> The error comes from virSystemdGetMachineByPID. The PID that shows up in

>> the above error message does not match the leader PID as reported by

>> machinectl. This change fixes the error but I don't know if it's correct

>> or if it has other implications:

> 

> I'm not familiar enough with the driver to review your change with

> confidence.


I'll do the review.

> 

>> diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c

>> index 066e013ed4..54ecb1316b 100644

>> --- a/src/lxc/lxc_controller.c

>> +++ b/src/lxc/lxc_controller.c

>> @@ -866,12 +866,12 @@ static int

>> virLXCControllerSetupCgroupLimits(virLXCController *ctrl)

>>       nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa,

>> auto_nodeset, -1);

>>         if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def,

>> -                                            ctrl->initpid,

>> +                                            getpid(),

>>                                               ctrl->nnicindexes,

>>                                               ctrl->nicindexes)))

>>           goto cleanup;

>>   -    if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0)

>> +    if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0)

>>           goto cleanup;

>>         /* Add all qemu-nbd tasks to the cgroup */

>>

>>

>> Maybe something else isn't working elsewhere. Clearly we try to add both

>> pids to the systemd machine, but virSystemdGetMachineByPID is not

>> working to match the non-leader pid, which is the one that the LXC

>> driver knows about.

>>

>> Thoughts?

>> Can anyone else reproduce?

> 

> https://gitlab.com/libvirt/libvirt/-/issues/182


Thanks for filing the issue.

Michal
diff mbox series

Patch

diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c
index 066e013ed4..54ecb1316b 100644
--- a/src/lxc/lxc_controller.c
+++ b/src/lxc/lxc_controller.c
@@ -866,12 +866,12 @@  static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl)
     nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1);
 
     if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def,
-                                            ctrl->initpid,
+                                            getpid(),
                                             ctrl->nnicindexes,
                                             ctrl->nicindexes)))
         goto cleanup;
 
-    if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0)
+    if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0)
         goto cleanup;
 
     /* Add all qemu-nbd tasks to the cgroup */