Message ID | 699f3b58-af98-124a-1cf4-daccd103897f@redhat.com |
---|---|
State | New |
Headers | show |
Series | lxc container startup error and RFC patch | expand |
On 7/29/21 3:28 PM, Cole Robinson wrote: > Hi all, > > I'm seeing LXC container startup failures. This is with libvirt git, > fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with > other versions). Reproducer: From my experience its more related to cgroups. Works with V2-only, doesn't work with V1 or hybrid. > sudo virt-install --connect lxc:/// --name test-container --memory 128 > --boot init=/bin/sh > > Starting install... > ERROR error from service: > GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does > not belong to any known machine > > libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate > that, showing up in syslog, but commit 9c1693eff made it fatal: > > commit 9c1693eff427661616ce1bd2795688f87288a412 > Author: Pavel Hrdina <phrdina@redhat.com> > Date: Fri Feb 5 16:17:35 2021 +0100 > > vircgroup: use DBus call to systemd for some APIs > > The error comes from virSystemdGetMachineByPID. The PID that shows up in > the above error message does not match the leader PID as reported by > machinectl. This change fixes the error but I don't know if it's correct > or if it has other implications: I'm not familiar enough with the driver to review your change with confidence. > diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c > index 066e013ed4..54ecb1316b 100644 > --- a/src/lxc/lxc_controller.c > +++ b/src/lxc/lxc_controller.c > @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl) > nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1); > > if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, > - ctrl->initpid, > + getpid(), > ctrl->nnicindexes, > ctrl->nicindexes))) > goto cleanup; > > - if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) > + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) > goto cleanup; > > /* Add all qemu-nbd tasks to the cgroup */ > > > Maybe something else isn't working elsewhere. Clearly we try to add both > pids to the systemd machine, but virSystemdGetMachineByPID is not > working to match the non-leader pid, which is the one that the LXC > driver knows about. > > Thoughts? > Can anyone else reproduce? https://gitlab.com/libvirt/libvirt/-/issues/182 Regards, Jim
On 8/2/21 10:20 PM, Jim Fehlig wrote: > On 7/29/21 3:28 PM, Cole Robinson wrote: >> Hi all, >> >> I'm seeing LXC container startup failures. This is with libvirt git, >> fedora 34 host with systemd-248.6-1.fc34.x86_64 (I didn't confirm with >> other versions). Reproducer: > > From my experience its more related to cgroups. Works with V2-only, > doesn't work with V1 or hybrid. Ah, that's the missing piece! I tried to reproduce on my fedora VMs but all of them are fully switched to v2. Thanks Jim, I'll give it another try. > >> sudo virt-install --connect lxc:/// --name test-container --memory 128 >> --boot init=/bin/sh >> >> Starting install... >> ERROR error from service: >> GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 2145047 does >> not belong to any known machine >> >> libvirt 7.0.0 works but 7.1.0+ does not. The root error seems to predate >> that, showing up in syslog, but commit 9c1693eff made it fatal: >> >> commit 9c1693eff427661616ce1bd2795688f87288a412 >> Author: Pavel Hrdina <phrdina@redhat.com> >> Date: Fri Feb 5 16:17:35 2021 +0100 >> >> vircgroup: use DBus call to systemd for some APIs >> >> The error comes from virSystemdGetMachineByPID. The PID that shows up in >> the above error message does not match the leader PID as reported by >> machinectl. This change fixes the error but I don't know if it's correct >> or if it has other implications: > > I'm not familiar enough with the driver to review your change with > confidence. I'll do the review. > >> diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c >> index 066e013ed4..54ecb1316b 100644 >> --- a/src/lxc/lxc_controller.c >> +++ b/src/lxc/lxc_controller.c >> @@ -866,12 +866,12 @@ static int >> virLXCControllerSetupCgroupLimits(virLXCController *ctrl) >> nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, >> auto_nodeset, -1); >> if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, >> - ctrl->initpid, >> + getpid(), >> ctrl->nnicindexes, >> ctrl->nicindexes))) >> goto cleanup; >> - if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) >> + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) >> goto cleanup; >> /* Add all qemu-nbd tasks to the cgroup */ >> >> >> Maybe something else isn't working elsewhere. Clearly we try to add both >> pids to the systemd machine, but virSystemdGetMachineByPID is not >> working to match the non-leader pid, which is the one that the LXC >> driver knows about. >> >> Thoughts? >> Can anyone else reproduce? > > https://gitlab.com/libvirt/libvirt/-/issues/182 Thanks for filing the issue. Michal
diff --git a/src/lxc/lxc_controller.c b/src/lxc/lxc_controller.c index 066e013ed4..54ecb1316b 100644 --- a/src/lxc/lxc_controller.c +++ b/src/lxc/lxc_controller.c @@ -866,12 +866,12 @@ static int virLXCControllerSetupCgroupLimits(virLXCController *ctrl) nodeset = virDomainNumatuneGetNodeset(ctrl->def->numa, auto_nodeset, -1); if (!(ctrl->cgroup = virLXCCgroupCreate(ctrl->def, - ctrl->initpid, + getpid(), ctrl->nnicindexes, ctrl->nicindexes))) goto cleanup; - if (virCgroupAddMachineProcess(ctrl->cgroup, getpid()) < 0) + if (virCgroupAddMachineProcess(ctrl->cgroup, ctrl->initpid) < 0) goto cleanup; /* Add all qemu-nbd tasks to the cgroup */