diff mbox

[Xen-devel] vnc=1 / pvgrub / close fb: backend at /local/domain/0/backend/vfb/xx/0

Message ID alpine.DEB.2.02.1410221313370.876@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini Oct. 22, 2014, 12:13 p.m. UTC
On Wed, 22 Oct 2014, Ian Campbell wrote:
> On Wed, 2014-10-22 at 12:57 +0100, Stefano Stabellini wrote:
> > On Wed, 22 Oct 2014, Ian Campbell wrote:
> > > On Wed, 2014-10-22 at 11:59 +0200, Samuel Thibault wrote:
> > > > Ian Campbell, le Wed 22 Oct 2014 10:00:36 +0100, a écrit :
> > > > > On Wed, 2014-10-22 at 08:24 +1100, Steven Haigh wrote:
> > > > > > As a side note to this - if I use pygrub as a bootloader vs using
> > > > > > pvgrub, then VNC works perfectly.
> > > > > > 
> > > > > > So, what options exist to make pvgrub behave properly for booting with
> > > > > > VNC enabled?
> > > > > 
> > > > > ISTR (vaguely) that way back when the backends needed to be modified to
> > > > > cope with kexec (which is effectively what pvgrub does) by not exiting
> > > > > when the frontend disconnects, instead sticking around waiting for a new
> > > > > frontend, this relates somehow to the "online" key in xenstore.
> > > > > 
> > > > > Perhaps the pvfb backend never got that treatment, which would explain
> > > > > #2? 
> > > > 
> > > > Probably, yes.
> > > 
> > > Adding Stefano and Anthony, since the backend in this case is in qemu.
> > > 
> > > When the frontend disconnects and the online node == 1 then the backend
> > > is supposed to go from Closed back to InitWait and wait for a new
> > > connection, as opposed to shutting down. This is needed for kexec (which
> > > pvgrub uses).
> > > 
> > > I can see some handling of the online node in hw/xen/xen_backend.c but
> > > it doesn't look like it would do what is needed here. I also don't see
> > > any handling in either hw/block/xen_disk.c or hw/display/xenfb.c. Which
> > > makes me suspect that as well as pvfb not working with kexec/pvgrub
> > > neither does the qdisk backend, which would be unfortunate.
> > 
> > Looking at the code in xen_backend.c, it seems that on XenbusStateClosed
> > xen_backend is going to try to reset to XenbusStateInitialising, unless
> > the frontend state is XenbusStateInitialising (no idea why).  See:
> > xen_be_try_reset and xen_be_check_state.
> > 
> > Maybe it should go to XenbusStateInitWait instead?
> 
> Possibly?
> 
> Doesn't xen_be_check_state do that though, i.e. once you hit
> XenbusStateInitialising you have:
>         case XenbusStateInitialising:
>             rc = xen_be_try_init(xendev);
> which will push on to XenbusStateInitWait?
> 
> There's quite a few xen_be_printf surrounding these state transitions,
> which ought to be printed at level >= 2. How can Steven control the
> loglevel and where would they go (/var/log/xen/qemu-dm-$domname.log?)


I think that this should do:

Comments

Steven Haigh Oct. 22, 2014, 3:23 p.m. UTC | #1
On 22/10/2014 11:13 PM, Stefano Stabellini wrote:
> On Wed, 22 Oct 2014, Ian Campbell wrote:
>> On Wed, 2014-10-22 at 12:57 +0100, Stefano Stabellini wrote:
>>> On Wed, 22 Oct 2014, Ian Campbell wrote:
>>>> On Wed, 2014-10-22 at 11:59 +0200, Samuel Thibault wrote:
>>>>> Ian Campbell, le Wed 22 Oct 2014 10:00:36 +0100, a écrit :
>>>>>> On Wed, 2014-10-22 at 08:24 +1100, Steven Haigh wrote:
>>>>>>> As a side note to this - if I use pygrub as a bootloader vs using
>>>>>>> pvgrub, then VNC works perfectly.
>>>>>>>
>>>>>>> So, what options exist to make pvgrub behave properly for booting with
>>>>>>> VNC enabled?
>>>>>>
>>>>>> ISTR (vaguely) that way back when the backends needed to be modified to
>>>>>> cope with kexec (which is effectively what pvgrub does) by not exiting
>>>>>> when the frontend disconnects, instead sticking around waiting for a new
>>>>>> frontend, this relates somehow to the "online" key in xenstore.
>>>>>>
>>>>>> Perhaps the pvfb backend never got that treatment, which would explain
>>>>>> #2? 
>>>>>
>>>>> Probably, yes.
>>>>
>>>> Adding Stefano and Anthony, since the backend in this case is in qemu.
>>>>
>>>> When the frontend disconnects and the online node == 1 then the backend
>>>> is supposed to go from Closed back to InitWait and wait for a new
>>>> connection, as opposed to shutting down. This is needed for kexec (which
>>>> pvgrub uses).
>>>>
>>>> I can see some handling of the online node in hw/xen/xen_backend.c but
>>>> it doesn't look like it would do what is needed here. I also don't see
>>>> any handling in either hw/block/xen_disk.c or hw/display/xenfb.c. Which
>>>> makes me suspect that as well as pvfb not working with kexec/pvgrub
>>>> neither does the qdisk backend, which would be unfortunate.
>>>
>>> Looking at the code in xen_backend.c, it seems that on XenbusStateClosed
>>> xen_backend is going to try to reset to XenbusStateInitialising, unless
>>> the frontend state is XenbusStateInitialising (no idea why).  See:
>>> xen_be_try_reset and xen_be_check_state.
>>>
>>> Maybe it should go to XenbusStateInitWait instead?
>>
>> Possibly?
>>
>> Doesn't xen_be_check_state do that though, i.e. once you hit
>> XenbusStateInitialising you have:
>>         case XenbusStateInitialising:
>>             rc = xen_be_try_init(xendev);
>> which will push on to XenbusStateInitWait?
>>
>> There's quite a few xen_be_printf surrounding these state transitions,
>> which ought to be printed at level >= 2. How can Steven control the
>> loglevel and where would they go (/var/log/xen/qemu-dm-$domname.log?)
> 
> 
> I think that this should do:
> 
> 
> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> index b2cb22b..d1d5d8e 100644
> --- a/hw/xen/xen_backend.c
> +++ b/hw/xen/xen_backend.c
> @@ -50,7 +50,7 @@ const char *xen_protocol;
>  
>  /* private */
>  static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = QTAILQ_HEAD_INITIALIZER(xendevs);
> -static int debug = 0;
> +static int debug = 9;
>  
>  /* ------------------------------------------------------------- */
>  
> 

I applied this patch and posted testing packages....

For completeness, this is the DomU config:
---------- DomU Config -----------
name            = "dev.vm"
memory          = 8192
vcpus           = 6
cpus            = "1-7"
disk            = [ 'phy:/dev/vg_hosting/dev.vm,xvda,w',
'file:/root/SL-65-x86_64-2013-12-05-boot.iso,xvdd:cdrom,r' ]
vif             = [ 'mac=20:34:01:36:00:42, vifname=vif.dev, bridge=br0' ]
kernel          = "/usr/lib/xen/boot/pv-grub-x86_64.gz"
extra           = "(hd0)/boot/grub/grub.conf"
#bootloader     = "pygrub"

vfb             = [ 'type=vnc, vnclisten=203.4.136.1, vncdisplay=2' ]

on_poweroff     = 'destroy'
on_reboot       = 'restart'
on_crash        = 'restart'
----------------------------------

Output using pv-grub:
Xen Minimal OS!
  start_info: 0x19ac000(VA)
    nr_pages: 0x200000
  shared_inf: 0xa5d0a000(MA)
     pt_base: 0x19af000(VA)
nr_pt_frames: 0x11
    mfn_list: 0x9ac000(VA)
   mod_start: 0x0(VA)
     mod_len: 0
       flags: 0x0
    cmd_line: (hd0)/boot/grub/grub.conf
  stack:      0x96b100-0x98b100
MM: Init
      _text: 0x0(VA)
     _etext: 0x7c814(VA)
   _erodata: 0x98000(VA)
     _edata: 0x9dd00(VA)
stack start: 0x96b100(VA)
       _end: 0x9ab700(VA)
  start_pfn: 19c3
    max_pfn: 200000
Mapping memory range 0x1c00000 - 0x200000000
setting 0x0-0x98000 readonly
skipped 0x1000
MM: Initialise page allocator for 29bc000(29bc000)-200000000(200000000)
MM: done
Demand map pfns at 200001000-2200001000.
Heap resides at 2200002000-4200002000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0x200001000.
Initialising scheduler
Thread "Idle": pointer: 0x2200002050, stack: 0x3a10000
Thread "xenstore": pointer: 0x2200002800, stack: 0x3a20000
xenbus initialised on irq 1 mfn 0x3f46d1
Thread "shutdown": pointer: 0x2200002fb0, stack: 0x3a30000
Dummy main: start_info=0x98b200
Thread "main": pointer: 0x2200003760, stack: 0x3a40000
"main" "(hd0)/boot/grub/grub.conf"
vbd 51712 is hd0
******************* BLKFRONT for device/vbd/51712 **********


Shutting down ()
Shutdown requested: 3
Thread "shutdown" exited.
backend at /local/domain/0/backend/vbd/20/51712
125829120 sectors of 512 bytes
**************************
vbd 51760 is hd1
******************* BLKFRONT for device/vbd/51760 **********


backend at /local/domain/0/backend/qdisk/20/51760
Failed to read /local/domain/0/backend/qdisk/20/51760/feature-barrier.
436224 sectors of 512 bytes
**************************
Thread "kbdfront": pointer: 0x2200004580, stack: 0x3a30000
******************* FBFRONT for device/vfb/0 **********


******************* KBDFRONT for device/vkbd/0 **********


backend at /local/domain/0/backend/vkbd/20/0
backend at /local/domain/0/backend/vfb/20/0
/local/domain/0/backend/vkbd/20/0 connected
************************** KBDFRONT
Thread "kbdfront" exited.
/local/domain/0/backend/vfb/20/0 connected
************************** FBFRONT
((( Hit enter to boot a grub entry )))
Thread "kbdfront close": pointer: 0x2200004580, stack: 0x3a30000
close fb: backend at /local/domain/0/backend/vfb/21/0
close kbd: backend at /local/domain/0/backend/vkbd/21/0
  Booting 'Scientific Linux (3.14.21-1.el6xen.x86_64)'

root (hd0)
 Filesystem type is ext2fs, using whole disk
kernel /boot/vmlinuz-3.14.21-1.el6xen.x86_64 ro root=/dev/xvda
rd_NO_LUKS rd_NO
_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc
KEYTABLE=us cras
hkernel=auto console=hvc0
Thread "kbdfront close" exited.
initrd /boot/initramfs-3.14.21-1.el6xen.x86_64.img

============= Init TPM Front ================
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront
initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/21/51712
node=device/vbd/51712
close blk: backend=/local/domain/0/backend/qdisk/21/51760
node=device/vbd/51760

----------------------------------

This gives a VNC display on port 5092 - and the system waits at the grub
prompt (ie the timeout is never reached). I don't get a console from
this point on in either VNC or via 'xl console dev.vm'

On another note, I noticed this within the Dom0 kernel dmesg:
device vif.dev entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): vif.dev: link is not ready
xen-blkback:ring-ref 2047, event-channel 4, protocol 1 (x86_64-abi)
qemu-system-i38[3956]: segfault at 0 ip           (null) sp
00007fffb4573638 error 4
xen-blkback:backend/vbd/21/51712: prepare for reconnect
br0: port 8(vif.dev) entered disabled state

I also noticed that if I pass console=tty0 on the grub command line in
"(hd0)/boot/grub/grub.conf" - then I get the expected console - however
the grub menu timeout still fails - almost as if a keypress has been
registered and cancelled the timeout...

For example, a 'sort of' working grub.conf for the DomU that hangs at
grub, but when manually selected, works as expected:
default=0
timeout=1
splashimage=(hd0)/boot/grub/splash.xpm.gz

title Scientific Linux (3.14.21-1.el6xen.x86_64)
        root (hd0)
        kernel /boot/vmlinuz-3.14.21-1.el6xen.x86_64 ro root=/dev/xvda
rd_NO_LUKS rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16
KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto console=tty0
        initrd /boot/initramfs-3.14.21-1.el6xen.x86_64.img
Ian Campbell Oct. 22, 2014, 3:40 p.m. UTC | #2
On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote:

> Output using pv-grub:

Can you also post the qemu logs please (under /var/log/xen somewhere I
think).

> qemu-system-i38[3956]: segfault at 0 ip           (null) sp
> 00007fffb4573638 error 4

That might be a smoking gun. Is there a core dump and/or could you try
and run qemu under gdb?

Ian.
Steven Haigh Oct. 22, 2014, 3:53 p.m. UTC | #3
On 23/10/2014 2:40 AM, Ian Campbell wrote:
> On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote:
> 
>> Output using pv-grub:
> 
> Can you also post the qemu logs please (under /var/log/xen somewhere I
> think).

I get very little out of this:
-rw-r--r--  1 root root    0 Oct 23 02:45 qemu-dm-dev.vm.log
-rw-r--r--  1 root root    0 Oct 23 02:44 xen-hotplug.log
-rw-r--r--  1 root root   55 Oct 23 02:45 xl-dev.vm.log
[root@dom0 xen]# cat xl-dev.vm.log
Waiting for domain dev.vm (domid 36) to die [pid 6970]

That's it :\

>> qemu-system-i38[3956]: segfault at 0 ip           (null) sp
>> 00007fffb4573638 error 4
> 
> That might be a smoking gun. Is there a core dump and/or could you try
> and run qemu under gdb?

Any hints on doing this? I can't say I'm a gdb guru.... I can't find any
core dumps anywhere so that's not really helpful...
Ian Campbell Oct. 23, 2014, 8:21 a.m. UTC | #4
On Thu, 2014-10-23 at 02:53 +1100, Steven Haigh wrote:
> On 23/10/2014 2:40 AM, Ian Campbell wrote:
> > On Thu, 2014-10-23 at 02:23 +1100, Steven Haigh wrote:
> > 
> >> Output using pv-grub:
> > 
> > Can you also post the qemu logs please (under /var/log/xen somewhere I
> > think).
> 
> I get very little out of this:
> -rw-r--r--  1 root root    0 Oct 23 02:45 qemu-dm-dev.vm.log
> -rw-r--r--  1 root root    0 Oct 23 02:44 xen-hotplug.log
> -rw-r--r--  1 root root   55 Oct 23 02:45 xl-dev.vm.log
> [root@dom0 xen]# cat xl-dev.vm.log
> Waiting for domain dev.vm (domid 36) to die [pid 6970]
> 
> That's it :\

:-/ indeed.

> >> qemu-system-i38[3956]: segfault at 0 ip           (null) sp
> >> 00007fffb4573638 error 4
> > 
> > That might be a smoking gun. Is there a core dump and/or could you try
> > and run qemu under gdb?
> 
> Any hints on doing this? I can't say I'm a gdb guru.... I can't find any
> core dumps anywhere so that's not really helpful...

Fiddling with ulimit might cause core dumps to be created. If not then

https://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00302.html
https://lists.gnu.org/archive/html/qemu-devel/2011-12/msg02575.html

have some hints on running qemu via gdbserver. I've also had luck by
configuring the guest with a device model which is a script that dumps
its args to a file ("echo $@ > /tmp/qemu.args") and then sleeps for an
hour, in another terminal you can then run (fairly quickly, before xl
times out) something like:
         # gdb /path/to/qemu
         (gdb) run [the content of that file]
or possibly even
        # gdb --args /path/to/qemu `cat /tmp/qemu.args
        (gdb) run
        
After it crashes the "bt" will get a back trace.

Ian.
diff mbox

Patch

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index b2cb22b..d1d5d8e 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -50,7 +50,7 @@  const char *xen_protocol;
 
 /* private */
 static QTAILQ_HEAD(XenDeviceHead, XenDevice) xendevs = QTAILQ_HEAD_INITIALIZER(xendevs);
-static int debug = 0;
+static int debug = 9;
 
 /* ------------------------------------------------------------- */