diff mbox

[Xen-devel] gross qemu behavior

Message ID alpine.DEB.2.02.1403281751000.20764@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini March 28, 2014, 5:52 p.m. UTC
On Fri, 28 Mar 2014, Stefano Stabellini wrote:
> CC'ing Paolo, hoping that he has a better idea on how to solve this
> problem.
> 
> 
> On Fri, 28 Mar 2014, Jan Beulich wrote:
> > Hi,
> > 
> > so while doing all that EPT work I naturally also happened to look more
> > closely at the EPT table dumps, spotting an odd range of 16 pages
> > outside any supposedly populated address range. This range only
> > exists when guest memory doesn't extend past (by default) 0xf0000000
> > (the start of MMIO, i.e. normally the frame buffer). After spending quite
> > a bit of time I finally figured that this must be a left over of the Cirrus
> > VGA ROM, and I would have thought that this
> > 
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -1976,6 +1976,9 @@ static int pci_add_option_rom(PCIDevice 
> >      }
> >  
> >      pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
> > +    memory_region_add_subregion_overlap(pdev->bus->address_space_mem,
> > +                                        pdev->rom.ram_addr, &pdev->rom, 1);
> > +    memory_region_del_subregion(pdev->bus->address_space_mem, &pdev->rom);
> >  
> >      return 0;
> >  }
> > 
> > should fix it. It does appear to work as far generic qemu is concerned,
> > but once looking at the Xen backend I had to conclude that this just
> > can't work: For one, xen_add_to_physmap() and
> > xen_remove_from_physmap() are _documented_ (in a comment) to
> > only be capable of a single region (VRAM). And the latter - even worse -
> > is implemented with a call to xc_domain_add_to_physmap(), completely
> > contrary to its name.
> 
> xen_add_to_physmap and xen_remove_from_physmap are just to deal with the
> VRAM in their current implementation.
> 
> 
> > Instrumenting xen_region_{add,del}(), I can see that all regions get
> > properly reported to the Xen backend, just that it doesn't handle them
> > (this is with above patch in place):
> > 
> > xra(fee00000,100000)
> > xra(fec00000,1000)
> > xra(fed00000,400)
> > xra(80000000,10000)
> > xrd(80000000,10000)
> > xra(f0000000,800000)
> > xra(f1000000,400000)
> > xra(f2000000,1000000)
> > xra(f3010000,4000)
> > xra(f3014000,1000)
> > xra(f3015000,3000)
> > xra(f3018000,1000)
> > xra(f3000000,10000)
> > xrd(f3000000,10000)
> > xrd(f0000000,800000)
> > xra(f0000000,800000)
> > mapping vram to f0000000 - f0800000
> > 
> > Having wasted enough time getting to this point, I'd like to ask you
> > to advise a proper fix for this. We definitely shouldn't be leaving
> > stuff sitting at arbitrary positions in the physical address space of
> > the guest. And the fact that the range gets removed (from Xen's
> > perspective, but not from qemu's) when RAM extends beyond
> > 0xf0000000 (due to it being replaced with what is actually
> > intended to be there) makes me wonder what would happen if the
> > ROM got enabled by the guest.
> 
> This is a thorny issue, fixing this behavior is not going to be trivial:
> 
> - The hypervisor/libxc does not currently expose a
>   xc_domain_remove_from_physmap function.
> 
> - QEMU works by allocating memory regions at the end of the guest
>   physmap and then moving them at the right place.
> 
> - QEMU can destroy a memory region and in that case we could free the
>   memory and remove it from the physmap, however that is NOT what QEMU
>   does with the vga ROM. In that case it calls
>   memory_region_del_subregion, so we can't be sure that the ROM won't be
>   mapped again, therefore we cannot free it. We need to move it
>   somewhere else, hence the problem.
> 
> 
> But fortunately we don't actually need to add the VGA ROM to the guest
> physmap for it to work, QEMU can trap and emulate. In fact even today we
> are not mapping it at the right place anyway, see xen_set_memory:
> 
>     if (add) {
>         if (!memory_region_is_rom(section->mr)) {
>             xen_add_to_physmap(state, start_addr, size,
>                                section->mr, section->offset_within_region);
>         } else {
> 
> 
> So the only solution I can see right now is:
> 
> - avoid allocating guest memory for the VGA ROM
> That means that at the beginning of xen_ram_alloc we need to realize
> that the memory region we are dealing with is the VGA ROM memory region
> and avoid calling xc_domain_populate_physmap_exact for it.
> 
> - call g_malloc instead
> Simply use g_malloc to allocate QEMU memory for the VGA ROM,
> keep track of the allocation in a data structure internal to xen-all.c.
> 
> - make sure that qemu_get_ram_ptr can deal with the different allocation
> Now that the VGA ROM is QEMU memory, we need to make sure that
> qemu_get_ram_ptr returns the right pointer for it.
> 
> 
> This is all very fiddly and hackish, but I can't see a better way of
> solving the issue.


Given that I feel that the explanation is not very clear, I am appending
a proof of concept patch. It is obviously horrible, I am by no means
suggesting it should be applied.

Comments

Paolo Bonzini March 28, 2014, 6:01 p.m. UTC | #1
Il 28/03/2014 18:52, Stefano Stabellini ha scritto:
>> This is a thorny issue, fixing this behavior is not going to be trivial:
>>
>> - The hypervisor/libxc does not currently expose a
>>   xc_domain_remove_from_physmap function.
>>
>> - QEMU works by allocating memory regions at the end of the guest
>>   physmap and then moving them at the right place.
>>
>> - QEMU can destroy a memory region and in that case we could free the
>>   memory and remove it from the physmap, however that is NOT what QEMU
>>   does with the vga ROM. In that case it calls
>>   memory_region_del_subregion, so we can't be sure that the ROM won't be
>>   mapped again, therefore we cannot free it. We need to move it
>>   somewhere else, hence the problem.

Right; QEMU cannot know either if the ROM will be mapped again (examples 
include "cd /sys/bus/pci/devices/0000:0:03.0 && echo 1 > rom && cat rom" 
or a warm reset).

>> But fortunately we don't actually need to add the VGA ROM to the guest
>> physmap for it to work, QEMU can trap and emulate. In fact even today we
>> are not mapping it at the right place anyway, see xen_set_memory:

But how can you execute from the VGA ROM then?  Also, how do you migrate 
its contents?  And how is VGA different from say an iPXE ROM?

It would be nice if QEMU could just special case pc.ram (which has 
block->offset == 0), and use the normal method to allocate other RAM 
regions.  But I'm afraid that would require some changes in the Xen 
toolstack as well (for migration, for example) and I'm not sure how you 
could execute from PCI ROM BARs.

Paolo
Stefano Stabellini March 28, 2014, 6:30 p.m. UTC | #2
On Fri, 28 Mar 2014, Paolo Bonzini wrote:
> Il 28/03/2014 18:52, Stefano Stabellini ha scritto:
> > > This is a thorny issue, fixing this behavior is not going to be trivial:
> > > 
> > > - The hypervisor/libxc does not currently expose a
> > >   xc_domain_remove_from_physmap function.
> > > 
> > > - QEMU works by allocating memory regions at the end of the guest
> > >   physmap and then moving them at the right place.
> > > 
> > > - QEMU can destroy a memory region and in that case we could free the
> > >   memory and remove it from the physmap, however that is NOT what QEMU
> > >   does with the vga ROM. In that case it calls
> > >   memory_region_del_subregion, so we can't be sure that the ROM won't be
> > >   mapped again, therefore we cannot free it. We need to move it
> > >   somewhere else, hence the problem.
> 
> Right; QEMU cannot know either if the ROM will be mapped again (examples
> include "cd /sys/bus/pci/devices/0000:0:03.0 && echo 1 > rom && cat rom" or a
> warm reset).
> 
> > > But fortunately we don't actually need to add the VGA ROM to the guest
> > > physmap for it to work, QEMU can trap and emulate. In fact even today we
> > > are not mapping it at the right place anyway, see xen_set_memory:
> 
> But how can you execute from the VGA ROM then?

I don't know, I guess we don't? In that case why does it work today?


> Also, how do you migrate its contents?

That would also not work. We would have to re-initialize it in QEMU on
the receiving end.


> And how is VGA different from say an iPXE ROM?

iPXE is read into memory by hvmloader.


> It would be nice if QEMU could just special case pc.ram (which has
> block->offset == 0), and use the normal method to allocate other RAM regions.
> But I'm afraid that would require some changes in the Xen toolstack as well
> (for migration, for example) and I'm not sure how you could execute from PCI
> ROM BARs.
> 
> Paolo
>
Paolo Bonzini March 29, 2014, 7:31 a.m. UTC | #3
Il 28/03/2014 19:30, Stefano Stabellini ha scritto:
> On Fri, 28 Mar 2014, Paolo Bonzini wrote:
>> Il 28/03/2014 18:52, Stefano Stabellini ha scritto:
>>>> This is a thorny issue, fixing this behavior is not going to be trivial:
>>>>
>>>> - The hypervisor/libxc does not currently expose a
>>>>   xc_domain_remove_from_physmap function.
>>>>
>>>> - QEMU works by allocating memory regions at the end of the guest
>>>>   physmap and then moving them at the right place.
>>>>
>>>> - QEMU can destroy a memory region and in that case we could free the
>>>>   memory and remove it from the physmap, however that is NOT what QEMU
>>>>   does with the vga ROM. In that case it calls
>>>>   memory_region_del_subregion, so we can't be sure that the ROM won't be
>>>>   mapped again, therefore we cannot free it. We need to move it
>>>>   somewhere else, hence the problem.
>>
>> Right; QEMU cannot know either if the ROM will be mapped again (examples
>> include "cd /sys/bus/pci/devices/0000:0:03.0 && echo 1 > rom && cat rom" or a
>> warm reset).
>>
>>>> But fortunately we don't actually need to add the VGA ROM to the guest
>>>> physmap for it to work, QEMU can trap and emulate. In fact even today we
>>>> are not mapping it at the right place anyway, see xen_set_memory:
>>
>> But how can you execute from the VGA ROM then?
>
> I don't know, I guess we don't? In that case why does it work today?

Right, the ROM is copied down to low memory by firmware (hvmloader?).

>> Also, how do you migrate its contents?
>
> That would also not work. We would have to re-initialize it in QEMU on
> the receiving end.

That is problematic.  It would mean that a system reset after migration 
may auto-upgrade some parts of the firmware.

Paolo
Fabio Fantoni March 30, 2014, 7:57 a.m. UTC | #4
2014-03-29 8:31 GMT+01:00 Paolo Bonzini <pbonzini@redhat.com>:

> Il 28/03/2014 19:30, Stefano Stabellini ha scritto:
>
>  On Fri, 28 Mar 2014, Paolo Bonzini wrote:
>>
>>> Il 28/03/2014 18:52, Stefano Stabellini ha scritto:
>>>
>>>> This is a thorny issue, fixing this behavior is not going to be trivial:
>>>>>
>>>>> - The hypervisor/libxc does not currently expose a
>>>>>   xc_domain_remove_from_physmap function.
>>>>>
>>>>> - QEMU works by allocating memory regions at the end of the guest
>>>>>   physmap and then moving them at the right place.
>>>>>
>>>>> - QEMU can destroy a memory region and in that case we could free the
>>>>>   memory and remove it from the physmap, however that is NOT what QEMU
>>>>>   does with the vga ROM. In that case it calls
>>>>>   memory_region_del_subregion, so we can't be sure that the ROM won't
>>>>> be
>>>>>   mapped again, therefore we cannot free it. We need to move it
>>>>>   somewhere else, hence the problem.
>>>>>
>>>>
>>> Right; QEMU cannot know either if the ROM will be mapped again (examples
>>> include "cd /sys/bus/pci/devices/0000:0:03.0 && echo 1 > rom && cat
>>> rom" or a
>>> warm reset).
>>>
>>>  But fortunately we don't actually need to add the VGA ROM to the guest
>>>>> physmap for it to work, QEMU can trap and emulate. In fact even today
>>>>> we
>>>>> are not mapping it at the right place anyway, see xen_set_memory:
>>>>>
>>>>
>>> But how can you execute from the VGA ROM then?
>>>
>>
>> I don't know, I guess we don't? In that case why does it work today?
>>
>
> Right, the ROM is copied down to low memory by firmware (hvmloader?).


Only vgabios and other rom of qemu traditional are include and loaded by
hvmloader.
Time ago when I was trying to solve some problems with the emulated vgas I came
to doubt that the vgabios of qemu upstream were not loaded or used
correctly.
Someone had told me that they were loaded automatically from qemu when you use
the qemu upstream.
Unfortunately I do not have enough knowledge and are not able to find
exactly the problems or things missing in xen to solve problems with
the emulated
vgas.
I did a lot of tests, comparing with kvm using same qemu parameters used
with xen showed almost always higher video performance on kvm and qxl was
not working on xen but showing too few errors/details in logs that I
posted long
ago, unfortunately no answers.
It seemed to me from what little I knew that was not allocated or
usedcorrectly all
the ram or one or more regions (having memory errors in logs) and / or not
being loaded or used properly the vgabios.
Seems that in this thread you are probably trying to solve problems
including the ones I found.

Last mail of my qxl tests for example is this:
http://lists.xen.org/archives/html/xen-devel/2013-12/msg00758.html
And the memory error on domU logs of this test was:

ioremap error for 0xfc001000-0xfc002000, requested 0x10, got 0x0

There was also another test maybe 2 years ago for which data have made
me doubt the proper loading or use of vgabios stdvga with xen and qemu
upstream but unfortunately can not find it now.


I will try to help with test and post results/details if I can.
For example some posts ago I see Stabellini patch that seems about load of
vgabios and other roms, should be tested?

Thanks for any reply and sorry for my bad english.


>
>
>  Also, how do you migrate its contents?
>>>
>>
>> That would also not work. We would have to re-initialize it in QEMU on
>> the receiving end.
>>
>
> That is problematic.  It would mean that a system reset after migration
> may auto-upgrade some parts of the firmware.
>
>
> Paolo
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
diff mbox

Patch

diff --git a/exec.c b/exec.c
index 91513c6..bdecc70 100644
--- a/exec.c
+++ b/exec.c
@@ -1453,6 +1453,7 @@  void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
    It should not be used for general purpose DMA.
    Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
  */
+extern uint8_t* vga_rom;
 void *qemu_get_ram_ptr(ram_addr_t addr)
 {
     RAMBlock *block = qemu_get_ram_block(addr);
@@ -1462,7 +1463,9 @@  void *qemu_get_ram_ptr(ram_addr_t addr)
          * because we don't want to map the entire memory in QEMU.
          * In that case just map until the end of the page.
          */
-        if (block->offset == 0) {
+        if (!strcmp(block->mr->name,"cirrus_vga.rom")) {
+            return vga_rom;
+        } else if (block->offset == 0) {
             return xen_map_cache(addr, 0, 0);
         } else if (block->host == NULL) {
             block->host =
diff --git a/xen-all.c b/xen-all.c
index ba34739..6211946 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -101,6 +101,8 @@  typedef struct XenIOState {
     Notifier wakeup;
 } XenIOState;
 
+uint8_t* vga_rom;
+
 /* Xen specific function for piix pci */
 
 int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num)
@@ -217,6 +219,11 @@  void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr)
         return;
     }
 
+    if (!strcmp(mr->name,"cirrus_vga.rom")) {
+        vga_rom = g_malloc(size);
+        return;
+    }
+
     trace_xen_ram_alloc(ram_addr, size);
 
     nr_pfn = size >> TARGET_PAGE_BITS;