diff mbox

[v2] KVM: Specify byte order for KVM_EXIT_MMIO

Message ID 1390606782-14724-1-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall Jan. 24, 2014, 11:39 p.m. UTC
The KVM API documentation is not clear about the semantics of the data
field on the mmio struct on the kvm_run struct.

This has become problematic when supporting ARM guests on big-endian
host systems with guests of both endianness types, because it is unclear
how the data should be exported to user space.

This should not break with existing implementations as all supported
existing implementations of known user space applications (QEMU and
kvmtools for virtio) only support default endianness of the
architectures on the host side.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
Changes [v1 - v2]:
 - s/host kernel should/host user space should/

 Documentation/virtual/kvm/api.txt | 5 +++++
 1 file changed, 5 insertions(+)

Comments

vkamensky Jan. 25, 2014, 12:05 a.m. UTC | #1
On 24 January 2014 15:51, Scott Wood <scottwood@freescale.com> wrote:
> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>> The KVM API documentation is not clear about the semantics of the data
>> field on the mmio struct on the kvm_run struct.
>>
>> This has become problematic when supporting ARM guests on big-endian
>> host systems with guests of both endianness types, because it is unclear
>> how the data should be exported to user space.
>>
>> This should not break with existing implementations as all supported
>> existing implementations of known user space applications (QEMU and
>> kvmtools for virtio) only support default endianness of the
>> architectures on the host side.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>> Changes [v1 - v2]:
>>  - s/host kernel should/host user space should/
>>
>>  Documentation/virtual/kvm/api.txt | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 366bf4b..6dbd68c 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>  by kvm.  The 'data' member contains the written data if 'is_write' is
>>  true, and should be filled by application code otherwise.
>>
>> +The 'data' member byte order is host kernel native endianness, regardless of
>> +the endianness of the guest, and represents the the value as it would go on the
>> +bus in real hardware.  The host user space should always be able to do:
>> +<type> val = *((<type> *)mmio.data).
>
> Host userspace should be able to do that with what results?  It would
> only produce a directly usable value if host endianness is the same as
> the emulated device's endianness.
>
> I'm not sure that "host kernel native endianness" is an accurate way of
> describing what currently happens.  Regardless of host or guest
> endianness, the guest should be swapping the value as necessary to
> ensure that the value that goes on the (real or emulated) bus is the
> same.
>
> Plus, if it were really "as it would go on the bus" the value wouldn't
> necessarily be left justified within data[], depending on how the bus
> works.
>
> How about a wording like this:
>
>   The 'data' member contains, in its first 'len' bytes, the value as it
>   would appear if the guest had accessed memory rather than I/O.

Scott, Yes! Thank you so much! Completely agree.
You managed elegantly to express in two lines,
what I failed to communicate with hundreds lines
of my emails in last few days on this and other
email thread.

- Victor

> -Scott
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
Peter Maydell Jan. 25, 2014, 12:24 a.m. UTC | #2
On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>> The KVM API documentation is not clear about the semantics of the data
>> field on the mmio struct on the kvm_run struct.
>>
>> This has become problematic when supporting ARM guests on big-endian
>> host systems with guests of both endianness types, because it is unclear
>> how the data should be exported to user space.
>>
>> This should not break with existing implementations as all supported
>> existing implementations of known user space applications (QEMU and
>> kvmtools for virtio) only support default endianness of the
>> architectures on the host side.
>>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Cc: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>> Changes [v1 - v2]:
>>  - s/host kernel should/host user space should/
>>
>>  Documentation/virtual/kvm/api.txt | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index 366bf4b..6dbd68c 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>  by kvm.  The 'data' member contains the written data if 'is_write' is
>>  true, and should be filled by application code otherwise.
>>
>> +The 'data' member byte order is host kernel native endianness, regardless of
>> +the endianness of the guest, and represents the the value as it would go on the
>> +bus in real hardware.  The host user space should always be able to do:
>> +<type> val = *((<type> *)mmio.data).
>
> Host userspace should be able to do that with what results?  It would
> only produce a directly usable value if host endianness is the same as
> the emulated device's endianness.

With the result that it gets the value the CPU has sent out on
the bus as the memory transaction. Obviously if what userspace
is emulating is a bus which has a byteswapping bridge or if it's
being helpful to device emulation by providing "here's the value
even though you think you're wired up backwards" then it needs
to byteswap.

> I'm not sure that "host kernel native endianness" is an accurate way of
> describing what currently happens.  Regardless of host or guest
> endianness, the guest should be swapping the value as necessary to
> ensure that the value that goes on the (real or emulated) bus is the
> same.

I don't know why you're bringing the guest in here. Whether
the guest chooses to byteswap or not is IMHO not relevant.
What KVM and userspace need to combine to achieve is that
whatever the guest happens to have done causes the same result
it would on the real hardware. Whether the guest sends out a
write of 0x12345678 because it wrote 0x12345678 directly or
because it started with 0x87654321 and issued a byte-reverse
instruction doesn't matter.

> Plus, if it were really "as it would go on the bus" the value wouldn't
> necessarily be left justified within data[], depending on how the bus
> works.

The point is that the value in data[] is not "as it would go on the bus",
but the value you get out by treating it as a host-native-endianness
value of the relevant size left-justified within data[] is the value as
it would go on the bus.

> How about a wording like this:
>
>   The 'data' member contains, in its first 'len' bytes, the value as it
>   would appear if the guest had accessed memory rather than I/O.

I think this is confusing, because now userspace authors have
to figure out how to get back to "value X of size Y at address Z"
by interpreting this text... Can you write out the equivalent of
Christoffer's text "here's how you get the memory transaction
value" for what you want?

(Also, value as it would appear to who?)

I think your wording implies that the order of bytes in data[] depend
on the guest CPU "usual byte order", ie the order which the CPU
does not do a byte-lane-swap for (LE for ARM, BE for PPC),
and it would mean it would come out differently from
my/Alex/Christoffer's proposal if the host kernel was the opposite
endianness from that "usual" order.

Finally, I think it's a bit confusing in that "as if the guest had
accessed memory" is assigning implicit semantics to memory
in the emulated system, when memory is actually kind of outside
KVM's purview because it's not part of the CPU.

thanks
-- PMM
Christoffer Dall Jan. 25, 2014, 1:56 a.m. UTC | #3
On Sat, Jan 25, 2014 at 12:24:08AM +0000, Peter Maydell wrote:
> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> > On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >> The KVM API documentation is not clear about the semantics of the data
> >> field on the mmio struct on the kvm_run struct.
> >>
> >> This has become problematic when supporting ARM guests on big-endian
> >> host systems with guests of both endianness types, because it is unclear
> >> how the data should be exported to user space.
> >>
> >> This should not break with existing implementations as all supported
> >> existing implementations of known user space applications (QEMU and
> >> kvmtools for virtio) only support default endianness of the
> >> architectures on the host side.
> >>
> >> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >> Cc: Peter Maydell <peter.maydell@linaro.org>
> >> Cc: Alexander Graf <agraf@suse.de>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> ---
> >> Changes [v1 - v2]:
> >>  - s/host kernel should/host user space should/
> >>
> >>  Documentation/virtual/kvm/api.txt | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> index 366bf4b..6dbd68c 100644
> >> --- a/Documentation/virtual/kvm/api.txt
> >> +++ b/Documentation/virtual/kvm/api.txt
> >> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>  by kvm.  The 'data' member contains the written data if 'is_write' is
> >>  true, and should be filled by application code otherwise.
> >>
> >> +The 'data' member byte order is host kernel native endianness, regardless of
> >> +the endianness of the guest, and represents the the value as it would go on the
> >> +bus in real hardware.  The host user space should always be able to do:
> >> +<type> val = *((<type> *)mmio.data).
> >
> > Host userspace should be able to do that with what results?  It would
> > only produce a directly usable value if host endianness is the same as
> > the emulated device's endianness.
> 
> With the result that it gets the value the CPU has sent out on
> the bus as the memory transaction. Obviously if what userspace
> is emulating is a bus which has a byteswapping bridge or if it's
> being helpful to device emulation by providing "here's the value
> even though you think you're wired up backwards" then it needs
> to byteswap.
> 

Yes, because KVM emulates the CPU, it's interface should emulate the
external interface of the CPU, not CPU-internal bits.

The value is directly usable to emulate the bus to attached devices in
user space.

> > I'm not sure that "host kernel native endianness" is an accurate way of
> > describing what currently happens.  Regardless of host or guest
> > endianness, the guest should be swapping the value as necessary to
> > ensure that the value that goes on the (real or emulated) bus is the
> > same.
> 
> I don't know why you're bringing the guest in here. Whether
> the guest chooses to byteswap or not is IMHO not relevant.
> What KVM and userspace need to combine to achieve is that
> whatever the guest happens to have done causes the same result
> it would on the real hardware. Whether the guest sends out a
> write of 0x12345678 because it wrote 0x12345678 directly or
> because it started with 0x87654321 and issued a byte-reverse
> instruction doesn't matter.
> 
> > Plus, if it were really "as it would go on the bus" the value wouldn't
> > necessarily be left justified within data[], depending on how the bus
> > works.
> 
> The point is that the value in data[] is not "as it would go on the bus",
> but the value you get out by treating it as a host-native-endianness
> value of the relevant size left-justified within data[] is the value as
> it would go on the bus.
> 
> > How about a wording like this:
> >
> >   The 'data' member contains, in its first 'len' bytes, the value as it
> >   would appear if the guest had accessed memory rather than I/O.
> 
> I think this is confusing, because now userspace authors have
> to figure out how to get back to "value X of size Y at address Z"
> by interpreting this text... Can you write out the equivalent of
> Christoffer's text "here's how you get the memory transaction
> value" for what you want?
> 
> (Also, value as it would appear to who?)
> 
> I think your wording implies that the order of bytes in data[] depend
> on the guest CPU "usual byte order", ie the order which the CPU
> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> and it would mean it would come out differently from
> my/Alex/Christoffer's proposal if the host kernel was the opposite
> endianness from that "usual" order.
> 
> Finally, I think it's a bit confusing in that "as if the guest had
> accessed memory" is assigning implicit semantics to memory
> in the emulated system, when memory is actually kind of outside
> KVM's purview because it's not part of the CPU.
> 

The "as if the guest had accessed memory", would imply that user space
needs to look at the settings of the CPU (in the case of ARM the E-bit)
to understand which value the CPU intended for the memory transaction.

However, I'm fine with rewording the patch to something like:

  The 'data' member contains, in its first 'len' bytes, the value of the
  memory transaction in host kernel native endianness, regardless of the
  endianness of the guest.  Host user space should always be able to
  directly do the following to emulate the bus and attached peripherals:
  <type> val = *((<type> *)mmio.data)

if people find this more clear.

-Christoffer
Christoffer Dall Jan. 25, 2014, 2:34 a.m. UTC | #4
On Sat, Jan 25, 2014 at 03:15:35AM +0100, Alexander Graf wrote:
> 
> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
> 
> > On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
> >> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
> >>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>>> index 366bf4b..6dbd68c 100644
> >>>> --- a/Documentation/virtual/kvm/api.txt
> >>>> +++ b/Documentation/virtual/kvm/api.txt
> >>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
> >>>> by kvm.  The 'data' member contains the written data if 'is_write' is
> >>>> true, and should be filled by application code otherwise.
> >>>> 
> >>>> +The 'data' member byte order is host kernel native endianness, regardless of
> >>>> +the endianness of the guest, and represents the the value as it would go on the
> >>>> +bus in real hardware.  The host user space should always be able to do:
> >>>> +<type> val = *((<type> *)mmio.data).
> >>> 
> >>> Host userspace should be able to do that with what results?  It would
> >>> only produce a directly usable value if host endianness is the same as
> >>> the emulated device's endianness.
> >> 
> >> With the result that it gets the value the CPU has sent out on
> >> the bus as the memory transaction.
> > 
> > Doesn't that assume the host kernel endianness is the same as the bus
> > (or rather, that the host CPU would not swap such an access before it
> > hits the bus)?
> > 
> > If you take the same hardware and boot a little endian host kernel one
> > day, and a big endian host kernel the next, the bus doesn't change, and
> > neither should the bytewise (assuming address invariance) contents of
> > data[].  How data[] would look when read as a larger integer would of
> > course change -- but that's due to how you're reading it.
> > 
> > It's clear to say that a value in memory has been stored there in host
> > endianness when the value is as you would want to see it in a CPU
> > register, but it's less clear when you talk about it relative to values
> > on a bus.  It's harder to correlate that to something that is software
> > visible.
> > 
> > I don't think there's any actual technical difference between your
> > wording and mine when each wording is properly interpreted, but I
> > suspect my wording is less likely to be misinterpreted (I could be
> > wrong).
> > 
> >> Obviously if what userspace
> >> is emulating is a bus which has a byteswapping bridge or if it's
> >> being helpful to device emulation by providing "here's the value
> >> even though you think you're wired up backwards" then it needs
> >> to byteswap.
> > 
> > Whether the emulated bus has "a byteswapping bridge" doesn't sound like
> > something that depends on the endianness that the host CPU is currently
> > running in.
> > 
> >>> How about a wording like this:
> >>> 
> >>>  The 'data' member contains, in its first 'len' bytes, the value as it
> >>>  would appear if the guest had accessed memory rather than I/O.
> >> 
> >> I think this is confusing, because now userspace authors have
> >> to figure out how to get back to "value X of size Y at address Z"
> >> by interpreting this text... Can you write out the equivalent of
> >> Christoffer's text "here's how you get the memory transaction
> >> value" for what you want?
> > 
> > Userspace swaps the value if and only if userspace's endianness differs
> > from the endianness with which the device interprets the data
> > (regardless of whether said interpretation is considered natural or
> > swapped relative to the way the bus is documented).  It's similar to how
> > userspace would handle emulating DMA.
> > 
> > KVM swaps the value if and only if the endianness of the guest access
> > differs from that of the host, i.e. if it would have done swapping when
> > emulating an ordinary memory access.
> > 
> >> (Also, value as it would appear to who?)
> > 
> > As it would appear to anyone.  It works because data[] actually is
> > memory.  Any difference in how data appears based on the reader's
> > context would already be reflected when the reader performs the load.
> > 
> >> I think your wording implies that the order of bytes in data[] depend
> >> on the guest CPU "usual byte order", ie the order which the CPU
> >> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
> >> and it would mean it would come out differently from
> >> my/Alex/Christoffer's proposal if the host kernel was the opposite
> >> endianness from that "usual" order.
> > 
> > It doesn't depend on "usual" anything.  The only thing it implicitly
> > says about guest byte order is that it's KVM's job to implement any
> > swapping if the endianness of the guest access is different from the
> > endianness of the host kernel access (whether it's due to the guest's
> > mode, the way a page is mapped, the instruction used, etc).
> > 
> >> Finally, I think it's a bit confusing in that "as if the guest had
> >> accessed memory" is assigning implicit semantics to memory
> >> in the emulated system, when memory is actually kind of outside
> >> KVM's purview because it's not part of the CPU.
> > 
> > That's sort of the point.  It defines it in a way that is independent of
> > the CPU, and thus independent of what endianness the CPU operates in.
> 
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> 
> your proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> -> ldw_p() will give us the correct value to work with
> 
> current proposal:
> 
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> 
> -> *(uint32_t*)data will give us the correct value to work with
> 
> 
> There are pros and cons for both approaches.
> 
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
> 
> Obviously both approaches would work as long as they're properly defined :).
> 
Just to clarify, with approach 2 existing supported QEMU configurations
of BE/LE on both ARM and PPC still work - it is only for future mixed
endian supprt we need to modify QEMU, right?

-Christoffer
vkamensky Jan. 25, 2014, 2:37 a.m. UTC | #5
On 24 January 2014 18:15, Alexander Graf <agraf@suse.de> wrote:
>
> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>
>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>> index 366bf4b..6dbd68c 100644
>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>> true, and should be filled by application code otherwise.
>>>>>
>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>> +<type> val = *((<type> *)mmio.data).
>>>>
>>>> Host userspace should be able to do that with what results?  It would
>>>> only produce a directly usable value if host endianness is the same as
>>>> the emulated device's endianness.
>>>
>>> With the result that it gets the value the CPU has sent out on
>>> the bus as the memory transaction.
>>
>> Doesn't that assume the host kernel endianness is the same as the bus
>> (or rather, that the host CPU would not swap such an access before it
>> hits the bus)?
>>
>> If you take the same hardware and boot a little endian host kernel one
>> day, and a big endian host kernel the next, the bus doesn't change, and
>> neither should the bytewise (assuming address invariance) contents of
>> data[].  How data[] would look when read as a larger integer would of
>> course change -- but that's due to how you're reading it.
>>
>> It's clear to say that a value in memory has been stored there in host
>> endianness when the value is as you would want to see it in a CPU
>> register, but it's less clear when you talk about it relative to values
>> on a bus.  It's harder to correlate that to something that is software
>> visible.
>>
>> I don't think there's any actual technical difference between your
>> wording and mine when each wording is properly interpreted, but I
>> suspect my wording is less likely to be misinterpreted (I could be
>> wrong).
>>
>>> Obviously if what userspace
>>> is emulating is a bus which has a byteswapping bridge or if it's
>>> being helpful to device emulation by providing "here's the value
>>> even though you think you're wired up backwards" then it needs
>>> to byteswap.
>>
>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>> something that depends on the endianness that the host CPU is currently
>> running in.
>>
>>>> How about a wording like this:
>>>>
>>>>  The 'data' member contains, in its first 'len' bytes, the value as it
>>>>  would appear if the guest had accessed memory rather than I/O.
>>>
>>> I think this is confusing, because now userspace authors have
>>> to figure out how to get back to "value X of size Y at address Z"
>>> by interpreting this text... Can you write out the equivalent of
>>> Christoffer's text "here's how you get the memory transaction
>>> value" for what you want?
>>
>> Userspace swaps the value if and only if userspace's endianness differs
>> from the endianness with which the device interprets the data
>> (regardless of whether said interpretation is considered natural or
>> swapped relative to the way the bus is documented).  It's similar to how
>> userspace would handle emulating DMA.
>>
>> KVM swaps the value if and only if the endianness of the guest access
>> differs from that of the host, i.e. if it would have done swapping when
>> emulating an ordinary memory access.
>>
>>> (Also, value as it would appear to who?)
>>
>> As it would appear to anyone.  It works because data[] actually is
>> memory.  Any difference in how data appears based on the reader's
>> context would already be reflected when the reader performs the load.
>>
>>> I think your wording implies that the order of bytes in data[] depend
>>> on the guest CPU "usual byte order", ie the order which the CPU
>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>> and it would mean it would come out differently from
>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>> endianness from that "usual" order.
>>
>> It doesn't depend on "usual" anything.  The only thing it implicitly
>> says about guest byte order is that it's KVM's job to implement any
>> swapping if the endianness of the guest access is different from the
>> endianness of the host kernel access (whether it's due to the guest's
>> mode, the way a page is mapped, the instruction used, etc).
>>
>>> Finally, I think it's a bit confusing in that "as if the guest had
>>> accessed memory" is assigning implicit semantics to memory
>>> in the emulated system, when memory is actually kind of outside
>>> KVM's purview because it's not part of the CPU.
>>
>> That's sort of the point.  It defines it in a way that is independent of
>> the CPU, and thus independent of what endianness the CPU operates in.
>
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>
> your proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
First of all, it does not depend on host endianity because in
Scott's case it is defined in terms where KVM host is not
even present.

If you just take real h/w (no emulation) and put
memory instead of IO and it is the same driver does the same
access operation on the same device bytes wise it will look
always the same. Because in one case driver just write word,
but when driver operates in opposite endianity it swaps word
first and then write it. Bytes wise it is the same memory
all the time. Device sees the same value in both cases.

Look at writel, readl, readl_relaxed, etc functions... Clean
endian agnostic driver just uses those access functions
and those functions handle byteswap if needed under
the cover.

Thanks,
Victor

> -> ldw_p() will give us the correct value to work with
>
> current proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>
> -> *(uint32_t*)data will give us the correct value to work with
>
>
> There are pros and cons for both approaches.
>
> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>
> Obviously both approaches would work as long as they're properly defined :).
>
>
> Alex
>
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
vkamensky Jan. 25, 2014, 3:36 p.m. UTC | #6
On 25 January 2014 01:20, Alexander Graf <agraf@suse.de> wrote:
>
>
>> Am 25.01.2014 um 03:37 schrieb Victor Kamensky <victor.kamensky@linaro.org>:
>>
>>> On 24 January 2014 18:15, Alexander Graf <agraf@suse.de> wrote:
>>>
>>>> On 25.01.2014, at 02:58, Scott Wood <scottwood@freescale.com> wrote:
>>>>
>>>>> On Sat, 2014-01-25 at 00:24 +0000, Peter Maydell wrote:
>>>>>> On 24 January 2014 23:51, Scott Wood <scottwood@freescale.com> wrote:
>>>>>>> On Fri, 2014-01-24 at 15:39 -0800, Christoffer Dall wrote:
>>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>>> index 366bf4b..6dbd68c 100644
>>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>>> @@ -2565,6 +2565,11 @@ executed a memory-mapped I/O instruction which could not be satisfied
>>>>>>> by kvm.  The 'data' member contains the written data if 'is_write' is
>>>>>>> true, and should be filled by application code otherwise.
>>>>>>>
>>>>>>> +The 'data' member byte order is host kernel native endianness, regardless of
>>>>>>> +the endianness of the guest, and represents the the value as it would go on the
>>>>>>> +bus in real hardware.  The host user space should always be able to do:
>>>>>>> +<type> val = *((<type> *)mmio.data).
>>>>>>
>>>>>> Host userspace should be able to do that with what results?  It would
>>>>>> only produce a directly usable value if host endianness is the same as
>>>>>> the emulated device's endianness.
>>>>>
>>>>> With the result that it gets the value the CPU has sent out on
>>>>> the bus as the memory transaction.
>>>>
>>>> Doesn't that assume the host kernel endianness is the same as the bus
>>>> (or rather, that the host CPU would not swap such an access before it
>>>> hits the bus)?
>>>>
>>>> If you take the same hardware and boot a little endian host kernel one
>>>> day, and a big endian host kernel the next, the bus doesn't change, and
>>>> neither should the bytewise (assuming address invariance) contents of
>>>> data[].  How data[] would look when read as a larger integer would of
>>>> course change -- but that's due to how you're reading it.
>>>>
>>>> It's clear to say that a value in memory has been stored there in host
>>>> endianness when the value is as you would want to see it in a CPU
>>>> register, but it's less clear when you talk about it relative to values
>>>> on a bus.  It's harder to correlate that to something that is software
>>>> visible.
>>>>
>>>> I don't think there's any actual technical difference between your
>>>> wording and mine when each wording is properly interpreted, but I
>>>> suspect my wording is less likely to be misinterpreted (I could be
>>>> wrong).
>>>>
>>>>> Obviously if what userspace
>>>>> is emulating is a bus which has a byteswapping bridge or if it's
>>>>> being helpful to device emulation by providing "here's the value
>>>>> even though you think you're wired up backwards" then it needs
>>>>> to byteswap.
>>>>
>>>> Whether the emulated bus has "a byteswapping bridge" doesn't sound like
>>>> something that depends on the endianness that the host CPU is currently
>>>> running in.
>>>>
>>>>>> How about a wording like this:
>>>>>>
>>>>>> The 'data' member contains, in its first 'len' bytes, the value as it
>>>>>> would appear if the guest had accessed memory rather than I/O.
>>>>>
>>>>> I think this is confusing, because now userspace authors have
>>>>> to figure out how to get back to "value X of size Y at address Z"
>>>>> by interpreting this text... Can you write out the equivalent of
>>>>> Christoffer's text "here's how you get the memory transaction
>>>>> value" for what you want?
>>>>
>>>> Userspace swaps the value if and only if userspace's endianness differs
>>>> from the endianness with which the device interprets the data
>>>> (regardless of whether said interpretation is considered natural or
>>>> swapped relative to the way the bus is documented).  It's similar to how
>>>> userspace would handle emulating DMA.
>>>>
>>>> KVM swaps the value if and only if the endianness of the guest access
>>>> differs from that of the host, i.e. if it would have done swapping when
>>>> emulating an ordinary memory access.
>>>>
>>>>> (Also, value as it would appear to who?)
>>>>
>>>> As it would appear to anyone.  It works because data[] actually is
>>>> memory.  Any difference in how data appears based on the reader's
>>>> context would already be reflected when the reader performs the load.
>>>>
>>>>> I think your wording implies that the order of bytes in data[] depend
>>>>> on the guest CPU "usual byte order", ie the order which the CPU
>>>>> does not do a byte-lane-swap for (LE for ARM, BE for PPC),
>>>>> and it would mean it would come out differently from
>>>>> my/Alex/Christoffer's proposal if the host kernel was the opposite
>>>>> endianness from that "usual" order.
>>>>
>>>> It doesn't depend on "usual" anything.  The only thing it implicitly
>>>> says about guest byte order is that it's KVM's job to implement any
>>>> swapping if the endianness of the guest access is different from the
>>>> endianness of the host kernel access (whether it's due to the guest's
>>>> mode, the way a page is mapped, the instruction used, etc).
>>>>
>>>>> Finally, I think it's a bit confusing in that "as if the guest had
>>>>> accessed memory" is assigning implicit semantics to memory
>>>>> in the emulated system, when memory is actually kind of outside
>>>>> KVM's purview because it's not part of the CPU.
>>>>
>>>> That's sort of the point.  It defines it in a way that is independent of
>>>> the CPU, and thus independent of what endianness the CPU operates in.
>>>
>>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>>>
>>> your proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> Actually it is { 0x01, 0x02, 0x03, 0x04 } in all cases.
>> First of all, it does not depend on host endianity because in
>> Scott's case it is defined in terms where KVM host is not
>> even present.
>>
>> If you just take real h/w (no emulation) and put
>> memory instead of IO and it is the same driver does the same
>> access operation on the same device bytes wise it will look
>> always the same. Because in one case driver just write word,
>> but when driver operates in opposite endianity it swaps word
>> first and then write it. Bytes wise it is the same memory
>> all the time. Device sees the same value in both cases.
>>
>> Look at writel, readl, readl_relaxed, etc functions... Clean
>> endian agnostic driver just uses those access functions
>> and those functions handle byteswap if needed under
>> the cover.
>
> You're thinking Linux again. Of course Linux does things on
> its side to compensate for endian swaps on the bus.

Anyone who would want to work with the same h/w in
endian agnostic way will have to do those  endian swaps
for one of the cases.

> It even
> does it on BE PPC if you access devices through swizzling
> buses, but we don't care as hypervisor. All we know in kvm is:
>
>   - instruction used for access
>   -> originating register (value)
>   -> access size (len)
>   -> virtual address (which we translate to phys)
>   - guest endianness mode

Do you agree that in above if you just change guest endianness
then device will see different (byteswapped) value so as far as
device concerned it is completely different transactions, different
effect. Why do we consider them? So if you want to see the
same transaction and have the same effect on device, then when
guest switches endianness it will need to add/drop byteswap,
in order to have the same effect.

Thanks,
Victor

> So my example below was trying to show what happens
> when value=0x01020304, len=4 with different host/guest endian
> combinations.
>
>
> Alex
>
>>
>> Thanks,
>> Victor
>>
>>> -> ldw_p() will give us the correct value to work with
>>>
>>> current proposal:
>>>
>>>  BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>  LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>  BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>  LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>
>>> -> *(uint32_t*)data will give us the correct value to work with
>>>
>>>
>>> There are pros and cons for both approaches.
>>>
>>> Pro approach 1 is that it fits the way data[] is read today, so no QEMU changes are required. However, it means that user space needs to have awareness of the "default endianness".
>>> With approach 2 you don't care about endianness at all anymore - you just get a payload that the host process can read in.
>>>
>>> Obviously both approaches would work as long as they're properly defined :).
>>>
>>>
>>> Alex
>>>
>>>
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
Peter Maydell Jan. 25, 2014, 4:23 p.m. UTC | #7
On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>
> your proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>
> -> ldw_p() will give us the correct value to work with
>
> current proposal:
>
>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>
> -> *(uint32_t*)data will give us the correct value to work with

For completeness, ditto, ARM:
Scott's proposal (you end up with the same values in
the data array as for PPC, so userspace has to know the
"default" endianness so it can do a byteswap if the
process endianness doesn't match it; on QEMU
ldl_p() handles this for us, as you say):

   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

current proposal:

   BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
   LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

The values in the data array are different than on
PPC, reflecting the fact that the "default" endianness
is different; userspace does

 -> *(uint32_t*)data will give us the correct value to work with

regardless of what the guest CPU arch is.

> There are pros and cons for both approaches.
>
> Pro approach 1 is that it fits the way data[] is read today,
> so no QEMU changes are required. However, it means
> that user space needs to have awareness of the
> "default endianness".
> With approach 2 you don't care about endianness at all
> anymore - you just get a payload that the host process
> can read in.
>
> Obviously both approaches would work as long as they're
> properly defined :).

Agreed with all of that.

thanks
-- PMM
Christoffer Dall Jan. 25, 2014, 6:31 p.m. UTC | #8
On Sat, Jan 25, 2014 at 04:23:00PM +0000, Peter Maydell wrote:
> On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
> > Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
> >
> > your proposal:
> >
> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
> >   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> >   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> >
> > -> ldw_p() will give us the correct value to work with
> >
> > current proposal:
> >
> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
> >   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> >   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
> >
> > -> *(uint32_t*)data will give us the correct value to work with
> 
> For completeness, ditto, ARM:
> Scott's proposal (you end up with the same values in
> the data array as for PPC, so userspace has to know the
> "default" endianness so it can do a byteswap if the
> process endianness doesn't match it; on QEMU
> ldl_p() handles this for us, as you say):
> 
>    BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>    LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> current proposal:
> 
>    BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>    LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
> 
> The values in the data array are different than on
> PPC, reflecting the fact that the "default" endianness
> is different; userspace does
> 
>  -> *(uint32_t*)data will give us the correct value to work with
> 
> regardless of what the guest CPU arch is.
> 
> > There are pros and cons for both approaches.
> >
> > Pro approach 1 is that it fits the way data[] is read today,
> > so no QEMU changes are required. However, it means
> > that user space needs to have awareness of the
> > "default endianness".
> > With approach 2 you don't care about endianness at all
> > anymore - you just get a payload that the host process
> > can read in.
> >
> > Obviously both approaches would work as long as they're
> > properly defined :).
> 
> Agreed with all of that.
> 
Agreed as well.

How do we settle on one versus the other?

And if we agree on the current proposal (from Alex/Peter/Me) is the
current suggested wording ok, or can someone suggest a better wording?

We could for example include the the matrices as above for all known
architectures, but that does seem overly verbose.

-Christoffer
vkamensky Jan. 26, 2014, 3:46 a.m. UTC | #9
On 25 January 2014 10:31, Christoffer Dall <christoffer.dall@linaro.org> wrote:
> On Sat, Jan 25, 2014 at 04:23:00PM +0000, Peter Maydell wrote:
>> On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
>> > Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like

Alex, could you please for the clarification purpose, tell
us what instructions BE guest and LE guest run in below
cases. Suppose r0 register has device address, and r1
register has word as 0x01020304. And suppose we have
regular memory instead of io at r0 address. How do you
see 'unsigned char data[4]' will look like if 'data'
address is at r0.

For this, for a moment, I think, we can forget about KVM,
just consider BE guest and LE guest running on regular
h/w.

Next we will consider cases as below when the same
BE guest and LE guest runs on top of BE host or LE
host KVM in different proposals combinations. Effectively
defining non-virtual examples first and consider how they
will work under KVM.

I was thinking that that you consider

BE guest:
   stw r1, 0, r0
just write r1 at r0 address

LE guest
   stwbrx r1, 0, r0
stwbrx - word byteswap r1 value and write it at r0 address

which would produce at r0 address
data[] = {0x01, 0x02, 0x03, 0x04}
for both above cases, and have the same effect
on device if its register is mapped at r0 address.

But judging by your other response in previous email, I
think you have something different in mind. Please
describe it.

Thanks,
Victor

>> > your proposal:
>> >
>> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>> >   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>> >   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>> >
>> > -> ldw_p() will give us the correct value to work with
>> >
>> > current proposal:
>> >
>> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>> >   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>> >   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>> >
>> > -> *(uint32_t*)data will give us the correct value to work with
>>
>> For completeness, ditto, ARM:
>> Scott's proposal (you end up with the same values in
>> the data array as for PPC, so userspace has to know the
>> "default" endianness so it can do a byteswap if the
>> process endianness doesn't match it; on QEMU
>> ldl_p() handles this for us, as you say):
>>
>>    BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>    LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> current proposal:
>>
>>    BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>    LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> The values in the data array are different than on
>> PPC, reflecting the fact that the "default" endianness
>> is different; userspace does
>>
>>  -> *(uint32_t*)data will give us the correct value to work with
>>
>> regardless of what the guest CPU arch is.
>>
>> > There are pros and cons for both approaches.
>> >
>> > Pro approach 1 is that it fits the way data[] is read today,
>> > so no QEMU changes are required. However, it means
>> > that user space needs to have awareness of the
>> > "default endianness".
>> > With approach 2 you don't care about endianness at all
>> > anymore - you just get a payload that the host process
>> > can read in.
>> >
>> > Obviously both approaches would work as long as they're
>> > properly defined :).
>>
>> Agreed with all of that.
>>
> Agreed as well.
>
> How do we settle on one versus the other?
>
> And if we agree on the current proposal (from Alex/Peter/Me) is the
> current suggested wording ok, or can someone suggest a better wording?
>
> We could for example include the the matrices as above for all known
> architectures, but that does seem overly verbose.
>
> -Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
vkamensky Jan. 26, 2014, 5:43 a.m. UTC | #10
On 25 January 2014 19:46, Victor Kamensky <victor.kamensky@linaro.org> wrote:
> On 25 January 2014 10:31, Christoffer Dall <christoffer.dall@linaro.org> wrote:
>> On Sat, Jan 25, 2014 at 04:23:00PM +0000, Peter Maydell wrote:
>>> On 25 January 2014 02:15, Alexander Graf <agraf@suse.de> wrote:
>>> > Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>
> Alex, could you please for the clarification purpose, tell
> us what instructions BE guest and LE guest run in below
> cases. Suppose r0 register has device address, and r1
> register has word as 0x01020304. And suppose we have
> regular memory instead of io at r0 address. How do you
> see 'unsigned char data[4]' will look like if 'data'
> address is at r0.
>
> For this, for a moment, I think, we can forget about KVM,
> just consider BE guest and LE guest running on regular
> h/w.
>
> Next we will consider cases as below when the same
> BE guest and LE guest runs on top of BE host or LE
> host KVM in different proposals combinations. Effectively
> defining non-virtual examples first and consider how they
> will work under KVM.
>
> I was thinking that that you consider
>
> BE guest:
>    stw r1, 0, r0
> just write r1 at r0 address
>
> LE guest
>    stwbrx r1, 0, r0
> stwbrx - word byteswap r1 value and write it at r0 address
>
> which would produce at r0 address
> data[] = {0x01, 0x02, 0x03, 0x04}
> for both above cases, and have the same effect
> on device if its register is mapped at r0 address.
>
> But judging by your other response in previous email, I
> think you have something different in mind. Please
> describe it.

Alex, I reread your tables again. I think that just
'stw r1, 0, r0' for both LE and BE will match them. As
far as data[] concerned in BE it will be data[] = {0x01,
0x02, 0x03, 0x04}, and in LE it wil lbe data[] = {0x04,
0x03, 0x02, 0x01}.

Do we agree that these are two different memory
transactions, as far as device concerned?

> Thanks,
> Victor
>
>>> > your proposal:
>>> >
>>> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>> >   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>> >   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>> >
>>> > -> ldw_p() will give us the correct value to work with
>>> >
>>> > current proposal:
>>> >
>>> >   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>> >   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>> >   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>> >   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>> >
>>> > -> *(uint32_t*)data will give us the correct value to work with
>>>
>>> For completeness, ditto, ARM:
>>> Scott's proposal (you end up with the same values in
>>> the data array as for PPC, so userspace has to know the
>>> "default" endianness so it can do a byteswap

Actually userland need to mange mismatch between
native internal data of simulated device and device
endianity. If native endianity mismatches device qemu
need to do byteswap. Otherwise it will be just copy it
into mmio.data as bytes array.
In qemu there is already infrastructure that handles
that. My ARM BE KVM patches use Scott's mmio.
data[] definition and I did not do anything in qemu
to make that work.

>>> if the
>>> process endianness doesn't match it; on QEMU
>>> ldl_p() handles this for us, as you say):
>>>
>>>    BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>    LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }

Peter, thank you for adding ARM case. It really
helps to see the difference between definitions.

Don't you agree that Scott's definition has advantage
because its interpretation does not depend on
CPU type?

It is always the same for the same 'device access
operation', guest endianity, host endianity. If one
look at 'stw r1, 0, r0' in BE guest, and 'stwbrx r1,
0, r0' in LE guest, which is the same memory
transaction as device concerned. Table using
Scott's definition will always look like

LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
BE guest, LE host: { 0x01, 0x02, 0x03, 0x04 }
LE guest, LE host: { 0x01, 0x02, 0x03, 0x04 }

if ARM would work with this device to do the
same thing, its table will look exactly the same.

>>>
>>> current proposal:
>>>
>>>    BE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>>    LE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>>    BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>>    LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>>
>>> The values in the data array are different than on
>>> PPC, reflecting the fact that the "default" endianness
>>> is different; userspace does
>>>
>>>  -> *(uint32_t*)data will give us the correct value to work with
>>>
>>> regardless of what the guest CPU arch is.
>>>
>>> > There are pros and cons for both approaches.
>>> >
>>> > Pro approach 1 is that it fits the way data[] is read today,
>>> > so no QEMU changes are required. However, it means
>>> > that user space needs to have awareness of the
>>> > "default endianness".
>>> > With approach 2 you don't care about endianness at all
>>> > anymore - you just get a payload that the host process
>>> > can read in.
>>> >
>>> > Obviously both approaches would work as long as they're
>>> > properly defined :).
>>>
>>> Agreed with all of that.
>>>
>> Agreed as well.
>>
>> How do we settle on one versus the other?

Not surprisingly, I vote for Scott's proposal :). ARM BE KVM
patches previously posted use this definition of mmio.data[].

Scott's definition interpretation does not depend on CPU type.
It is much simpler. It does not use notion not very well defined
like "real bus". it does not use word 'endianness', byteswap,
cast example, which IMHO makes it more robust.

Thanks,
Victor

>>
>> And if we agree on the current proposal (from Alex/Peter/Me) is the
>> current suggested wording ok, or can someone suggest a better wording?
>>
>> We could for example include the the matrices as above for all known
>> architectures, but that does seem overly verbose.
>>
>> -Christoffer
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
Peter Maydell Jan. 27, 2014, 9:42 a.m. UTC | #11
On 27 January 2014 07:52, Alexander Graf <agraf@suse.de> wrote:
>
> On 26.01.2014, at 06:43, Victor Kamensky <victor.kamensky@linaro.org> wrote:
>> Scott's definition interpretation does not depend on CPU type.
>> It is much simpler. It does not use notion not very well defined
>> like "real bus". it does not use word 'endianness', byteswap,
>> cast example, which IMHO makes it more robust.
>
> The more I think about it the more I tend to agree.

Personally I find Scott's definition rather more confusing,
at least in wording, and it does require QEMU to depend
on the CPU type. (It took me a fair bit of thinking to confirm
that it really was different to the proposal in Christoffer's
patch and not just a rewording, for example). Anyway, would
somebody like to formulate a proposed wording change to
the API doc for that version? (For preference, one which
doesn't try to describe it in terms of "as if the guest wrote
to memory", given that that is IMHO ambiguous and
confusing.)

thanks
-- PMM
Peter Maydell Jan. 28, 2014, 8:55 a.m. UTC | #12
On 28 January 2014 01:59, Scott Wood <scottwood@freescale.com> wrote:
> On Sat, 2014-01-25 at 03:15 +0100, Alexander Graf wrote:
>> Ok, let's go through the combinations for a 32-bit write of 0x01020304 on PPC and what data[] looks like
>>
>> your proposal:
>>
>>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>   BE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>
>> -> ldw_p() will give us the correct value to work with
>
> Do you mean ldl_p(), which just does a memcpy()?  How is that different
> from *(uint32_t*)data, other than complying with C99 aliasing rules?
>
> What you get from doing ldl_p() is the value as interpreted by the host
> endianness.  If the memory region endianness is different from the host
> endianness, you need to swap.  adjust_endianness() sort of does this,
> but it's using TARGET_WORDS_BIGENDIAN instead of the host endianness
> which will break if they are not the same.  It works for TCG because it
> passes the MMIO value in target endianness rather than host --
> effectively the TARGET_WORDS_BIGENDIAN usages in memory.c and
> include/exec/softmmu_template.h cancel each other.
>
>> current proposal:
>>
>>   BE guest, BE host: { 0x01, 0x02, 0x03, 0x04 }
>>   LE guest, BE host: { 0x04, 0x03, 0x02, 0x01 }
>>   BE guest, LE host:  { 0x04, 0x03, 0x02, 0x01 }
>>   LE guest, LE host:  { 0x01, 0x02, 0x03, 0x04 }
>>
>> -> *(uint32_t*)data will give us the correct value to work with
>
> *(uint32_t*)data in the "LE guest, LE host" case will get you 0x04030201
> -- how is that the correct value to work with?  Did you make this table
> with the assumption that big-endian interpretation is inherently "the
> correct value" on PPC?

So for ARM, if you put the CPU in big endian mode (by setting
CPSR.E) then the architecture defines that what happens here
is that before any access to the bus the CPU does a byte lane
swap of the data. So a register value of 0x04030201 sent
as a 32 bit store appears on the bus as 0x01020304. If the CPU
is in little endian mode no swapping happens and values in
registers go direct to the bus.

The assumption we made here is that PPC is similar except
that it does no swapping on BE, and swaps on LE. Is this wrong?
(I did look through the PPC architecture docs but couldn't find
anything that clearly stated what happened.)

> I suppose you could define one endianness as "natural" for a particular
> platform, and the other as swapped, but I don't think that's a useful
> way to look at it except for things like virtio that are defined as
> being native-endian (and I don't think KVM_EXIT_MMIO is the right layer
> to try to sort that problem out, as it just introduces confusion for
> non-virtio MMIO).

As I say above, for ARM (v7, BE8) considering one endianness as
'natural' is exactly how the architecture works. (We don't need to
care about BE32, fortunately, as it is obsolete: it used to work kind
of similarly except that it was word invariant and worked by XORing
the address with 1 or 3 when the CPU was in BE mode...)

> Unfortunately, that appears to currently be how QEMU works, so either
> QEMU needs to be fixed in a way that is compatible with currently
> working scenarios (e.g. by changing adjust_endianness() to use host
> endianness and changing the TCG path to match), or it needs to be
> explicitly defined in the API what assumptions are being made regarding
> the default endianness on each architecture.  I believe these
> assumptions are present but implicit in the current proposal (if Alex's
> intepretation of it matches what was intended).
>
>> With approach 2 you don't care about endianness at all anymore - you
>> just get a payload that the host process can read in.
>
> You always need to care about endianness -- something has to result in
> PCI registers appearing swapped compared to e500 CCSR registers.

That's the result of some bridge doing a byte lane swap, isn't it?
Obviously swapping bridges need to be modeled (and in QEMU
we currently do that rather badly, by marking the device endpoints
as having an endianness and inferring the existence of a swap)
but that's not related to the initial CPU-to-bus interface.

thanks
-- PMM
diff mbox

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 366bf4b..6dbd68c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2565,6 +2565,11 @@  executed a memory-mapped I/O instruction which could not be satisfied
 by kvm.  The 'data' member contains the written data if 'is_write' is
 true, and should be filled by application code otherwise.
 
+The 'data' member byte order is host kernel native endianness, regardless of
+the endianness of the guest, and represents the the value as it would go on the
+bus in real hardware.  The host user space should always be able to do:
+<type> val = *((<type> *)mmio.data).
+
 NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR,
       KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding
 operations are complete (and guest state is consistent) only after userspace