diff mbox

[1/2] ARM: kvm: define PAGE_S2_DEVICE as read-only by default

Message ID 1410603462-28900-1-git-send-email-ard.biesheuvel@linaro.org
State Superseded
Headers show

Commit Message

Ard Biesheuvel Sept. 13, 2014, 10:17 a.m. UTC
Now that we support read-only memslots, we need to make sure that
pass-through device mappings are not mapped writable if the guest
has requested them to be read-only. The existing implementation
already honours this by calling kvm_set_s2pte_writable() on the new
pte in case of writable mappings, so all we need to do is define
the default pgprot_t value used for devices to be PTE_S2_RDONLY.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Marc Zyngier Sept. 13, 2014, 10:41 a.m. UTC | #1
Hi Ard,

On 2014-09-13 11:17, Ard Biesheuvel wrote:
> Now that we support read-only memslots, we need to make sure that
> pass-through device mappings are not mapped writable if the guest
> has requested them to be read-only. The existing implementation
> already honours this by calling kvm_set_s2pte_writable() on the new
> pte in case of writable mappings, so all we need to do is define
> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I feel very uncomfortable with this change. Why would we map a device 
RO? Is that only for completeness sake?

Note that we also use PAGE_S2_DEVICE for things that are not mapped 
through a memslot, such as the GIC.

Thanks,

         M.

> ---
>  arch/arm/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/pgtable.h 
> b/arch/arm/include/asm/pgtable.h
> index 01baef07cd0c..92b2fbe18868 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -100,7 +100,7 @@ extern pgprot_t		pgprot_s2_device;
>  #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
>  #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
>  #define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDWR)
> +#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
>
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY |
> L_PTE_XN | L_PTE_NONE)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | 
> L_PTE_XN)
Ard Biesheuvel Sept. 13, 2014, 11:15 a.m. UTC | #2
On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
> Hi Ard,
>
> On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>
>> Now that we support read-only memslots, we need to make sure that
>> pass-through device mappings are not mapped writable if the guest
>> has requested them to be read-only. The existing implementation
>> already honours this by calling kvm_set_s2pte_writable() on the new
>> pte in case of writable mappings, so all we need to do is define
>> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
>
> I feel very uncomfortable with this change. Why would we map a device RO? Is
> that only for completeness sake?
>

We would map a device RO so that QEMU (or whatever is managing KVM)
can emulate the writes. I don't have a clear cut use case, to be
honest, but setting up a writable mapping for a memslot that was
explicitly set up as read-only seems wrong in any case.

Note that the particular problem I was seeing was primarily caused by
kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
this particular issue as well.

> Note that we also use PAGE_S2_DEVICE for things that are not mapped through
> a memslot, such as the GIC.
>

Yes, and I realize now that this breaks it.
My apologies: I have an additional patch locally that sets up MMIO
ranges in one go instead of faulting them in one page at a time as we
do now, and there the read-write case is handled correctly in
kvm_phys_addr_ioremap(). However, I thought it was better to send
these out separately first, but apparently not.

So if we can agree on whether or not MMIO backed mappings should be
read-write even if the memslot says no, I will follow up with a proper
series if there are still changes required.
Christoffer Dall Sept. 13, 2014, 5:06 p.m. UTC | #3
On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > Hi Ard,
> >
> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
> >>
> >> Now that we support read-only memslots, we need to make sure that
> >> pass-through device mappings are not mapped writable if the guest
> >> has requested them to be read-only. The existing implementation
> >> already honours this by calling kvm_set_s2pte_writable() on the new
> >> pte in case of writable mappings, so all we need to do is define
> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
> >>
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >
> >
> > I feel very uncomfortable with this change. Why would we map a device RO? Is
> > that only for completeness sake?
> >
> 
> We would map a device RO so that QEMU (or whatever is managing KVM)
> can emulate the writes. I don't have a clear cut use case, to be
> honest, but setting up a writable mapping for a memslot that was
> explicitly set up as read-only seems wrong in any case.

Agreed, if it doesn't ever make sense to do so, then we should return an
error to user space if userspace attempts such a configuration.  The
current code is just weird.

> 
> Note that the particular problem I was seeing was primarily caused by
> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
> this particular issue as well.
> 
> > Note that we also use PAGE_S2_DEVICE for things that are not mapped through
> > a memslot, such as the GIC.
> >
> 
> Yes, and I realize now that this breaks it.
> My apologies: I have an additional patch locally that sets up MMIO
> ranges in one go instead of faulting them in one page at a time as we
> do now, and there the read-write case is handled correctly in
> kvm_phys_addr_ioremap(). However, I thought it was better to send
> these out separately first, but apparently not.

I think it is better to change this separately, and then add the ioremap
stuff.  However, you need to change all places that call PAGE_S2_DEVICE
and expect a RDWR memory region, this happens to be only
kvm_phys_addr_ioremap() for now.

> 
> So if we can agree on whether or not MMIO backed mappings should be
> read-write even if the memslot says no, I will follow up with a proper
> series if there are still changes required.
> 

I guess it could be theoretically useful to have read-only device memory
regions, and I can't think of why it would violate the architecture.

That said, I don't have any more clear use cases in mind, and we
definitely shouldn't just silently ignore the read-only flag from user
space and make the region writeable.  If we don't want to allow this
behavior, we can return an error in kvm_arch_create_memslot(), which
will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM.

Thanks,
-Christoffer
Ard Biesheuvel Sept. 14, 2014, 4:49 a.m. UTC | #4
On 13 September 2014 19:06, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
>> > Hi Ard,
>> >
>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>> >>
>> >> Now that we support read-only memslots, we need to make sure that
>> >> pass-through device mappings are not mapped writable if the guest
>> >> has requested them to be read-only. The existing implementation
>> >> already honours this by calling kvm_set_s2pte_writable() on the new
>> >> pte in case of writable mappings, so all we need to do is define
>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> >
>> >
>> > I feel very uncomfortable with this change. Why would we map a device RO? Is
>> > that only for completeness sake?
>> >
>>
>> We would map a device RO so that QEMU (or whatever is managing KVM)
>> can emulate the writes. I don't have a clear cut use case, to be
>> honest, but setting up a writable mapping for a memslot that was
>> explicitly set up as read-only seems wrong in any case.
>
> Agreed, if it doesn't ever make sense to do so, then we should return an
> error to user space if userspace attempts such a configuration.  The
> current code is just weird.
>
>>
>> Note that the particular problem I was seeing was primarily caused by
>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
>> this particular issue as well.
>>
>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped through
>> > a memslot, such as the GIC.
>> >
>>
>> Yes, and I realize now that this breaks it.
>> My apologies: I have an additional patch locally that sets up MMIO
>> ranges in one go instead of faulting them in one page at a time as we
>> do now, and there the read-write case is handled correctly in
>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>> these out separately first, but apparently not.
>
> I think it is better to change this separately, and then add the ioremap
> stuff.  However, you need to change all places that call PAGE_S2_DEVICE
> and expect a RDWR memory region, this happens to be only
> kvm_phys_addr_ioremap() for now.
>
>>
>> So if we can agree on whether or not MMIO backed mappings should be
>> read-write even if the memslot says no, I will follow up with a proper
>> series if there are still changes required.
>>
>
> I guess it could be theoretically useful to have read-only device memory
> regions, and I can't think of why it would violate the architecture.
>

We have to handle it either way. But after reading D4.5.3 (Table
D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39
"ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
first place, and we could just revert that patch and everything would
work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
matter what memtype the S2 translation specifies)

> That said, I don't have any more clear use cases in mind, and we
> definitely shouldn't just silently ignore the read-only flag from user
> space and make the region writeable.  If we don't want to allow this
> behavior, we can return an error in kvm_arch_create_memslot(), which
> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM.
>

Well, I am not sure how easy it is to find out beforehand (i.e., at
ioctl time) what the nature of the backing is, and you have to deal
with hva_to_pfn() potentially returning a VM_PFNMAP pfn or
PageReserved pages anyway.
So just mapping everything as MT_NORMAL actually seems like the sanest
thing to do, so imo we should revert the patch. The only other
question I had is whether it would be better to map a MMIO region in
one go, but we can discuss that separately.
Marc Zyngier Sept. 14, 2014, 9:09 a.m. UTC | #5
On 2014-09-14 05:49, Ard Biesheuvel wrote:
> On 13 September 2014 19:06, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> 
>>> wrote:
>>> > Hi Ard,
>>> >
>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>> >>
>>> >> Now that we support read-only memslots, we need to make sure 
>>> that
>>> >> pass-through device mappings are not mapped writable if the 
>>> guest
>>> >> has requested them to be read-only. The existing implementation
>>> >> already honours this by calling kvm_set_s2pte_writable() on the 
>>> new
>>> >> pte in case of writable mappings, so all we need to do is define
>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>> >>
>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> >
>>> >
>>> > I feel very uncomfortable with this change. Why would we map a 
>>> device RO? Is
>>> > that only for completeness sake?
>>> >
>>>
>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>> can emulate the writes. I don't have a clear cut use case, to be
>>> honest, but setting up a writable mapping for a memslot that was
>>> explicitly set up as read-only seems wrong in any case.
>>
>> Agreed, if it doesn't ever make sense to do so, then we should 
>> return an
>> error to user space if userspace attempts such a configuration.  The
>> current code is just weird.
>>
>>>
>>> Note that the particular problem I was seeing was primarily caused 
>>> by
>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it 
>>> unveiled
>>> this particular issue as well.
>>>
>>> > Note that we also use PAGE_S2_DEVICE for things that are not 
>>> mapped through
>>> > a memslot, such as the GIC.
>>> >
>>>
>>> Yes, and I realize now that this breaks it.
>>> My apologies: I have an additional patch locally that sets up MMIO
>>> ranges in one go instead of faulting them in one page at a time as 
>>> we
>>> do now, and there the read-write case is handled correctly in
>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>> these out separately first, but apparently not.
>>
>> I think it is better to change this separately, and then add the 
>> ioremap
>> stuff.  However, you need to change all places that call 
>> PAGE_S2_DEVICE
>> and expect a RDWR memory region, this happens to be only
>> kvm_phys_addr_ioremap() for now.
>>
>>>
>>> So if we can agree on whether or not MMIO backed mappings should be
>>> read-write even if the memslot says no, I will follow up with a 
>>> proper
>>> series if there are still changes required.
>>>
>>
>> I guess it could be theoretically useful to have read-only device 
>> memory
>> regions, and I can't think of why it would violate the architecture.
>>
>
> We have to handle it either way. But after reading D4.5.3 (Table
> D4-40) of the ARM ARM, I am wondering why we needed patch 
> b88657674d39
> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
> first place, and we could just revert that patch and everything would
> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
> matter what memtype the S2 translation specifies)

We've been there before:
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html

>> That said, I don't have any more clear use cases in mind, and we
>> definitely shouldn't just silently ignore the read-only flag from 
>> user
>> space and make the region writeable.  If we don't want to allow this
>> behavior, we can return an error in kvm_arch_create_memslot(), which
>> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return 
>> -ENOMEM.
>>
>
> Well, I am not sure how easy it is to find out beforehand (i.e., at
> ioctl time) what the nature of the backing is, and you have to deal
> with hva_to_pfn() potentially returning a VM_PFNMAP pfn or
> PageReserved pages anyway.
> So just mapping everything as MT_NORMAL actually seems like the 
> sanest
> thing to do, so imo we should revert the patch. The only other
> question I had is whether it would be better to map a MMIO region in
> one go, but we can discuss that separately.

Aside from the MT_NORMAL thing, the only saving we'd get by dynamically 
maping MMIO regions would be the page tables. Not very useful in my 
opinion.

Thanks,

         M.
Ard Biesheuvel Sept. 14, 2014, 9:43 a.m. UTC | #6
On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote:
> On 2014-09-14 05:49, Ard Biesheuvel wrote:
>>
>> On 13 September 2014 19:06, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>>>
>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>>>
>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>> > Hi Ard,
>>>> >
>>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>>> >>
>>>> >> Now that we support read-only memslots, we need to make sure that
>>>> >> pass-through device mappings are not mapped writable if the guest
>>>> >> has requested them to be read-only. The existing implementation
>>>> >> already honours this by calling kvm_set_s2pte_writable() on the new
>>>> >> pte in case of writable mappings, so all we need to do is define
>>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>>> >>
>>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>> >
>>>> >
>>>> > I feel very uncomfortable with this change. Why would we map a device
>>>> > RO? Is
>>>> > that only for completeness sake?
>>>> >
>>>>
>>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>>> can emulate the writes. I don't have a clear cut use case, to be
>>>> honest, but setting up a writable mapping for a memslot that was
>>>> explicitly set up as read-only seems wrong in any case.
>>>
>>>
>>> Agreed, if it doesn't ever make sense to do so, then we should return an
>>> error to user space if userspace attempts such a configuration.  The
>>> current code is just weird.
>>>
>>>>
>>>> Note that the particular problem I was seeing was primarily caused by
>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
>>>> this particular issue as well.
>>>>
>>>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped
>>>> > through
>>>> > a memslot, such as the GIC.
>>>> >
>>>>
>>>> Yes, and I realize now that this breaks it.
>>>> My apologies: I have an additional patch locally that sets up MMIO
>>>> ranges in one go instead of faulting them in one page at a time as we
>>>> do now, and there the read-write case is handled correctly in
>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>>> these out separately first, but apparently not.
>>>
>>>
>>> I think it is better to change this separately, and then add the ioremap
>>> stuff.  However, you need to change all places that call PAGE_S2_DEVICE
>>> and expect a RDWR memory region, this happens to be only
>>> kvm_phys_addr_ioremap() for now.
>>>
>>>>
>>>> So if we can agree on whether or not MMIO backed mappings should be
>>>> read-write even if the memslot says no, I will follow up with a proper
>>>> series if there are still changes required.
>>>>
>>>
>>> I guess it could be theoretically useful to have read-only device memory
>>> regions, and I can't think of why it would violate the architecture.
>>>
>>
>> We have to handle it either way. But after reading D4.5.3 (Table
>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39
>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
>> first place, and we could just revert that patch and everything would
>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
>> matter what memtype the S2 translation specifies)
>
>
> We've been there before:
> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html
>

Ah right. So why did those patches not make it in?

Any way, in this case, we have to choose between either dropping the
special case for kvm_is_mmio_pfn(), or fix it honor the writable.
My vote would be to get rid of it.

>>> That said, I don't have any more clear use cases in mind, and we
>>> definitely shouldn't just silently ignore the read-only flag from user
>>> space and make the region writeable.  If we don't want to allow this
>>> behavior, we can return an error in kvm_arch_create_memslot(), which
>>> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM.
>>>
>>
>> Well, I am not sure how easy it is to find out beforehand (i.e., at
>> ioctl time) what the nature of the backing is, and you have to deal
>> with hva_to_pfn() potentially returning a VM_PFNMAP pfn or
>> PageReserved pages anyway.
>> So just mapping everything as MT_NORMAL actually seems like the sanest
>> thing to do, so imo we should revert the patch. The only other
>> question I had is whether it would be better to map a MMIO region in
>> one go, but we can discuss that separately.
>
>
> Aside from the MT_NORMAL thing, the only saving we'd get by dynamically
> maping MMIO regions would be the page tables. Not very useful in my opinion.
>

OK, so you agree faulting it in entirely upon the first abort is a
sane thing to do then.
So 1 patch to change that and 1 to revert the PAGE_S2_DEVICE thing then?
Ard Biesheuvel Sept. 14, 2014, 10:57 p.m. UTC | #7
On 14 September 2014 11:43, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote:
>> On 2014-09-14 05:49, Ard Biesheuvel wrote:
>>>
>>> On 13 September 2014 19:06, Christoffer Dall
>>> <christoffer.dall@linaro.org> wrote:
>>>>
>>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>>>>
>>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>> > Hi Ard,
>>>>> >
>>>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>>>> >>
>>>>> >> Now that we support read-only memslots, we need to make sure that
>>>>> >> pass-through device mappings are not mapped writable if the guest
>>>>> >> has requested them to be read-only. The existing implementation
>>>>> >> already honours this by calling kvm_set_s2pte_writable() on the new
>>>>> >> pte in case of writable mappings, so all we need to do is define
>>>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>>>> >>
>>>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>> >
>>>>> >
>>>>> > I feel very uncomfortable with this change. Why would we map a device
>>>>> > RO? Is
>>>>> > that only for completeness sake?
>>>>> >
>>>>>
>>>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>>>> can emulate the writes. I don't have a clear cut use case, to be
>>>>> honest, but setting up a writable mapping for a memslot that was
>>>>> explicitly set up as read-only seems wrong in any case.
>>>>
>>>>
>>>> Agreed, if it doesn't ever make sense to do so, then we should return an
>>>> error to user space if userspace attempts such a configuration.  The
>>>> current code is just weird.
>>>>
>>>>>
>>>>> Note that the particular problem I was seeing was primarily caused by
>>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
>>>>> this particular issue as well.
>>>>>
>>>>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped
>>>>> > through
>>>>> > a memslot, such as the GIC.
>>>>> >
>>>>>
>>>>> Yes, and I realize now that this breaks it.
>>>>> My apologies: I have an additional patch locally that sets up MMIO
>>>>> ranges in one go instead of faulting them in one page at a time as we
>>>>> do now, and there the read-write case is handled correctly in
>>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>>>> these out separately first, but apparently not.
>>>>
>>>>
>>>> I think it is better to change this separately, and then add the ioremap
>>>> stuff.  However, you need to change all places that call PAGE_S2_DEVICE
>>>> and expect a RDWR memory region, this happens to be only
>>>> kvm_phys_addr_ioremap() for now.
>>>>
>>>>>
>>>>> So if we can agree on whether or not MMIO backed mappings should be
>>>>> read-write even if the memslot says no, I will follow up with a proper
>>>>> series if there are still changes required.
>>>>>
>>>>
>>>> I guess it could be theoretically useful to have read-only device memory
>>>> regions, and I can't think of why it would violate the architecture.
>>>>
>>>
>>> We have to handle it either way. But after reading D4.5.3 (Table
>>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39
>>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
>>> first place, and we could just revert that patch and everything would
>>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
>>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
>>> matter what memtype the S2 translation specifies)
>>
>>
>> We've been there before:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html
>>
>
> Ah right. So why did those patches not make it in?
>

Never mind. I read the whole thread this time.

So, in summary, there is a concern that a malicious guest may request
a cachable mapping for a device range, in an attempt to manipulate the
VGIC or other device memory of another VM.
I think that concern only applies to writable mappings, so perhaps we
should just change

if (kvm_is_mmio_pfn(pfn))

to

if (kvm_is_mmio_pfn(pfn) && writable)

and be done with it (which is coincidentally the very first naive fix
I suggested for the issue i was seeing)
That way, we never map read-only MMIO regions writable, and rely on
the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those
regions are uncached.
(Wouldn't hurt to add a comment to explain it, I suppose)
Peter Maydell Sept. 15, 2014, 3:37 a.m. UTC | #8
On 14 September 2014 15:57, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> So, in summary, there is a concern that a malicious guest may request
> a cachable mapping for a device range, in an attempt to manipulate the
> VGIC or other device memory of another VM.
> I think that concern only applies to writable mappings

I think it also applies to read-only mappings, because it would
still be permitting the guest to set up a situation with mismatched
memory attributes with potentially unpleasant effects for the
other guest (which no longer gets the guarantees it should
get from the fact it has mapped the VGIC as Device memory).

thanks
-- PMM
Mario Smarduch Sept. 15, 2014, 7:41 p.m. UTC | #9
I've been working around the edges of this discussion, and
maybe be little unclear on this.

But the manuals say intersection of Stage1/Stage2 permissions are
used. If guest sets stage1 to read-only then setting stage 2
to read-only or read-write should have no impact. So why
should stage 2 RW be changed?

- Mario

On 09/14/2014 03:57 PM, Ard Biesheuvel wrote:
> On 14 September 2014 11:43, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote:
>>> On 2014-09-14 05:49, Ard Biesheuvel wrote:
>>>>
>>>> On 13 September 2014 19:06, Christoffer Dall
>>>> <christoffer.dall@linaro.org> wrote:
>>>>>
>>>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>>>>>
>>>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote:
>>>>>>> Hi Ard,
>>>>>>>
>>>>>>> On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>>>>>>>
>>>>>>>> Now that we support read-only memslots, we need to make sure that
>>>>>>>> pass-through device mappings are not mapped writable if the guest
>>>>>>>> has requested them to be read-only. The existing implementation
>>>>>>>> already honours this by calling kvm_set_s2pte_writable() on the new
>>>>>>>> pte in case of writable mappings, so all we need to do is define
>>>>>>>> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>>>>>>>
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>>
>>>>>>>
>>>>>>> I feel very uncomfortable with this change. Why would we map a device
>>>>>>> RO? Is
>>>>>>> that only for completeness sake?
>>>>>>>
>>>>>>
>>>>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>>>>> can emulate the writes. I don't have a clear cut use case, to be
>>>>>> honest, but setting up a writable mapping for a memslot that was
>>>>>> explicitly set up as read-only seems wrong in any case.
>>>>>
>>>>>
>>>>> Agreed, if it doesn't ever make sense to do so, then we should return an
>>>>> error to user space if userspace attempts such a configuration.  The
>>>>> current code is just weird.
>>>>>
>>>>>>
>>>>>> Note that the particular problem I was seeing was primarily caused by
>>>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
>>>>>> this particular issue as well.
>>>>>>
>>>>>>> Note that we also use PAGE_S2_DEVICE for things that are not mapped
>>>>>>> through
>>>>>>> a memslot, such as the GIC.
>>>>>>>
>>>>>>
>>>>>> Yes, and I realize now that this breaks it.
>>>>>> My apologies: I have an additional patch locally that sets up MMIO
>>>>>> ranges in one go instead of faulting them in one page at a time as we
>>>>>> do now, and there the read-write case is handled correctly in
>>>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>>>>> these out separately first, but apparently not.
>>>>>
>>>>>
>>>>> I think it is better to change this separately, and then add the ioremap
>>>>> stuff.  However, you need to change all places that call PAGE_S2_DEVICE
>>>>> and expect a RDWR memory region, this happens to be only
>>>>> kvm_phys_addr_ioremap() for now.
>>>>>
>>>>>>
>>>>>> So if we can agree on whether or not MMIO backed mappings should be
>>>>>> read-write even if the memslot says no, I will follow up with a proper
>>>>>> series if there are still changes required.
>>>>>>
>>>>>
>>>>> I guess it could be theoretically useful to have read-only device memory
>>>>> regions, and I can't think of why it would violate the architecture.
>>>>>
>>>>
>>>> We have to handle it either way. But after reading D4.5.3 (Table
>>>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39
>>>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
>>>> first place, and we could just revert that patch and everything would
>>>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
>>>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
>>>> matter what memtype the S2 translation specifies)
>>>
>>>
>>> We've been there before:
>>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html
>>>
>>
>> Ah right. So why did those patches not make it in?
>>
> 
> Never mind. I read the whole thread this time.
> 
> So, in summary, there is a concern that a malicious guest may request
> a cachable mapping for a device range, in an attempt to manipulate the
> VGIC or other device memory of another VM.
> I think that concern only applies to writable mappings, so perhaps we
> should just change
> 
> if (kvm_is_mmio_pfn(pfn))
> 
> to
> 
> if (kvm_is_mmio_pfn(pfn) && writable)
> 
> and be done with it (which is coincidentally the very first naive fix
> I suggested for the issue i was seeing)
> That way, we never map read-only MMIO regions writable, and rely on
> the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those
> regions are uncached.
> (Wouldn't hurt to add a comment to explain it, I suppose)
>
Peter Maydell Sept. 15, 2014, 7:45 p.m. UTC | #10
On 15 September 2014 12:41, Mario Smarduch <m.smarduch@samsung.com> wrote:
> I've been working around the edges of this discussion, and
> maybe be little unclear on this.
>
> But the manuals say intersection of Stage1/Stage2 permissions are
> used. If guest sets stage1 to read-only then setting stage 2
> to read-only or read-write should have no impact. So why
> should stage 2 RW be changed?

Read versus read-write isn't relevant. The point is
that you mustn't let the guest set up a Normal memory
mapping of the same bit of hardware (phys. address
space) that some other guest or the host has set up
as Device mapping.

-- PMM
Mario Smarduch Sept. 17, 2014, 7:19 p.m. UTC | #11
On 09/14/2014 03:57 PM, Ard Biesheuvel wrote:

> 
> Never mind. I read the whole thread this time.
> 
> So, in summary, there is a concern that a malicious guest may request
> a cachable mapping for a device range, in an attempt to manipulate the
> VGIC or other device memory of another VM.
> I think that concern only applies to writable mappings, so perhaps we
> should just change
> 
> if (kvm_is_mmio_pfn(pfn))
> 
> to
> 
> if (kvm_is_mmio_pfn(pfn) && writable)
Hi Ard,

What if the device passed through is read-only like maybe IPMI
sensors.

> 
> and be done with it (which is coincidentally the very first naive fix
> I suggested for the issue i was seeing)
> That way, we never map read-only MMIO regions writable, and rely on
> the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those
> regions are uncached.
> (Wouldn't hurt to add a comment to explain it, I suppose)
>
diff mbox

Patch

diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 01baef07cd0c..92b2fbe18868 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -100,7 +100,7 @@  extern pgprot_t		pgprot_s2_device;
 #define PAGE_HYP		_MOD_PROT(pgprot_kernel, L_PTE_HYP)
 #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
 #define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDWR)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)