Message ID | 1410603462-28900-1-git-send-email-ard.biesheuvel@linaro.org |
---|---|
State | Superseded |
Headers | show |
Hi Ard, On 2014-09-13 11:17, Ard Biesheuvel wrote: > Now that we support read-only memslots, we need to make sure that > pass-through device mappings are not mapped writable if the guest > has requested them to be read-only. The existing implementation > already honours this by calling kvm_set_s2pte_writable() on the new > pte in case of writable mappings, so all we need to do is define > the default pgprot_t value used for devices to be PTE_S2_RDONLY. > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> I feel very uncomfortable with this change. Why would we map a device RO? Is that only for completeness sake? Note that we also use PAGE_S2_DEVICE for things that are not mapped through a memslot, such as the GIC. Thanks, M. > --- > arch/arm/include/asm/pgtable.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm/include/asm/pgtable.h > b/arch/arm/include/asm/pgtable.h > index 01baef07cd0c..92b2fbe18868 100644 > --- a/arch/arm/include/asm/pgtable.h > +++ b/arch/arm/include/asm/pgtable.h > @@ -100,7 +100,7 @@ extern pgprot_t pgprot_s2_device; > #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP) > #define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP) > #define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY) > -#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDWR) > +#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY) > > #define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | > L_PTE_XN | L_PTE_NONE) > #define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | > L_PTE_XN)
On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: > Hi Ard, > > On 2014-09-13 11:17, Ard Biesheuvel wrote: >> >> Now that we support read-only memslots, we need to make sure that >> pass-through device mappings are not mapped writable if the guest >> has requested them to be read-only. The existing implementation >> already honours this by calling kvm_set_s2pte_writable() on the new >> pte in case of writable mappings, so all we need to do is define >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > I feel very uncomfortable with this change. Why would we map a device RO? Is > that only for completeness sake? > We would map a device RO so that QEMU (or whatever is managing KVM) can emulate the writes. I don't have a clear cut use case, to be honest, but setting up a writable mapping for a memslot that was explicitly set up as read-only seems wrong in any case. Note that the particular problem I was seeing was primarily caused by kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled this particular issue as well. > Note that we also use PAGE_S2_DEVICE for things that are not mapped through > a memslot, such as the GIC. > Yes, and I realize now that this breaks it. My apologies: I have an additional patch locally that sets up MMIO ranges in one go instead of faulting them in one page at a time as we do now, and there the read-write case is handled correctly in kvm_phys_addr_ioremap(). However, I thought it was better to send these out separately first, but apparently not. So if we can agree on whether or not MMIO backed mappings should be read-write even if the memslot says no, I will follow up with a proper series if there are still changes required.
On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: > On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: > > Hi Ard, > > > > On 2014-09-13 11:17, Ard Biesheuvel wrote: > >> > >> Now that we support read-only memslots, we need to make sure that > >> pass-through device mappings are not mapped writable if the guest > >> has requested them to be read-only. The existing implementation > >> already honours this by calling kvm_set_s2pte_writable() on the new > >> pte in case of writable mappings, so all we need to do is define > >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. > >> > >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > > > > I feel very uncomfortable with this change. Why would we map a device RO? Is > > that only for completeness sake? > > > > We would map a device RO so that QEMU (or whatever is managing KVM) > can emulate the writes. I don't have a clear cut use case, to be > honest, but setting up a writable mapping for a memslot that was > explicitly set up as read-only seems wrong in any case. Agreed, if it doesn't ever make sense to do so, then we should return an error to user space if userspace attempts such a configuration. The current code is just weird. > > Note that the particular problem I was seeing was primarily caused by > kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled > this particular issue as well. > > > Note that we also use PAGE_S2_DEVICE for things that are not mapped through > > a memslot, such as the GIC. > > > > Yes, and I realize now that this breaks it. > My apologies: I have an additional patch locally that sets up MMIO > ranges in one go instead of faulting them in one page at a time as we > do now, and there the read-write case is handled correctly in > kvm_phys_addr_ioremap(). However, I thought it was better to send > these out separately first, but apparently not. I think it is better to change this separately, and then add the ioremap stuff. However, you need to change all places that call PAGE_S2_DEVICE and expect a RDWR memory region, this happens to be only kvm_phys_addr_ioremap() for now. > > So if we can agree on whether or not MMIO backed mappings should be > read-write even if the memslot says no, I will follow up with a proper > series if there are still changes required. > I guess it could be theoretically useful to have read-only device memory regions, and I can't think of why it would violate the architecture. That said, I don't have any more clear use cases in mind, and we definitely shouldn't just silently ignore the read-only flag from user space and make the region writeable. If we don't want to allow this behavior, we can return an error in kvm_arch_create_memslot(), which will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM. Thanks, -Christoffer
On 13 September 2014 19:06, Christoffer Dall <christoffer.dall@linaro.org> wrote: > On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: >> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: >> > Hi Ard, >> > >> > On 2014-09-13 11:17, Ard Biesheuvel wrote: >> >> >> >> Now that we support read-only memslots, we need to make sure that >> >> pass-through device mappings are not mapped writable if the guest >> >> has requested them to be read-only. The existing implementation >> >> already honours this by calling kvm_set_s2pte_writable() on the new >> >> pte in case of writable mappings, so all we need to do is define >> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >> >> >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> > >> > >> > I feel very uncomfortable with this change. Why would we map a device RO? Is >> > that only for completeness sake? >> > >> >> We would map a device RO so that QEMU (or whatever is managing KVM) >> can emulate the writes. I don't have a clear cut use case, to be >> honest, but setting up a writable mapping for a memslot that was >> explicitly set up as read-only seems wrong in any case. > > Agreed, if it doesn't ever make sense to do so, then we should return an > error to user space if userspace attempts such a configuration. The > current code is just weird. > >> >> Note that the particular problem I was seeing was primarily caused by >> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled >> this particular issue as well. >> >> > Note that we also use PAGE_S2_DEVICE for things that are not mapped through >> > a memslot, such as the GIC. >> > >> >> Yes, and I realize now that this breaks it. >> My apologies: I have an additional patch locally that sets up MMIO >> ranges in one go instead of faulting them in one page at a time as we >> do now, and there the read-write case is handled correctly in >> kvm_phys_addr_ioremap(). However, I thought it was better to send >> these out separately first, but apparently not. > > I think it is better to change this separately, and then add the ioremap > stuff. However, you need to change all places that call PAGE_S2_DEVICE > and expect a RDWR memory region, this happens to be only > kvm_phys_addr_ioremap() for now. > >> >> So if we can agree on whether or not MMIO backed mappings should be >> read-write even if the memslot says no, I will follow up with a proper >> series if there are still changes required. >> > > I guess it could be theoretically useful to have read-only device memory > regions, and I can't think of why it would violate the architecture. > We have to handle it either way. But after reading D4.5.3 (Table D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39 "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the first place, and we could just revert that patch and everything would work as expected. (In short, D4.5.3 says that MT_DEVICE trumps MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't matter what memtype the S2 translation specifies) > That said, I don't have any more clear use cases in mind, and we > definitely shouldn't just silently ignore the read-only flag from user > space and make the region writeable. If we don't want to allow this > behavior, we can return an error in kvm_arch_create_memslot(), which > will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM. > Well, I am not sure how easy it is to find out beforehand (i.e., at ioctl time) what the nature of the backing is, and you have to deal with hva_to_pfn() potentially returning a VM_PFNMAP pfn or PageReserved pages anyway. So just mapping everything as MT_NORMAL actually seems like the sanest thing to do, so imo we should revert the patch. The only other question I had is whether it would be better to map a MMIO region in one go, but we can discuss that separately.
On 2014-09-14 05:49, Ard Biesheuvel wrote: > On 13 September 2014 19:06, Christoffer Dall > <christoffer.dall@linaro.org> wrote: >> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: >>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> >>> wrote: >>> > Hi Ard, >>> > >>> > On 2014-09-13 11:17, Ard Biesheuvel wrote: >>> >> >>> >> Now that we support read-only memslots, we need to make sure >>> that >>> >> pass-through device mappings are not mapped writable if the >>> guest >>> >> has requested them to be read-only. The existing implementation >>> >> already honours this by calling kvm_set_s2pte_writable() on the >>> new >>> >> pte in case of writable mappings, so all we need to do is define >>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >>> >> >>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>> > >>> > >>> > I feel very uncomfortable with this change. Why would we map a >>> device RO? Is >>> > that only for completeness sake? >>> > >>> >>> We would map a device RO so that QEMU (or whatever is managing KVM) >>> can emulate the writes. I don't have a clear cut use case, to be >>> honest, but setting up a writable mapping for a memslot that was >>> explicitly set up as read-only seems wrong in any case. >> >> Agreed, if it doesn't ever make sense to do so, then we should >> return an >> error to user space if userspace attempts such a configuration. The >> current code is just weird. >> >>> >>> Note that the particular problem I was seeing was primarily caused >>> by >>> kvm_is_mmio_pfn()'s false positive on the zero page, but it >>> unveiled >>> this particular issue as well. >>> >>> > Note that we also use PAGE_S2_DEVICE for things that are not >>> mapped through >>> > a memslot, such as the GIC. >>> > >>> >>> Yes, and I realize now that this breaks it. >>> My apologies: I have an additional patch locally that sets up MMIO >>> ranges in one go instead of faulting them in one page at a time as >>> we >>> do now, and there the read-write case is handled correctly in >>> kvm_phys_addr_ioremap(). However, I thought it was better to send >>> these out separately first, but apparently not. >> >> I think it is better to change this separately, and then add the >> ioremap >> stuff. However, you need to change all places that call >> PAGE_S2_DEVICE >> and expect a RDWR memory region, this happens to be only >> kvm_phys_addr_ioremap() for now. >> >>> >>> So if we can agree on whether or not MMIO backed mappings should be >>> read-write even if the memslot says no, I will follow up with a >>> proper >>> series if there are still changes required. >>> >> >> I guess it could be theoretically useful to have read-only device >> memory >> regions, and I can't think of why it would violate the architecture. >> > > We have to handle it either way. But after reading D4.5.3 (Table > D4-40) of the ARM ARM, I am wondering why we needed patch > b88657674d39 > "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the > first place, and we could just revert that patch and everything would > work as expected. (In short, D4.5.3 says that MT_DEVICE trumps > MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't > matter what memtype the S2 translation specifies) We've been there before: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html >> That said, I don't have any more clear use cases in mind, and we >> definitely shouldn't just silently ignore the read-only flag from >> user >> space and make the region writeable. If we don't want to allow this >> behavior, we can return an error in kvm_arch_create_memslot(), which >> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return >> -ENOMEM. >> > > Well, I am not sure how easy it is to find out beforehand (i.e., at > ioctl time) what the nature of the backing is, and you have to deal > with hva_to_pfn() potentially returning a VM_PFNMAP pfn or > PageReserved pages anyway. > So just mapping everything as MT_NORMAL actually seems like the > sanest > thing to do, so imo we should revert the patch. The only other > question I had is whether it would be better to map a MMIO region in > one go, but we can discuss that separately. Aside from the MT_NORMAL thing, the only saving we'd get by dynamically maping MMIO regions would be the page tables. Not very useful in my opinion. Thanks, M.
On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote: > On 2014-09-14 05:49, Ard Biesheuvel wrote: >> >> On 13 September 2014 19:06, Christoffer Dall >> <christoffer.dall@linaro.org> wrote: >>> >>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: >>>> >>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: >>>> > Hi Ard, >>>> > >>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote: >>>> >> >>>> >> Now that we support read-only memslots, we need to make sure that >>>> >> pass-through device mappings are not mapped writable if the guest >>>> >> has requested them to be read-only. The existing implementation >>>> >> already honours this by calling kvm_set_s2pte_writable() on the new >>>> >> pte in case of writable mappings, so all we need to do is define >>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >>>> >> >>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>> > >>>> > >>>> > I feel very uncomfortable with this change. Why would we map a device >>>> > RO? Is >>>> > that only for completeness sake? >>>> > >>>> >>>> We would map a device RO so that QEMU (or whatever is managing KVM) >>>> can emulate the writes. I don't have a clear cut use case, to be >>>> honest, but setting up a writable mapping for a memslot that was >>>> explicitly set up as read-only seems wrong in any case. >>> >>> >>> Agreed, if it doesn't ever make sense to do so, then we should return an >>> error to user space if userspace attempts such a configuration. The >>> current code is just weird. >>> >>>> >>>> Note that the particular problem I was seeing was primarily caused by >>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled >>>> this particular issue as well. >>>> >>>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped >>>> > through >>>> > a memslot, such as the GIC. >>>> > >>>> >>>> Yes, and I realize now that this breaks it. >>>> My apologies: I have an additional patch locally that sets up MMIO >>>> ranges in one go instead of faulting them in one page at a time as we >>>> do now, and there the read-write case is handled correctly in >>>> kvm_phys_addr_ioremap(). However, I thought it was better to send >>>> these out separately first, but apparently not. >>> >>> >>> I think it is better to change this separately, and then add the ioremap >>> stuff. However, you need to change all places that call PAGE_S2_DEVICE >>> and expect a RDWR memory region, this happens to be only >>> kvm_phys_addr_ioremap() for now. >>> >>>> >>>> So if we can agree on whether or not MMIO backed mappings should be >>>> read-write even if the memslot says no, I will follow up with a proper >>>> series if there are still changes required. >>>> >>> >>> I guess it could be theoretically useful to have read-only device memory >>> regions, and I can't think of why it would violate the architecture. >>> >> >> We have to handle it either way. But after reading D4.5.3 (Table >> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39 >> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the >> first place, and we could just revert that patch and everything would >> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps >> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't >> matter what memtype the S2 translation specifies) > > > We've been there before: > https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html > Ah right. So why did those patches not make it in? Any way, in this case, we have to choose between either dropping the special case for kvm_is_mmio_pfn(), or fix it honor the writable. My vote would be to get rid of it. >>> That said, I don't have any more clear use cases in mind, and we >>> definitely shouldn't just silently ignore the read-only flag from user >>> space and make the region writeable. If we don't want to allow this >>> behavior, we can return an error in kvm_arch_create_memslot(), which >>> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM. >>> >> >> Well, I am not sure how easy it is to find out beforehand (i.e., at >> ioctl time) what the nature of the backing is, and you have to deal >> with hva_to_pfn() potentially returning a VM_PFNMAP pfn or >> PageReserved pages anyway. >> So just mapping everything as MT_NORMAL actually seems like the sanest >> thing to do, so imo we should revert the patch. The only other >> question I had is whether it would be better to map a MMIO region in >> one go, but we can discuss that separately. > > > Aside from the MT_NORMAL thing, the only saving we'd get by dynamically > maping MMIO regions would be the page tables. Not very useful in my opinion. > OK, so you agree faulting it in entirely upon the first abort is a sane thing to do then. So 1 patch to change that and 1 to revert the PAGE_S2_DEVICE thing then?
On 14 September 2014 11:43, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote: >> On 2014-09-14 05:49, Ard Biesheuvel wrote: >>> >>> On 13 September 2014 19:06, Christoffer Dall >>> <christoffer.dall@linaro.org> wrote: >>>> >>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: >>>>> >>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: >>>>> > Hi Ard, >>>>> > >>>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote: >>>>> >> >>>>> >> Now that we support read-only memslots, we need to make sure that >>>>> >> pass-through device mappings are not mapped writable if the guest >>>>> >> has requested them to be read-only. The existing implementation >>>>> >> already honours this by calling kvm_set_s2pte_writable() on the new >>>>> >> pte in case of writable mappings, so all we need to do is define >>>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >>>>> >> >>>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>> > >>>>> > >>>>> > I feel very uncomfortable with this change. Why would we map a device >>>>> > RO? Is >>>>> > that only for completeness sake? >>>>> > >>>>> >>>>> We would map a device RO so that QEMU (or whatever is managing KVM) >>>>> can emulate the writes. I don't have a clear cut use case, to be >>>>> honest, but setting up a writable mapping for a memslot that was >>>>> explicitly set up as read-only seems wrong in any case. >>>> >>>> >>>> Agreed, if it doesn't ever make sense to do so, then we should return an >>>> error to user space if userspace attempts such a configuration. The >>>> current code is just weird. >>>> >>>>> >>>>> Note that the particular problem I was seeing was primarily caused by >>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled >>>>> this particular issue as well. >>>>> >>>>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped >>>>> > through >>>>> > a memslot, such as the GIC. >>>>> > >>>>> >>>>> Yes, and I realize now that this breaks it. >>>>> My apologies: I have an additional patch locally that sets up MMIO >>>>> ranges in one go instead of faulting them in one page at a time as we >>>>> do now, and there the read-write case is handled correctly in >>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send >>>>> these out separately first, but apparently not. >>>> >>>> >>>> I think it is better to change this separately, and then add the ioremap >>>> stuff. However, you need to change all places that call PAGE_S2_DEVICE >>>> and expect a RDWR memory region, this happens to be only >>>> kvm_phys_addr_ioremap() for now. >>>> >>>>> >>>>> So if we can agree on whether or not MMIO backed mappings should be >>>>> read-write even if the memslot says no, I will follow up with a proper >>>>> series if there are still changes required. >>>>> >>>> >>>> I guess it could be theoretically useful to have read-only device memory >>>> regions, and I can't think of why it would violate the architecture. >>>> >>> >>> We have to handle it either way. But after reading D4.5.3 (Table >>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39 >>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the >>> first place, and we could just revert that patch and everything would >>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps >>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't >>> matter what memtype the S2 translation specifies) >> >> >> We've been there before: >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html >> > > Ah right. So why did those patches not make it in? > Never mind. I read the whole thread this time. So, in summary, there is a concern that a malicious guest may request a cachable mapping for a device range, in an attempt to manipulate the VGIC or other device memory of another VM. I think that concern only applies to writable mappings, so perhaps we should just change if (kvm_is_mmio_pfn(pfn)) to if (kvm_is_mmio_pfn(pfn) && writable) and be done with it (which is coincidentally the very first naive fix I suggested for the issue i was seeing) That way, we never map read-only MMIO regions writable, and rely on the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those regions are uncached. (Wouldn't hurt to add a comment to explain it, I suppose)
On 14 September 2014 15:57, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > So, in summary, there is a concern that a malicious guest may request > a cachable mapping for a device range, in an attempt to manipulate the > VGIC or other device memory of another VM. > I think that concern only applies to writable mappings I think it also applies to read-only mappings, because it would still be permitting the guest to set up a situation with mismatched memory attributes with potentially unpleasant effects for the other guest (which no longer gets the guarantees it should get from the fact it has mapped the VGIC as Device memory). thanks -- PMM
I've been working around the edges of this discussion, and maybe be little unclear on this. But the manuals say intersection of Stage1/Stage2 permissions are used. If guest sets stage1 to read-only then setting stage 2 to read-only or read-write should have no impact. So why should stage 2 RW be changed? - Mario On 09/14/2014 03:57 PM, Ard Biesheuvel wrote: > On 14 September 2014 11:43, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> On 14 September 2014 11:09, Marc Zyngier <maz@misterjones.org> wrote: >>> On 2014-09-14 05:49, Ard Biesheuvel wrote: >>>> >>>> On 13 September 2014 19:06, Christoffer Dall >>>> <christoffer.dall@linaro.org> wrote: >>>>> >>>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote: >>>>>> >>>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier@arm.com> wrote: >>>>>>> Hi Ard, >>>>>>> >>>>>>> On 2014-09-13 11:17, Ard Biesheuvel wrote: >>>>>>>> >>>>>>>> Now that we support read-only memslots, we need to make sure that >>>>>>>> pass-through device mappings are not mapped writable if the guest >>>>>>>> has requested them to be read-only. The existing implementation >>>>>>>> already honours this by calling kvm_set_s2pte_writable() on the new >>>>>>>> pte in case of writable mappings, so all we need to do is define >>>>>>>> the default pgprot_t value used for devices to be PTE_S2_RDONLY. >>>>>>>> >>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >>>>>>> >>>>>>> >>>>>>> I feel very uncomfortable with this change. Why would we map a device >>>>>>> RO? Is >>>>>>> that only for completeness sake? >>>>>>> >>>>>> >>>>>> We would map a device RO so that QEMU (or whatever is managing KVM) >>>>>> can emulate the writes. I don't have a clear cut use case, to be >>>>>> honest, but setting up a writable mapping for a memslot that was >>>>>> explicitly set up as read-only seems wrong in any case. >>>>> >>>>> >>>>> Agreed, if it doesn't ever make sense to do so, then we should return an >>>>> error to user space if userspace attempts such a configuration. The >>>>> current code is just weird. >>>>> >>>>>> >>>>>> Note that the particular problem I was seeing was primarily caused by >>>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled >>>>>> this particular issue as well. >>>>>> >>>>>>> Note that we also use PAGE_S2_DEVICE for things that are not mapped >>>>>>> through >>>>>>> a memslot, such as the GIC. >>>>>>> >>>>>> >>>>>> Yes, and I realize now that this breaks it. >>>>>> My apologies: I have an additional patch locally that sets up MMIO >>>>>> ranges in one go instead of faulting them in one page at a time as we >>>>>> do now, and there the read-write case is handled correctly in >>>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send >>>>>> these out separately first, but apparently not. >>>>> >>>>> >>>>> I think it is better to change this separately, and then add the ioremap >>>>> stuff. However, you need to change all places that call PAGE_S2_DEVICE >>>>> and expect a RDWR memory region, this happens to be only >>>>> kvm_phys_addr_ioremap() for now. >>>>> >>>>>> >>>>>> So if we can agree on whether or not MMIO backed mappings should be >>>>>> read-write even if the memslot says no, I will follow up with a proper >>>>>> series if there are still changes required. >>>>>> >>>>> >>>>> I guess it could be theoretically useful to have read-only device memory >>>>> regions, and I can't think of why it would violate the architecture. >>>>> >>>> >>>> We have to handle it either way. But after reading D4.5.3 (Table >>>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39 >>>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the >>>> first place, and we could just revert that patch and everything would >>>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps >>>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't >>>> matter what memtype the S2 translation specifies) >>> >>> >>> We've been there before: >>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html >>> >> >> Ah right. So why did those patches not make it in? >> > > Never mind. I read the whole thread this time. > > So, in summary, there is a concern that a malicious guest may request > a cachable mapping for a device range, in an attempt to manipulate the > VGIC or other device memory of another VM. > I think that concern only applies to writable mappings, so perhaps we > should just change > > if (kvm_is_mmio_pfn(pfn)) > > to > > if (kvm_is_mmio_pfn(pfn) && writable) > > and be done with it (which is coincidentally the very first naive fix > I suggested for the issue i was seeing) > That way, we never map read-only MMIO regions writable, and rely on > the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those > regions are uncached. > (Wouldn't hurt to add a comment to explain it, I suppose) >
On 15 September 2014 12:41, Mario Smarduch <m.smarduch@samsung.com> wrote: > I've been working around the edges of this discussion, and > maybe be little unclear on this. > > But the manuals say intersection of Stage1/Stage2 permissions are > used. If guest sets stage1 to read-only then setting stage 2 > to read-only or read-write should have no impact. So why > should stage 2 RW be changed? Read versus read-write isn't relevant. The point is that you mustn't let the guest set up a Normal memory mapping of the same bit of hardware (phys. address space) that some other guest or the host has set up as Device mapping. -- PMM
On 09/14/2014 03:57 PM, Ard Biesheuvel wrote: > > Never mind. I read the whole thread this time. > > So, in summary, there is a concern that a malicious guest may request > a cachable mapping for a device range, in an attempt to manipulate the > VGIC or other device memory of another VM. > I think that concern only applies to writable mappings, so perhaps we > should just change > > if (kvm_is_mmio_pfn(pfn)) > > to > > if (kvm_is_mmio_pfn(pfn) && writable) Hi Ard, What if the device passed through is read-only like maybe IPMI sensors. > > and be done with it (which is coincidentally the very first naive fix > I suggested for the issue i was seeing) > That way, we never map read-only MMIO regions writable, and rely on > the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those > regions are uncached. > (Wouldn't hurt to add a comment to explain it, I suppose) >
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 01baef07cd0c..92b2fbe18868 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -100,7 +100,7 @@ extern pgprot_t pgprot_s2_device; #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP) #define PAGE_HYP_DEVICE _MOD_PROT(pgprot_hyp_device, L_PTE_HYP) #define PAGE_S2 _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY) -#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDWR) +#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY) #define __PAGE_NONE __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE) #define __PAGE_SHARED __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
Now that we support read-only memslots, we need to make sure that pass-through device mappings are not mapped writable if the guest has requested them to be read-only. The existing implementation already honours this by calling kvm_set_s2pte_writable() on the new pte in case of writable mappings, so all we need to do is define the default pgprot_t value used for devices to be PTE_S2_RDONLY. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- arch/arm/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)