[v3,0/6] add non-strict mode support for arm-smmu-v3

Message ID	1531376312-2192-1-git-send-email-thunder.leizhen@huawei.com
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; From: Zhen Lei <thunder.leizhen@huawei.com> To: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>, Robin Murphy <robin.murphy@arm.com>, Will Deacon <will.deacon@arm.com>, Joerg Roedel <joro@8bytes.org>, linux-arm-kernel <linux-arm-kernel@lists.infradead.org>, iommu <iommu@lists.linux-foundation.org>, linux-kernel <linux-kernel@vger.kernel.org> CC: Zhen Lei <thunder.leizhen@huawei.com> Subject: [PATCH v3 0/6] add non-strict mode support for arm-smmu-v3 Date: Thu, 12 Jul 2018 14:18:26 +0800 Message-ID: <1531376312-2192-1-git-send-email-thunder.leizhen@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk
Series	add non-strict mode support for arm-smmu-v3 \| expand [v3,0/6] add non-strict mode support for arm-smmu-v3 [v3,1/6] iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook [v3,2/6] iommu/dma: add support for non-strict mode [v3,3/6] iommu/amd: use default branch to deal with all non-supported capabilities [v3,4/6] iommu/io-pgtable-arm: add support for non-strict mode [v3,5/6] iommu/arm-smmu-v3: add support for non-strict mode [v3,6/6] iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

Zhen Lei July 12, 2018, 6:18 a.m. UTC

v2 -> v3:
Add a bootup option "iommu_strict_mode" to make the manager can choose which
mode to be used. The first 5 patches have not changed.
+	iommu_strict_mode=	[arm-smmu-v3]
+		0 - strict mode (default)
+		1 - non-strict mode

v1 -> v2:
Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:
0, IOMMU_STRICT;
1, IOMMU_NON_STRICT;
Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with
other IOMMUs which still use strict mode. In other words, this patch series
will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT
in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3
driver to io-pgtable module. 

Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be
initialized when the related domain and IOMMU driver support non-strict mode.

v1:
In common, a IOMMU unmap operation follow the below steps:
1. remove the mapping in page table of the specified iova range
2. execute tlbi command to invalid the mapping which is cached in TLB
3. wait for the above tlbi operation to be finished
4. free the IOVA resource
5. free the physical memory resource

This maybe a problem when unmap is very frequently, the combination of tlbi
and wait operation will consume a lot of time. A feasible method is put off
tlbi and iova-free operation, when accumulating to a certain number or
reaching a specified time, execute only one tlbi_all command to clean up
TLB, then free the backup IOVAs. Mark as non-strict mode.

But it must be noted that, although the mapping has already been removed in
the page table, it maybe still exist in TLB. And the freed physical memory
may also be reused for others. So a attacker can persistent access to memory
based on the just freed IOVA, to obtain sensible data or corrupt memory. So
the VFIO should always choose the strict mode.

Some may consider put off physical memory free also, that will still follow
strict mode. But for the map_sg cases, the memory allocation is not controlled
by IOMMU APIs, so it is not enforceable.

Fortunately, Intel and AMD have already applied the non-strict mode, and put
queue_iova() operation into the common file dma-iommu.c., and my work is based
on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to
unmap, but Intel and AMD IOMMU drivers are not.

Below is the performance data of strict vs non-strict for NVMe device:
Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)
Randomly Write IOPS: 143K(strict) vs 513K(non-strict)


Zhen Lei (6):
  iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook
  iommu/dma: add support for non-strict mode
  iommu/amd: use default branch to deal with all non-supported
    capabilities
  iommu/io-pgtable-arm: add support for non-strict mode
  iommu/arm-smmu-v3: add support for non-strict mode
  iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

 Documentation/admin-guide/kernel-parameters.txt | 12 +++++++
 drivers/iommu/amd_iommu.c                       |  4 +--
 drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--
 drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++
 drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------
 include/linux/iommu.h                           |  7 +++++
 6 files changed, 98 insertions(+), 15 deletions(-)

-- 
1.8.3

Robin Murphy July 24, 2018, 9:51 p.m. UTC | #1

On 2018-07-12 7:18 AM, Zhen Lei wrote:
> v2 -> v3:

> Add a bootup option "iommu_strict_mode" to make the manager can choose which

> mode to be used. The first 5 patches have not changed.

> +	iommu_strict_mode=	[arm-smmu-v3]

> +		0 - strict mode (default)

> +		1 - non-strict mode

> 

> v1 -> v2:

> Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:

> 0, IOMMU_STRICT;

> 1, IOMMU_NON_STRICT;

> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with

> other IOMMUs which still use strict mode. In other words, this patch series

> will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT

> in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3

> driver to io-pgtable module.


What exactly is the issue there? We don't have any problem with other 
quirks like NO_DMA, and as I said before, by the time we're allocating 
the io-pgtable in arm_smmu_domain_finalise() we already know everything 
there is to know about the domain.

> Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be

> initialized when the related domain and IOMMU driver support non-strict mode.

> 

> v1:

> In common, a IOMMU unmap operation follow the below steps:

> 1. remove the mapping in page table of the specified iova range

> 2. execute tlbi command to invalid the mapping which is cached in TLB

> 3. wait for the above tlbi operation to be finished

> 4. free the IOVA resource

> 5. free the physical memory resource

> 

> This maybe a problem when unmap is very frequently, the combination of tlbi

> and wait operation will consume a lot of time. A feasible method is put off

> tlbi and iova-free operation, when accumulating to a certain number or

> reaching a specified time, execute only one tlbi_all command to clean up

> TLB, then free the backup IOVAs. Mark as non-strict mode.

> 

> But it must be noted that, although the mapping has already been removed in

> the page table, it maybe still exist in TLB. And the freed physical memory

> may also be reused for others. So a attacker can persistent access to memory

> based on the just freed IOVA, to obtain sensible data or corrupt memory. So

> the VFIO should always choose the strict mode.

> 

> Some may consider put off physical memory free also, that will still follow

> strict mode. But for the map_sg cases, the memory allocation is not controlled

> by IOMMU APIs, so it is not enforceable.

> 

> Fortunately, Intel and AMD have already applied the non-strict mode, and put

> queue_iova() operation into the common file dma-iommu.c., and my work is based

> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to

> unmap, but Intel and AMD IOMMU drivers are not.

> 

> Below is the performance data of strict vs non-strict for NVMe device:

> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)

> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)


How does that compare to passthrough performance? One thing I'm not 
entirely clear about is what the realistic use-case for this is - even 
if invalidation were infinitely fast, enabling translation still 
typically has a fair impact on overall system performance in terms of 
latency, power, memory bandwidth, etc., so I can't help wonder what 
devices exist today for which performance is critical and robustness* is 
unimportant, yet have crippled addressing capabilities such that they 
can't just use passthrough.

Robin.


* I don't want to say "security" here, since I'm actually a lot less 
concerned about the theoretical malicious endpoint/wild write scenarios 
than the the much more straightforward malfunctioning device and/or 
buggy driver causing use-after-free style memory corruption. Also, I'm 
sick of the word "security"...

> 

> Zhen Lei (6):

>    iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook

>    iommu/dma: add support for non-strict mode

>    iommu/amd: use default branch to deal with all non-supported

>      capabilities

>    iommu/io-pgtable-arm: add support for non-strict mode

>    iommu/arm-smmu-v3: add support for non-strict mode

>    iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

> 

>   Documentation/admin-guide/kernel-parameters.txt | 12 +++++++

>   drivers/iommu/amd_iommu.c                       |  4 +--

>   drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--

>   drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++

>   drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------

>   include/linux/iommu.h                           |  7 +++++

>   6 files changed, 98 insertions(+), 15 deletions(-)

>

Zhen Lei July 26, 2018, 3:44 a.m. UTC | #2

On 2018/7/25 5:51, Robin Murphy wrote:
> On 2018-07-12 7:18 AM, Zhen Lei wrote:

>> v2 -> v3:

>> Add a bootup option "iommu_strict_mode" to make the manager can choose which

>> mode to be used. The first 5 patches have not changed.

>> +    iommu_strict_mode=    [arm-smmu-v3]

>> +        0 - strict mode (default)

>> +        1 - non-strict mode

>>

>> v1 -> v2:

>> Use the lowest bit of the io_pgtable_ops.unmap's iova parameter to pass the strict mode:

>> 0, IOMMU_STRICT;

>> 1, IOMMU_NON_STRICT;

>> Treat 0 as IOMMU_STRICT, so that the unmap operation can compatible with

>> other IOMMUs which still use strict mode. In other words, this patch series

>> will not impact other IOMMU drivers. I tried add a new quirk IO_PGTABLE_QUIRK_NON_STRICT

>> in io_pgtable_cfg.quirks, but it can not pass the strict mode of the domain from SMMUv3

>> driver to io-pgtable module.

> 

> What exactly is the issue there? We don't have any problem with other quirks like NO_DMA, and as I said before, by the time we're allocating the io-pgtable in arm_smmu_domain_finalise() we already know everything there is to know about the domain.


Because userspace can map/unamp and start devices to access memory through VFIO.
So that, the attacker can:
1. alloc memory
2. map
3. unmap
4. free memory
5. repeatedly accesssing the just freed memory base on the just unmapped iova,
   this attack may success if the freed memory is reused by others and the mapping still staying in TLB

But if only root user can use VFIO, this is an unnecessary worry. Then both normal and VFIO will use the
same strict mode, so that the new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.

> 

>> Add a new member domain_non_strict in struct iommu_dma_cookie, this member will only be

>> initialized when the related domain and IOMMU driver support non-strict mode.

>>

>> v1:

>> In common, a IOMMU unmap operation follow the below steps:

>> 1. remove the mapping in page table of the specified iova range

>> 2. execute tlbi command to invalid the mapping which is cached in TLB

>> 3. wait for the above tlbi operation to be finished

>> 4. free the IOVA resource

>> 5. free the physical memory resource

>>

>> This maybe a problem when unmap is very frequently, the combination of tlbi

>> and wait operation will consume a lot of time. A feasible method is put off

>> tlbi and iova-free operation, when accumulating to a certain number or

>> reaching a specified time, execute only one tlbi_all command to clean up

>> TLB, then free the backup IOVAs. Mark as non-strict mode.

>>

>> But it must be noted that, although the mapping has already been removed in

>> the page table, it maybe still exist in TLB. And the freed physical memory

>> may also be reused for others. So a attacker can persistent access to memory

>> based on the just freed IOVA, to obtain sensible data or corrupt memory. So

>> the VFIO should always choose the strict mode.

>>

>> Some may consider put off physical memory free also, that will still follow

>> strict mode. But for the map_sg cases, the memory allocation is not controlled

>> by IOMMU APIs, so it is not enforceable.

>>

>> Fortunately, Intel and AMD have already applied the non-strict mode, and put

>> queue_iova() operation into the common file dma-iommu.c., and my work is based

>> on it. The difference is that arm-smmu-v3 driver will call IOMMU common APIs to

>> unmap, but Intel and AMD IOMMU drivers are not.

>>

>> Below is the performance data of strict vs non-strict for NVMe device:

>> Randomly Read  IOPS: 146K(strict) vs 573K(non-strict)

>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

> 

> How does that compare to passthrough performance? One thing I'm not entirely clear about is what the realistic use-case for this is - even if invalidation were infinitely fast, enabling translation still typically has a fair impact on overall system performance in terms of latency, power, memory bandwidth, etc., so I can't help wonder what devices exist today for which performance is critical and robustness* is unimportant, yet have crippled addressing capabilities such that they can't just use passthrough.

I have no passthrough performance data yet, I will ask my team to do it. But we have tested the Global bypass:
Randomly Read IOPS: 744K, and Randomly Write IOPS: is the same to non-strict.

I'm also not clear. But I think in most cases, the system does not need to run at full capacity, but the system
should have that ability. For example, a system's daily load may only 30-50%, but the load may increase to 80%+
on festival day.

Passthrough is not enough to support VFIO, and virtualization need the later.

> 

> Robin.

> 

> 

> * I don't want to say "security" here, since I'm actually a lot less concerned about the theoretical malicious endpoint/wild write scenarios than the the much more straightforward malfunctioning device and/or buggy driver causing use-after-free style memory corruption. Also, I'm sick of the word "security"...


OK，We really have no need to consider buggy devices.

> 

>>

>> Zhen Lei (6):

>>    iommu/arm-smmu-v3: fix the implementation of flush_iotlb_all hook

>>    iommu/dma: add support for non-strict mode

>>    iommu/amd: use default branch to deal with all non-supported

>>      capabilities

>>    iommu/io-pgtable-arm: add support for non-strict mode

>>    iommu/arm-smmu-v3: add support for non-strict mode

>>    iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

>>

>>   Documentation/admin-guide/kernel-parameters.txt | 12 +++++++

>>   drivers/iommu/amd_iommu.c                       |  4 +--

>>   drivers/iommu/arm-smmu-v3.c                     | 42 +++++++++++++++++++++++--

>>   drivers/iommu/dma-iommu.c                       | 25 +++++++++++++++

>>   drivers/iommu/io-pgtable-arm.c                  | 23 ++++++++------

>>   include/linux/iommu.h                           |  7 +++++

>>   6 files changed, 98 insertions(+), 15 deletions(-)

>>

> 

> .

> 


-- 
Thanks!
BestRegards

Robin Murphy July 26, 2018, 2:16 p.m. UTC | #3

On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:
> 

> 

> On 2018/7/25 5:51, Robin Murphy wrote:

>> On 2018-07-12 7:18 AM, Zhen Lei wrote:

>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the

>>> manager can choose which mode to be used. The first 5 patches

>>> have not changed. +    iommu_strict_mode=    [arm-smmu-v3] +

>>> 0 - strict mode (default) +        1 - non-strict mode

>>> 

>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova

>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1,

>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap

>>> operation can compatible with other IOMMUs which still use strict

>>> mode. In other words, this patch series will not impact other

>>> IOMMU drivers. I tried add a new quirk

>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can

>>> not pass the strict mode of the domain from SMMUv3 driver to

>>> io-pgtable module.

>> 

>> What exactly is the issue there? We don't have any problem with

>> other quirks like NO_DMA, and as I said before, by the time we're

>> allocating the io-pgtable in arm_smmu_domain_finalise() we already

>> know everything there is to know about the domain.

> 

> Because userspace can map/unamp and start devices to access memory

> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3.

> unmap 4. free memory 5. repeatedly accesssing the just freed memory

> base on the just unmapped iova, this attack may success if the freed

> memory is reused by others and the mapping still staying in TLB


Right, but that's why we don't set non-strict mode on unmanaged domains; 
what I was asking about was specifically why "it can not pass the strict 
mode of the domain from SMMUv3 driver to io-pgtable module", because we 
don't get anywhere near io-pgtable until we already know whether the 
domain in question can allow lazy unmaps or not.

> But if only root user can use VFIO, this is an unnecessary worry.

> Then both normal and VFIO will use the same strict mode, so that the

> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.

> 

>> 

>>> Add a new member domain_non_strict in struct iommu_dma_cookie,

>>> this member will only be initialized when the related domain and

>>> IOMMU driver support non-strict mode.

>>> 

>>> v1: In common, a IOMMU unmap operation follow the below steps: 1.

>>> remove the mapping in page table of the specified iova range 2.

>>> execute tlbi command to invalid the mapping which is cached in

>>> TLB 3. wait for the above tlbi operation to be finished 4. free

>>> the IOVA resource 5. free the physical memory resource

>>> 

>>> This maybe a problem when unmap is very frequently, the

>>> combination of tlbi and wait operation will consume a lot of

>>> time. A feasible method is put off tlbi and iova-free operation,

>>> when accumulating to a certain number or reaching a specified

>>> time, execute only one tlbi_all command to clean up TLB, then

>>> free the backup IOVAs. Mark as non-strict mode.

>>> 

>>> But it must be noted that, although the mapping has already been

>>> removed in the page table, it maybe still exist in TLB. And the

>>> freed physical memory may also be reused for others. So a

>>> attacker can persistent access to memory based on the just freed

>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO

>>> should always choose the strict mode.

>>> 

>>> Some may consider put off physical memory free also, that will

>>> still follow strict mode. But for the map_sg cases, the memory

>>> allocation is not controlled by IOMMU APIs, so it is not

>>> enforceable.

>>> 

>>> Fortunately, Intel and AMD have already applied the non-strict

>>> mode, and put queue_iova() operation into the common file

>>> dma-iommu.c., and my work is based on it. The difference is that

>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but

>>> Intel and AMD IOMMU drivers are not.

>>> 

>>> Below is the performance data of strict vs non-strict for NVMe

>>> device: Randomly Read  IOPS: 146K(strict) vs 573K(non-strict) 

>>> Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

>> 

>> How does that compare to passthrough performance? One thing I'm not

>> entirely clear about is what the realistic use-case for this is -

>> even if invalidation were infinitely fast, enabling translation

>> still typically has a fair impact on overall system performance in

>> terms of latency, power, memory bandwidth, etc., so I can't help

>> wonder what devices exist today for which performance is critical

>> and robustness* is unimportant, yet have crippled addressing

>> capabilities such that they can't just use passthrough.

> I have no passthrough performance data yet, I will ask my team to do

> it. But we have tested the Global bypass: Randomly Read IOPS: 744K,

> and Randomly Write IOPS: is the same to non-strict.

> 

> I'm also not clear. But I think in most cases, the system does not

> need to run at full capacity, but the system should have that

> ability. For example, a system's daily load may only 30-50%, but the

> load may increase to 80%+ on festival day.

> 

> Passthrough is not enough to support VFIO, and virtualization need

> the later.


Huh? The whole point of passthrough mode is that the IOMMU can still be 
used for VFIO, but without imposing the overhead of dynamic mapping on 
host DMA.

Robin.

>> * I don't want to say "security" here, since I'm actually a lot

>> less concerned about the theoretical malicious endpoint/wild write

>> scenarios than the the much more straightforward malfunctioning

>> device and/or buggy driver causing use-after-free style memory

>> corruption. Also, I'm sick of the word "security"...

> 

> OK，We really have no need to consider buggy devices.

> 

>> 

>>> 

>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of

>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode 

>>> iommu/amd: use default branch to deal with all non-supported 

>>> capabilities iommu/io-pgtable-arm: add support for non-strict

>>> mode iommu/arm-smmu-v3: add support for non-strict mode 

>>> iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

>>> 

>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ 

>>> drivers/iommu/amd_iommu.c                       |  4 +-- 

>>> drivers/iommu/arm-smmu-v3.c                     | 42

>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c

>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c

>>> | 23 ++++++++------ include/linux/iommu.h

>>> |  7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-)

>>> 

>> 

>> .

>> 

>

Zhen Lei July 27, 2018, 2:49 a.m. UTC | #4

On 2018/7/26 22:16, Robin Murphy wrote:
> On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:

>>

>>

>> On 2018/7/25 5:51, Robin Murphy wrote:

>>> On 2018-07-12 7:18 AM, Zhen Lei wrote:

>>>> v2 -> v3: Add a bootup option "iommu_strict_mode" to make the

>>>> manager can choose which mode to be used. The first 5 patches

>>>> have not changed. +    iommu_strict_mode=    [arm-smmu-v3] +

>>>> 0 - strict mode (default) +        1 - non-strict mode

>>>>

>>>> v1 -> v2: Use the lowest bit of the io_pgtable_ops.unmap's iova

>>>> parameter to pass the strict mode: 0, IOMMU_STRICT; 1,

>>>> IOMMU_NON_STRICT; Treat 0 as IOMMU_STRICT, so that the unmap

>>>> operation can compatible with other IOMMUs which still use strict

>>>> mode. In other words, this patch series will not impact other

>>>> IOMMU drivers. I tried add a new quirk

>>>> IO_PGTABLE_QUIRK_NON_STRICT in io_pgtable_cfg.quirks, but it can

>>>> not pass the strict mode of the domain from SMMUv3 driver to

>>>> io-pgtable module.

>>>

>>> What exactly is the issue there? We don't have any problem with

>>> other quirks like NO_DMA, and as I said before, by the time we're

>>> allocating the io-pgtable in arm_smmu_domain_finalise() we already

>>> know everything there is to know about the domain.

>>

>> Because userspace can map/unamp and start devices to access memory

>> through VFIO. So that, the attacker can: 1. alloc memory 2. map 3.

>> unmap 4. free memory 5. repeatedly accesssing the just freed memory

>> base on the just unmapped iova, this attack may success if the freed

>> memory is reused by others and the mapping still staying in TLB

> 

> Right, but that's why we don't set non-strict mode on unmanaged domains; what I was asking about was specifically why "it can not pass the strict mode of the domain from SMMUv3 driver to io-pgtable module", because we don't get anywhere near io-pgtable until we already know whether the domain in question can allow lazy unmaps or not.


Sorry, it's my fault. I've all through mistaken that "data->iop.cfg" is shared by all domains until you mentioned me again.

> 

>> But if only root user can use VFIO, this is an unnecessary worry.

>> Then both normal and VFIO will use the same strict mode, so that the

>> new quirk IO_PGTABLE_QUIRK_NON_STRICT can easily be applied.

>>

>>>

>>>> Add a new member domain_non_strict in struct iommu_dma_cookie,

>>>> this member will only be initialized when the related domain and

>>>> IOMMU driver support non-strict mode.

>>>>

>>>> v1: In common, a IOMMU unmap operation follow the below steps: 1.

>>>> remove the mapping in page table of the specified iova range 2.

>>>> execute tlbi command to invalid the mapping which is cached in

>>>> TLB 3. wait for the above tlbi operation to be finished 4. free

>>>> the IOVA resource 5. free the physical memory resource

>>>>

>>>> This maybe a problem when unmap is very frequently, the

>>>> combination of tlbi and wait operation will consume a lot of

>>>> time. A feasible method is put off tlbi and iova-free operation,

>>>> when accumulating to a certain number or reaching a specified

>>>> time, execute only one tlbi_all command to clean up TLB, then

>>>> free the backup IOVAs. Mark as non-strict mode.

>>>>

>>>> But it must be noted that, although the mapping has already been

>>>> removed in the page table, it maybe still exist in TLB. And the

>>>> freed physical memory may also be reused for others. So a

>>>> attacker can persistent access to memory based on the just freed

>>>> IOVA, to obtain sensible data or corrupt memory. So the VFIO

>>>> should always choose the strict mode.

>>>>

>>>> Some may consider put off physical memory free also, that will

>>>> still follow strict mode. But for the map_sg cases, the memory

>>>> allocation is not controlled by IOMMU APIs, so it is not

>>>> enforceable.

>>>>

>>>> Fortunately, Intel and AMD have already applied the non-strict

>>>> mode, and put queue_iova() operation into the common file

>>>> dma-iommu.c., and my work is based on it. The difference is that

>>>> arm-smmu-v3 driver will call IOMMU common APIs to unmap, but

>>>> Intel and AMD IOMMU drivers are not.

>>>>

>>>> Below is the performance data of strict vs non-strict for NVMe

>>>> device: Randomly Read  IOPS: 146K(strict) vs 573K(non-strict) Randomly Write IOPS: 143K(strict) vs 513K(non-strict)

>>>

>>> How does that compare to passthrough performance? One thing I'm not

>>> entirely clear about is what the realistic use-case for this is -

>>> even if invalidation were infinitely fast, enabling translation

>>> still typically has a fair impact on overall system performance in

>>> terms of latency, power, memory bandwidth, etc., so I can't help

>>> wonder what devices exist today for which performance is critical

>>> and robustness* is unimportant, yet have crippled addressing

>>> capabilities such that they can't just use passthrough.

>> I have no passthrough performance data yet, I will ask my team to do

>> it. But we have tested the Global bypass: Randomly Read IOPS: 744K,

>> and Randomly Write IOPS: is the same to non-strict.

>>

>> I'm also not clear. But I think in most cases, the system does not

>> need to run at full capacity, but the system should have that

>> ability. For example, a system's daily load may only 30-50%, but the

>> load may increase to 80%+ on festival day.

>>

>> Passthrough is not enough to support VFIO, and virtualization need

>> the later.

> 

> Huh? The whole point of passthrough mode is that the IOMMU can still be used for VFIO, but without imposing the overhead of dynamic mapping on host DMA.


I said that from my experience. Userspace do not known the PA, so I think the user can not fill dma_map.iova correctly.

	/* Allocate some space and setup a DMA mapping */
	dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE,
			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
	dma_map.size = 0x1000;
	dma_map.iova = 0x2f00000000UL;						/* user specified */
	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;

	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);

Further more, dma_map is also suitable for iommu_map_sg usage scenario.

> 

> Robin.

> 

>>> * I don't want to say "security" here, since I'm actually a lot

>>> less concerned about the theoretical malicious endpoint/wild write

>>> scenarios than the the much more straightforward malfunctioning

>>> device and/or buggy driver causing use-after-free style memory

>>> corruption. Also, I'm sick of the word "security"...

>>

>> OK，We really have no need to consider buggy devices.

>>

>>>

>>>>

>>>> Zhen Lei (6): iommu/arm-smmu-v3: fix the implementation of

>>>> flush_iotlb_all hook iommu/dma: add support for non-strict mode iommu/amd: use default branch to deal with all non-supported capabilities iommu/io-pgtable-arm: add support for non-strict

>>>> mode iommu/arm-smmu-v3: add support for non-strict mode iommu/arm-smmu-v3: add bootup option "iommu_strict_mode"

>>>>

>>>> Documentation/admin-guide/kernel-parameters.txt | 12 +++++++ drivers/iommu/amd_iommu.c                       |  4 +-- drivers/iommu/arm-smmu-v3.c                     | 42

>>>> +++++++++++++++++++++++-- drivers/iommu/dma-iommu.c

>>>> | 25 +++++++++++++++ drivers/iommu/io-pgtable-arm.c

>>>> | 23 ++++++++------ include/linux/iommu.h

>>>> |  7 +++++ 6 files changed, 98 insertions(+), 15 deletions(-)

>>>>

>>>

>>> .

>>>

>>

> 

> .

> 


-- 
Thanks!
BestRegards

Will Deacon July 27, 2018, 9:37 a.m. UTC | #5

On Fri, Jul 27, 2018 at 10:49:59AM +0800, Leizhen (ThunderTown) wrote:
> On 2018/7/26 22:16, Robin Murphy wrote:

> > On 2018-07-26 4:44 AM, Leizhen (ThunderTown) wrote:

> >> Passthrough is not enough to support VFIO, and virtualization need

> >> the later.

> > 

> > Huh? The whole point of passthrough mode is that the IOMMU can still be

> > used for VFIO, but without imposing the overhead of dynamic mapping on

> > host DMA.

> 

> I said that from my experience. Userspace do not known the PA, so I think

> the user can not fill dma_map.iova correctly.

> 

> 	/* Allocate some space and setup a DMA mapping */

> 	dma_map.vaddr = (__u64)mmap(0, 0x1000, PROT_READ | PROT_WRITE,

> 			     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);

> 	dma_map.size = 0x1000;

> 	dma_map.iova = 0x2f00000000UL;						/* user specified */

> 	dma_map.flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE;

> 

> 	ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);


Hmm, I'm pretty sure that's not the case. When passthrough is configured
via iommu.passthrough, it only applies to the default domain and therefore
won't affect the unmanaged domain that VFIO manages explicitly. So VFIO
will continue to use translation and userspace can't pass whatever it likes
for the iova.

Will

[v3,0/6] add non-strict mode support for arm-smmu-v3

Message

Comments