[v7,03/10] iommu: introduce a reserved iova cookie

Message ID 1461084994-2355-4-git-send-email-eric.auger@linaro.org
State New
Headers show

Commit Message

Auger Eric April 19, 2016, 4:56 p.m.
This patch introduces some new fields in the iommu_domain struct,
dedicated to reserved iova management.

In a similar way as DMA mapping IOVA window, we need to store
information related to a reserved IOVA window.

The reserved_iova_cookie will store the reserved iova_domain
handle. An RB tree indexed by physical address is introduced to
store the host physical addresses bound to reserved IOVAs.

Those physical addresses will correspond to MSI frame base
addresses, also referred to as doorbells. Their number should be
quite limited per domain.

Also a spin_lock is introduced to protect accesses to the iova_domain
and RB tree. The choice of a spin_lock is driven by the fact the RB
tree will need to be accessed in MSI controller code not allowed to
sleep.

Signed-off-by: Eric Auger <eric.auger@linaro.org>


---
v5 -> v6:
- initialize reserved_binding_list
- use a spinlock instead of a mutex
---
 drivers/iommu/iommu.c | 2 ++
 include/linux/iommu.h | 6 ++++++
 2 files changed, 8 insertions(+)

-- 
1.9.1

Comments

Auger Eric April 20, 2016, 4:14 p.m. | #1
Hi Robin,
On 04/20/2016 02:55 PM, Robin Murphy wrote:
> On 19/04/16 17:56, Eric Auger wrote:

>> This patch introduces some new fields in the iommu_domain struct,

>> dedicated to reserved iova management.

>>

>> In a similar way as DMA mapping IOVA window, we need to store

>> information related to a reserved IOVA window.

>>

>> The reserved_iova_cookie will store the reserved iova_domain

>> handle. An RB tree indexed by physical address is introduced to

>> store the host physical addresses bound to reserved IOVAs.

>>

>> Those physical addresses will correspond to MSI frame base

>> addresses, also referred to as doorbells. Their number should be

>> quite limited per domain.

>>

>> Also a spin_lock is introduced to protect accesses to the iova_domain

>> and RB tree. The choice of a spin_lock is driven by the fact the RB

>> tree will need to be accessed in MSI controller code not allowed to

>> sleep.

>>

>> Signed-off-by: Eric Auger <eric.auger@linaro.org>

>>

>> ---

>> v5 -> v6:

>> - initialize reserved_binding_list

>> - use a spinlock instead of a mutex

>> ---

>>   drivers/iommu/iommu.c | 2 ++

>>   include/linux/iommu.h | 6 ++++++

>>   2 files changed, 8 insertions(+)

>>

>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

>> index b9df141..f70ef3b 100644

>> --- a/drivers/iommu/iommu.c

>> +++ b/drivers/iommu/iommu.c

>> @@ -1073,6 +1073,8 @@ static struct iommu_domain

>> *__iommu_domain_alloc(struct bus_type *bus,

>>

>>       domain->ops  = bus->iommu_ops;

>>       domain->type = type;

>> +    spin_lock_init(&domain->reserved_lock);

>> +    domain->reserved_binding_list = RB_ROOT;

>>

>>       return domain;

>>   }

>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h

>> index b3e8c5b..60999db 100644

>> --- a/include/linux/iommu.h

>> +++ b/include/linux/iommu.h

>> @@ -24,6 +24,7 @@

>>   #include <linux/of.h>

>>   #include <linux/types.h>

>>   #include <linux/scatterlist.h>

>> +#include <linux/spinlock.h>

>>   #include <trace/events/iommu.h>

>>

>>   #define IOMMU_READ    (1 << 0)

>> @@ -83,6 +84,11 @@ struct iommu_domain {

>>       void *handler_token;

>>       struct iommu_domain_geometry geometry;

>>       void *iova_cookie;

>> +    void *reserved_iova_cookie;

> 

> Why exactly do we need this? From your description, it's for the user of

> the domain to keep track of IOVA allocations in, but then that's

> precisely what the iova_cookie exists for.


I was not sure whether both APIs could not be used concurrently, hence a
separate cookie. If we only consider MSI mapping use case I guess we are
either with a DMA domain or with a domain for VFIO and I would agree
with you, ie. we can reuse the same cookie.
> 

>> +    /* rb tree indexed by PA, for reserved bindings only */

>> +    struct rb_root reserved_binding_list;

> 

> Nit: that's more puzzling than helpful - "reserved binding" is

> particularly vague and nondescript, and makes me think of anything but

> MSI descriptors.

my heart is torn between advised genericity and MSI use case. My natural
short-sighted inclination would head me for an MSI mapping dedicated API
but I am following advices. As discussed with Alex there are
implementation details pretty related to MSI problematics I think (the
fact we store the "bindings" in an rb-tree/list, locking)

If Marc & Alex I can retarget this API to be less generic.

 Plus it's called a list but isn't a list (that said,
> given that we'd typically only expect a handful of entries, and lookups

> are hardly going to be a performance-critical bottleneck, would a simple

> list not suffice?)

I fully agree on that point. An rb-tree is overkill today for MSI use
case. Again if we were to use this API for anything else, this may
change the decision. But sure we can refactor afterwards upon needs. TBH
the rb-tree is inherited from vfio_iommu_type1 dma tree where that code
was originally located.
> 

>> +    /* protects reserved cookie and rbtree manipulation */

>> +    spinlock_t reserved_lock;

> 

> A cookie is an opaque structure, so any locking it needs would normally

> be hidden within. If on the other hand it's not meant to be opaque at

> this level, then it should probably be something more specific than a

> void * (if at all, as above).

agreed

Thanks

Eric
> 

> Robin.

> 

>>   };

>>

>>   enum iommu_cap {

>>

>
Auger Eric April 22, 2016, 1:02 p.m. | #2
Hi Robin,
On 04/22/2016 02:36 PM, Robin Murphy wrote:
> On 20/04/16 17:14, Eric Auger wrote:

>> Hi Robin,

>> On 04/20/2016 02:55 PM, Robin Murphy wrote:

>>> On 19/04/16 17:56, Eric Auger wrote:

>>>> This patch introduces some new fields in the iommu_domain struct,

>>>> dedicated to reserved iova management.

>>>>

>>>> In a similar way as DMA mapping IOVA window, we need to store

>>>> information related to a reserved IOVA window.

>>>>

>>>> The reserved_iova_cookie will store the reserved iova_domain

>>>> handle. An RB tree indexed by physical address is introduced to

>>>> store the host physical addresses bound to reserved IOVAs.

>>>>

>>>> Those physical addresses will correspond to MSI frame base

>>>> addresses, also referred to as doorbells. Their number should be

>>>> quite limited per domain.

>>>>

>>>> Also a spin_lock is introduced to protect accesses to the iova_domain

>>>> and RB tree. The choice of a spin_lock is driven by the fact the RB

>>>> tree will need to be accessed in MSI controller code not allowed to

>>>> sleep.

>>>>

>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>

>>>>

>>>> ---

>>>> v5 -> v6:

>>>> - initialize reserved_binding_list

>>>> - use a spinlock instead of a mutex

>>>> ---

>>>>    drivers/iommu/iommu.c | 2 ++

>>>>    include/linux/iommu.h | 6 ++++++

>>>>    2 files changed, 8 insertions(+)

>>>>

>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

>>>> index b9df141..f70ef3b 100644

>>>> --- a/drivers/iommu/iommu.c

>>>> +++ b/drivers/iommu/iommu.c

>>>> @@ -1073,6 +1073,8 @@ static struct iommu_domain

>>>> *__iommu_domain_alloc(struct bus_type *bus,

>>>>

>>>>        domain->ops  = bus->iommu_ops;

>>>>        domain->type = type;

>>>> +    spin_lock_init(&domain->reserved_lock);

>>>> +    domain->reserved_binding_list = RB_ROOT;

>>>>

>>>>        return domain;

>>>>    }

>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h

>>>> index b3e8c5b..60999db 100644

>>>> --- a/include/linux/iommu.h

>>>> +++ b/include/linux/iommu.h

>>>> @@ -24,6 +24,7 @@

>>>>    #include <linux/of.h>

>>>>    #include <linux/types.h>

>>>>    #include <linux/scatterlist.h>

>>>> +#include <linux/spinlock.h>

>>>>    #include <trace/events/iommu.h>

>>>>

>>>>    #define IOMMU_READ    (1 << 0)

>>>> @@ -83,6 +84,11 @@ struct iommu_domain {

>>>>        void *handler_token;

>>>>        struct iommu_domain_geometry geometry;

>>>>        void *iova_cookie;

>>>> +    void *reserved_iova_cookie;

>>>

>>> Why exactly do we need this? From your description, it's for the user of

>>> the domain to keep track of IOVA allocations in, but then that's

>>> precisely what the iova_cookie exists for.

>>

>> I was not sure whether both APIs could not be used concurrently, hence a

>> separate cookie. If we only consider MSI mapping use case I guess we are

>> either with a DMA domain or with a domain for VFIO and I would agree

>> with you, ie. we can reuse the same cookie.

> 

> Unless somebody cooks up some paravirtualised monstrosity where the

> guest driver somehow uses the host kernel's DMA mapping ops (thankfully,

> I'm not sure how that would even be possible), then they should always

> be mutually exclusive.


OK thanks
> 

> (That said, I should probably add a sanity check to

> iommu_dma_put_cookie() to ensure it only touches the cookies of

> IOMMU_DOMAIN_DMA domains...)

> 

>>>> +    /* rb tree indexed by PA, for reserved bindings only */

>>>> +    struct rb_root reserved_binding_list;

>>>

>>> Nit: that's more puzzling than helpful - "reserved binding" is

>>> particularly vague and nondescript, and makes me think of anything but

>>> MSI descriptors.

>> my heart is torn between advised genericity and MSI use case. My natural

>> short-sighted inclination would head me for an MSI mapping dedicated API

>> but I am following advices. As discussed with Alex there are

>> implementation details pretty related to MSI problematics I think (the

>> fact we store the "bindings" in an rb-tree/list, locking)

>>

>> If Marc & Alex I can retarget this API to be less generic.

>>

>>   Plus it's called a list but isn't a list (that said,

>>> given that we'd typically only expect a handful of entries, and lookups

>>> are hardly going to be a performance-critical bottleneck, would a simple

>>> list not suffice?)

>> I fully agree on that point. An rb-tree is overkill today for MSI use

>> case. Again if we were to use this API for anything else, this may

>> change the decision. But sure we can refactor afterwards upon needs. TBH

>> the rb-tree is inherited from vfio_iommu_type1 dma tree where that code

>> was originally located.

> 

> Thinking some more, how feasible would it be to handle the IOVA

> management aspect within the existing tree, i.e. extend struct vfio_dma

> so an entry can represent different types of thing - DMA pages, MSI

> pages, arbitrary reservations - and link to more implementation-specific

> data (e.g. a refcounted MSI descriptor stored elsewhere in the domain)

> as necessary?

it is feasible and  was approximately done there at the early ages of
the series:
https://lkml.org/lkml/2016/1/28/803 &
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html

Then with the intent of doing something reusable the trend was to put it
in the iommu instead of vfio_iommu_typeX.

I am currently trying to make the "msi-iommu api" implemented upon the
dma-iommu api based on the simplifications that we discussed:
- reuse iova_cookie for iova domain
- add an opaque msi_cookie that hides the msi doorbell list + its spinlock
- simplify locking by making sure the msi domain cannot disappear before
the iommu domain destruction

If you agree I would suggest to wait and see the outcome of this new
design and make a shared decision based on that code? Should be
available next week.

Best Regards

Eric

> 

> Robin.

> 

>>>

>>>> +    /* protects reserved cookie and rbtree manipulation */

>>>> +    spinlock_t reserved_lock;

>>>

>>> A cookie is an opaque structure, so any locking it needs would normally

>>> be hidden within. If on the other hand it's not meant to be opaque at

>>> this level, then it should probably be something more specific than a

>>> void * (if at all, as above).

>> agreed

>>

>> Thanks

>>

>> Eric

>>>

>>> Robin.

>>>

>>>>    };

>>>>

>>>>    enum iommu_cap {

>>>>

>>>

>>

>
Auger Eric April 22, 2016, 2:53 p.m. | #3
Hi Robin,
On 04/22/2016 03:02 PM, Eric Auger wrote:
> Hi Robin,

> On 04/22/2016 02:36 PM, Robin Murphy wrote:

>> On 20/04/16 17:14, Eric Auger wrote:

>>> Hi Robin,

>>> On 04/20/2016 02:55 PM, Robin Murphy wrote:

>>>> On 19/04/16 17:56, Eric Auger wrote:

>>>>> This patch introduces some new fields in the iommu_domain struct,

>>>>> dedicated to reserved iova management.

>>>>>

>>>>> In a similar way as DMA mapping IOVA window, we need to store

>>>>> information related to a reserved IOVA window.

>>>>>

>>>>> The reserved_iova_cookie will store the reserved iova_domain

>>>>> handle. An RB tree indexed by physical address is introduced to

>>>>> store the host physical addresses bound to reserved IOVAs.

>>>>>

>>>>> Those physical addresses will correspond to MSI frame base

>>>>> addresses, also referred to as doorbells. Their number should be

>>>>> quite limited per domain.

>>>>>

>>>>> Also a spin_lock is introduced to protect accesses to the iova_domain

>>>>> and RB tree. The choice of a spin_lock is driven by the fact the RB

>>>>> tree will need to be accessed in MSI controller code not allowed to

>>>>> sleep.

>>>>>

>>>>> Signed-off-by: Eric Auger <eric.auger@linaro.org>

>>>>>

>>>>> ---

>>>>> v5 -> v6:

>>>>> - initialize reserved_binding_list

>>>>> - use a spinlock instead of a mutex

>>>>> ---

>>>>>    drivers/iommu/iommu.c | 2 ++

>>>>>    include/linux/iommu.h | 6 ++++++

>>>>>    2 files changed, 8 insertions(+)

>>>>>

>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

>>>>> index b9df141..f70ef3b 100644

>>>>> --- a/drivers/iommu/iommu.c

>>>>> +++ b/drivers/iommu/iommu.c

>>>>> @@ -1073,6 +1073,8 @@ static struct iommu_domain

>>>>> *__iommu_domain_alloc(struct bus_type *bus,

>>>>>

>>>>>        domain->ops  = bus->iommu_ops;

>>>>>        domain->type = type;

>>>>> +    spin_lock_init(&domain->reserved_lock);

>>>>> +    domain->reserved_binding_list = RB_ROOT;

>>>>>

>>>>>        return domain;

>>>>>    }

>>>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h

>>>>> index b3e8c5b..60999db 100644

>>>>> --- a/include/linux/iommu.h

>>>>> +++ b/include/linux/iommu.h

>>>>> @@ -24,6 +24,7 @@

>>>>>    #include <linux/of.h>

>>>>>    #include <linux/types.h>

>>>>>    #include <linux/scatterlist.h>

>>>>> +#include <linux/spinlock.h>

>>>>>    #include <trace/events/iommu.h>

>>>>>

>>>>>    #define IOMMU_READ    (1 << 0)

>>>>> @@ -83,6 +84,11 @@ struct iommu_domain {

>>>>>        void *handler_token;

>>>>>        struct iommu_domain_geometry geometry;

>>>>>        void *iova_cookie;

>>>>> +    void *reserved_iova_cookie;

>>>>

>>>> Why exactly do we need this? From your description, it's for the user of

>>>> the domain to keep track of IOVA allocations in, but then that's

>>>> precisely what the iova_cookie exists for.

>>>

>>> I was not sure whether both APIs could not be used concurrently, hence a

>>> separate cookie. If we only consider MSI mapping use case I guess we are

>>> either with a DMA domain or with a domain for VFIO and I would agree

>>> with you, ie. we can reuse the same cookie.

>>

>> Unless somebody cooks up some paravirtualised monstrosity where the

>> guest driver somehow uses the host kernel's DMA mapping ops (thankfully,

>> I'm not sure how that would even be possible), then they should always

>> be mutually exclusive.

> 

> OK thanks

>>

>> (That said, I should probably add a sanity check to

>> iommu_dma_put_cookie() to ensure it only touches the cookies of

>> IOMMU_DOMAIN_DMA domains...)

>>

>>>>> +    /* rb tree indexed by PA, for reserved bindings only */

>>>>> +    struct rb_root reserved_binding_list;

>>>>

>>>> Nit: that's more puzzling than helpful - "reserved binding" is

>>>> particularly vague and nondescript, and makes me think of anything but

>>>> MSI descriptors.

>>> my heart is torn between advised genericity and MSI use case. My natural

>>> short-sighted inclination would head me for an MSI mapping dedicated API

>>> but I am following advices. As discussed with Alex there are

>>> implementation details pretty related to MSI problematics I think (the

>>> fact we store the "bindings" in an rb-tree/list, locking)

>>>

>>> If Marc & Alex I can retarget this API to be less generic.

>>>

>>>   Plus it's called a list but isn't a list (that said,

>>>> given that we'd typically only expect a handful of entries, and lookups

>>>> are hardly going to be a performance-critical bottleneck, would a simple

>>>> list not suffice?)

>>> I fully agree on that point. An rb-tree is overkill today for MSI use

>>> case. Again if we were to use this API for anything else, this may

>>> change the decision. But sure we can refactor afterwards upon needs. TBH

>>> the rb-tree is inherited from vfio_iommu_type1 dma tree where that code

>>> was originally located.

>>

>> Thinking some more, how feasible would it be to handle the IOVA

>> management aspect within the existing tree, i.e. extend struct vfio_dma

>> so an entry can represent different types of thing - DMA pages, MSI

>> pages, arbitrary reservations - and link to more implementation-specific

>> data (e.g. a refcounted MSI descriptor stored elsewhere in the domain)

>> as necessary?

> it is feasible and  was approximately done there at the early ages of

> the series:

> https://lkml.org/lkml/2016/1/28/803 &

> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016607.html


Forgot to mention that locking mechanism currently used in
vfio_iommu_type1 is based on mutex. Hence it is not compatible with MSI
layer requirements where msi_domain_(de)activate and msi_set_affinity
must be atomic.

in msi case I think an rcu list might be quite appropriate.

Best Regards

Eric
> 

> Then with the intent of doing something reusable the trend was to put it

> in the iommu instead of vfio_iommu_typeX.

> 

> I am currently trying to make the "msi-iommu api" implemented upon the

> dma-iommu api based on the simplifications that we discussed:

> - reuse iova_cookie for iova domain

> - add an opaque msi_cookie that hides the msi doorbell list + its spinlock

> - simplify locking by making sure the msi domain cannot disappear before

> the iommu domain destruction

> 

> If you agree I would suggest to wait and see the outcome of this new

> design and make a shared decision based on that code? Should be

> available next week.

> 

> Best Regards

> 

> Eric

> 

>>

>> Robin.

>>

>>>>

>>>>> +    /* protects reserved cookie and rbtree manipulation */

>>>>> +    spinlock_t reserved_lock;

>>>>

>>>> A cookie is an opaque structure, so any locking it needs would normally

>>>> be hidden within. If on the other hand it's not meant to be opaque at

>>>> this level, then it should probably be something more specific than a

>>>> void * (if at all, as above).

>>> agreed

>>>

>>> Thanks

>>>

>>> Eric

>>>>

>>>> Robin.

>>>>

>>>>>    };

>>>>>

>>>>>    enum iommu_cap {

>>>>>

>>>>

>>>

>>

>

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b9df141..f70ef3b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1073,6 +1073,8 @@  static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
 
 	domain->ops  = bus->iommu_ops;
 	domain->type = type;
+	spin_lock_init(&domain->reserved_lock);
+	domain->reserved_binding_list = RB_ROOT;
 
 	return domain;
 }
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b3e8c5b..60999db 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -24,6 +24,7 @@ 
 #include <linux/of.h>
 #include <linux/types.h>
 #include <linux/scatterlist.h>
+#include <linux/spinlock.h>
 #include <trace/events/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
@@ -83,6 +84,11 @@  struct iommu_domain {
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
 	void *iova_cookie;
+	void *reserved_iova_cookie;
+	/* rb tree indexed by PA, for reserved bindings only */
+	struct rb_root reserved_binding_list;
+	/* protects reserved cookie and rbtree manipulation */
+	spinlock_t reserved_lock;
 };
 
 enum iommu_cap {