diff mbox

[Xen-devel,v6,6/8] xen/arm: introduce GNTTABOP_cache_flush

Message ID 1413470755-30991-6-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini Oct. 16, 2014, 2:45 p.m. UTC
Introduce a new hypercall to perform cache maintenance operation on
behalf of the guest. The argument is a machine address and a size. The
implementation checks that the memory range is owned by the guest or the
guest has been granted access to it by another domain.

Introduce grant_map_exists: an internal grant table function to check
whether an mfn has been granted to a given domain on a target grant
table.

As grant_map_exists loops over all the guest grant table entries, limit
DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
max. Warn if the user sets max_grant_frames higher than 10.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---

Changes in v6:
- set DEFAULT_MAX_NR_GRANT_FRAMES to 10;
- warn if max_grant_frames > 10.

Changes in v5:
- make mfn mfn unsigned long;
- remove unhelpful error message;
- handle errors returned by cache maintenance functions.

Changes in v4:
- ASSERT(spin_is_locked);
- return instead of break in grant_map_exists;
- pass a pointer to __gnttab_cache_flush;
- code style;
- unsigned int iterator in gnttab_cache_flush;
- return ret instead -ret;
- cflush.offset >= PAGE_SIZE return -EINVAL.

Changes in v3:
- reduce the time the grant_table lock is held;
- fix warning message;
- s/EFAULT/EPERM;
- s/llx/PRIx64;
- check offset and size independetly before checking their sum;
- rcu_lock_current_domain cannot fail;
- s/ENOSYS/EOPNOTSUPP;
- use clean_and_invalidate_xen_dcache_va_range to do both operations at
once;
- fold grant_map_exists in this patch;
- support "count" argument;
- make correspondent changes to compat/grant_table.c;
- introduce GNTTAB_CACHE_SOURCE_GREF to select the type of input in the
union;
- rename size field to length;
- make length and offset uint16_t;
- only take spin_lock if d != owner.

Changes in v2:
- do not check for mfn_to_page errors;
- take a reference to the page;
- replace printk with gdprintk;
- split long line;
- remove out label;
- move rcu_lock_current_domain down before the loop.
- move the hypercall to GNTTABOP;
- take a spin_lock before calling grant_map_exists.
---
 xen/common/compat/grant_table.c  |    8 +++
 xen/common/grant_table.c         |  137 ++++++++++++++++++++++++++++++++++++++
 xen/include/public/grant_table.h |   20 ++++++
 xen/include/xen/grant_table.h    |    2 +-
 xen/include/xlat.lst             |    1 +
 5 files changed, 167 insertions(+), 1 deletion(-)

Comments

David Vrabel Oct. 16, 2014, 5:03 p.m. UTC | #1
On 16/10/14 15:45, Stefano Stabellini wrote:
> Introduce a new hypercall to perform cache maintenance operation on
> behalf of the guest. The argument is a machine address and a size. The
> implementation checks that the memory range is owned by the guest or the
> guest has been granted access to it by another domain.
> 
> Introduce grant_map_exists: an internal grant table function to check
> whether an mfn has been granted to a given domain on a target grant
> table.
> 
> As grant_map_exists loops over all the guest grant table entries, limit
> DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
> max. Warn if the user sets max_grant_frames higher than 10.

No.  This is much too low.

A netfront with 4 queues wants 4 * 2 * 256 = 2048 grant references. So
this limit would only allow for two VIFs which is completely unacceptable.

blkfront would be similarly constrained.

I think you're going to have to add continuations somehow or you are
going to have abandon this approach and use the SWIOTLB in the guest.

David
Stefano Stabellini Oct. 16, 2014, 5:29 p.m. UTC | #2
On Thu, 16 Oct 2014, David Vrabel wrote:
> On 16/10/14 15:45, Stefano Stabellini wrote:
> > Introduce a new hypercall to perform cache maintenance operation on
> > behalf of the guest. The argument is a machine address and a size. The
> > implementation checks that the memory range is owned by the guest or the
> > guest has been granted access to it by another domain.
> > 
> > Introduce grant_map_exists: an internal grant table function to check
> > whether an mfn has been granted to a given domain on a target grant
> > table.
> > 
> > As grant_map_exists loops over all the guest grant table entries, limit
> > DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
> > max. Warn if the user sets max_grant_frames higher than 10.
> 
> No.  This is much too low.
> 
> A netfront with 4 queues wants 4 * 2 * 256 = 2048 grant references. So
> this limit would only allow for two VIFs which is completely unacceptable.
> 
> blkfront would be similarly constrained.

10 is too low, that is a good point, thanks!


> I think you're going to have to add continuations somehow or you are
> going to have abandon this approach and use the SWIOTLB in the guest.

Actually the latest version already supports continuations (even though
I admit I didn't explicitly test it).

In any case the default max (32 frames) means 16000 iterations, that
would take around 32000ns on a not very modern system. I don't think is
that bad. And of course it is just the theoretical worst case.

Maybe we should leave the default max as is?
David Vrabel Oct. 16, 2014, 5:47 p.m. UTC | #3
On 16/10/14 18:29, Stefano Stabellini wrote:
> On Thu, 16 Oct 2014, David Vrabel wrote:
>> On 16/10/14 15:45, Stefano Stabellini wrote:
>>> Introduce a new hypercall to perform cache maintenance operation on
>>> behalf of the guest. The argument is a machine address and a size. The
>>> implementation checks that the memory range is owned by the guest or the
>>> guest has been granted access to it by another domain.
>>>
>>> Introduce grant_map_exists: an internal grant table function to check
>>> whether an mfn has been granted to a given domain on a target grant
>>> table.
>>>
>>> As grant_map_exists loops over all the guest grant table entries, limit
>>> DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
>>> max. Warn if the user sets max_grant_frames higher than 10.
>>
>> No.  This is much too low.
>>
>> A netfront with 4 queues wants 4 * 2 * 256 = 2048 grant references. So
>> this limit would only allow for two VIFs which is completely unacceptable.
>>
>> blkfront would be similarly constrained.
> 
> 10 is too low, that is a good point, thanks!
> 
> 
>> I think you're going to have to add continuations somehow or you are
>> going to have abandon this approach and use the SWIOTLB in the guest.
> 
> Actually the latest version already supports continuations (even though
> I admit I didn't explicitly test it).

I meant pre-empting the hypercall when it is in the middle of iterating
through the grant table.  I think it would be safe to drop the grant
table lock and resume the scan part way through since the relevant grant
should not change during the cache flush call (if it does then the flush
wiil at worst safely fail).

> In any case the default max (32 frames) means 16000 iterations, that
> would take around 32000ns on a not very modern system. I don't think is
> that bad. And of course it is just the theoretical worst case.
> 
> Maybe we should leave the default max as is?

At a bare minimum, yes.  But I really don't think we should be /adding/
hard scalability limits to Xen -- we should be removing them!

David
Jan Beulich Oct. 17, 2014, 9:44 a.m. UTC | #4
>>> On 16.10.14 at 19:29, <stefano.stabellini@eu.citrix.com> wrote:
> In any case the default max (32 frames) means 16000 iterations, that
> would take around 32000ns on a not very modern system. I don't think is
> that bad. And of course it is just the theoretical worst case.

2ns per iteration when there are 3 conditionals in the loop body
seems rather optimistic to me.

Jan
Stefano Stabellini Oct. 17, 2014, 10:01 a.m. UTC | #5
On Thu, 16 Oct 2014, David Vrabel wrote:
> On 16/10/14 18:29, Stefano Stabellini wrote:
> > On Thu, 16 Oct 2014, David Vrabel wrote:
> >> On 16/10/14 15:45, Stefano Stabellini wrote:
> >>> Introduce a new hypercall to perform cache maintenance operation on
> >>> behalf of the guest. The argument is a machine address and a size. The
> >>> implementation checks that the memory range is owned by the guest or the
> >>> guest has been granted access to it by another domain.
> >>>
> >>> Introduce grant_map_exists: an internal grant table function to check
> >>> whether an mfn has been granted to a given domain on a target grant
> >>> table.
> >>>
> >>> As grant_map_exists loops over all the guest grant table entries, limit
> >>> DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
> >>> max. Warn if the user sets max_grant_frames higher than 10.
> >>
> >> No.  This is much too low.
> >>
> >> A netfront with 4 queues wants 4 * 2 * 256 = 2048 grant references. So
> >> this limit would only allow for two VIFs which is completely unacceptable.
> >>
> >> blkfront would be similarly constrained.
> > 
> > 10 is too low, that is a good point, thanks!
> > 
> > 
> >> I think you're going to have to add continuations somehow or you are
> >> going to have abandon this approach and use the SWIOTLB in the guest.
> > 
> > Actually the latest version already supports continuations (even though
> > I admit I didn't explicitly test it).
> 
> I meant pre-empting the hypercall when it is in the middle of iterating
> through the grant table.  I think it would be safe to drop the grant
> table lock and resume the scan part way through since the relevant grant
> should not change during the cache flush call (if it does then the flush
> wiil at worst safely fail).

Dropping the lock and taking it again every 1000 iterations is doable
and beneficial.
However a real continuation is difficult because we already have a
continuation on the count hypercall argument.
Jan Beulich Oct. 17, 2014, 10:23 a.m. UTC | #6
>>> On 17.10.14 at 12:01, <stefano.stabellini@eu.citrix.com> wrote:
> On Thu, 16 Oct 2014, David Vrabel wrote:
>> On 16/10/14 18:29, Stefano Stabellini wrote:
>> > On Thu, 16 Oct 2014, David Vrabel wrote:
>> >> On 16/10/14 15:45, Stefano Stabellini wrote:
>> >>> Introduce a new hypercall to perform cache maintenance operation on
>> >>> behalf of the guest. The argument is a machine address and a size. The
>> >>> implementation checks that the memory range is owned by the guest or the
>> >>> guest has been granted access to it by another domain.
>> >>>
>> >>> Introduce grant_map_exists: an internal grant table function to check
>> >>> whether an mfn has been granted to a given domain on a target grant
>> >>> table.
>> >>>
>> >>> As grant_map_exists loops over all the guest grant table entries, limit
>> >>> DEFAULT_MAX_NR_GRANT_FRAMES to 10 to cap the loop to 5000 iterations
>> >>> max. Warn if the user sets max_grant_frames higher than 10.
>> >>
>> >> No.  This is much too low.
>> >>
>> >> A netfront with 4 queues wants 4 * 2 * 256 = 2048 grant references. So
>> >> this limit would only allow for two VIFs which is completely unacceptable.
>> >>
>> >> blkfront would be similarly constrained.
>> > 
>> > 10 is too low, that is a good point, thanks!
>> > 
>> > 
>> >> I think you're going to have to add continuations somehow or you are
>> >> going to have abandon this approach and use the SWIOTLB in the guest.
>> > 
>> > Actually the latest version already supports continuations (even though
>> > I admit I didn't explicitly test it).
>> 
>> I meant pre-empting the hypercall when it is in the middle of iterating
>> through the grant table.  I think it would be safe to drop the grant
>> table lock and resume the scan part way through since the relevant grant
>> should not change during the cache flush call (if it does then the flush
>> wiil at worst safely fail).
> 
> Dropping the lock and taking it again every 1000 iterations is doable
> and beneficial.
> However a real continuation is difficult because we already have a
> continuation on the count hypercall argument.

We've got the upper bits of cmd left to do further encoding of
continuations. If you split every 1024 iterations and used the top
16 bits of cmd, that would leave ample room for future GNTTABOP-s
and still allow handling tables sizes up to 2^26 entries. We could
even deal with the full 2^32 range (removing the need to
enforce some upper limit) if we reserved only 10 bits for the actual
cmd encoding, which should still be plenty.

Jan
diff mbox

Patch

diff --git a/xen/common/compat/grant_table.c b/xen/common/compat/grant_table.c
index 2dc1e44..5130c35 100644
--- a/xen/common/compat/grant_table.c
+++ b/xen/common/compat/grant_table.c
@@ -51,6 +51,10 @@  CHECK_gnttab_get_version;
 CHECK_gnttab_swap_grant_ref;
 #undef xen_gnttab_swap_grant_ref
 
+#define xen_gnttab_cache_flush gnttab_cache_flush
+CHECK_gnttab_cache_flush;
+#undef xen_gnttab_cache_flush
+
 int compat_grant_table_op(unsigned int cmd,
                           XEN_GUEST_HANDLE_PARAM(void) cmp_uop,
                           unsigned int count)
@@ -106,6 +110,10 @@  int compat_grant_table_op(unsigned int cmd,
     CASE(swap_grant_ref);
 #endif
 
+#ifndef CHECK_gnttab_cache_flush
+    CASE(cache_flush);
+#endif
+
 #undef CASE
     default:
         return do_grant_table_op(cmd, cmp_uop, count);
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 345be82..2124444 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -489,6 +489,34 @@  static int _set_status(unsigned gt_version,
         return _set_status_v2(domid, readonly, mapflag, shah, act, status);
 }
 
+static bool_t grant_map_exists(const struct domain *ld,
+                        struct grant_table *rgt,
+                        unsigned long mfn)
+{
+    const struct active_grant_entry *act;
+    grant_ref_t ref;
+
+    ASSERT(spin_is_locked(&rgt->lock));
+
+    for ( ref = 0; ref != nr_grant_entries(rgt); ref++ )
+    {
+        act = &active_entry(rgt, ref);
+
+        if ( !act->pin )
+            continue;
+
+        if ( act->domid != ld->domain_id )
+            continue;
+
+        if ( act->frame != mfn )
+            continue;
+
+        return 1;
+    }
+
+    return 0;
+}
+
 static void mapcount(
     struct grant_table *lgt, struct domain *rd, unsigned long mfn,
     unsigned int *wrc, unsigned int *rdc)
@@ -2487,6 +2515,97 @@  gnttab_swap_grant_ref(XEN_GUEST_HANDLE_PARAM(gnttab_swap_grant_ref_t) uop,
     return 0;
 }
 
+static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush)
+{
+    struct domain *d, *owner;
+    struct page_info *page;
+    unsigned long mfn;
+    void *v;
+    int ret = 0;
+
+    if ( (cflush->offset >= PAGE_SIZE) ||
+         (cflush->length > PAGE_SIZE) ||
+         (cflush->offset + cflush->length > PAGE_SIZE) )
+        return -EINVAL;
+
+    if ( cflush->length == 0 || cflush->op == 0 )
+        return 0;
+
+    /* currently unimplemented */
+    if ( cflush->op & GNTTAB_CACHE_SOURCE_GREF )
+        return -EOPNOTSUPP;
+
+    d = rcu_lock_current_domain();
+    mfn = cflush->a.dev_bus_addr >> PAGE_SHIFT;
+
+    if ( !mfn_valid(mfn) )
+    {
+        rcu_unlock_domain(d);
+        return -EINVAL;
+    }
+
+    page = mfn_to_page(mfn);
+    owner = page_get_owner_and_reference(page);
+    if ( !owner )
+    {
+        rcu_unlock_domain(d);
+        return -EPERM;
+    }
+
+    if ( d != owner )
+    {
+        spin_lock(&owner->grant_table->lock);
+
+        if ( !grant_map_exists(d, owner->grant_table, mfn) )
+        {
+            spin_unlock(&owner->grant_table->lock);
+            rcu_unlock_domain(d);
+            put_page(page);
+            return -EINVAL;
+        }
+    }
+
+    v = map_domain_page(mfn);
+    v += cflush->offset;
+
+    if ( (cflush->op & GNTTAB_CACHE_INVAL) && (cflush->op & GNTTAB_CACHE_CLEAN) )
+        ret = clean_and_invalidate_dcache_va_range(v, cflush->length);
+    else if ( cflush->op & GNTTAB_CACHE_INVAL )
+        ret = invalidate_dcache_va_range(v, cflush->length);
+    else if ( cflush->op & GNTTAB_CACHE_CLEAN )
+        ret = clean_dcache_va_range(v, cflush->length);
+
+    if ( d != owner )
+        spin_unlock(&owner->grant_table->lock);
+    unmap_domain_page(v);
+    put_page(page);
+
+    return ret;
+}
+
+static long
+gnttab_cache_flush(XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) uop,
+                      unsigned int count)
+{
+    int ret;
+    unsigned int i;
+    gnttab_cache_flush_t op;
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( i && hypercall_preempt_check() )
+            return i;
+        if ( unlikely(__copy_from_guest(&op, uop, 1)) )
+            return -EFAULT;
+        ret = __gnttab_cache_flush(&op);
+        if ( ret < 0 )
+            return ret;
+        guest_handle_add_offset(uop, 1);
+    }
+    return 0;
+}
+
+
 long
 do_grant_table_op(
     unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) uop, unsigned int count)
@@ -2616,6 +2735,20 @@  do_grant_table_op(
         }
         break;
     }
+    case GNTTABOP_cache_flush:
+    {
+        XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) cflush =
+            guest_handle_cast(uop, gnttab_cache_flush_t);
+        if ( unlikely(!guest_handle_okay(cflush, count)) )
+            goto out;
+        rc = gnttab_cache_flush(cflush, count);
+        if ( rc > 0 )
+        {
+            guest_handle_add_offset(cflush, rc);
+            uop = guest_handle_cast(cflush, void);
+        }
+        break;
+    }
     default:
         rc = -ENOSYS;
         break;
@@ -2953,6 +3086,10 @@  static int __init gnttab_usage_init(void)
     if ( !max_maptrack_frames )
         max_maptrack_frames = MAX_MAPTRACK_TO_GRANTS_RATIO * max_grant_frames;
 
+    if ( max_grant_frames > DEFAULT_MAX_NR_GRANT_FRAMES )
+        printk(XENLOG_WARNING "An high gnttab_max_frames can seriously slow down the hypervisor.\n"
+                "It is recommended to set gnttab_max_frames to %d max.\n", DEFAULT_MAX_NR_GRANT_FRAMES);
+
     register_keyhandler('g', &gnttab_usage_print_all_keyhandler);
     return 0;
 }
diff --git a/xen/include/public/grant_table.h b/xen/include/public/grant_table.h
index b8a3d6c..20d4e77 100644
--- a/xen/include/public/grant_table.h
+++ b/xen/include/public/grant_table.h
@@ -309,6 +309,7 @@  typedef uint16_t grant_status_t;
 #define GNTTABOP_get_status_frames    9
 #define GNTTABOP_get_version          10
 #define GNTTABOP_swap_grant_ref	      11
+#define GNTTABOP_cache_flush	      12
 #endif /* __XEN_INTERFACE_VERSION__ */
 /* ` } */
 
@@ -574,6 +575,25 @@  struct gnttab_swap_grant_ref {
 typedef struct gnttab_swap_grant_ref gnttab_swap_grant_ref_t;
 DEFINE_XEN_GUEST_HANDLE(gnttab_swap_grant_ref_t);
 
+/*
+ * Issue one or more cache maintenance operations on a portion of a
+ * page granted to the calling domain by a foreign domain.
+ */
+struct gnttab_cache_flush {
+    union {
+        uint64_t dev_bus_addr;
+        grant_ref_t ref;
+    } a;
+    uint16_t offset; /* offset from start of grant */
+    uint16_t length; /* size within the grant */
+#define GNTTAB_CACHE_CLEAN          (1<<0)
+#define GNTTAB_CACHE_INVAL          (1<<1)
+#define GNTTAB_CACHE_SOURCE_GREF    (1<<31)
+    uint32_t op;
+};
+typedef struct gnttab_cache_flush gnttab_cache_flush_t;
+DEFINE_XEN_GUEST_HANDLE(gnttab_cache_flush_t);
+
 #endif /* __XEN_INTERFACE_VERSION__ */
 
 /*
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 9f3a1f3..77328cd 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -47,7 +47,7 @@ 
 
 #ifndef DEFAULT_MAX_NR_GRANT_FRAMES /* to allow arch to override */
 /* Default maximum size of a grant table. [POLICY] */
-#define DEFAULT_MAX_NR_GRANT_FRAMES   32
+#define DEFAULT_MAX_NR_GRANT_FRAMES   10
 #endif
 #ifndef max_nr_grant_frames /* to allow arch to override */
 /* The maximum size of a grant table. */
diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst
index 234b668..9ce9fee 100644
--- a/xen/include/xlat.lst
+++ b/xen/include/xlat.lst
@@ -51,6 +51,7 @@ 
 ?       grant_entry_header              grant_table.h
 ?	grant_entry_v2			grant_table.h
 ?	gnttab_swap_grant_ref		grant_table.h
+?	gnttab_cache_flush		grant_table.h
 ?	kexec_exec			kexec.h
 !	kexec_image			kexec.h
 !	kexec_range			kexec.h