diff mbox series

[PROTOTYPE,4/6] memory: Extend ram_block_discard_(require|disable) by two discard types

Message ID 20200924160423.106747-5-david@redhat.com
State New
Headers show
Series virtio-mem: vfio support | expand

Commit Message

David Hildenbrand Sept. 24, 2020, 4:04 p.m. UTC
We want to separate the two cases whereby
- balloning drivers do random discards on random guest memory (e.g.,
  virtio-balloon) - uncoordinated discards
- paravirtualized memory devices do discards in well-known granularity,
  and always know which block is currently accessible or inaccessible by
  a guest. - coordinated discards

This will be required to get virtio_mem + vfio running - vfio still
wants to block random memory ballooning.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 exec.c                | 109 ++++++++++++++++++++++++++++++++++--------
 include/exec/memory.h |  36 ++++++++++++--
 2 files changed, 121 insertions(+), 24 deletions(-)

Comments

Peter Xu Oct. 20, 2020, 7:17 p.m. UTC | #1
On Thu, Sep 24, 2020 at 06:04:21PM +0200, David Hildenbrand wrote:
> We want to separate the two cases whereby

> - balloning drivers do random discards on random guest memory (e.g.,

>   virtio-balloon) - uncoordinated discards

> - paravirtualized memory devices do discards in well-known granularity,

>   and always know which block is currently accessible or inaccessible by

>   a guest. - coordinated discards

> 

> This will be required to get virtio_mem + vfio running - vfio still

> wants to block random memory ballooning.

> 

> Cc: Paolo Bonzini <pbonzini@redhat.com>

> Cc: "Michael S. Tsirkin" <mst@redhat.com>

> Cc: Alex Williamson <alex.williamson@redhat.com>

> Cc: Wei Yang <richardw.yang@linux.intel.com>

> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Cc: Igor Mammedov <imammedo@redhat.com>

> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> Cc: Peter Xu <peterx@redhat.com>

> Signed-off-by: David Hildenbrand <david@redhat.com>

> ---

>  exec.c                | 109 ++++++++++++++++++++++++++++++++++--------

>  include/exec/memory.h |  36 ++++++++++++--

>  2 files changed, 121 insertions(+), 24 deletions(-)

> 

> diff --git a/exec.c b/exec.c

> index e34b602bdf..83098e9230 100644

> --- a/exec.c

> +++ b/exec.c

> @@ -4098,52 +4098,121 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)

>   * If positive, discarding RAM is disabled. If negative, discarding RAM is

>   * required to work and cannot be disabled.

>   */

> -static int ram_block_discard_disabled;

> +static int uncoordinated_discard_disabled;

> +static int coordinated_discard_disabled;


Instead of duplicating the codes, how about start to make it an array?

Btw, iiuc these flags do not need atomic operations at all, because all callers
should be with BQL and called majorly during machine start/reset.  Even not, I
think we can also use a mutex, maybe it could make things simpler.  No strong
opinion, though.

-- 
Peter Xu
David Hildenbrand Oct. 20, 2020, 7:58 p.m. UTC | #2
On 20.10.20 21:17, Peter Xu wrote:
> On Thu, Sep 24, 2020 at 06:04:21PM +0200, David Hildenbrand wrote:

>> We want to separate the two cases whereby

>> - balloning drivers do random discards on random guest memory (e.g.,

>>   virtio-balloon) - uncoordinated discards

>> - paravirtualized memory devices do discards in well-known granularity,

>>   and always know which block is currently accessible or inaccessible by

>>   a guest. - coordinated discards

>>

>> This will be required to get virtio_mem + vfio running - vfio still

>> wants to block random memory ballooning.

>>

>> Cc: Paolo Bonzini <pbonzini@redhat.com>

>> Cc: "Michael S. Tsirkin" <mst@redhat.com>

>> Cc: Alex Williamson <alex.williamson@redhat.com>

>> Cc: Wei Yang <richardw.yang@linux.intel.com>

>> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>

>> Cc: Igor Mammedov <imammedo@redhat.com>

>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

>> Cc: Peter Xu <peterx@redhat.com>

>> Signed-off-by: David Hildenbrand <david@redhat.com>

>> ---

>>  exec.c                | 109 ++++++++++++++++++++++++++++++++++--------

>>  include/exec/memory.h |  36 ++++++++++++--

>>  2 files changed, 121 insertions(+), 24 deletions(-)

>>

>> diff --git a/exec.c b/exec.c

>> index e34b602bdf..83098e9230 100644

>> --- a/exec.c

>> +++ b/exec.c

>> @@ -4098,52 +4098,121 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)

>>   * If positive, discarding RAM is disabled. If negative, discarding RAM is

>>   * required to work and cannot be disabled.

>>   */

>> -static int ram_block_discard_disabled;

>> +static int uncoordinated_discard_disabled;

>> +static int coordinated_discard_disabled;

> 

> Instead of duplicating the codes, how about start to make it an array?

> 

> Btw, iiuc these flags do not need atomic operations at all, because all callers

> should be with BQL and called majorly during machine start/reset.  Even not, I

> think we can also use a mutex, maybe it could make things simpler.  No strong

> opinion, though.

> 


I remember there were some !BQL users (but I might be confusing it with
postcopy code that once used to inhibit the balloon without BQL). Will
double-check. Simplifying it is certainly a good idea.

(we want to be able to check from virtio-balloon code repeatedly without
taking a mutex over and over again :) )

Thanks!

-- 
Thanks,

David / dhildenb
Peter Xu Oct. 20, 2020, 8:49 p.m. UTC | #3
On Tue, Oct 20, 2020 at 09:58:34PM +0200, David Hildenbrand wrote:
> I remember there were some !BQL users (but I might be confusing it with

> postcopy code that once used to inhibit the balloon without BQL). Will

> double-check. Simplifying it is certainly a good idea.

> 

> (we want to be able to check from virtio-balloon code repeatedly without

> taking a mutex over and over again :) )


Right.  Again I've no strong opinion so feel free to keep the codes as wish.
However if we'd go the other way (either BQL or another mutex) IMHO we can
simply read that flag directly without taking mutex.  IMHO here the mutex can
be used to protect write concurrency and should be enough. Considering that
this flag should rarely change and it should never flip (e.g., positive would
never go negative, and vise versa), then READ_ONCE() whould be safe enough on
read side (e.g., balloon).

Thanks,

-- 
Peter Xu
Peter Xu Oct. 20, 2020, 9:30 p.m. UTC | #4
On Tue, Oct 20, 2020 at 04:49:29PM -0400, Peter Xu wrote:
> On Tue, Oct 20, 2020 at 09:58:34PM +0200, David Hildenbrand wrote:

> > I remember there were some !BQL users (but I might be confusing it with

> > postcopy code that once used to inhibit the balloon without BQL). Will

> > double-check. Simplifying it is certainly a good idea.

> > 

> > (we want to be able to check from virtio-balloon code repeatedly without

> > taking a mutex over and over again :) )

> 

> Right.  Again I've no strong opinion so feel free to keep the codes as wish.

> However if we'd go the other way (either BQL or another mutex) IMHO we can

> simply read that flag directly without taking mutex.  IMHO here the mutex can

> be used to protect write concurrency and should be enough. Considering that

> this flag should rarely change and it should never flip (e.g., positive would

> never go negative, and vise versa), then READ_ONCE() whould be safe enough on

> read side (e.g., balloon).


Btw, what we've discussed is all about serialzation of the flag.  I'm also
thinking about whether we can make the flag clearer on what it means.  Frankly
speaking on the first glance it confused me quite a bit..

IMHO what we may want is not some complicated counter on "disablement", but
some simple counters on "someone that provided the cap to discard pages", and
"someone that won't work if we _might_ discard pages".  I'm thinking what if we
split the "disable" counter into two:

  - ram_discard_providers ("Who allows ram discard"): balloon, virtio-mem

  - ram_discard_opposers ("Who forbids ram discard"): vfio, rdma, ...

The major benefit is, the counters should always be >=0, as what a usual
counter would do.  Each device/caller should only/majorly increase the counter.
Also we don't need the cmpxchg() loops too since the check is easier - just
read the other side of the counters to know whether we should fail now.

So after this patch to introduce "coordinated discards", the counters can also
grows into four (we can still define some arrays):

        |---------------------------+------------+-----------|
        | counters                  | provider   | opposer   |
        |---------------------------+------------+-----------|
        | RAM_DISCARD_COORDINATED   | virtio-mem | rdma, ... |
        | RAM_DISCARD_UNCOORDINATED | balloon    | vfio      |
        |---------------------------+------------+-----------|

Some examples: for vfio, it only needs to make sure no UNCOORDINATE providers.
For rdma, it needs to make sure no COORDINATE/UNCOORDINATE providers.  The
check helper could simply be done using a similar ANY bitmask as introduced in
the current patch.

Not sure whether this may help.

Thanks,

-- 
Peter Xu
diff mbox series

Patch

diff --git a/exec.c b/exec.c
index e34b602bdf..83098e9230 100644
--- a/exec.c
+++ b/exec.c
@@ -4098,52 +4098,121 @@  void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
  * If positive, discarding RAM is disabled. If negative, discarding RAM is
  * required to work and cannot be disabled.
  */
-static int ram_block_discard_disabled;
+static int uncoordinated_discard_disabled;
+static int coordinated_discard_disabled;
 
-int ram_block_discard_disable(bool state)
+static int __ram_block_discard_disable(int *counter)
 {
     int old;
 
-    if (!state) {
-        atomic_dec(&ram_block_discard_disabled);
-        return 0;
-    }
-
     do {
-        old = atomic_read(&ram_block_discard_disabled);
+        old = atomic_read(counter);
         if (old < 0) {
             return -EBUSY;
         }
-    } while (atomic_cmpxchg(&ram_block_discard_disabled, old, old + 1) != old);
+    } while (atomic_cmpxchg(counter, old, old + 1) != old);
+
     return 0;
 }
 
-int ram_block_discard_require(bool state)
+int ram_block_discard_type_disable(RamBlockDiscardType type, bool state)
 {
-    int old;
+    int ret;
 
-    if (!state) {
-        atomic_inc(&ram_block_discard_disabled);
-        return 0;
+    if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED) {
+        if (!state) {
+            atomic_dec(&uncoordinated_discard_disabled);
+        } else {
+            ret = __ram_block_discard_disable(&uncoordinated_discard_disabled);
+            if (ret) {
+                return ret;
+            }
+        }
     }
+    if (type & RAM_BLOCK_DISCARD_T_COORDINATED) {
+        if (!state) {
+            atomic_dec(&coordinated_discard_disabled);
+        } else {
+            ret = __ram_block_discard_disable(&coordinated_discard_disabled);
+            if (ret) {
+                /* Rollback the previous change. */
+                if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED) {
+                    atomic_dec(&uncoordinated_discard_disabled);
+                }
+                return ret;
+            }
+        }
+    }
+    return 0;
+}
+
+static int __ram_block_discard_require(int *counter)
+{
+    int old;
 
     do {
-        old = atomic_read(&ram_block_discard_disabled);
+        old = atomic_read(counter);
         if (old > 0) {
             return -EBUSY;
         }
-    } while (atomic_cmpxchg(&ram_block_discard_disabled, old, old - 1) != old);
+    } while (atomic_cmpxchg(counter, old, old - 1) != old);
+
+    return 0;
+}
+
+int ram_block_discard_type_require(RamBlockDiscardType type, bool state)
+{
+    int ret;
+
+    if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED) {
+        if (!state) {
+            atomic_inc(&uncoordinated_discard_disabled);
+        } else {
+            ret = __ram_block_discard_require(&uncoordinated_discard_disabled);
+            if (ret) {
+                return ret;
+            }
+        }
+    }
+    if (type & RAM_BLOCK_DISCARD_T_COORDINATED) {
+        if (!state) {
+            atomic_inc(&coordinated_discard_disabled);
+        } else {
+            ret = __ram_block_discard_require(&coordinated_discard_disabled);
+            if (ret) {
+                /* Rollback the previous change. */
+                if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED) {
+                    atomic_inc(&uncoordinated_discard_disabled);
+                }
+                return ret;
+            }
+        }
+    }
     return 0;
 }
 
-bool ram_block_discard_is_disabled(void)
+bool ram_block_discard_type_is_disabled(RamBlockDiscardType type)
 {
-    return atomic_read(&ram_block_discard_disabled) > 0;
+    if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED &&
+        atomic_read(&uncoordinated_discard_disabled) > 0) {
+        return true;
+    } else if (type & RAM_BLOCK_DISCARD_T_COORDINATED &&
+               atomic_read(&coordinated_discard_disabled) > 0) {
+        return true;
+    }
+    return false;
 }
 
-bool ram_block_discard_is_required(void)
+bool ram_block_discard_type_is_required(RamBlockDiscardType type)
 {
-    return atomic_read(&ram_block_discard_disabled) < 0;
+    if (type & RAM_BLOCK_DISCARD_T_UNCOORDINATED &&
+        atomic_read(&uncoordinated_discard_disabled) < 0) {
+        return true;
+    } else if (type & RAM_BLOCK_DISCARD_T_COORDINATED &&
+               atomic_read(&coordinated_discard_disabled) < 0) {
+        return true;
+    }
+    return false;
 }
 
 #endif
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2931ead730..3169ebc3d9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2588,6 +2588,18 @@  static inline MemOp devend_memop(enum device_endian end)
 }
 #endif
 
+typedef enum RamBlockDiscardType {
+    /* Uncorrdinated discards (e.g., virtio-balloon */
+    RAM_BLOCK_DISCARD_T_UNCOORDINATED = 1,
+    /*
+     * Coordinated discards on selected memory regions (e.g., virtio-mem via
+     * SparseRamNotifier).
+     */
+    RAM_BLOCK_DISCARD_T_COORDINATED =   2,
+    /* Any type of discards */
+    RAM_BLOCK_DISCARD_T_ANY =           3,
+} RamBlockDiscardType;
+
 /*
  * Inhibit technologies that require discarding of pages in RAM blocks, e.g.,
  * to manage the actual amount of memory consumed by the VM (then, the memory
@@ -2609,7 +2621,11 @@  static inline MemOp devend_memop(enum device_endian end)
  * Returns 0 if successful. Returns -EBUSY if a technology that relies on
  * discards to work reliably is active.
  */
-int ram_block_discard_disable(bool state);
+int ram_block_discard_type_disable(RamBlockDiscardType type, bool state);
+static inline int ram_block_discard_disable(bool state)
+{
+    return ram_block_discard_type_disable(RAM_BLOCK_DISCARD_T_ANY, state);
+}
 
 /*
  * Inhibit technologies that disable discarding of pages in RAM blocks.
@@ -2617,17 +2633,29 @@  int ram_block_discard_disable(bool state);
  * Returns 0 if successful. Returns -EBUSY if discards are already set to
  * broken.
  */
-int ram_block_discard_require(bool state);
+int ram_block_discard_type_require(RamBlockDiscardType type, bool state);
+static inline int ram_block_discard_require(bool state)
+{
+    return ram_block_discard_type_require(RAM_BLOCK_DISCARD_T_ANY, state);
+}
 
 /*
  * Test if discarding of memory in ram blocks is disabled.
  */
-bool ram_block_discard_is_disabled(void);
+bool ram_block_discard_type_is_disabled(RamBlockDiscardType type);
+static inline bool ram_block_discard_is_disabled(void)
+{
+    return ram_block_discard_type_is_disabled(RAM_BLOCK_DISCARD_T_ANY);
+}
 
 /*
  * Test if discarding of memory in ram blocks is required to work reliably.
  */
-bool ram_block_discard_is_required(void);
+bool ram_block_discard_type_is_required(RamBlockDiscardType type);
+static inline bool ram_block_discard_is_required(void)
+{
+    return ram_block_discard_type_is_required(RAM_BLOCK_DISCARD_T_ANY);
+}
 
 #endif