diff mbox series

[v2,06/12] accel/tcg: better handle memory constrained systems

Message ID 20200722062902.24509-7-alex.bennee@linaro.org
State New
Headers show
Series candidate fixes for 5.1-rc1 (testing, semihosting, OOM tcg, x86 fpu) | expand

Commit Message

Alex Bennée July 22, 2020, 6:28 a.m. UTC
It turns out there are some 64 bit systems that have relatively low
amounts of physical memory available to them (typically CI system).
Even with swapping available a 1GB translation buffer that fills up
can put the machine under increased memory pressure. Detect these low
memory situations and reduce tb_size appropriately.

Fixes: 600e17b261
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
Message-Id: <20200717105139.25293-6-alex.bennee@linaro.org>

---
v2
  - /4 to /8 as suggested by Christian
---
 accel/tcg/translate-all.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.20.1

Comments

Richard Henderson July 22, 2020, 3:57 p.m. UTC | #1
On 7/21/20 11:28 PM, Alex Bennée wrote:
> +        size_t phys_mem = qemu_get_host_physmem();

> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

> +            tb_size = phys_mem / 8;

> +        } else {

> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> +        }


I don't understand the 2 * DEFAULT part.

Does this make more sense as

    if (phys_mem == 0) {
        tb_size = default;
    } else {
        tb_size = MIN(default, phys_mem / 8);
    }

?


r~
Alex Bennée July 22, 2020, 4:29 p.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> On 7/21/20 11:28 PM, Alex Bennée wrote:

>> +        size_t phys_mem = qemu_get_host_physmem();

>> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

>> +            tb_size = phys_mem / 8;

>> +        } else {

>> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

>> +        }

>

> I don't understand the 2 * DEFAULT part.


I figured once you had at least twice as much memory you could use the
full amount but...


> Does this make more sense as

>

>     if (phys_mem == 0) {

>         tb_size = default;

>     } else {

>         tb_size = MIN(default, phys_mem / 8);

>     }


This is probably a less aggressive tapering off which still doesn't
affect my 32gb dev machine ;-)

-- 
Alex Bennée
Daniel P. Berrangé July 22, 2020, 4:44 p.m. UTC | #3
On Wed, Jul 22, 2020 at 05:29:46PM +0100, Alex Bennée wrote:
> 

> Richard Henderson <richard.henderson@linaro.org> writes:

> 

> > On 7/21/20 11:28 PM, Alex Bennée wrote:

> >> +        size_t phys_mem = qemu_get_host_physmem();

> >> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

> >> +            tb_size = phys_mem / 8;

> >> +        } else {

> >> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> >> +        }

> >

> > I don't understand the 2 * DEFAULT part.

> 

> I figured once you had at least twice as much memory you could use the

> full amount but...

> 

> 

> > Does this make more sense as

> >

> >     if (phys_mem == 0) {

> >         tb_size = default;

> >     } else {

> >         tb_size = MIN(default, phys_mem / 8);

> >     }

> 

> This is probably a less aggressive tapering off which still doesn't

> affect my 32gb dev machine ;-)


I still feel like this logic of looking at physmem is doomed, because
it makes the assumption that all of physical RAM is theoretically
available to the user, and this isn't the case if running inside a
container or cgroup with a memory cap set.

I don't really have any good answer here, but assuming we can use
1 GB for a cache just doesn't seem like a good idea, especially if
users are running multiple VMs in parallel.

OpenStack uses TCG in alot of their CI infrastructure for example
and runs multiple VMs. If there's 4 VMs, that's another 4 GB of
RAM usage just silently added on top of the explicit -m value.

I wouldn't be surprised if this pushes CI into OOM, even without
containers or cgroups being involved, as they have plenty of other
services consuming RAM in the CI VMs.

The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this
minimizing codegen cache flushes, but doesn't mention the real world
performance impact of eliminating those flushes ?

Presumably this makes the guest OS boot faster, but what's the before
and after time ?  And what's the time like for values in between the
original 32mb and the new 1 GB ?  Can we get some value that is
*significantly* smaller than 1 GB but still gives some useful benefit ?
what would 128 MB be like compared to the original 32mb ?

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Richard Henderson July 22, 2020, 7:02 p.m. UTC | #4
On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:
> OpenStack uses TCG in alot of their CI infrastructure for example

> and runs multiple VMs. If there's 4 VMs, that's another 4 GB of

> RAM usage just silently added on top of the explicit -m value.

> 

> I wouldn't be surprised if this pushes CI into OOM, even without

> containers or cgroups being involved, as they have plenty of other

> services consuming RAM in the CI VMs.


I would hope that CI would also supply a -tb_size to go along with that -m
value.  Because we really can't guess on their behalf.

> The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this

> minimizing codegen cache flushes, but doesn't mention the real world

> performance impact of eliminating those flushes ?


Somewhere on the mailing list was this info.  It was so dreadfully slow it was
*really* noticable.  Timeouts everywhere.

> 

> Presumably this makes the guest OS boot faster, but what's the before

> and after time ?  And what's the time like for values in between the

> original 32mb and the new 1 GB ?


But it wasn't "the original 32MB".
It was the original "ram_size / 4", until that broke due to argument parsing
ordering.

I don't know what CI usually uses, but I usually use at least -m 4G, sometimes
more.  What's the libvirt default?


r~
Daniel P. Berrangé July 23, 2020, 9 a.m. UTC | #5
On Wed, Jul 22, 2020 at 12:02:59PM -0700, Richard Henderson wrote:
> On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:

> > OpenStack uses TCG in alot of their CI infrastructure for example

> > and runs multiple VMs. If there's 4 VMs, that's another 4 GB of

> > RAM usage just silently added on top of the explicit -m value.

> > 

> > I wouldn't be surprised if this pushes CI into OOM, even without

> > containers or cgroups being involved, as they have plenty of other

> > services consuming RAM in the CI VMs.

> 

> I would hope that CI would also supply a -tb_size to go along with that -m

> value.  Because we really can't guess on their behalf.


I've never even seen mention of -tb_size argument before myself, nor
seen anyone else using it and libvirt doesn't set it, so I think
this is not a valid assumption.


> > The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this

> > minimizing codegen cache flushes, but doesn't mention the real world

> > performance impact of eliminating those flushes ?

> 

> Somewhere on the mailing list was this info.  It was so dreadfully slow it was

> *really* noticable.  Timeouts everywhere.

> 

> > Presumably this makes the guest OS boot faster, but what's the before

> > and after time ?  And what's the time like for values in between the

> > original 32mb and the new 1 GB ?

> 

> But it wasn't "the original 32MB".

> It was the original "ram_size / 4", until that broke due to argument parsing

> ordering.


Hmm, 600e17b261555c56a048781b8dd5ba3985650013 says it was 32 MB as the
default in its commit message, which seems to match the code doing

 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)


> I don't know what CI usually uses, but I usually use at least -m 4G, sometimes

> more.  What's the libvirt default?


There's no default memory size - its up to whomever/whatever creates the
VMs to choose how much RAM is given.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Alex Bennée July 23, 2020, 9:22 a.m. UTC | #6
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Jul 22, 2020 at 12:02:59PM -0700, Richard Henderson wrote:

>> On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:

>> > OpenStack uses TCG in alot of their CI infrastructure for example

>> > and runs multiple VMs. If there's 4 VMs, that's another 4 GB of

>> > RAM usage just silently added on top of the explicit -m value.

>> > 

>> > I wouldn't be surprised if this pushes CI into OOM, even without

>> > containers or cgroups being involved, as they have plenty of other

>> > services consuming RAM in the CI VMs.

>> 

>> I would hope that CI would also supply a -tb_size to go along with that -m

>> value.  Because we really can't guess on their behalf.

>

> I've never even seen mention of -tb_size argument before myself, nor

> seen anyone else using it and libvirt doesn't set it, so I think

> this is not a valid assumption.

>

>

>> > The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this

>> > minimizing codegen cache flushes, but doesn't mention the real world

>> > performance impact of eliminating those flushes ?

>> 

>> Somewhere on the mailing list was this info.  It was so dreadfully slow it was

>> *really* noticable.  Timeouts everywhere.

>> 

>> > Presumably this makes the guest OS boot faster, but what's the before

>> > and after time ?  And what's the time like for values in between the

>> > original 32mb and the new 1 GB ?

>> 

>> But it wasn't "the original 32MB".

>> It was the original "ram_size / 4", until that broke due to argument parsing

>> ordering.

>

> Hmm, 600e17b261555c56a048781b8dd5ba3985650013 says it was 32 MB as the

> default in its commit message, which seems to match the code doing

>

>  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)


You need to look earlier in the sequence (see the tag pull-tcg-20200228):

  47a2def4533a2807e48954abd50b32ecb1aaf29a

so when the argument ordering broke the guest ram_size heuristic we
started getting reports of performance regressions because we fell back
to that size. Before then it was always based on guest ram size within
the min/max bounds set by those defines.

>> I don't know what CI usually uses, but I usually use at least -m 4G, sometimes

>> more.  What's the libvirt default?

>

> There's no default memory size - its up to whomever/whatever creates the

> VMs to choose how much RAM is given.

>

> Regards,

> Daniel



-- 
Alex Bennée
Daniel P. Berrangé July 23, 2020, 9:31 a.m. UTC | #7
On Thu, Jul 23, 2020 at 10:22:25AM +0100, Alex Bennée wrote:
> 

> Daniel P. Berrangé <berrange@redhat.com> writes:

> 

> > On Wed, Jul 22, 2020 at 12:02:59PM -0700, Richard Henderson wrote:

> >> On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:

> >> > OpenStack uses TCG in alot of their CI infrastructure for example

> >> > and runs multiple VMs. If there's 4 VMs, that's another 4 GB of

> >> > RAM usage just silently added on top of the explicit -m value.

> >> > 

> >> > I wouldn't be surprised if this pushes CI into OOM, even without

> >> > containers or cgroups being involved, as they have plenty of other

> >> > services consuming RAM in the CI VMs.

> >> 

> >> I would hope that CI would also supply a -tb_size to go along with that -m

> >> value.  Because we really can't guess on their behalf.

> >

> > I've never even seen mention of -tb_size argument before myself, nor

> > seen anyone else using it and libvirt doesn't set it, so I think

> > this is not a valid assumption.

> >

> >

> >> > The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this

> >> > minimizing codegen cache flushes, but doesn't mention the real world

> >> > performance impact of eliminating those flushes ?

> >> 

> >> Somewhere on the mailing list was this info.  It was so dreadfully slow it was

> >> *really* noticable.  Timeouts everywhere.

> >> 

> >> > Presumably this makes the guest OS boot faster, but what's the before

> >> > and after time ?  And what's the time like for values in between the

> >> > original 32mb and the new 1 GB ?

> >> 

> >> But it wasn't "the original 32MB".

> >> It was the original "ram_size / 4", until that broke due to argument parsing

> >> ordering.

> >

> > Hmm, 600e17b261555c56a048781b8dd5ba3985650013 says it was 32 MB as the

> > default in its commit message, which seems to match the code doing

> >

> >  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)

> 

> You need to look earlier in the sequence (see the tag pull-tcg-20200228):

> 

>   47a2def4533a2807e48954abd50b32ecb1aaf29a

> 

> so when the argument ordering broke the guest ram_size heuristic we

> started getting reports of performance regressions because we fell back

> to that size. Before then it was always based on guest ram size within

> the min/max bounds set by those defines.


Ah I see. That's a shame, as something based on guest RAM size feels like
a much safer bet for a default heuristic than basing it on host RAM size.

I'd probably say that the original commit which changed the argument
processing is flawed, and could/should be fixed.

The problem that commit was trying to solve was to do validation of the
value passed to -m. In fixing that it also moving the parsing. The key
problem here is that we need to do parsing and validating at different
points in the startup procedure.  IOW, we need to split the logic, not
simply moving the CLI parsing to the place that makes validation work.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Alex Bennée July 23, 2020, 10:06 a.m. UTC | #8
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Jul 23, 2020 at 10:22:25AM +0100, Alex Bennée wrote:

>> 

>> Daniel P. Berrangé <berrange@redhat.com> writes:

>> 

>> > On Wed, Jul 22, 2020 at 12:02:59PM -0700, Richard Henderson wrote:

>> >> On 7/22/20 9:44 AM, Daniel P. Berrangé wrote:

>> >> > OpenStack uses TCG in alot of their CI infrastructure for example

>> >> > and runs multiple VMs. If there's 4 VMs, that's another 4 GB of

>> >> > RAM usage just silently added on top of the explicit -m value.

>> >> > 

>> >> > I wouldn't be surprised if this pushes CI into OOM, even without

>> >> > containers or cgroups being involved, as they have plenty of other

>> >> > services consuming RAM in the CI VMs.

>> >> 

>> >> I would hope that CI would also supply a -tb_size to go along with that -m

>> >> value.  Because we really can't guess on their behalf.

>> >

>> > I've never even seen mention of -tb_size argument before myself, nor

>> > seen anyone else using it and libvirt doesn't set it, so I think

>> > this is not a valid assumption.

>> >

>> >

>> >> > The commit 600e17b261555c56a048781b8dd5ba3985650013 talks about this

>> >> > minimizing codegen cache flushes, but doesn't mention the real world

>> >> > performance impact of eliminating those flushes ?

>> >> 

>> >> Somewhere on the mailing list was this info.  It was so dreadfully slow it was

>> >> *really* noticable.  Timeouts everywhere.

>> >> 

>> >> > Presumably this makes the guest OS boot faster, but what's the before

>> >> > and after time ?  And what's the time like for values in between the

>> >> > original 32mb and the new 1 GB ?

>> >> 

>> >> But it wasn't "the original 32MB".

>> >> It was the original "ram_size / 4", until that broke due to argument parsing

>> >> ordering.

>> >

>> > Hmm, 600e17b261555c56a048781b8dd5ba3985650013 says it was 32 MB as the

>> > default in its commit message, which seems to match the code doing

>> >

>> >  #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)

>> 

>> You need to look earlier in the sequence (see the tag pull-tcg-20200228):

>> 

>>   47a2def4533a2807e48954abd50b32ecb1aaf29a

>> 

>> so when the argument ordering broke the guest ram_size heuristic we

>> started getting reports of performance regressions because we fell back

>> to that size. Before then it was always based on guest ram size within

>> the min/max bounds set by those defines.

>

> Ah I see. That's a shame, as something based on guest RAM size feels like

> a much safer bet for a default heuristic than basing it on host RAM

> size.


It was a poor heuristic because the amount of code generation space you
need really depends on the amount of code being executed and that is
more determined by workload than RAM size. You may have 4gb of RAM
running a single program with a large block cache or 128Mb of RAM but
constantly swapping code from a block store which triggers a
re-translation every time.

Also as the translation cache is mmap'ed it doesn't all have to get
used. Having spare cache isn't too wasteful.

> I'd probably say that the original commit which changed the argument

> processing is flawed, and could/should be fixed.


I'd say not - we are not trying to replace/fix the original heuristic
but introduce a new one to finesse behaviour in relatively resource
constrained machines. Nothing we do can cope with all the potential
range of invocations of QEMU people might do. For that the user will
have to look at workload and tweak the tb-size control. The default was
chosen to make the "common" case of running a single guest on a users
desktop work at a reasonable performance level. You'll see we make that
distinction in the comments between system emulation and for example
linux-user where it's much more reasonable to expect multiple QEMU
invocations.

> The problem that commit was trying to solve was to do validation of the

> value passed to -m. In fixing that it also moving the parsing. The key

> problem here is that we need to do parsing and validating at different

> points in the startup procedure.  IOW, we need to split the logic, not

> simply moving the CLI parsing to the place that makes validation work.

>

> Regards,

> Daniel



-- 
Alex Bennée
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 2afa46bd2b1..3fe40ec1710 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -976,7 +976,12 @@  static inline size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
-        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
+        size_t phys_mem = qemu_get_host_physmem();
+        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {
+            tb_size = phys_mem / 8;
+        } else {
+            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
+        }
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;