diff mbox series

[v1,5/5] accel/tcg: better handle memory constrained systems

Message ID 20200717105139.25293-6-alex.bennee@linaro.org
State Superseded
Headers show
Series candidate fixes for 5.1-rc1 (shippable, semihosting, OOM tcg) | expand

Commit Message

Alex Bennée July 17, 2020, 10:51 a.m. UTC
It turns out there are some 64 bit systems that have relatively low
amounts of physical memory available to them (typically CI system).
Even with swapping available a 1GB translation buffer that fills up
can put the machine under increased memory pressure. Detect these low
memory situations and reduce tb_size appropriately.

Fixes: 600e17b261
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 accel/tcg/translate-all.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.20.1

Comments

Christian Ehrhardt July 17, 2020, 2:23 p.m. UTC | #1
On Fri, Jul 17, 2020 at 12:51 PM Alex Bennée <alex.bennee@linaro.org> wrote:

> It turns out there are some 64 bit systems that have relatively low

> amounts of physical memory available to them (typically CI system).

> Even with swapping available a 1GB translation buffer that fills up

> can put the machine under increased memory pressure. Detect these low

> memory situations and reduce tb_size appropriately.

>

> Fixes: 600e17b261

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Cc: BALATON Zoltan <balaton@eik.bme.hu>

> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

> ---

>  accel/tcg/translate-all.c | 7 ++++++-

>  1 file changed, 6 insertions(+), 1 deletion(-)

>

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index 2afa46bd2b1..2ff0ba6d19b 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -976,7 +976,12 @@ static inline size_t size_code_gen_buffer(size_t

> tb_size)

>  {

>      /* Size the buffer.  */

>      if (tb_size == 0) {

> -        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> +        size_t phys_mem = qemu_get_host_physmem();

> +        if (phys_mem > 0 && phys_mem < (2 *

> DEFAULT_CODE_GEN_BUFFER_SIZE)) {

> +            tb_size = phys_mem / 4;

>


In my experiments I've found that /8 more closely matches the former
behavior
on small hosts while at the same time not affecting common large hosts.


> +        } else {

> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> +        }

>      }

>      if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {

>          tb_size = MIN_CODE_GEN_BUFFER_SIZE;

> --

> 2.20.1

>

>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jul 17, 2020 at 12:51 PM Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org">alex.bennee@linaro.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It turns out there are some 64 bit systems that have relatively low<br>
amounts of physical memory available to them (typically CI system).<br>
Even with swapping available a 1GB translation buffer that fills up<br>
can put the machine under increased memory pressure. Detect these low<br>
memory situations and reduce tb_size appropriately.<br>
<br>
Fixes: 600e17b261<br>
Signed-off-by: Alex Bennée &lt;<a href="mailto:alex.bennee@linaro.org" target="_blank">alex.bennee@linaro.org</a>&gt;<br>

Cc: BALATON Zoltan &lt;<a href="mailto:balaton@eik.bme.hu" target="_blank">balaton@eik.bme.hu</a>&gt;<br>
Cc: Christian Ehrhardt &lt;<a href="mailto:christian.ehrhardt@canonical.com" target="_blank">christian.ehrhardt@canonical.com</a>&gt;<br>
---<br>
 accel/tcg/translate-all.c | 7 ++++++-<br>
 1 file changed, 6 insertions(+), 1 deletion(-)<br>
<br>
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c<br>
index 2afa46bd2b1..2ff0ba6d19b 100644<br>
--- a/accel/tcg/translate-all.c<br>
+++ b/accel/tcg/translate-all.c<br>
@@ -976,7 +976,12 @@ static inline size_t size_code_gen_buffer(size_t tb_size)<br>
 {<br>
     /* Size the buffer.  */<br>
     if (tb_size == 0) {<br>
-        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;<br>
+        size_t phys_mem = qemu_get_host_physmem();<br>
+        if (phys_mem &gt; 0 &amp;&amp; phys_mem &lt; (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {<br>
+            tb_size = phys_mem / 4;<br></blockquote><div><br></div><div>In my experiments I&#39;ve found that /8 more closely matches the former behavior</div><div>on small hosts while at the same time not affecting common large hosts.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+        } else {<br>
+            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;<br>
+        }<br>
     }<br>
     if (tb_size &lt; MIN_CODE_GEN_BUFFER_SIZE) {<br>
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;<br>
-- <br>
2.20.1<br>
<br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature">Christian Ehrhardt<br>Staff Engineer, Ubuntu Server<br>Canonical Ltd</div></div>
Daniel P. Berrangé July 17, 2020, 2:39 p.m. UTC | #2
On Fri, Jul 17, 2020 at 11:51:39AM +0100, Alex Bennée wrote:
> It turns out there are some 64 bit systems that have relatively low

> amounts of physical memory available to them (typically CI system).

> Even with swapping available a 1GB translation buffer that fills up

> can put the machine under increased memory pressure. Detect these low

> memory situations and reduce tb_size appropriately.

> 

> Fixes: 600e17b261

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Cc: BALATON Zoltan <balaton@eik.bme.hu>

> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

> ---

>  accel/tcg/translate-all.c | 7 ++++++-

>  1 file changed, 6 insertions(+), 1 deletion(-)

> 

> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> index 2afa46bd2b1..2ff0ba6d19b 100644

> --- a/accel/tcg/translate-all.c

> +++ b/accel/tcg/translate-all.c

> @@ -976,7 +976,12 @@ static inline size_t size_code_gen_buffer(size_t tb_size)

>  {

>      /* Size the buffer.  */

>      if (tb_size == 0) {

> -        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> +        size_t phys_mem = qemu_get_host_physmem();

> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

> +            tb_size = phys_mem / 4;

> +        } else {

> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> +        }


I'm not convinced this is going to work when running QEMU inside a
container environment with RAM cap, because the physmem level is
completely unrelated to the RAM the container is permitted to actually
use in practice. ie host has 32 GB of RAM, but the container QEMU is
in only has 1 GB permitted.

I don't have much of a better suggestion, as I don't think we want
to get into reading the cgroups memory limits. It does feel like the
assumption we can blindly use a 1GB cache though is invalid even
with this patch applied.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Alex Bennée July 17, 2020, 2:55 p.m. UTC | #3
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Jul 17, 2020 at 11:51:39AM +0100, Alex Bennée wrote:

>> It turns out there are some 64 bit systems that have relatively low

>> amounts of physical memory available to them (typically CI system).

>> Even with swapping available a 1GB translation buffer that fills up

>> can put the machine under increased memory pressure. Detect these low

>> memory situations and reduce tb_size appropriately.

>> 

>> Fixes: 600e17b261

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> Cc: BALATON Zoltan <balaton@eik.bme.hu>

>> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

>> ---

>>  accel/tcg/translate-all.c | 7 ++++++-

>>  1 file changed, 6 insertions(+), 1 deletion(-)

>> 

>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

>> index 2afa46bd2b1..2ff0ba6d19b 100644

>> --- a/accel/tcg/translate-all.c

>> +++ b/accel/tcg/translate-all.c

>> @@ -976,7 +976,12 @@ static inline size_t size_code_gen_buffer(size_t tb_size)

>>  {

>>      /* Size the buffer.  */

>>      if (tb_size == 0) {

>> -        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

>> +        size_t phys_mem = qemu_get_host_physmem();

>> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

>> +            tb_size = phys_mem / 4;

>> +        } else {

>> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

>> +        }

>

> I'm not convinced this is going to work when running QEMU inside a

> container environment with RAM cap, because the physmem level is

> completely unrelated to the RAM the container is permitted to actually

> use in practice. ie host has 32 GB of RAM, but the container QEMU is

> in only has 1 GB permitted.


What will happen when the mmap happens? Will a capped container limit
the attempted mmap? I would hope the container case at least gave
different feedback than a "silent" OOM.

> I don't have much of a better suggestion, as I don't think we want

> to get into reading the cgroups memory limits. It does feel like the

> assumption we can blindly use a 1GB cache though is invalid even

> with this patch applied.

>

> Regards,

> Daniel



-- 
Alex Bennée
Daniel P. Berrangé July 17, 2020, 3 p.m. UTC | #4
On Fri, Jul 17, 2020 at 03:55:15PM +0100, Alex Bennée wrote:
> 

> Daniel P. Berrangé <berrange@redhat.com> writes:

> 

> > On Fri, Jul 17, 2020 at 11:51:39AM +0100, Alex Bennée wrote:

> >> It turns out there are some 64 bit systems that have relatively low

> >> amounts of physical memory available to them (typically CI system).

> >> Even with swapping available a 1GB translation buffer that fills up

> >> can put the machine under increased memory pressure. Detect these low

> >> memory situations and reduce tb_size appropriately.

> >> 

> >> Fixes: 600e17b261

> >> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> >> Cc: BALATON Zoltan <balaton@eik.bme.hu>

> >> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

> >> ---

> >>  accel/tcg/translate-all.c | 7 ++++++-

> >>  1 file changed, 6 insertions(+), 1 deletion(-)

> >> 

> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c

> >> index 2afa46bd2b1..2ff0ba6d19b 100644

> >> --- a/accel/tcg/translate-all.c

> >> +++ b/accel/tcg/translate-all.c

> >> @@ -976,7 +976,12 @@ static inline size_t size_code_gen_buffer(size_t tb_size)

> >>  {

> >>      /* Size the buffer.  */

> >>      if (tb_size == 0) {

> >> -        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> >> +        size_t phys_mem = qemu_get_host_physmem();

> >> +        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {

> >> +            tb_size = phys_mem / 4;

> >> +        } else {

> >> +            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;

> >> +        }

> >

> > I'm not convinced this is going to work when running QEMU inside a

> > container environment with RAM cap, because the physmem level is

> > completely unrelated to the RAM the container is permitted to actually

> > use in practice. ie host has 32 GB of RAM, but the container QEMU is

> > in only has 1 GB permitted.

> 

> What will happen when the mmap happens? Will a capped container limit

> the attempted mmap? I would hope the container case at least gave

> different feedback than a "silent" OOM.


IIRC it should trigger the OOM killer on process(s) within the container.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 2afa46bd2b1..2ff0ba6d19b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -976,7 +976,12 @@  static inline size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
-        tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
+        size_t phys_mem = qemu_get_host_physmem();
+        if (phys_mem > 0 && phys_mem < (2 * DEFAULT_CODE_GEN_BUFFER_SIZE)) {
+            tb_size = phys_mem / 4;
+        } else {
+            tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
+        }
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;