diff mbox

Performance regression using KVM/ARM

Message ID 57194CEF.3040202@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek April 21, 2016, 9:58 p.m. UTC
On 04/21/16 18:23, Christoffer Dall wrote:
> Hi,

> 

> Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,

> 2015-09-10) had the unfortunate side effect that memory slots registered

> with KVM no longer contain a userspace address that is aligned to a 2M

> boundary, causing the use of THP to fail in the kernel.

> 

> I fail to see where in the QEMU code we should be asking for a 2M

> alignment of our memory region.  Can someone help pointing me to the

> right place to fix this or suggest a patch?

> 

> This causes a performance regssion of hackbench on KVM/ARM of about 62%

> compared to the workload running with THP.

> 

> We have verified that this is indeed the cause of the failure by adding

> various prints to QEMU and the kernel, but unfortunatley my QEMU

> knowledge is not sufficient for me to fix it myself.

> 

> Any help would be much appreciated!


Can you please test the attached series?

(Note that I'm only interested in solving this problem as a productive
distraction, so if the patches don't work, or require a lot of massaging
for merging, I'll just drop them (or, preferably, give them to someone
else).)

Thanks
Laszlo

Comments

Christoffer Dall April 22, 2016, 10:02 a.m. UTC | #1
Hi Laszlo,

On Thu, Apr 21, 2016 at 11:58:07PM +0200, Laszlo Ersek wrote:
> On 04/21/16 18:23, Christoffer Dall wrote:

> > Hi,

> > 

> > Commit 9fac18f (oslib: allocate PROT_NONE pages on top of RAM,

> > 2015-09-10) had the unfortunate side effect that memory slots registered

> > with KVM no longer contain a userspace address that is aligned to a 2M

> > boundary, causing the use of THP to fail in the kernel.

> > 

> > I fail to see where in the QEMU code we should be asking for a 2M

> > alignment of our memory region.  Can someone help pointing me to the

> > right place to fix this or suggest a patch?

> > 

> > This causes a performance regssion of hackbench on KVM/ARM of about 62%

> > compared to the workload running with THP.

> > 

> > We have verified that this is indeed the cause of the failure by adding

> > various prints to QEMU and the kernel, but unfortunatley my QEMU

> > knowledge is not sufficient for me to fix it myself.

> > 

> > Any help would be much appreciated!

> 

> Can you please test the attached series?

> 

> (Note that I'm only interested in solving this problem as a productive

> distraction, so if the patches don't work, or require a lot of massaging

> for merging, I'll just drop them (or, preferably, give them to someone

> else).)

> 


I like your procrastination methods!

Unfortunately this fix wasn't the right one either.

-Christoffer
diff mbox

Patch

From 8e7cd9425417189f5fc894039a8af956ca2e19dd Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <lersek@redhat.com>
Date: Thu, 21 Apr 2016 22:19:16 +0200
Subject: [PATCH 3/3] util/mmap-alloc: preserve size alignment with guard pages
 on ARM

Commit 9fac18f03a90 ("oslib: allocate PROT_NONE pages on top of RAM")
introduced a guard page after the user-requested area.

(Commit 794e8f301a17 ("exec: factor out duplicate mmap code") factored out
this logic, preserving the behavior of 9fac18f03a90.)

Christoffer reports that 9fac18f03a90 renders the KVM/ARM performance
optimization added in 2e07b297e0b4 ("oslib-posix: Align to permit
transparent hugepages on ARM Linux") ineffective, because the single guard
page makes the size of the region unaligned, preventing the application of
THP.

Restore 2e07b297e0b4 to working state by aligning the full area size --
consisting of user requested and guard pages -- on ARM.

Ref: http://thread.gmane.org/gmane.comp.emulators.qemu/407833
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 util/mmap-alloc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index 41e36f74d7be..153a586cec63 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -41,7 +41,11 @@  size_t qemu_fd_getpagesize(int fd)
 
 static size_t size_with_guard_pages(size_t size, size_t align)
 {
+#if defined(__arm__)
+    return QEMU_ALIGN_UP(size + getpagesize(), align);
+#else
     return size + getpagesize();
+#endif
 }
 
 void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
-- 
1.8.3.1