From patchwork Wed Feb 1 18:34:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 93082 Delivered-To: patch@linaro.org Received: by 10.140.20.99 with SMTP id 90csp2551049qgi; Wed, 1 Feb 2017 10:35:14 -0800 (PST) X-Received: by 10.84.210.35 with SMTP id z32mr6504894plh.112.1485974113999; Wed, 01 Feb 2017 10:35:13 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id h71si19859189pfe.48.2017.02.01.10.35.13 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Feb 2017 10:35:13 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-return-77128-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org; spf=pass (google.com: domain of libc-alpha-return-77128-patch=linaro.org@sourceware.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=libc-alpha-return-77128-patch=linaro.org@sourceware.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=jfhE80TK5WMAD6xCUnfGbdxQWo2KeVY U1+Efd3u2mQXDhjOa/oFhq4mmn23O6U3vKWXiZjpreOF5BlJbi8Jglbw9KNR8Q9c ljfABhL82t8Mw8aFFZVKtR/aNMjSvHrDmha24TEl5l3i1Q/35Sc6IAYOKCCGi8zp Mj9xRCNn8gYA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:message-id:in-reply-to :references; s=default; bh=Oq+aC3AZ7m2rvvYt1L3nviWiZSg=; b=hTtfc HyevUSnUQ9rl2dPIn+w4A+J40A3dQAp31QQGPTjy8QMCbwkX0oCracc7byQwoBff N3sfTi2fv/W4yeFQ4ZZg6bHWbpHPs8XsSAjKcfrbibJp8XV0b/WxN1YYUj48lKVA A73cU4DHlDpCJRmcNWtwFG27dx5uhdAWxz2Cvc= Received: (qmail 11879 invoked by alias); 1 Feb 2017 18:35:00 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 11613 invoked by uid 89); 1 Feb 2017 18:34:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-HELO: mail-qt0-f177.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=OT5HYakauPOzsxhRMuodvXBUqAH9LEb4B1l4YOXwtbE=; b=pz77v5rNRJlr58ybnUYNY9CyzzAifB98phz3hFLOYZaRftR8nvQC1m7joMAIy+pT7k suooisV23m6TKzx3wPaTWWC1ZhdvJfeVKYeI8nhq2twgPTq75n3CI3MtMvFRiOl8pitu HBKxlJIX/OYUyzpO6JElFOdTB7gVwXGJqF7DURPVgRQaGcZJWVuivMqBD58xLy9HHVlC D9UUzyxGVadf1UTQytpaGoO01637avyoRD9l2lk7FUO2pt5Nt6Rmv2xgkFevG2N+XYpE tsDhx0V0N69In481+qas5YxRpS3t79DusNJSDxKU4OHiYISKYAVbUibo8D6da965KP8G 7LhQ== X-Gm-Message-State: AIkVDXLCTGeVir9JVghQQmOhLcEX3ENv8cAB5WizDd52cc037vkIBSxeLdw+3Mul5K5rSWnz X-Received: by 10.237.62.9 with SMTP id l9mr4349006qtf.198.1485974088069; Wed, 01 Feb 2017 10:34:48 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Subject: [PATCH 2/2] nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988) Date: Wed, 1 Feb 2017 16:34:38 -0200 Message-Id: <1485974078-12152-2-git-send-email-adhemerval.zanella@linaro.org> In-Reply-To: <1485974078-12152-1-git-send-email-adhemerval.zanella@linaro.org> References: <1485974078-12152-1-git-send-email-adhemerval.zanella@linaro.org> Current allocate_stack logic for create stacks is to first mmap all the required memory with the desirable memory and then mprotect the guard area with PROT_NONE if required. Although it works as expected, it pessimizes the allocation because it requires the kernel to actually increase commit charge (it counts against the available physical/swap memory available for the system). The only issue is to actually check this change since side-effects are really Linux specific and to actually account them it would require a kernel specific tests to parse the system wide information. On the kernel I checked /proc/self/statm does not show any meaningful difference for vmm and/or rss before and after thread creation. I could only see really meaningful information checking on system wide /proc/meminfo between thread creation: MemFree, MemAvailable, and Committed_AS shows large difference without the patch. I think trying to use these kind of information on a testcase is fragile. The BZ#18988 reports shows that the commit pages are easily seen with mlockall (MCL_FUTURE) (with lock all pages that become mapped in the process) however a more straighfoward testcase shows that pthread_create could be faster using this patch: -- static const int inner_count = 256; static const int outer_count = 128; static void *thread1(void *arg) { return NULL; } static void *sleeper(void *arg) { pthread_t ts[inner_count]; for (int i = 0; i < inner_count; i++) pthread_create (&ts[i], &a, thread1, NULL); for (int i = 0; i < inner_count; i++) pthread_join (ts[i], NULL); return NULL; } int main(void) { pthread_attr_init(&a); pthread_attr_setguardsize(&a, 1<<20); pthread_attr_setstacksize(&a, 1134592); pthread_t ts[outer_count]; for (int i = 0; i < outer_count; i++) pthread_create(&ts[i], &a, sleeper, NULL); for (int i = 0; i < outer_count; i++) pthread_join(ts[i], NULL); assert(r == 0); } return 0; } -- On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests I see: $ time ./test real 0m3.647s user 0m0.080s sys 0m11.836s While with the patch I see: $ time ./test real 0m0.696s user 0m0.040s sys 0m1.152s So I added a pthread_create benchtest (thread_create) which check the thread creation latency. As for the simple benchtests, I saw improvements in thread creation on all architectures I tested the change. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf, and powerpc64le-linux-gnu. [BZ #18988] * benchtests/thread_create-inputs: New file. * benchtests/thread_create-source.c: Likewise. * support/xpthread_attr_setguardsize.c: Likewise. * support/Makefile (libsupport-routines): Add xpthread_attr_setguardsize object. * support/xthread.h: Add xpthread_attr_setguardsize prototype. * benchtests/Makefile (bench-pthread): Add thread_create. * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and then mprotect the required area. --- ChangeLog | 11 +++++++ benchtests/Makefile | 2 +- benchtests/thread_create-inputs | 14 +++++++++ benchtests/thread_create-source.c | 58 ++++++++++++++++++++++++++++++++++++ nptl/allocatestack.c | 35 +++++++++++++++++++++- support/Makefile | 1 + support/xpthread_attr_setguardsize.c | 26 ++++++++++++++++ support/xthread.h | 2 ++ 8 files changed, 147 insertions(+), 2 deletions(-) create mode 100644 benchtests/thread_create-inputs create mode 100644 benchtests/thread_create-source.c create mode 100644 support/xpthread_attr_setguardsize.c -- 2.7.4 diff --git a/benchtests/Makefile b/benchtests/Makefile index 81edf8a..6535373 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -25,7 +25,7 @@ bench-math := acos acosh asin asinh atan atanh cos cosh exp exp2 log log2 \ modf pow rint sin sincos sinh sqrt tan tanh fmin fmax fminf \ fmaxf -bench-pthread := pthread_once +bench-pthread := pthread_once thread_create bench-string := ffs ffsll diff --git a/benchtests/thread_create-inputs b/benchtests/thread_create-inputs new file mode 100644 index 0000000..e3ca03b --- /dev/null +++ b/benchtests/thread_create-inputs @@ -0,0 +1,14 @@ +## args: int:size_t:size_t +## init: thread_create_init +## includes: pthread.h +## include-sources: thread_create-source.c + +## name: stack=1024,guard=1 +32, 1024, 1 +## name: stack=1024,guard=2 +32, 1024, 2 + +## name: stack=2048,guard=1 +32, 2048, 1 +## name: stack=2048,guard=2 +32, 2048, 2 diff --git a/benchtests/thread_create-source.c b/benchtests/thread_create-source.c new file mode 100644 index 0000000..74e7777 --- /dev/null +++ b/benchtests/thread_create-source.c @@ -0,0 +1,58 @@ +/* Measure pthread_create thread creation with different stack + and guard sizes. + + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +static size_t pgsize; + +static void +thread_create_init (void) +{ + pgsize = sysconf (_SC_PAGESIZE); +} + +static void * +thread_dummy (void *arg) +{ + return NULL; +} + +static void +thread_create (int nthreads, size_t stacksize, size_t guardsize) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + stacksize = stacksize * pgsize; + guardsize = guardsize * pgsize; + + xpthread_attr_setstacksize (&attr, stacksize); + xpthread_attr_setguardsize (&attr, guardsize); + + pthread_t ts[nthreads]; + + for (int i = 0; i < nthreads; i++) + ts[i] = xpthread_create (&attr, thread_dummy, NULL); + + for (int i = 0; i < nthreads; i++) + xpthread_join (ts[i]); +} diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index e52c698..25c5698 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -490,7 +490,13 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, size += pagesize_m1 + 1; #endif - mem = mmap (NULL, size, prot, + /* If a guard page is required, avoid committing memory by first + allocate with PROT_NONE and then reserve with required permission + excluding the guard page. */ + int prot_mmap = PROT_NONE; + if (__glibc_unlikely (guardsize == 0)) + prot_mmap = prot; + mem = mmap (NULL, size, prot_mmap, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (__glibc_unlikely (mem == MAP_FAILED)) @@ -510,9 +516,36 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, - TLS_PRE_TCB_SIZE); #endif + /* Now mprotect the required region excluding the guard area. */ + if (__glibc_likely (prot_mmap == PROT_NONE)) + { + char *mprotstart = mem; + size_t mprotsize = size; +#ifdef NEED_SEPARATE_REGISTER_STACK + mprotstart += (((size - guardsize) / 2) & ~pagesize_m1) + + guardsize; + mprotsize -= mprotstart - (char*) mem; +#elif _STACK_GROWS_DOWN + mprotstart += guardsize; + mprotsize -= guardsize; +#elif _STACK_GROWS_UP + char *guard = (char *) (((uintptr_t) pd - guardsize) + & ~pagesize_m1); + mprotsize -= guard - mprotstart; +#endif + if (mprotect (mprotstart, mprotsize, prot) != 0) + { + munmap (mem, size); + return errno; + } + } + /* Remember the stack-related values. */ pd->stackblock = mem; pd->stackblock_size = size; + /* Update guardsize for newly allocated guardsize to avoid + an mprotect in guard resize below. */ + pd->guardsize = guardsize; /* We allocated the first block thread-specific data array. This address will not change for the lifetime of this diff --git a/support/Makefile b/support/Makefile index 2ace559..c0a443f 100644 --- a/support/Makefile +++ b/support/Makefile @@ -68,6 +68,7 @@ libsupport-routines = \ xpthread_attr_init \ xpthread_attr_setdetachstate \ xpthread_attr_setstacksize \ + xpthread_attr_setguardsize \ xpthread_barrier_destroy \ xpthread_barrier_init \ xpthread_barrier_wait \ diff --git a/support/xpthread_attr_setguardsize.c b/support/xpthread_attr_setguardsize.c new file mode 100644 index 0000000..35fed5d --- /dev/null +++ b/support/xpthread_attr_setguardsize.c @@ -0,0 +1,26 @@ +/* pthread_attr_setguardsize with error checking. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +void +xpthread_attr_setguardsize (pthread_attr_t *attr, size_t guardsize) +{ + xpthread_check_return ("pthread_attr_setguardize", + pthread_attr_setguardsize (attr, guardsize)); +} diff --git a/support/xthread.h b/support/xthread.h index 6dd7e70..3552a73 100644 --- a/support/xthread.h +++ b/support/xthread.h @@ -67,6 +67,8 @@ void xpthread_attr_setdetachstate (pthread_attr_t *attr, int detachstate); void xpthread_attr_setstacksize (pthread_attr_t *attr, size_t stacksize); +void xpthread_attr_setguardsize (pthread_attr_t *attr, + size_t guardsize); /* This function returns non-zero if pthread_barrier_wait returned PTHREAD_BARRIER_SERIAL_THREAD. */