From patchwork Thu Jan 16 16:54:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 857927 Delivered-To: patch@linaro.org Received: by 2002:a05:6000:cc8:b0:385:e875:8a9e with SMTP id dq8csp301310wrb; Thu, 16 Jan 2025 08:56:30 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU9eygqYp7GI0woYkLZAfphlGs4a9WZmyXyr4zCpOq2UEzvmhGOcOF21KU7DDWDgq5/G+xv5Q==@linaro.org X-Google-Smtp-Source: AGHT+IEbXu8FxNjUTGm2HDxk7fLaBmA8pXXo2f3TxqrUdD7967WfyGcAUmItcjnPVMukG8AHaa2c X-Received: by 2002:a05:6214:4015:b0:6df:97c6:ccc0 with SMTP id 6a1803df08f44-6df9b22de6bmr589901296d6.28.1737046590216; Thu, 16 Jan 2025 08:56:30 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1737046590; cv=pass; d=google.com; s=arc-20240605; b=DDd3iDksQJbO8grcj9E4JMtA6mKaHN1BnHsGCHmw5pcRcoln7aKDATUQxQcCW7cL75 Fpcca7HArwLh24VoiFq8tIIu9R2AEKEvagfJtHXWfuL3lr7XEi1Kb627FkfE5kvhq1SQ HWkUf7qgE0ZlNjD3EH9azTF32RL9BRsOKPbmdyZG/feQ3uW385Agy6EYlNzgHQPa6AsP F6Og4m+r/i8k9VZ6e9ubIlum1ZwLFxu3RdYmIdeojCcMEpbU/TgFvAk1NlgN+va7sCvC mHaGrXcdCADuyp1Hr113i3ScZnC+ke7CDvaaW3kwfhL1rdhH4R16WjXCpjSckmmOIVEk 881Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dkim-filter:arc-filter:dmarc-filter:delivered-to:dkim-filter; bh=+R6HtrHa5R2+PjtIWOZaUsxz0RdwR/HmbtdemvpxRpg=; fh=kxe/wfJybIMwOF5l32zk6CyzYVmV/lEsdubStKtynhc=; b=c+wZc8VG3bsK+6Adyl5zIXgD7dJeKJeB3Q1Sdrm/x6YiAp+AmsFzu4aqZaNP8f4crS obVFn2vfSfMk14BlaDlpLfj8BdwklS3fbJJwOYkPCaSkCH8ov5EYKgPKgE1depBkgwvt 1CgJ8V93It+1Mm5Xmk737FFy/6hrOpxeKzALsqUZR80wxejP4ozxrIubgpYROOTTTCKT Hj1S5f4nHJNg8yX8FcsWHAp6IZLB0OVBtQFHAWUQrS+fbn6cBZ9d7wzqawsnYECMwCp4 WCzfKhfBdEEjkToCUAS0wCv2SvstNLXIrVLIJJxPSAVYPdRoo8goD9gEALRXeTADrj9d Oodg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cy9PvbeD; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id 6a1803df08f44-6e1afcd960csi3225906d6.195.2025.01.16.08.56.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jan 2025 08:56:30 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=cy9PvbeD; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB634384DEF7 for ; Thu, 16 Jan 2025 16:56:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB634384DEF7 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=cy9PvbeD X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id BED49384DEF3 for ; Thu, 16 Jan 2025 16:56:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BED49384DEF3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BED49384DEF3 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::22a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737046564; cv=none; b=pAUwkN7IQilr6AdlTDTIHbqaiHeGBKpnCEX1F788tSISnj5HsFZ7ogBTVgu3P2V4WATU3hmXlC6nn81yQthfJyob35bhxl7QeHg/RwBoFiIV/hV0x1R6u8eu5UkKWvaADJWUj0yeLYzuzsXrwBRhR/IbluCY8JeDxp/P6QKzMMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737046564; c=relaxed/simple; bh=bY123kvS/SXe8C8lvwckAZHJ05AhHNebk5gg09CtvGk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=mXQ6P2PpC4wn0NUEe9NwP3I3i78eJ6crhaDSJwLhfmm+pfnMu0DJ/QhlsPddAAW3bAQLuX/cxbAYW+B/QmGlPOq9zZehSqvpYf01lkndQdlzVXzuL73cOv1CrLq0fDxnfp04TweYx8nux+xKXrWHkRbJ1kvkCPTErj47Vh7LqH0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BED49384DEF3 Received: by mail-oi1-x22a.google.com with SMTP id 5614622812f47-3eb6b16f1a0so411690b6e.3 for ; Thu, 16 Jan 2025 08:56:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1737046564; x=1737651364; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+R6HtrHa5R2+PjtIWOZaUsxz0RdwR/HmbtdemvpxRpg=; b=cy9PvbeDbyklBiUAQZTW+kjaDLrRoJFWgajZMYxEP97b7+cDHrBiKsgD+KO3yqcELZ PplyP12D4BUWnGAKwnjotB/DlY0fqo/vDc1u0wls6w2QnSD0z1hZbxWn4dOVK3NkCSAu f4QAe+kYdsobuQ15AyKopyUx2D5Xc+WyvN7HfbMjDvx/Neuz16rB0kAtJoDJFD4+s8GO /cmam+KNC+Fra1cjmY5wXiT7DgKGwZ5MMJZPhshWLoVhNPbvErnwv7eSsUljKDa1Dchz +qI5zO5Ebc4O8QsbiLcoZtdG8Bp2Y8xO0t02b0W1vXY5nmOUcoOajv0dfPG/S47JkG/W Imxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737046564; x=1737651364; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+R6HtrHa5R2+PjtIWOZaUsxz0RdwR/HmbtdemvpxRpg=; b=BbKfVoKvFte3pz4toPfIZuJ4Jvnjybtx4GJhw9YSkJ+mpn66/pSjneNh3FW9KkLRZb wbkeXKsrR3JkBtVprxwAd1OSWSjOOEyj5ZYE5mVNpTpdACn4tauc/ExQVd8ZkpJSeEX3 x6SLANFZnrUkEFjEcqy1IhszdoH1tWOsbe0TUsHgVCWzE2kJfVTxEJQpu+zmkFLt8f+u CofWSLM1wQq9lIh+VpLGXy7AJYbI4cjgmSzc8oCUIsdELJVwM0tfgnJJrUOics5qG4eC CwEgtN+WJWlH93iBuW67kQMiXQ7rDWczPum7cjDh+iQDjGcI9bMPN/iDe68YNleqnd9i XwyA== X-Gm-Message-State: AOJu0YxiXVqBMKDMklo28WeF/XEq77o8eRPwPbAxlEGCMIcfMZWO90Na l3mbKyrBI1ZIyWXPf+zsMNIrg9LkYRMch5wjfxHESWD5ODa57NoGCatJYN2ha8JnKvu4QMkTWJ8 u X-Gm-Gg: ASbGncuwVZNLBNu+NlztQ4sx81WElYa8DuqS/EWbDWCVB5+QKkVE2R6nf4LZFHZtbAr bujwlZmuB8jTPmuf9TAuTJcPm2ylSQ10maT7wJKGg47gICnG8CxiFd5IB3LSmajXNAh/jrt8RVk Yde9fL7ipVNJwZrhIIWp4gx7neQREHnPReVpoFgKsDnLIUp4E5ujrwxRvX6O291tp/aw0X3jwpH iXtMNfWNzx201eUjTSrgu1vjUT2fdHmZY0Ly4Q0V9QYrOFPR/rjEA1sQ2fYJJUv5jb07g== X-Received: by 2002:a05:6808:228e:b0:3ea:5a0e:941c with SMTP id 5614622812f47-3ef2ec1704fmr22700198b6e.10.1737046563285; Thu, 16 Jan 2025 08:56:03 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c0:f41c:3bfd:b552:5a7a:db14]) by smtp.gmail.com with ESMTPSA id 5614622812f47-3f19da450e7sm109075b6e.4.2025.01.16.08.56.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jan 2025 08:56:02 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Lorenzo Stoakes , =?utf-8?q?Cristian_Rodr?= =?utf-8?q?=C3=ADguez?= Subject: [PATCH v2] nptl: Add support for setup guard pages with MADV_GUARD_INSTALL Date: Thu, 16 Jan 2025 13:54:59 -0300 Message-ID: <20250116165557.2289386-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org Linux 6.13 (662df3e5c3766) added a lightweight way to define guard areas through madvise syscall. Instead of PROT_NONE the guard region through mprotect, userland can madvise the same area with a special flag, and the kernel ensures that accessing the area will trigger a SIGSEGV (as for PROT_NONE mapping). The madvise way has the advantage of less kernel memory consumption for the process page-table (one less VMA per guard area), and slightly less contention on kernel (also due to the fewer VMA areas being tracked). The pthread_create allocates a new thread stack in two ways: if a guard area is set (the default) it allocates the memory range required using PROT_NONE and then mprotect the usable stack area. Otherwise, if a guard page is not set it allocates the region with the required flags. For the MADV_GUARD_INSTALL support, the stack area region is allocated with required flags and then the guard region is installed. If the kernel does not support it, the usual way is used instead (and MADV_GUARD_INSTALL is disabled for future stack creations). The stack allocation strategy is recorded on the pthread struct, and it is used in case the guard region needs to be resized. To avoid needing an extra field, the 'user_stack' is repurposed and renamed to 'stack_mode'. This patch also adds a proper test for the pthread guard. I checked on x86_64, aarch64, powerpc64le, and hppa with kernel 6.13.0-rc7. Changes from v1: * Fixed MADV_GUARD_INSTALL on _STACK_GROWS_UP ABIs. --- nptl/Makefile | 1 + nptl/TODO-testing | 4 - nptl/allocatestack.c | 263 ++++++++++----- nptl/descr.h | 8 +- nptl/nptl-stack.c | 2 +- nptl/pthread_create.c | 2 +- nptl/tst-guard1.c | 369 ++++++++++++++++++++++ sysdeps/nptl/dl-tls_init_tp.c | 2 +- sysdeps/nptl/fork.h | 2 +- sysdeps/unix/sysv/linux/bits/mman-linux.h | 2 + 10 files changed, 560 insertions(+), 95 deletions(-) create mode 100644 nptl/tst-guard1.c diff --git a/nptl/Makefile b/nptl/Makefile index b7c63999a3..b04e25cd0d 100644 --- a/nptl/Makefile +++ b/nptl/Makefile @@ -289,6 +289,7 @@ tests = \ tst-dlsym1 \ tst-exec4 \ tst-exec5 \ + tst-guard1 \ tst-initializers1 \ tst-initializers1-c11 \ tst-initializers1-c89 \ diff --git a/nptl/TODO-testing b/nptl/TODO-testing index f50d2ceb51..46ebf3bc5c 100644 --- a/nptl/TODO-testing +++ b/nptl/TODO-testing @@ -1,7 +1,3 @@ -pthread_attr_setguardsize - - test effectiveness - pthread_attr_[sg]etschedparam what to test? diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 9c1a72bcf0..e2c9ac8143 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -146,10 +146,37 @@ get_cached_stack (size_t *sizep, void **memp) return result; } +/* Assume support for MADV_ADVISE_GUARD, setup_stack_prot will disable it + and fallback to ALLOCATE_GUARD_PROT_NONE if the madvise call fails. */ +static int allocate_stack_mode = ALLOCATE_GUARD_MADV_GUARD; + +static inline int stack_prot (void) +{ + return (PROT_READ | PROT_WRITE + | ((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); +} + +static void * +allocate_thread_stack (size_t size, size_t guardsize) +{ + /* MADV_ADVISE_GUARD does not require an additional PROT_NONE mapping. */ + int prot = stack_prot (); + + if (atomic_load_relaxed (&allocate_stack_mode) == ALLOCATE_GUARD_PROT_NONE) + /* If a guard page is required, avoid committing memory by first allocate + with PROT_NONE and then reserve with required permission excluding the + guard page. */ + prot = guardsize == 0 ? prot : PROT_NONE; + + return __mmap (NULL, size, prot, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, + 0); +} + + /* Return the guard page position on allocated stack. */ static inline char * __attribute ((always_inline)) -guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, +guard_position (void *mem, size_t size, size_t guardsize, const struct pthread *pd, size_t pagesize_m1) { #if _STACK_GROWS_DOWN @@ -159,27 +186,131 @@ guard_position (void *mem, size_t size, size_t guardsize, struct pthread *pd, #endif } -/* Based on stack allocated with PROT_NONE, setup the required portions with - 'prot' flags based on the guard page position. */ -static inline int -setup_stack_prot (char *mem, size_t size, char *guard, size_t guardsize, - const int prot) +/* Setup the MEM thread stack of SIZE bytes with the required protection flags + along with a guard area of GUARDSIZE size. It first tries with + MADV_GUARD_INSTALL, and then fallback to setup the guard area using the + extra PROT_NONE mapping. Update PD with the type of guard area setup. */ +static inline bool +setup_stack_prot (char *mem, size_t size, struct pthread *pd, + size_t guardsize, size_t pagesize_m1) { - char *guardend = guard + guardsize; + if (__glibc_unlikely (guardsize == 0)) + return true; + + char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1); + if (atomic_load_relaxed (&allocate_stack_mode) == ALLOCATE_GUARD_MADV_GUARD) + { + if (__madvise (guard, guardsize, MADV_GUARD_INSTALL) == 0) + { + pd->stack_mode = ALLOCATE_GUARD_MADV_GUARD; + return true; + } + + /* If madvise fails it means the kernel does not support the guard + advise (we assume that the syscall is available, guard is page-aligned + and length is non negative). The stack has already the expected + protection flags, so it just need to PROT_NONE the guard area. */ + atomic_store_relaxed (&allocate_stack_mode, ALLOCATE_GUARD_PROT_NONE); + if (__mprotect (guard, guardsize, PROT_NONE) != 0) + return false; + } + else + { + const int prot = stack_prot (); + char *guardend = guard + guardsize; #if _STACK_GROWS_DOWN - /* As defined at guard_position, for architectures with downward stack - the guard page is always at start of the allocated area. */ - if (__mprotect (guardend, size - guardsize, prot) != 0) - return errno; + /* As defined at guard_position, for architectures with downward stack + the guard page is always at start of the allocated area. */ + if (__mprotect (guardend, size - guardsize, prot) != 0) + return false; #else - size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; - if (__mprotect (mem, mprots1, prot) != 0) - return errno; - size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; - if (__mprotect (guardend, mprots2, prot) != 0) - return errno; + size_t mprots1 = (uintptr_t) guard - (uintptr_t) mem; + if (__mprotect (mem, mprots1, prot) != 0) + return false; + size_t mprots2 = ((uintptr_t) mem + size) - (uintptr_t) guardend; + if (__mprotect (guardend, mprots2, prot) != 0) + return false; #endif - return 0; + } + + pd->stack_mode = ALLOCATE_GUARD_PROT_NONE; + return true; +} + +/* Update the guard area of the thread stack MEM of size SIZE with the new + GUARDISZE. It uses the method defined by PD stack_mode. */ +static inline bool +adjust_stack_prot (char *mem, size_t size, const struct pthread *pd, + size_t guardsize, size_t pagesize_m1) +{ + /* The required guard area is larger than the current one. For + _STACK_GROWS_DOWN it means the guard should increase as: + + |guard|stack---------------------------------| + |new guard--|stack---------------------------| + + while for _STACK_GROWS_UP: + + |stack---------------------------|guard|-----| + |stack--------------------|new guard---|-----| + + Both madvise and mprotect allows overlap the required region, + so use the new guard placement with the new size. */ + if (guardsize > pd->guardsize) + { + char *guard = guard_position (mem, size, guardsize, pd, pagesize_m1); + if (pd->stack_mode == ALLOCATE_GUARD_MADV_GUARD) + return __madvise (guard, guardsize, MADV_GUARD_INSTALL) == 0; + else if (pd->stack_mode == ALLOCATE_GUARD_PROT_NONE) + return __mprotect (guard, guardsize, PROT_NONE) == 0; + } + /* The current guard area is larger than the required one. For + _STACK_GROWS_DOWN is means change the guard as: + + |guard-------|stack-------------------------| + |new guard|stack----------------------------| + + And for _STACK_GROWS_UP: + + |stack---------------------|guard-------|---| + |stack------------------------|new guard|---| + + For ALLOCATE_GUARD_MADV_GUARD it means remove the slack area + (disjointed region of guard and new guard), while for + ALLOCATE_GUARD_PROT_NONE it requires to mprotect it with the stack + protection flags. */ + else if (pd->guardsize > guardsize) + { + size_t slacksize = pd->guardsize - guardsize; + if (pd->stack_mode == ALLOCATE_GUARD_MADV_GUARD) + { + void *slack = +#if _STACK_GROWS_DOWN + mem + guardsize; +#else + guard_position (mem, size, pd->guardsize, pd, pagesize_m1); +#endif + return __madvise (slack, slacksize, MADV_GUARD_REMOVE) == 0; + } + else if (pd->stack_mode == ALLOCATE_GUARD_PROT_NONE) + { + const int prot = stack_prot (); +#if _STACK_GROWS_DOWN + return __mprotect (mem + guardsize, slacksize, prot) == 0; +#else + char *new_guard = (char *)(((uintptr_t) pd - guardsize) + & ~pagesize_m1); + char *old_guard = (char *)(((uintptr_t) pd - pd->guardsize) + & ~pagesize_m1); + /* The guard size difference might be > 0, but once rounded + to the nearest page the size difference might be zero. */ + if (new_guard > old_guard + && __mprotect (old_guard, new_guard - old_guard, prot) != 0) + return false; +#endif + } + } + return true; } /* Mark the memory of the stack as usable to the kernel. It frees everything @@ -291,7 +422,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, /* This is a user-provided stack. It will not be queued in the stack cache nor will the memory (except the TLS memory) be freed. */ - pd->user_stack = true; + pd->stack_mode = ALLOCATE_GUARD_USER; /* This is at least the second thread. */ pd->header.multiple_threads = 1; @@ -325,10 +456,7 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, /* Allocate some anonymous memory. If possible use the cache. */ size_t guardsize; size_t reported_guardsize; - size_t reqsize; void *mem; - const int prot = (PROT_READ | PROT_WRITE - | ((GL(dl_stack_flags) & PF_X) ? PROT_EXEC : 0)); /* Adjust the stack size for alignment. */ size &= ~tls_static_align_m1; @@ -358,16 +486,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, return EINVAL; /* Try to get a stack from the cache. */ - reqsize = size; pd = get_cached_stack (&size, &mem); if (pd == NULL) { - /* If a guard page is required, avoid committing memory by first - allocate with PROT_NONE and then reserve with required permission - excluding the guard page. */ - mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, - MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); - + mem = allocate_thread_stack (size, guardsize); if (__glibc_unlikely (mem == MAP_FAILED)) return errno; @@ -394,15 +516,10 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, #endif /* Now mprotect the required region excluding the guard area. */ - if (__glibc_likely (guardsize > 0)) + if (!setup_stack_prot (mem, size, pd, guardsize, pagesize_m1)) { - char *guard = guard_position (mem, size, guardsize, pd, - pagesize_m1); - if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0) - { - __munmap (mem, size); - return errno; - } + __munmap (mem, size); + return errno; } /* Remember the stack-related values. */ @@ -456,59 +573,31 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, which will be read next. */ } - /* Create or resize the guard area if necessary. */ - if (__glibc_unlikely (guardsize > pd->guardsize)) + /* Create or resize the guard area if necessary on an already + allocated stack. */ + if (!adjust_stack_prot (mem, size, pd, guardsize, pagesize_m1)) { - char *guard = guard_position (mem, size, guardsize, pd, - pagesize_m1); - if (__mprotect (guard, guardsize, PROT_NONE) != 0) - { - mprot_error: - lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE); - - /* Remove the thread from the list. */ - __nptl_stack_list_del (&pd->list); + lll_lock (GL (dl_stack_cache_lock), LLL_PRIVATE); - lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); + /* Remove the thread from the list. */ + __nptl_stack_list_del (&pd->list); - /* Get rid of the TLS block we allocated. */ - _dl_deallocate_tls (TLS_TPADJ (pd), false); + lll_unlock (GL (dl_stack_cache_lock), LLL_PRIVATE); - /* Free the stack memory regardless of whether the size - of the cache is over the limit or not. If this piece - of memory caused problems we better do not use it - anymore. Uh, and we ignore possible errors. There - is nothing we could do. */ - (void) __munmap (mem, size); + /* Get rid of the TLS block we allocated. */ + _dl_deallocate_tls (TLS_TPADJ (pd), false); - return errno; - } + /* Free the stack memory regardless of whether the size + of the cache is over the limit or not. If this piece + of memory caused problems we better do not use it + anymore. Uh, and we ignore possible errors. There + is nothing we could do. */ + (void) __munmap (mem, size); - pd->guardsize = guardsize; + return errno; } - else if (__builtin_expect (pd->guardsize - guardsize > size - reqsize, - 0)) - { - /* The old guard area is too large. */ - -#if _STACK_GROWS_DOWN - if (__mprotect ((char *) mem + guardsize, pd->guardsize - guardsize, - prot) != 0) - goto mprot_error; -#elif _STACK_GROWS_UP - char *new_guard = (char *)(((uintptr_t) pd - guardsize) - & ~pagesize_m1); - char *old_guard = (char *)(((uintptr_t) pd - pd->guardsize) - & ~pagesize_m1); - /* The guard size difference might be > 0, but once rounded - to the nearest page the size difference might be zero. */ - if (new_guard > old_guard - && __mprotect (old_guard, new_guard - old_guard, prot) != 0) - goto mprot_error; -#endif - pd->guardsize = guardsize; - } + pd->guardsize = guardsize; /* The pthread_getattr_np() calls need to get passed the size requested in the attribute, regardless of how large the actually used guardsize is. */ @@ -568,19 +657,21 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp, static void name_stack_maps (struct pthread *pd, bool set) { + size_t adjust = pd->stack_mode == ALLOCATE_GUARD_PROT_NONE ? + pd->guardsize : 0; #if _STACK_GROWS_DOWN - void *stack = pd->stackblock + pd->guardsize; + void *stack = pd->stackblock + adjust; #else void *stack = pd->stackblock; #endif - size_t stacksize = pd->stackblock_size - pd->guardsize; + size_t stacksize = pd->stackblock_size - adjust; if (!set) - __set_vma_name (stack, stacksize, NULL); + __set_vma_name (stack, stacksize, " glibc: unused stack"); else { unsigned int tid = pd->tid; - if (pd->user_stack) + if (pd->stack_mode == ALLOCATE_GUARD_USER) SET_STACK_NAME (" glibc: pthread user stack: ", stack, stacksize, tid); else SET_STACK_NAME (" glibc: pthread stack: ", stack, stacksize, tid); diff --git a/nptl/descr.h b/nptl/descr.h index d0d30929e2..9c1ed54c56 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -125,6 +125,12 @@ struct priority_protection_data unsigned int priomap[]; }; +enum allocate_stack_mode_t +{ + ALLOCATE_GUARD_MADV_GUARD = 0, + ALLOCATE_GUARD_PROT_NONE = 1, + ALLOCATE_GUARD_USER = 2, +}; /* Thread descriptor data structure. */ struct pthread @@ -324,7 +330,7 @@ struct pthread bool report_events; /* True if the user provided the stack. */ - bool user_stack; + enum allocate_stack_mode_t stack_mode; /* True if thread must stop at startup time. */ bool stopped_start; diff --git a/nptl/nptl-stack.c b/nptl/nptl-stack.c index 503357f25d..c049c5133c 100644 --- a/nptl/nptl-stack.c +++ b/nptl/nptl-stack.c @@ -120,7 +120,7 @@ __nptl_deallocate_stack (struct pthread *pd) not reset the 'used' flag in the 'tid' field. This is done by the kernel. If no thread has been created yet this field is still zero. */ - if (__glibc_likely (! pd->user_stack)) + if (__glibc_likely (pd->stack_mode != ALLOCATE_GUARD_USER)) (void) queue_stack (pd); else /* Free the memory associated with the ELF TLS. */ diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 01e8a86980..0808f2e628 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -554,7 +554,7 @@ start_thread (void *arg) to avoid creating a new free-state block during thread release. */ __getrandom_vdso_release (pd); - if (!pd->user_stack) + if (pd->stack_mode != ALLOCATE_GUARD_USER) advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/nptl/tst-guard1.c b/nptl/tst-guard1.c new file mode 100644 index 0000000000..18df7ff301 --- /dev/null +++ b/nptl/tst-guard1.c @@ -0,0 +1,369 @@ +/* Basic tests for pthread guard area. + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static long int pagesz; + +/* To check if the guard region is inaccessible, the thread tries read/writes + on it and checks if a SIGSEGV is generated. */ + +static volatile sig_atomic_t signal_jump_set; +static sigjmp_buf signal_jmp_buf; + +static void +sigsegv_handler (int sig) +{ + if (signal_jump_set == 0) + return; + + siglongjmp (signal_jmp_buf, sig); +} + +static bool +try_access_buf (char *ptr, bool write) +{ + signal_jump_set = true; + + bool failed = sigsetjmp (signal_jmp_buf, 0) != 0; + if (!failed) + { + if (write) + *(volatile char *)(ptr) = 'x'; + else + *(volatile char *)(ptr); + } + + signal_jump_set = false; + return !failed; +} + +static bool +try_read_buf (char *ptr) +{ + return try_access_buf (ptr, false); +} + +static bool +try_write_buf (char *ptr) +{ + return try_access_buf (ptr, true); +} + +static bool +try_read_write_buf (char *ptr) +{ + return try_read_buf (ptr) && try_write_buf(ptr); +} + + +/* Return the guard region of the current thread (it only makes sense on + a thread created by pthread_created). */ + +struct stack_t +{ + char *stack; + size_t stacksize; + char *guard; + size_t guardsize; +}; + +static inline size_t +adjust_stacksize (size_t stacksize) +{ + /* For some ABIs, The guard page depends of the thread descriptor, which in + turn rely on the require static TLS. The only supported _STACK_GROWS_UP + ABI, hppa, defines TLS_DTV_AT_TP and it is not straightforward to + calculate the guard region with current pthread APIs. So to get a + correct stack size assumes an extra page after the guard area. */ +#if _STACK_GROWS_DOWN + return stacksize; +#elif _STACK_GROWS_UP + return stacksize - pagesz; +#endif +} + +struct stack_t +get_current_stack_info (void) +{ + pthread_attr_t attr; + TEST_VERIFY_EXIT (pthread_getattr_np (pthread_self (), &attr) == 0); + void *stack; + size_t stacksize; + TEST_VERIFY_EXIT (pthread_attr_getstack (&attr, &stack, &stacksize) == 0); + size_t guardsize; + TEST_VERIFY_EXIT (pthread_attr_getguardsize (&attr, &guardsize) == 0); + /* The guardsize is reported as the current page size, although it might + be adjusted to a larger value (aarch64 for instance). */ + if (guardsize != 0 && guardsize < ARCH_MIN_GUARD_SIZE) + guardsize = ARCH_MIN_GUARD_SIZE; + +#if _STACK_GROWS_DOWN + void *guard = guardsize ? stack - guardsize : 0; +#elif _STACK_GROWS_UP + stacksize = adjust_stacksize (stacksize); + void *guard = guardsize ? stack + stacksize : 0; +#endif + + pthread_attr_destroy (&attr); + + return (struct stack_t) { stack, stacksize, guard, guardsize }; +} + +struct thread_args_t +{ + size_t stacksize; + size_t guardsize; +}; + +struct thread_args_t +get_thread_args (const pthread_attr_t *attr) +{ + size_t stacksize; + size_t guardsize; + + TEST_COMPARE (pthread_attr_getstacksize (attr, &stacksize), 0); + TEST_COMPARE (pthread_attr_getguardsize (attr, &guardsize), 0); + if (guardsize < ARCH_MIN_GUARD_SIZE) + guardsize = ARCH_MIN_GUARD_SIZE; + + return (struct thread_args_t) { stacksize, guardsize }; +} + +static void +set_thread_args (pthread_attr_t *attr, const struct thread_args_t *args) +{ + xpthread_attr_setstacksize (attr, args->stacksize); + xpthread_attr_setguardsize (attr, args->guardsize); +} + +static void * +tf (void *closure) +{ + struct thread_args_t *args = closure; + + struct stack_t s = get_current_stack_info (); + if (test_verbose) + printf ("debug: [tid=%jd] stack = { .stack=%p, stacksize=%#zx, guard=%p, " + "guardsize=%#zx }\n", + (intmax_t) gettid (), + s.stack, + s.stacksize, + s.guard, + s.guardsize); + + if (args != NULL) + { + TEST_COMPARE (adjust_stacksize (args->stacksize), s.stacksize); + TEST_COMPARE (args->guardsize, s.guardsize); + } + + /* Ensure we can access the stack area. */ + TEST_COMPARE (try_read_write_buf (s.stack), true); + TEST_COMPARE (try_read_write_buf (&s.stack[s.stacksize / 2]), true); + TEST_COMPARE (try_read_write_buf (&s.stack[s.stacksize - 1]), true); + + /* Check if accessing the guard area results in SIGSEGV. */ + if (s.guardsize > 0) + { + TEST_COMPARE (try_read_write_buf (s.guard), false); + TEST_COMPARE (try_read_write_buf (&s.guard[s.guardsize / 2]), false); + TEST_COMPARE (try_read_write_buf (&s.guard[s.guardsize] - 1), false); + } + + return NULL; +} + +/* Test 1: caller provided stack without guard. */ +static void +do_test1 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + size_t stacksize = support_small_thread_stack_size (); + void *stack = xmmap (0, + stacksize, + PROT_READ | PROT_WRITE, + MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, + -1); + xpthread_attr_setstack (&attr, stack, stacksize); + xpthread_attr_setguardsize (&attr, 0); + + struct thread_args_t args = { stacksize, 0 }; + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); + xmunmap (stack, stacksize); +} + +/* Test 2: same as 1., but with a guard area. */ +static void +do_test2 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + + size_t stacksize = support_small_thread_stack_size (); + void *stack = xmmap (0, + stacksize, + PROT_READ | PROT_WRITE, + MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, + -1); + xpthread_attr_setstack (&attr, stack, stacksize); + xpthread_attr_setguardsize (&attr, pagesz); + + struct thread_args_t args = { stacksize, 0 }; + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); + xmunmap (stack, stacksize); +} + +/* Test 3: pthread_create with default values. */ +static void +do_test3 (void) +{ + pthread_t t = xpthread_create (NULL, tf, NULL); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); +} + +/* Test 4: pthread_create without a guard area. */ +static void +do_test4 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.stacksize += args.guardsize; + args.guardsize = 0; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 5: pthread_create with non default stack and guard size value. */ +static void +do_test5 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.guardsize += pagesz; + args.stacksize += pagesz; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 6: thread with the required size (stack + guard) that matches the + test 3, but with a larger guard area. The pthread_create will need to + increase the guard area. */ +static void +do_test6 (void) +{ + pthread_attr_t attr; + xpthread_attr_init (&attr); + struct thread_args_t args = get_thread_args (&attr); + args.guardsize += pagesz; + args.stacksize -= pagesz; + set_thread_args (&attr, &args); + + pthread_t t = xpthread_create (&attr, tf, &args); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); + + xpthread_attr_destroy (&attr); +} + +/* Test 7: pthread_create with default values, the requires size matches the + one from test 3 and 6 (but with a reduced guard ares). The + pthread_create should use the cached stack from previous tests, but it + would require to reduce the guard area. */ +static void +do_test7 (void) +{ + pthread_t t = xpthread_create (NULL, tf, NULL); + void *status = xpthread_join (t); + TEST_VERIFY (status == 0); +} + +static int +do_test (void) +{ + pagesz = sysconf (_SC_PAGESIZE); + + { + struct sigaction sa = { + .sa_handler = sigsegv_handler, + .sa_flags = SA_NODEFER, + }; + sigemptyset (&sa.sa_mask); + xsigaction (SIGSEGV, &sa, NULL); + /* Some system generates SIGBUS accessing the guard area when it is + setup with madvise. */ + xsigaction (SIGBUS, &sa, NULL); + } + + static const struct { + const char *descr; + void (*test)(void); + } tests[] = { + { "user provided stack without guard", do_test1 }, + { "user provided stack with guard", do_test2 }, + { "default attribute", do_test3 }, + { "default attribute without guard", do_test4 }, + { "non default stack and guard sizes", do_test5 }, + { "reused stack with larger guard", do_test6 }, + { "reused stack with smaller guard", do_test7 }, + }; + + for (int i = 0; i < array_length (tests); i++) + { + printf ("debug: test%01d: %s\n", i, tests[i].descr); + tests[i].test(); + } + + return 0; +} + +#include diff --git a/sysdeps/nptl/dl-tls_init_tp.c b/sysdeps/nptl/dl-tls_init_tp.c index c57738e9f3..20cc9202ec 100644 --- a/sysdeps/nptl/dl-tls_init_tp.c +++ b/sysdeps/nptl/dl-tls_init_tp.c @@ -72,7 +72,7 @@ __tls_init_tp (void) /* Early initialization of the TCB. */ pd->tid = INTERNAL_SYSCALL_CALL (set_tid_address, &pd->tid); THREAD_SETMEM (pd, specific[0], &pd->specific_1stblock[0]); - THREAD_SETMEM (pd, user_stack, true); + THREAD_SETMEM (pd, stack_mode, ALLOCATE_GUARD_USER); /* Before initializing GL (dl_stack_user), the debugger could not find us and had to set __nptl_initial_report_events. Propagate diff --git a/sysdeps/nptl/fork.h b/sysdeps/nptl/fork.h index 6156af79e1..3c79179437 100644 --- a/sysdeps/nptl/fork.h +++ b/sysdeps/nptl/fork.h @@ -155,7 +155,7 @@ reclaim_stacks (void) INIT_LIST_HEAD (&GL (dl_stack_used)); INIT_LIST_HEAD (&GL (dl_stack_user)); - if (__glibc_unlikely (THREAD_GETMEM (self, user_stack))) + if (__glibc_unlikely (self->stack_mode == ALLOCATE_GUARD_USER)) list_add (&self->list, &GL (dl_stack_user)); else list_add (&self->list, &GL (dl_stack_used)); diff --git a/sysdeps/unix/sysv/linux/bits/mman-linux.h b/sysdeps/unix/sysv/linux/bits/mman-linux.h index 8e072eb4cd..fe0496d802 100644 --- a/sysdeps/unix/sysv/linux/bits/mman-linux.h +++ b/sysdeps/unix/sysv/linux/bits/mman-linux.h @@ -113,6 +113,8 @@ locked pages too. */ # define MADV_COLLAPSE 25 /* Synchronous hugepage collapse. */ # define MADV_HWPOISON 100 /* Poison a page for testing. */ +# define MADV_GUARD_INSTALL 102 /* Fatal signal on access to range */ +# define MADV_GUARD_REMOVE 103 /* Unguard range */ #endif /* The POSIX people had to invent similar names for the same things. */