From patchwork Tue Jun 24 19:33:54 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 899581 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D51E2E92B7 for ; Tue, 24 Jun 2025 19:34:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793649; cv=none; b=sbsYIH+7FSGABgUYfAMjG1ZymnbVfmh4BNWBxna3Dr5FjhC3IGz7yCt9S7U2PUiowajkeU5cHEDyzisxW4zFUY89/vs/osjJXukys14toe6JbPp0mDX0Y7JgpkyH84CYI/2BwBY4OjBkoFXudjS3WRg+FePtXM867omhWE5VTLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793649; c=relaxed/simple; bh=6N56cO+mEI93aCD929OWXgdVDQmRt6u/f2iRrSkiXBY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=D++Y9hSBEfZb6nDc8kqjvhRlhL0tcqQSBpNU5N0RQe2ppNvjddJgZD2ohDYh/VytF97z14IdMPiSaOsxbctp+nA9w+Kckx3n2PGnwdcmFD5f1wrXJvXul0fALjzPmiwZDsxY0lzRDE7ZSY3Lzu03J0BXIOv0mGFY+HcShF/caBs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tk6JcZJL; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tk6JcZJL" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7492da755a1so2281147b3a.1 for ; Tue, 24 Jun 2025 12:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750793647; x=1751398447; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UC2aFr+2hBpo4c8OQj6ulNMqnFjS9I9wfu23t0p6gWo=; b=tk6JcZJLly3FgBTmdsemuIUDIl7dxcOpbtvOW6o0b+rh/CpFrYJXAwCvXoFhntDkVV 74ALfeTpNGYWI1BLqtaD9nRF/9nohrsD89N0AgRpBiShYoFZeDkPuXU+kEC7ynAcNKIz D78BjZ2j53bgsOBVB9gRuj9LAHW1/gUaNkf+Yo2Q1EyRNfYL2Fc5swKiFlnUjazcWGi1 /aBke/UaLrt/KaODq9OnUjEmdt38MiahV403+BJbSqO4cuQbgBXxrakwpaWxiAPToRvK 2VanKTly/kRzdvWjdcsCSER9C07YUac6DJsCpQU1wl1QweDO5MGkqfdPNMfU8g6yX6KL nzyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750793647; x=1751398447; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UC2aFr+2hBpo4c8OQj6ulNMqnFjS9I9wfu23t0p6gWo=; b=fIgMv9XCdpYmgmBZm47bcyWrQg98lUFxtJLniBsRbYZYm9Curzp+pjCHgGz2KULDrT oC8ChX8IQcZJPoLE2NryLcfOshYKfHfiqRkOGNDUBkNltUi+PxnCiafFBXgAolcssiBx c/uga7uz1hg9oLbQvb4R94a8vuWVqTMp1fTGxMgSaI2J1vCuwmYkeUU2efjcXxXuUfC5 oMPMGBIRzjISqfsVOFI0LzNCynLKHQYD5hoxfXuPpVeHGA0ohzIkfRoTK8TtowSmnBJJ 91xs1m102yAn3G/z2UZLEoj0uSgGJiE/PkDxF6Zr59r99ph87KoWdNsXOq3l6jamq6CA XXxw== X-Forwarded-Encrypted: i=1; AJvYcCW79w2rfDH2oECbgaquYb8At/DF6PXHAZ8Q2H3Kp3GRXVSZVw9OtEM03dSHLCMBCfQy0rAkNTH737mLHG9loA0=@vger.kernel.org X-Gm-Message-State: AOJu0Ywx4d1o2p/UH4pwYLGkUbdqLGvwotfmcZiIRnbhzeAf1g+DwyTQ MQXFIS7HDhV3U4Dco2fFE6wYSTv+XPLeBpn3eUyWt0NPdazYhSQbl/21fCOTldL8XWxxCE+Z37o 93WoUEQ== X-Google-Smtp-Source: AGHT+IGXFGZq1eG2J9AB8OK4tRQIFjL7wLANvEFPeXTMgxcyIvcd791KpwM8NoFdIHzomkbPKUWUDTzIDUk= X-Received: from pfwy15.prod.google.com ([2002:a05:6a00:1c8f:b0:739:8cd6:c16c]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:124f:b0:746:1e35:3307 with SMTP id d2e1a72fcca58-74ad44fa9e5mr508606b3a.14.1750793646830; Tue, 24 Jun 2025 12:34:06 -0700 (PDT) Date: Tue, 24 Jun 2025 12:33:54 -0700 In-Reply-To: <20250624193359.3865351-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250624193359.3865351-1-surenb@google.com> X-Mailer: git-send-email 2.50.0.714.g196bf9f422-goog Message-ID: <20250624193359.3865351-3-surenb@google.com> Subject: [PATCH v5 2/7] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, tjmercier@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, surenb@google.com Test that /proc/pid/maps does not report unexpected holes in the address space when a vma at the edge of the page is being concurrently remapped. This remapping results in the vma shrinking and expanding from under the reader. We should always see either shrunk or expanded (original) version of the vma. Signed-off-by: Suren Baghdasaryan --- tools/testing/selftests/proc/proc-pid-vm.c | 83 ++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/tools/testing/selftests/proc/proc-pid-vm.c b/tools/testing/selftests/proc/proc-pid-vm.c index 6e3f06376a1f..39842e4ec45f 100644 --- a/tools/testing/selftests/proc/proc-pid-vm.c +++ b/tools/testing/selftests/proc/proc-pid-vm.c @@ -583,6 +583,86 @@ static void test_maps_tearing_from_split(int maps_fd, signal_state(mod_info, TEST_DONE); } +static inline void shrink_vma(const struct vma_modifier_info *mod_info) +{ + assert(mremap(mod_info->addr, page_size * 3, page_size, 0) != MAP_FAILED); +} + +static inline void expand_vma(const struct vma_modifier_info *mod_info) +{ + assert(mremap(mod_info->addr, page_size, page_size * 3, 0) != MAP_FAILED); +} + +static inline void check_shrink_result(struct line_content *mod_last_line, + struct line_content *mod_first_line, + struct line_content *restored_last_line, + struct line_content *restored_first_line) +{ + /* Make sure only the last vma of the first page is changing */ + assert(strcmp(mod_last_line->text, restored_last_line->text) != 0); + assert(strcmp(mod_first_line->text, restored_first_line->text) == 0); +} + +static void test_maps_tearing_from_resize(int maps_fd, + struct vma_modifier_info *mod_info, + struct page_content *page1, + struct page_content *page2, + struct line_content *last_line, + struct line_content *first_line) +{ + struct line_content shrunk_last_line; + struct line_content shrunk_first_line; + struct line_content restored_last_line; + struct line_content restored_first_line; + + wait_for_state(mod_info, SETUP_READY); + + /* re-read the file to avoid using stale data from previous test */ + read_boundary_lines(maps_fd, page1, page2, last_line, first_line); + + mod_info->vma_modify = shrink_vma; + mod_info->vma_restore = expand_vma; + mod_info->vma_mod_check = check_shrink_result; + + capture_mod_pattern(maps_fd, mod_info, page1, page2, last_line, first_line, + &shrunk_last_line, &shrunk_first_line, + &restored_last_line, &restored_first_line); + + /* Now start concurrent modifications for test_duration_sec */ + signal_state(mod_info, TEST_READY); + + struct line_content new_last_line; + struct line_content new_first_line; + struct timespec start_ts, end_ts; + + clock_gettime(CLOCK_MONOTONIC_COARSE, &start_ts); + do { + read_boundary_lines(maps_fd, page1, page2, &new_last_line, &new_first_line); + + /* Check if we read vmas after shrinking it */ + if (!strcmp(new_last_line.text, shrunk_last_line.text)) { + /* + * The vmas should be consistent with shrunk results, + * however if the vma was concurrently restored, it + * can be reported twice (first as shrunk one, then + * as restored one) because we found it as the next vma + * again. In that case new first line will be the same + * as the last restored line. + */ + assert(!strcmp(new_first_line.text, shrunk_first_line.text) || + !strcmp(new_first_line.text, restored_last_line.text)); + } else { + /* The vmas should be consistent with the original/resored state */ + assert(!strcmp(new_last_line.text, restored_last_line.text) && + !strcmp(new_first_line.text, restored_first_line.text)); + } + clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); + } while (end_ts.tv_sec - start_ts.tv_sec < test_duration_sec); + + /* Signal the modifyer thread to stop and wait until it exits */ + signal_state(mod_info, TEST_DONE); +} + static int test_maps_tearing(void) { struct vma_modifier_info *mod_info; @@ -674,6 +754,9 @@ static int test_maps_tearing(void) test_maps_tearing_from_split(maps_fd, mod_info, &page1, &page2, &last_line, &first_line); + test_maps_tearing_from_resize(maps_fd, mod_info, &page1, &page2, + &last_line, &first_line); + stop_vma_modifier(mod_info); free(page2.data); From patchwork Tue Jun 24 19:33:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 899580 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0B0A2EAB77 for ; Tue, 24 Jun 2025 19:34:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793653; cv=none; b=LCq68V6oPAr+wujj70ydOcuzkVaAz9Qgm5qvr5NPkyUF4fW5XH0Uo29WysR4rzAGgUz9YCQlLTSWM4KQxBmz0BRfXPXNzNhYGz6p9fj0kAmO3T5hkSrzwAERpfE1mMTVjKHk1g+bpI4azn2E187zZZwn0oqQhEJpE8NDJ8PAhfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793653; c=relaxed/simple; bh=F9xHDiW8Uqm2cMSKOBqvwqb70yMRUXd6tMFuQWLlKP4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nrOchTEoTALUB8Pvrbb6sfeGTt2uPvmCejLOPUa7uNo1BvSE4H8GrvzWEOY5ZOU3XNeAr+8x8h6lvJwphKrvOuxQxAXhh3xkq0lcrzYFTm8vn/pt2iQCgeG8QrPEmZP4eHkHikqRdogz+0RVRORDFXqRIDfeYlO+cOwJ/u8Ene4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AbwDHTy4; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AbwDHTy4" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b34abbcdcf3so332675a12.1 for ; Tue, 24 Jun 2025 12:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750793651; x=1751398451; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yLtEbRv6QyxPjsf9LVAPVV2LoLwOWyeMLnVsJ44IjdU=; b=AbwDHTy4bZnt0JWMsRnTvU9Eok/Y74F983dSPXWbA/8swgF3GX/Uq0SefP1ma6Mu8j QxN+pNUIcwodjVgXYIdUGmtUcR/gqr8JYh7XKW8KNrNYQG0xRrmhAy5rfsDVlnY370qQ OM/gtsIdKdWFqqycBJlSHSOX+1JeisRJ7Gv07WfDfcIKKkQ9qYvrIp0gXVczXkgxqlRy dbUJCkGkaKF5ieYthimCOzCnefv55YecppSbIGjX71AQ9HQ2T68Gv2pvq0HjgH00Kk0B Z+emEUSNiV4uLSCKqT3v954cZOZKLkqbCPnUvbaSFA5kIIy0slXnmhtYzkwHkW+5VzPI URVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750793651; x=1751398451; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yLtEbRv6QyxPjsf9LVAPVV2LoLwOWyeMLnVsJ44IjdU=; b=sTO1lyv698Qdg0WJK+YzuM+PoXKulgyuzBrJV33xJCOW9qNEm6VAWtDEkC6eTHPNMo 7OO8LZBnGLRLO007+yl+E7pLZTdrlV6eNAmDD0R3oSutN394Nz33fAyaurFag5Pd7522 TeEAyZB6FhBqnCQEziRQ5IkGVkykOvfZ0sTksCCwosUCLkc+XzUQUo99Ub2ucRuRe/Xr P01SDHgU1er1cFdz278htMsLJkquuZ6+nugIGK1isLZVEZxmirQGId0EPeDXOzRC3AIE MPCfpOh545gU5OQadIu7wRHdp02nbuEvv6am2nTJSqW6RY9HXmZPx8O4h7eYmxCHHl3X iHZQ== X-Forwarded-Encrypted: i=1; AJvYcCVIDzJMS1y+CU3npWHXmG6XWmlll9rsdxBDdNng6lkZ9ktXUcRmw7Llv9Ub10m7j/4ScyiFnZdPmSDcHRk04A4=@vger.kernel.org X-Gm-Message-State: AOJu0Yy+fius42Kdpu8/GDGfxrTe3TK2MchiPpySaRYRp3QWVky1PIam 8yeV41Ld1AhcFGCAfHWmel5XXYUEyMjCzr6eoRe7wDcq/ZUWx2KYwFCf2KQqSR0VZZ9NIEQvGRF 3JBy+Dw== X-Google-Smtp-Source: AGHT+IHClWuGmDdB+MsMxoQpT43VMdFBAo96rsx+K87NYCqHXT7J+KynqmIRFatckh9qdl8zl/G773J5x/U= X-Received: from pgww20.prod.google.com ([2002:a05:6a02:2c94:b0:b2c:3d70:9c1]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9d8e:b0:215:e43a:29b9 with SMTP id adf61e73a8af0-2207f2a49ddmr205722637.33.1750793651161; Tue, 24 Jun 2025 12:34:11 -0700 (PDT) Date: Tue, 24 Jun 2025 12:33:56 -0700 In-Reply-To: <20250624193359.3865351-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250624193359.3865351-1-surenb@google.com> X-Mailer: git-send-email 2.50.0.714.g196bf9f422-goog Message-ID: <20250624193359.3865351-5-surenb@google.com> Subject: [PATCH v5 4/7] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, tjmercier@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, surenb@google.com Extend /proc/pid/maps tearing test to verify PROCMAP_QUERY ioctl operation correctness while the vma is being concurrently modified. Signed-off-by: Suren Baghdasaryan --- tools/testing/selftests/proc/proc-pid-vm.c | 60 ++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/tools/testing/selftests/proc/proc-pid-vm.c b/tools/testing/selftests/proc/proc-pid-vm.c index 1aef2db7e893..b582f40851fb 100644 --- a/tools/testing/selftests/proc/proc-pid-vm.c +++ b/tools/testing/selftests/proc/proc-pid-vm.c @@ -486,6 +486,21 @@ static void capture_mod_pattern(int maps_fd, assert(strcmp(restored_first_line->text, first_line->text) == 0); } +static void query_addr_at(int maps_fd, void *addr, + unsigned long *vma_start, unsigned long *vma_end) +{ + struct procmap_query q; + + memset(&q, 0, sizeof(q)); + q.size = sizeof(q); + /* Find the VMA at the split address */ + q.query_addr = (unsigned long long)addr; + q.query_flags = 0; + assert(!ioctl(maps_fd, PROCMAP_QUERY, &q)); + *vma_start = q.vma_start; + *vma_end = q.vma_end; +} + static inline void split_vma(const struct vma_modifier_info *mod_info) { assert(mmap(mod_info->addr, page_size, mod_info->prot | PROT_EXEC, @@ -546,6 +561,8 @@ static void test_maps_tearing_from_split(int maps_fd, do { bool last_line_changed; bool first_line_changed; + unsigned long vma_start; + unsigned long vma_end; read_boundary_lines(maps_fd, page1, page2, &new_last_line, &new_first_line); @@ -576,6 +593,19 @@ static void test_maps_tearing_from_split(int maps_fd, first_line_changed = strcmp(new_first_line.text, first_line->text) != 0; assert(last_line_changed == first_line_changed); + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + query_addr_at(maps_fd, mod_info->addr + page_size, + &vma_start, &vma_end); + /* + * The vma at the split address can be either the same as + * original one (if read before the split) or the same as the + * first line in the second page (if read after the split). + */ + assert((vma_start == last_line->start_addr && + vma_end == last_line->end_addr) || + (vma_start == split_first_line.start_addr && + vma_end == split_first_line.end_addr)); + clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); } while (end_ts.tv_sec - start_ts.tv_sec < test_duration_sec); @@ -637,6 +667,9 @@ static void test_maps_tearing_from_resize(int maps_fd, clock_gettime(CLOCK_MONOTONIC_COARSE, &start_ts); do { + unsigned long vma_start; + unsigned long vma_end; + read_boundary_lines(maps_fd, page1, page2, &new_last_line, &new_first_line); /* Check if we read vmas after shrinking it */ @@ -656,6 +689,17 @@ static void test_maps_tearing_from_resize(int maps_fd, assert(!strcmp(new_last_line.text, restored_last_line.text) && !strcmp(new_first_line.text, restored_first_line.text)); } + + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + query_addr_at(maps_fd, mod_info->addr, &vma_start, &vma_end); + /* + * The vma should stay at the same address and have either the + * original size of 3 pages or 1 page if read after shrinking. + */ + assert(vma_start == last_line->start_addr && + (vma_end - vma_start == page_size * 3 || + vma_end - vma_start == page_size)); + clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); } while (end_ts.tv_sec - start_ts.tv_sec < test_duration_sec); @@ -726,6 +770,9 @@ static void test_maps_tearing_from_remap(int maps_fd, clock_gettime(CLOCK_MONOTONIC_COARSE, &start_ts); do { + unsigned long vma_start; + unsigned long vma_end; + read_boundary_lines(maps_fd, page1, page2, &new_last_line, &new_first_line); /* Check if we read vmas after remapping it */ @@ -745,6 +792,19 @@ static void test_maps_tearing_from_remap(int maps_fd, assert(!strcmp(new_last_line.text, restored_last_line.text) && !strcmp(new_first_line.text, restored_first_line.text)); } + + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + query_addr_at(maps_fd, mod_info->addr + page_size, &vma_start, &vma_end); + /* + * The vma should either stay at the same address and have the + * original size of 3 pages or we should find the remapped vma + * at the remap destination address with size of 1 page. + */ + assert((vma_start == last_line->start_addr && + vma_end - vma_start == page_size * 3) || + (vma_start == last_line->start_addr + page_size && + vma_end - vma_start == page_size)); + clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); } while (end_ts.tv_sec - start_ts.tv_sec < test_duration_sec); From patchwork Tue Jun 24 19:33:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 899579 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4A772EBDEA for ; Tue, 24 Jun 2025 19:34:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793657; cv=none; b=innvBVV2ayU8W95ULpNvPUkcvNUM3FnICF5NcdEHsqbAWhbMBZJKKfHN8J7azejuIiuEwqX+q4LbT4T9WstpEE99qTiMiymg3q+RxSjuGscGlr41hKDZnlVJNiyjbvEYtH644zkH/FbbhB9whpLsKz2EgJ8N9Quc1znrD5OANNc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1750793657; c=relaxed/simple; bh=U4CpqRhmkjkU+JSBE1xEgEtaLoyuPdpcrl5XFw5JfRQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=e45hT71dK63LsfKYEc7Srm67ORS3FnJvfZMJsIFZBXnevBHTQ8VI3VfzVnu8b7f2SUnJRtmAh8QyW2S8YqRIHwdRJbjWI69eoyvfBOEj5SLnuO3wFznrwV3RkaHvndNFHLpbuRZNZEN2Gnqb4yNZly9aODFx/QhODOPxB76dACY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lexT8NlL; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lexT8NlL" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-23507382e64so54067885ad.2 for ; Tue, 24 Jun 2025 12:34:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750793655; x=1751398455; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=R8jCWZ/F+ajpqK4ZOmTtaT6tZvVFIoI2XxVqwquCgmw=; b=lexT8NlLX3KgGeZX3T/PApvCCB7P4EEKvnwrfpzvbxwXJRdFSiS6PecyRjA5BNYQCL p5ViVEInGKJeLomMCdQH67LSjJpyP7576YxVekJI2INkI33uHsHSVZw5ibTvXSvD+p2X D3ts+Z/Md4oErG5qlULHQ+N+eI3Q4Z4GFgDqVEbk/i9DNjBXosOrP4xz0NJs9Bt4Xr3C ycjc8ikB2aaszRtlsMkuYTzelMkmoAo2rsNXQRu2SNrtquDVyJBzyhkIajuKIp7O85qD 2QUTrQcYS76YCb63DZLSYnnYyfUkLHRTG2oDRHV9LNjllq6gr/tBSi6XnVfXMZxTAsmr 5Veg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750793655; x=1751398455; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=R8jCWZ/F+ajpqK4ZOmTtaT6tZvVFIoI2XxVqwquCgmw=; b=UHKjQpdQIyjIaYxOH6TfxdJbK2W05APncQSJBtAXwP/gf/igxwKwAQFgYa1nJMOucp 0EZlbGCJl5u8+sUGncasFW7MlPnBDgDvgKnVWLaxSshlPG5lNkJgdkQ36knWNnNnJuzR y5bWIqegUPxIcbzP2DfkyNEdjJX0uBStguZUuBsHm/oZPS3rwgEe7sgXPczILMmbkJEq c0hAPvUKmV8YpAQTauvoqbt2329kZ1KCY8hKBBsx3vHLvRMUox8HrkMF5PKEmJrobvOD XssUW0XgOe7fqicmJfJjs0giOk/zCjbuwACPKZhqdO5HMQ1YiAMl4A9qd9LiKfBcIrAf P1ww== X-Forwarded-Encrypted: i=1; AJvYcCWEl12Kv8oYDCds2+vDJpYDCtBKN+wd5Ao7qOIzg3sXsVArlqJLteHmglY2ahBM2XKk3pHbmouqYysv/nFMEwg=@vger.kernel.org X-Gm-Message-State: AOJu0YwXX8xjtiCBMieyLbCAHW8/WIezul/N/JBit9zu8naGqRlu2Dlm HzHf3Ug/cPvprfa5JxNoRSzL9SJTi2KemLgV1lfxJp4DIGptyz1VdsdKSk9Cql377uRwLYiqArq XSvni5Q== X-Google-Smtp-Source: AGHT+IF1b8D4iLGP12pVy+17DkqSsks+kkF0hmalJQ6QlCgP6k7A5fSY0k99ddn2kQ7IT6XhG2dE5s8LCCA= X-Received: from plbkw7.prod.google.com ([2002:a17:902:f907:b0:234:4c97:1e84]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:f684:b0:235:60e:3704 with SMTP id d9443c01a7336-23823fc3f99mr9626325ad.12.1750793655228; Tue, 24 Jun 2025 12:34:15 -0700 (PDT) Date: Tue, 24 Jun 2025 12:33:58 -0700 In-Reply-To: <20250624193359.3865351-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250624193359.3865351-1-surenb@google.com> X-Mailer: git-send-email 2.50.0.714.g196bf9f422-goog Message-ID: <20250624193359.3865351-7-surenb@google.com> Subject: [PATCH v5 6/7] mm/maps: read proc/pid/maps under per-vma lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz, peterx@redhat.com, jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net, willy@infradead.org, osalvador@suse.de, andrii@kernel.org, ryan.roberts@arm.com, christophe.leroy@csgroup.eu, tjmercier@google.com, kaleshsingh@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, surenb@google.com With maple_tree supporting vma tree traversal under RCU and per-vma locks, /proc/pid/maps can be read while holding individual vma locks instead of locking the entire address space. Completely lockless approach (walking vma tree under RCU) would be quite complex with the main issue being get_vma_name() using callbacks which might not work correctly with a stable vma copy, requiring original (unstable) vma - see special_mapping_name() for an example. When per-vma lock acquisition fails, we take the mmap_lock for reading, lock the vma, release the mmap_lock and continue. This fallback to mmap read lock guarantees the reader to make forward progress even during lock contention. This will interfere with the writer but for a very short time while we are acquiring the per-vma lock and only when there was contention on the vma reader is interested in. We shouldn't see a repeated fallback to mmap read locks in practice, as this require a very unlikely series of lock contentions (for instance due to repeated vma split operations). However even if this did somehow happen, we would still progress. One case requiring special handling is when vma changes between the time it was found and the time it got locked. A problematic case would be if vma got shrunk so that it's start moved higher in the address space and a new vma was installed at the beginning: reader found: |--------VMA A--------| VMA is modified: |-VMA B-|----VMA A----| reader locks modified VMA A reader reports VMA A: | gap |----VMA A----| This would result in reporting a gap in the address space that does not exist. To prevent this we retry the lookup after locking the vma, however we do that only when we identify a gap and detect that the address space was changed after we found the vma. This change is designed to reduce mmap_lock contention and prevent a process reading /proc/pid/maps files (often a low priority task, such as monitoring/data collection services) from blocking address space updates. Note that this change has a userspace visible disadvantage: it allows for sub-page data tearing as opposed to the previous mechanism where data tearing could happen only between pages of generated output data. Since current userspace considers data tearing between pages to be acceptable, we assume is will be able to handle sub-page data tearing as well. Signed-off-by: Suren Baghdasaryan --- fs/proc/internal.h | 5 ++ fs/proc/task_mmu.c | 123 ++++++++++++++++++++++++++++++++++---- include/linux/mmap_lock.h | 11 ++++ mm/mmap_lock.c | 88 +++++++++++++++++++++++++++ 4 files changed, 217 insertions(+), 10 deletions(-) diff --git a/fs/proc/internal.h b/fs/proc/internal.h index 3d48ffe72583..7c235451c5ea 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -384,6 +384,11 @@ struct proc_maps_private { struct task_struct *task; struct mm_struct *mm; struct vma_iterator iter; + loff_t last_pos; +#ifdef CONFIG_PER_VMA_LOCK + bool mmap_locked; + struct vm_area_struct *locked_vma; +#endif #ifdef CONFIG_NUMA struct mempolicy *task_mempolicy; #endif diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 751479eb128f..33171afb5364 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -127,21 +127,118 @@ static void release_task_mempolicy(struct proc_maps_private *priv) } #endif -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *priv, - loff_t *ppos) +#ifdef CONFIG_PER_VMA_LOCK + +static void unlock_vma(struct proc_maps_private *priv) { - struct vm_area_struct *vma = vma_next(&priv->iter); + if (priv->locked_vma) { + vma_end_read(priv->locked_vma); + priv->locked_vma = NULL; + } +} + +static const struct seq_operations proc_pid_maps_op; + +static inline bool lock_vma_range(struct seq_file *m, + struct proc_maps_private *priv) +{ + /* + * smaps and numa_maps perform page table walk, therefore require + * mmap_lock but maps can be read with locking just the vma. + */ + if (m->op != &proc_pid_maps_op) { + if (mmap_read_lock_killable(priv->mm)) + return false; + priv->mmap_locked = true; + } else { + rcu_read_lock(); + priv->locked_vma = NULL; + priv->mmap_locked = false; + } + + return true; +} + +static inline void unlock_vma_range(struct proc_maps_private *priv) +{ + if (priv->mmap_locked) { + mmap_read_unlock(priv->mm); + } else { + unlock_vma(priv); + rcu_read_unlock(); + } +} + +static struct vm_area_struct *get_next_vma(struct proc_maps_private *priv, + loff_t last_pos) +{ + struct vm_area_struct *vma; + + if (priv->mmap_locked) + return vma_next(&priv->iter); + + unlock_vma(priv); + vma = lock_next_vma(priv->mm, &priv->iter, last_pos); + if (!IS_ERR_OR_NULL(vma)) + priv->locked_vma = vma; + + return vma; +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline bool lock_vma_range(struct seq_file *m, + struct proc_maps_private *priv) +{ + return mmap_read_lock_killable(priv->mm) == 0; +} + +static inline void unlock_vma_range(struct proc_maps_private *priv) +{ + mmap_read_unlock(priv->mm); +} + +static struct vm_area_struct *get_next_vma(struct proc_maps_private *priv, + loff_t last_pos) +{ + return vma_next(&priv->iter); +} + +#endif /* CONFIG_PER_VMA_LOCK */ + +static struct vm_area_struct *proc_get_vma(struct seq_file *m, loff_t *ppos) +{ + struct proc_maps_private *priv = m->private; + struct vm_area_struct *vma; + + vma = get_next_vma(priv, *ppos); + /* EINTR is possible */ + if (IS_ERR(vma)) + return vma; + + /* Store previous position to be able to restart if needed */ + priv->last_pos = *ppos; if (vma) { - *ppos = vma->vm_start; + /* + * Track the end of the reported vma to ensure position changes + * even if previous vma was merged with the next vma and we + * found the extended vma with the same vm_start. + */ + *ppos = vma->vm_end; } else { - *ppos = -2UL; + *ppos = -2UL; /* -2 indicates gate vma */ vma = get_gate_vma(priv->mm); } return vma; } +static inline bool is_sentinel_pos(unsigned long pos) +{ + return pos == -1UL || pos == -2UL; +} + static void *m_start(struct seq_file *m, loff_t *ppos) { struct proc_maps_private *priv = m->private; @@ -163,28 +260,34 @@ static void *m_start(struct seq_file *m, loff_t *ppos) return NULL; } - if (mmap_read_lock_killable(mm)) { + if (!lock_vma_range(m, priv)) { mmput(mm); put_task_struct(priv->task); priv->task = NULL; return ERR_PTR(-EINTR); } + /* + * Reset current position if last_addr was set before + * and it's not a sentinel. + */ + if (last_addr > 0 && !is_sentinel_pos(last_addr)) + *ppos = last_addr = priv->last_pos; vma_iter_init(&priv->iter, mm, last_addr); hold_task_mempolicy(priv); if (last_addr == -2UL) return get_gate_vma(mm); - return proc_get_vma(priv, ppos); + return proc_get_vma(m, ppos); } static void *m_next(struct seq_file *m, void *v, loff_t *ppos) { if (*ppos == -2UL) { - *ppos = -1UL; + *ppos = -1UL; /* -1 indicates no more vmas */ return NULL; } - return proc_get_vma(m->private, ppos); + return proc_get_vma(m, ppos); } static void m_stop(struct seq_file *m, void *v) @@ -196,7 +299,7 @@ static void m_stop(struct seq_file *m, void *v) return; release_task_mempolicy(priv); - mmap_read_unlock(mm); + unlock_vma_range(priv); mmput(mm); put_task_struct(priv->task); priv->task = NULL; diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 5da384bd0a26..1f4f44951abe 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -309,6 +309,17 @@ void vma_mark_detached(struct vm_area_struct *vma); struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); +/* + * Locks next vma pointed by the iterator. Confirms the locked vma has not + * been modified and will retry under mmap_lock protection if modification + * was detected. Should be called from read RCU section. + * Returns either a valid locked VMA, NULL if no more VMAs or -EINTR if the + * process was interrupted. + */ +struct vm_area_struct *lock_next_vma(struct mm_struct *mm, + struct vma_iterator *iter, + unsigned long address); + #else /* CONFIG_PER_VMA_LOCK */ static inline void mm_lock_seqcount_init(struct mm_struct *mm) {} diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 5f725cc67334..ed0e5e2171cd 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -178,6 +178,94 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, count_vm_vma_lock_event(VMA_LOCK_ABORT); return NULL; } + +static struct vm_area_struct *lock_vma_under_mmap_lock(struct mm_struct *mm, + struct vma_iterator *iter, + unsigned long address) +{ + struct vm_area_struct *vma; + int ret; + + ret = mmap_read_lock_killable(mm); + if (ret) + return ERR_PTR(ret); + + /* Lookup the vma at the last position again under mmap_read_lock */ + vma_iter_init(iter, mm, address); + vma = vma_next(iter); + if (vma) + vma_start_read_locked(vma); + + mmap_read_unlock(mm); + + return vma; +} + +struct vm_area_struct *lock_next_vma(struct mm_struct *mm, + struct vma_iterator *iter, + unsigned long address) +{ + struct vm_area_struct *vma; + unsigned int mm_wr_seq; + bool mmap_unlocked; + + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu read lock held"); +retry: + /* Start mmap_lock speculation in case we need to verify the vma later */ + mmap_unlocked = mmap_lock_speculate_try_begin(mm, &mm_wr_seq); + vma = vma_next(iter); + if (!vma) + return NULL; + + vma = vma_start_read(mm, vma); + + if (IS_ERR_OR_NULL(vma)) { + /* + * Retry immediately if the vma gets detached from under us. + * Infinite loop should not happen because the vma we find will + * have to be constantly knocked out from under us. + */ + if (PTR_ERR(vma) == -EAGAIN) { + vma_iter_init(iter, mm, address); + goto retry; + } + + goto out; + } + + /* + * Verify the vma we locked belongs to the same address space and it's + * not behind of the last search position. + */ + if (unlikely(vma->vm_mm != mm || address >= vma->vm_end)) + goto out_unlock; + + /* + * vma can be ahead of the last search position but we need to verify + * it was not shrunk after we found it and another vma has not been + * installed ahead of it. Otherwise we might observe a gap that should + * not be there. + */ + if (address < vma->vm_start) { + /* Verify only if the address space might have changed since vma lookup. */ + if (!mmap_unlocked || mmap_lock_speculate_retry(mm, mm_wr_seq)) { + vma_iter_init(iter, mm, address); + if (vma != vma_next(iter)) + goto out_unlock; + } + } + + return vma; + +out_unlock: + vma_end_read(vma); +out: + rcu_read_unlock(); + vma = lock_vma_under_mmap_lock(mm, iter, address); + rcu_read_lock(); + + return vma; +} #endif /* CONFIG_PER_VMA_LOCK */ #ifdef CONFIG_LOCK_MM_AND_FIND_VMA