From patchwork Sat Mar 13 05:07:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39570C433DB for ; Sat, 13 Mar 2021 05:08:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D8AB64FA8 for ; Sat, 13 Mar 2021 05:08:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231597AbhCMFHj (ORCPT ); Sat, 13 Mar 2021 00:07:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:41432 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230073AbhCMFHN (ORCPT ); Sat, 13 Mar 2021 00:07:13 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 74EAF64E41; Sat, 13 Mar 2021 05:07:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612033; bh=CmJZKDvonNZuEhrsQtHiVUIbYbcH3a9cT2/xdaUlRJU=; h=Date:From:To:Subject:In-Reply-To:From; b=RoqZWi4Mijhm76qFuOq2KFvWvZsRvbYpaILGqD+1fWA+1fqI12+HwCRa6umQ3a8++ FC+LnPUDUxzLCENPcNr+b96T3P5fk8T4VajBJzcH6TEv9jKI5OS5kpzsOmZ7iQawiF sz+vQMWOpl8XPyzdK7kPaHSKCllsAXQ+yflMg32s= Date: Fri, 12 Mar 2021 21:07:12 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, bhe@redhat.com, bp@alien8.de, cai@lca.pw, chris@chris-wilson.co.uk, david@redhat.com, hpa@zytor.com, linux-mm@kvack.org, lma@semihalf.com, mgorman@suse.de, mhocko@kernel.org, mingo@redhat.com, mm-commits@vger.kernel.org, rppt@linux.ibm.com, stable@vger.kernel.org, tglx@linutronix.de, tomi.p.sarvela@intel.com, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 04/29] mm/page_alloc.c: refactor initialization of struct page for holes in memory layout Message-ID: <20210313050712.incikM-Od%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Rapoport Subject: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") the holes inside a zone were re-initialized during memmap_init() and got their zone/node links right. However, after that commit nothing updates the struct pages representing such holes. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); in set_pfnblock_flags_mask() when called with a struct page from a range other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32 but the unset zone link in struct page makes them appear as a part of ZONE_DMA. Interleave initialization of the unavailable pages with the normal initialization of memory map, so that zone and node information will be properly set on struct pages that are not backed by the actual memory. With this change the pages for holes inside a zone will get proper zone/node links and the pages that are not spanned by any node will get links to the adjacent zone/node. The holes between nodes will be prepended to the zone/node above the hole and the trailing pages in the last section that will be appended to the zone/node below. [akpm@linux-foundation.org: don't initialize static to zero, use %llu for u64] Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport Reported-by: Qian Cai Reported-by: Andrea Arcangeli Reviewed-by: Baoquan He Acked-by: Vlastimil Babka Reviewed-by: David Hildenbrand Cc: Borislav Petkov Cc: Chris Wilson Cc: "H. Peter Anvin" Cc: Łukasz Majczak Cc: Ingo Molnar Cc: Mel Gorman Cc: Michal Hocko Cc: "Sarvela, Tomi P" Cc: Thomas Gleixner Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 158 +++++++++++++++++++++------------------------- 1 file changed, 75 insertions(+), 83 deletions(-) --- a/mm/page_alloc.c~mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout +++ a/mm/page_alloc.c @@ -6259,12 +6259,65 @@ static void __meminit zone_init_free_lis } } +#if !defined(CONFIG_FLAT_NODE_MEM_MAP) +/* + * Only struct pages that correspond to ranges defined by memblock.memory + * are zeroed and initialized by going through __init_single_page() during + * memmap_init_zone(). + * + * But, there could be struct pages that correspond to holes in + * memblock.memory. This can happen because of the following reasons: + * - physical memory bank size is not necessarily the exact multiple of the + * arbitrary section size + * - early reserved memory may not be listed in memblock.memory + * - memory layouts defined with memmap= kernel parameter may not align + * nicely with memmap sections + * + * Explicitly initialize those struct pages so that: + * - PG_Reserved is set + * - zone and node links point to zone and node that span the page if the + * hole is in the middle of a zone + * - zone and node links point to adjacent zone/node if the hole falls on + * the zone boundary; the pages in such holes will be prepended to the + * zone/node above the hole except for the trailing pages in the last + * section that will be appended to the zone/node below. + */ +static u64 __meminit init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) +{ + unsigned long pfn; + u64 pgcnt = 0; + + for (pfn = spfn; pfn < epfn; pfn++) { + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { + pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) + + pageblock_nr_pages - 1; + continue; + } + __init_single_page(pfn_to_page(pfn), pfn, zone, node); + __SetPageReserved(pfn_to_page(pfn)); + pgcnt++; + } + + return pgcnt; +} +#else +static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn, + int zone, int node) +{ + return 0; +} +#endif + void __meminit __weak memmap_init_zone(struct zone *zone) { unsigned long zone_start_pfn = zone->zone_start_pfn; unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone); + static unsigned long hole_pfn; unsigned long start_pfn, end_pfn; + u64 pgcnt = 0; for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); @@ -6274,7 +6327,29 @@ void __meminit __weak memmap_init_zone(s memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); + + if (hole_pfn < start_pfn) + pgcnt += init_unavailable_range(hole_pfn, start_pfn, + zone_id, nid); + hole_pfn = end_pfn; } + +#ifdef CONFIG_SPARSEMEM + /* + * Initialize the hole in the range [zone_end_pfn, section_end]. + * If zone boundary falls in the middle of a section, this hole + * will be re-initialized during the call to this function for the + * higher zone. + */ + end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION); + if (hole_pfn < end_pfn) + pgcnt += init_unavailable_range(hole_pfn, end_pfn, + zone_id, nid); +#endif + + if (pgcnt) + pr_info(" %s zone: %llu pages in unavailable ranges\n", + zone->name, pgcnt); } static int zone_batchsize(struct zone *zone) @@ -7071,88 +7146,6 @@ void __init free_area_init_memoryless_no free_area_init_node(nid); } -#if !defined(CONFIG_FLAT_NODE_MEM_MAP) -/* - * Initialize all valid struct pages in the range [spfn, epfn) and mark them - * PageReserved(). Return the number of struct pages that were initialized. - */ -static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) -{ - unsigned long pfn; - u64 pgcnt = 0; - - for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) - + pageblock_nr_pages - 1; - continue; - } - /* - * Use a fake node/zone (0) for now. Some of these pages - * (in memblock.reserved but not in memblock.memory) will - * get re-initialized via reserve_bootmem_region() later. - */ - __init_single_page(pfn_to_page(pfn), pfn, 0, 0); - __SetPageReserved(pfn_to_page(pfn)); - pgcnt++; - } - - return pgcnt; -} - -/* - * Only struct pages that are backed by physical memory are zeroed and - * initialized by going through __init_single_page(). But, there are some - * struct pages which are reserved in memblock allocator and their fields - * may be accessed (for example page_to_pfn() on some configuration accesses - * flags). We must explicitly initialize those struct pages. - * - * This function also addresses a similar issue where struct pages are left - * uninitialized because the physical address range is not covered by - * memblock.memory or memblock.reserved. That could happen when memblock - * layout is manually configured via memmap=, or when the highest physical - * address (max_pfn) does not end on a section boundary. - */ -static void __init init_unavailable_mem(void) -{ - phys_addr_t start, end; - u64 i, pgcnt; - phys_addr_t next = 0; - - /* - * Loop through unavailable ranges not covered by memblock.memory. - */ - pgcnt = 0; - for_each_mem_range(i, &start, &end) { - if (next < start) - pgcnt += init_unavailable_range(PFN_DOWN(next), - PFN_UP(start)); - next = end; - } - - /* - * Early sections always have a fully populated memmap for the whole - * section - see pfn_valid(). If the last section has holes at the - * end and that section is marked "online", the memmap will be - * considered initialized. Make sure that memmap has a well defined - * state. - */ - pgcnt += init_unavailable_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); - - /* - * Struct pages that do not have backing memory. This could be because - * firmware is using some of this memory, or for some other reasons. - */ - if (pgcnt) - pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); -} -#else -static inline void __init init_unavailable_mem(void) -{ -} -#endif /* !CONFIG_FLAT_NODE_MEM_MAP */ - #if MAX_NUMNODES > 1 /* * Figure out the number of possible node ids. @@ -7576,7 +7569,6 @@ void __init free_area_init(unsigned long /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); - init_unavailable_mem(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid); From patchwork Sat Mar 13 05:07:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ECAAC4332B for ; Sat, 13 Mar 2021 05:08:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3235264F9E for ; Sat, 13 Mar 2021 05:08:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230114AbhCMFHj (ORCPT ); Sat, 13 Mar 2021 00:07:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:41830 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231383AbhCMFHi (ORCPT ); Sat, 13 Mar 2021 00:07:38 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id DCE0564F8F; Sat, 13 Mar 2021 05:07:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612058; bh=1xtT7t78qCwdLMG7y+oZc57boXEC044DLpWX8MhTK+4=; h=Date:From:To:Subject:In-Reply-To:From; b=0Njt4SepH/A9hR6DIzwUrtRz/Z5vWuXfOXenus1f0huSeAUKu1C3EwYd/voXu0roV jEJILq2nNOnBZoU7fW/XoFobECjPaV02H3oG220ARdw1SQYQaf8RlVC1Izk7x419m6 yugi4FdpbeT/Y71O81sbAe7INqVpcOkccMwQWCvA= Date: Fri, 12 Mar 2021 21:07:37 -0800 From: Andrew Morton To: akpm@linux-foundation.org, hirofumi@mail.parknet.co.jp, linux-mm@kvack.org, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org Subject: [patch 11/29] mm/highmem.c: fix zero_user_segments() with start > end Message-ID: <20210313050737.mOQhR0vUf%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: OGAWA Hirofumi Subject: mm/highmem.c: fix zero_user_segments() with start > end zero_user_segments() is used from __block_write_begin_int(), for example like the following zero_user_segments(page, 4096, 1024, 512, 918) But new the zero_user_segments() implementation for for HIGHMEM + TRANSPARENT_HUGEPAGE doesn't handle "start > end" case correctly, and hits BUG_ON(). (we can fix __block_write_begin_int() instead though, it is the old and multiple usage) Also it calls kmap_atomic() unnecessarily while start == end == 0. Link: https://lkml.kernel.org/r/87v9ab60r4.fsf@mail.parknet.co.jp Fixes: 0060ef3b4e6d ("mm: support THPs in zero_user_segments") Signed-off-by: OGAWA Hirofumi Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton --- mm/highmem.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) --- a/mm/highmem.c~fix-zero_user_segments-with-start-end +++ a/mm/highmem.c @@ -368,20 +368,24 @@ void zero_user_segments(struct page *pag BUG_ON(end1 > page_size(page) || end2 > page_size(page)); + if (start1 >= end1) + start1 = end1 = 0; + if (start2 >= end2) + start2 = end2 = 0; + for (i = 0; i < compound_nr(page); i++) { void *kaddr = NULL; - if (start1 < PAGE_SIZE || start2 < PAGE_SIZE) - kaddr = kmap_atomic(page + i); - if (start1 >= PAGE_SIZE) { start1 -= PAGE_SIZE; end1 -= PAGE_SIZE; } else { unsigned this_end = min_t(unsigned, end1, PAGE_SIZE); - if (end1 > start1) + if (end1 > start1) { + kaddr = kmap_atomic(page + i); memset(kaddr + start1, 0, this_end - start1); + } end1 -= this_end; start1 = 0; } @@ -392,8 +396,11 @@ void zero_user_segments(struct page *pag } else { unsigned this_end = min_t(unsigned, end2, PAGE_SIZE); - if (end2 > start2) + if (end2 > start2) { + if (!kaddr) + kaddr = kmap_atomic(page + i); memset(kaddr + start2, 0, this_end - start2); + } end2 -= this_end; start2 = 0; } From patchwork Sat Mar 13 05:07:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22D75C433DB for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0618C64FC0 for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232179AbhCMFIM (ORCPT ); Sat, 13 Mar 2021 00:08:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:41882 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230441AbhCMFHm (ORCPT ); Sat, 13 Mar 2021 00:07:42 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6CC5264F90; Sat, 13 Mar 2021 05:07:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612061; bh=Ilh21OVyvFr0B5NZVnY+FinM1N+HNc2mploCjMNqv/s=; h=Date:From:To:Subject:In-Reply-To:From; b=brhHQKbRg8n0639jxb3P792lSqiY4AbU0nvCPb8biDjI1SaZ0RUCr3jvOQ88hilFC cjxcYiIYWClYZAopnpQrEXYyv8JARXjrpf9WPM+Cqbp4ptM5fDbYfKEBC595nPo5XJ hyZo58dcIOI5VsGO4YmdejYEgzwZIs/EXZTGcny0= Date: Fri, 12 Mar 2021 21:07:41 -0800 From: Andrew Morton To: akpm@linux-foundation.org, deller@gmx.de, linux-mm@kvack.org, liorribak@gmail.com, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, viro@zeniv.linux.org.uk Subject: [patch 12/29] binfmt_misc: fix possible deadlock in bm_register_write Message-ID: <20210313050741.ypAY0ENiS%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Lior Ribak Subject: binfmt_misc: fix possible deadlock in bm_register_write There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=) at kernel/locking/rwsem.c:1213 3 __down_read (sem=) at kernel/locking/rwsem.c:1222 4 down_read (sem=) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=, buffer=, count=19, ppos=) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: Lior Ribak Acked-by: Helge Deller Cc: Al Viro Cc: Signed-off-by: Andrew Morton --- fs/binfmt_misc.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) --- a/fs/binfmt_misc.c~binfmt_misc-fix-possible-deadlock-in-bm_register_write +++ a/fs/binfmt_misc.c @@ -649,12 +649,24 @@ static ssize_t bm_register_write(struct struct super_block *sb = file_inode(file)->i_sb; struct dentry *root = sb->s_root, *dentry; int err = 0; + struct file *f = NULL; e = create_entry(buffer, count); if (IS_ERR(e)) return PTR_ERR(e); + if (e->flags & MISC_FMT_OPEN_FILE) { + f = open_exec(e->interpreter); + if (IS_ERR(f)) { + pr_notice("register: failed to install interpreter file %s\n", + e->interpreter); + kfree(e); + return PTR_ERR(f); + } + e->interp_file = f; + } + inode_lock(d_inode(root)); dentry = lookup_one_len(e->name, root, strlen(e->name)); err = PTR_ERR(dentry); @@ -678,21 +690,6 @@ static ssize_t bm_register_write(struct goto out2; } - if (e->flags & MISC_FMT_OPEN_FILE) { - struct file *f; - - f = open_exec(e->interpreter); - if (IS_ERR(f)) { - err = PTR_ERR(f); - pr_notice("register: failed to install interpreter file %s\n", e->interpreter); - simple_release_fs(&bm_mnt, &entry_count); - iput(inode); - inode = NULL; - goto out2; - } - e->interp_file = f; - } - e->dentry = dget(dentry); inode->i_private = e; inode->i_fop = &bm_entry_operations; @@ -709,6 +706,8 @@ out: inode_unlock(d_inode(root)); if (err) { + if (f) + filp_close(f, NULL); kfree(e); return err; } From patchwork Sat Mar 13 05:07:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 399595 Delivered-To: patch@linaro.org Received: by 2002:a02:8562:0:0:0:0:0 with SMTP id g89csp1762017jai; Fri, 12 Mar 2021 21:08:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJzn3kJYq78waKAjHAAWrOK/tvxQD2mYAH/2e64alhTlRBS9W72l3+7o+sysP07Kk7OGRnAp X-Received: by 2002:a05:6402:cb8:: with SMTP id cn24mr18348954edb.105.1615612124117; Fri, 12 Mar 2021 21:08:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615612124; cv=none; d=google.com; s=arc-20160816; b=TC27jZl2miNjo23dQfcPHzXk7lwdWtpuoMyNGadmcgcFKqkbgojSQ2o2LIvHLAtGzQ KEpiDFAg8JNzZxhebcGrbg7numI4QdomMngWATFVSr+jraU48PyOp1XKMENAZTsCVpQj 52ln8YRlrIbv+oaQTb2838mKtEY7XX1ceFEtHY+TAlqgFqHXSjolEUcvApJDQszE3svx ZkAKl82K6ousDfXZBEmae4MiIIm1j+fBD7KzEFzwfgJyuUsHSSjSINirE+gbAQvs32M+ WFkkE4B9bkaUYzQjFwGxXnrBmaLDQdi0Io8671NAAngzsTW+66Rxu8e8tnIdFac9L6XF fy0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:message-id:subject:to :from:date:dkim-signature; bh=zED7o9egQlhVXnRGUtU/y+mq1FXa81kO+IMr74UvT1Y=; b=0xik78n+n9crsDn6kzRYY7n5p7mDNdqjn+gMLD+HYFKlgOF9yh+TTQe17JdsYRYnqT 5ICIMYlOXh1b5775ROoC1gfkfWARXr1QCowfA/QRhj40VViHYnwBtLc+2Q0qjDPdXY8R xmJFIC0TlIF6/gNvDc6IdSPRqf5Mhw+tvdN6WcNbN4H8YKaXf+uyRnQZPYgRcAT3rzjV 3s9iZ4Sj3Ph+vfuRljSaoCSDMwhMY/yVjuddrEn8frFLAqd7G0HZNsUtWYvQiDaG+MBl Rd920iXRNBWMgIZfvFRd6YdUoLflGzYxZNGTk9LnCpuVdKWaxvFirbvCYVXbbZw/EKBv iKog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=gH1W4eak; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rh9si6194999ejb.341.2021.03.12.21.08.43; Fri, 12 Mar 2021 21:08:44 -0800 (PST) Received-SPF: pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=korg header.b=gH1W4eak; spf=pass (google.com: domain of stable-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232210AbhCMFIM (ORCPT + 12 others); Sat, 13 Mar 2021 00:08:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:41944 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231723AbhCMFHs (ORCPT ); Sat, 13 Mar 2021 00:07:48 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B141064F91; Sat, 13 Mar 2021 05:07:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612068; bh=KKTkhef31fR4JMNYj70z9BZaxZrdISCGvo1YJFQCLDg=; h=Date:From:To:Subject:In-Reply-To:From; b=gH1W4eakRO5NY2uDhlwpxuufWQwrrK7IH/nD/EGCMCuw7tsta/mdN84TX6MNP1dms XVnrGhprRxoWdAEO1mjE9CwjwbictbpJZNmN/+wzCPTyxmhGy7zVfuwRVLJJhZqbK0 V5624aMI46v3Yf3672kV1Y/pmLNYC0BzCMnnrtFE= Date: Fri, 12 Mar 2021 21:07:47 -0800 From: Andrew Morton To: akpm@linux-foundation.org, aou@eecs.berkeley.edu, arnd@arndb.de, deanbo422@gmail.com, elver@google.com, green.hu@gmail.com, guoren@kernel.org, keescook@chromium.org, linux-mm@kvack.org, luc.vanoostenryck@gmail.com, masahiroy@kernel.org, mm-commits@vger.kernel.org, nathan@kernel.org, ndesaulniers@google.com, nickhu@andestech.com, nivedita@alum.mit.edu, ojeda@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, rdunlap@infradead.org, samitolvanen@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 14/29] linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Message-ID: <20210313050747.ZSth0fs8-%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Arnd Bergmann Subject: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Separating compiler-clang.h from compiler-gcc.h inadventently dropped the definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling back to the open-coded version and hoping that the compiler detects it. Since all versions of clang support the __builtin_bswap interfaces, add back the flags and have the headers pick these up automatically. This results in a 4% improvement of compilation speed for arm defconfig. Note: it might also be worth revisiting which architectures set CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is set on six architectures (arm32, csky, mips, powerpc, s390, x86), while another ten architectures define custom helpers (alpha, arc, ia64, m68k, mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300, hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized version and rely on the compiler to detect it. A long time ago, the compiler builtins were architecture specific, but nowadays, all compilers that are able to build the kernel have correct implementations of them, though some may not be as optimized as the inline asm versions. The patch that dropped the optimization landed in v4.19, so as discussed it would be fairly safe to backport this revert to stable kernels to the 4.19/5.4/5.10 stable kernels, but there is a remaining risk for regressions, and it has no known side-effects besides compile speed. Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/ Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Signed-off-by: Arnd Bergmann Reviewed-by: Nathan Chancellor Reviewed-by: Kees Cook Acked-by: Miguel Ojeda Acked-by: Nick Desaulniers Acked-by: Luc Van Oostenryck Cc: Masahiro Yamada Cc: Nick Hu Cc: Greentime Hu Cc: Vincent Chen Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Guo Ren Cc: Randy Dunlap Cc: Sami Tolvanen Cc: Marco Elver Cc: Arvind Sankar Cc: Signed-off-by: Andrew Morton --- include/linux/compiler-clang.h | 6 ++++++ 1 file changed, 6 insertions(+) --- a/include/linux/compiler-clang.h~linux-compiler-clangh-define-have_builtin_bswap +++ a/include/linux/compiler-clang.h @@ -31,6 +31,12 @@ #define __no_sanitize_thread #endif +#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#define __HAVE_BUILTIN_BSWAP16__ +#endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP */ + #if __has_feature(undefined_behavior_sanitizer) /* GCC does not have __SANITIZE_UNDEFINED__ */ #define __no_sanitize_undefined \ From patchwork Sat Mar 13 05:08:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CDF4C4332B for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5A69264FAB for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232223AbhCMFIM (ORCPT ); Sat, 13 Mar 2021 00:08:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:42136 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232018AbhCMFII (ORCPT ); Sat, 13 Mar 2021 00:08:08 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 40CB564F9F; Sat, 13 Mar 2021 05:08:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612087; bh=oFkGicWgNPYPGg9DjaeEP/qDqRxtBkw4IS4j+oKsFHQ=; h=Date:From:To:Subject:In-Reply-To:From; b=Z1cS1rVtJoYywnjjYdqvEqXRJbleFXXQtBsNuqV49teb7sJtC77JjzgMsoqt0Wqr2 B7uLKl59dSqbUUvKfR4twR2OwdcUAejrabcu465Dx0z8X4MALVng/6DwVjKn2sGTX+ aWeEqpcs1KfEhH5Tvs2SyZ+mlD50DQxyy2uOPoJM= Date: Fri, 12 Mar 2021 21:08:06 -0800 From: Andrew Morton To: akpm@linux-foundation.org, fweimer@redhat.com, jannh@google.com, jeffv@google.com, jmorris@namei.org, keescook@chromium.org, linux-mm@kvack.org, mhocko@suse.com, minchan@kernel.org, mm-commits@vger.kernel.org, oleg@redhat.com, rientjes@google.com, shakeelb@google.com, stable@vger.kernel.org, surenb@google.com, timmurray@google.com, torvalds@linux-foundation.org Subject: [patch 19/29] mm/madvise: replace ptrace attach requirement for process_madvise Message-ID: <20210313050806.o83g5fWX9%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Suren Baghdasaryan Subject: mm/madvise: replace ptrace attach requirement for process_madvise process_madvise currently requires ptrace attach capability. PTRACE_MODE_ATTACH gives one process complete control over another process. It effectively removes the security boundary between the two processes (in one direction). Granting ptrace attach capability even to a system process is considered dangerous since it creates an attack surface. This severely limits the usage of this API. The operations process_madvise can perform do not affect the correctness of the operation of the target process; they only affect where the data is physically located (and therefore, how fast it can be accessed). What we want is the ability for one process to influence another process in order to optimize performance across the entire system while leaving the security boundary intact. Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process performance. Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com Signed-off-by: Suren Baghdasaryan Reviewed-by: Kees Cook Acked-by: Minchan Kim Acked-by: David Rientjes Cc: Jann Horn Cc: Jeff Vander Stoep Cc: Michal Hocko Cc: Shakeel Butt Cc: Tim Murray Cc: Florian Weimer Cc: Oleg Nesterov Cc: James Morris Cc: [5.10+] Signed-off-by: Andrew Morton --- mm/madvise.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mm-madvise-replace-ptrace-attach-requirement-for-process_madvise +++ a/mm/madvise.c @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pi goto release_task; } - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); if (IS_ERR_OR_NULL(mm)) { ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; goto release_task; } + /* + * Require CAP_SYS_NICE for influencing process performance. Note that + * only non-destructive hints are currently supported. + */ + if (!capable(CAP_SYS_NICE)) { + ret = -EPERM; + goto release_mm; + } + total_len = iov_iter_count(&iter); while (iov_iter_count(&iter)) { @@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi if (ret == 0) ret = total_len - iov_iter_count(&iter); +release_mm: mmput(mm); release_task: put_task_struct(task); From patchwork Sat Mar 13 05:08:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C307DC432C3 for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AA45864FC0 for ; Sat, 13 Mar 2021 05:08:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232230AbhCMFIM (ORCPT ); Sat, 13 Mar 2021 00:08:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:42176 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232023AbhCMFIL (ORCPT ); Sat, 13 Mar 2021 00:08:11 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id AC18164F94; Sat, 13 Mar 2021 05:08:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612091; bh=CESRd6SNLsaOY2RX7+AT+TjD+Ud3fDWanSkhqCBWjIg=; h=Date:From:To:Subject:In-Reply-To:From; b=PdldyS3trAOVSGdcJ/vSyxcigYafByIrYREXBqfKO9CWcPb+CWEFmLQoA0Vms8ojK +1+9icXjWTlVmTNeEKRNomOzuASDLDT4zcuwPfsPtxGTSuELZuQpd/14smfNmhfk7c lpPqRD1nihnu2VRxSBSZLgeezRW9ut8+G0XcXJIY= Date: Fri, 12 Mar 2021 21:08:10 -0800 From: Andrew Morton To: akpm@linux-foundation.org, andreyknvl@google.com, aryabinin@virtuozzo.com, Branislav.Rankov@arm.com, catalin.marinas@arm.com, dvyukov@google.com, elver@google.com, eugenis@google.com, glider@google.com, hch@infradead.org, kevin.brodsky@arm.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, pcc@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vincenzo.frascino@arm.com, will.deacon@arm.com Subject: [patch 20/29] kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Message-ID: <20210313050810.cdPfxpuDx%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Andrey Konovalov Subject: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called after debug_pagealloc_unmap_pages(). This causes a crash when debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an unmapped page. This patch puts kasan_free_nondeferred_pages() before debug_pagealloc_unmap_pages() and arch_free_page(), which can also make the page unavailable. Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.1614987311.git.andreyknvl@google.com Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov Cc: Catalin Marinas Cc: Will Deacon Cc: Vincenzo Frascino Cc: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Marco Elver Cc: Peter Collingbourne Cc: Evgenii Stepanov Cc: Branislav Rankov Cc: Kevin Brodsky Cc: Christoph Hellwig Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/mm/page_alloc.c~kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc +++ a/mm/page_alloc.c @@ -1282,6 +1282,12 @@ static __always_inline bool free_pages_p kernel_poison_pages(page, 1 << order); /* + * With hardware tag-based KASAN, memory tags must be set before the + * page becomes unavailable via debug_pagealloc or arch_free_page. + */ + kasan_free_nondeferred_pages(page, order); + + /* * arch_free_page() can make the page's contents inaccessible. s390 * does this. So nothing which can access the page's contents should * happen after this. @@ -1290,8 +1296,6 @@ static __always_inline bool free_pages_p debug_pagealloc_unmap_pages(page, 1 << order); - kasan_free_nondeferred_pages(page, order); - return true; } From patchwork Sat Mar 13 05:08:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400069 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8D29C4332D for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85EC364FB2 for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232462AbhCMFIo (ORCPT ); Sat, 13 Mar 2021 00:08:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:42270 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231938AbhCMFIP (ORCPT ); Sat, 13 Mar 2021 00:08:15 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 36B6164FA7; Sat, 13 Mar 2021 05:08:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612094; bh=zK+h7VijOFJSPX1xiKzXdTpB0F2ONaCHKR8zs3+pwpI=; h=Date:From:To:Subject:In-Reply-To:From; b=rfgrpHBbGtvQhmyBGO6jJv+u3sy9urBSspIgLK54eWMs9OcHDA8jhqJHiTaq6KUgm EM5Wac31xfUM9PwWMNxvjVwEKkFod+16PdYSqwjBEQsZGcCNo0ftzqwoT0dbNo8GU9 OKg6wL2cNxSAFBTzypSAuIGFsPHEzOEHL99jVaNA= Date: Fri, 12 Mar 2021 21:08:13 -0800 From: Andrew Morton To: akpm@linux-foundation.org, andreyknvl@google.com, aryabinin@virtuozzo.com, Branislav.Rankov@arm.com, catalin.marinas@arm.com, dvyukov@google.com, elver@google.com, eugenis@google.com, glider@google.com, kevin.brodsky@arm.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, pcc@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vincenzo.frascino@arm.com, will.deacon@arm.com Subject: [patch 21/29] kasan: fix KASAN_STACK dependency for HW_TAGS Message-ID: <20210313050813.Gybpfo-3O%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Andrey Konovalov Subject: kasan: fix KASAN_STACK dependency for HW_TAGS There's a runtime failure when running HW_TAGS-enabled kernel built with GCC on hardware that doesn't support MTE. GCC-built kernels always have CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't supported by HW_TAGS. Having that config enabled causes KASAN to issue MTE-only instructions to unpoison kernel stacks, which causes the failure. Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used. (The commit that introduced CONFIG_KASAN_HW_TAGS specified proper dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.) Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.1615215441.git.andreyknvl@google.com Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov Reported-by: Catalin Marinas Cc: Cc: Will Deacon Cc: Vincenzo Frascino Cc: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Marco Elver Cc: Peter Collingbourne Cc: Evgenii Stepanov Cc: Branislav Rankov Cc: Kevin Brodsky Signed-off-by: Andrew Morton --- lib/Kconfig.kasan | 1 + 1 file changed, 1 insertion(+) --- a/lib/Kconfig.kasan~kasan-fix-kasan_stack-dependency-for-hw_tags +++ a/lib/Kconfig.kasan @@ -156,6 +156,7 @@ config KASAN_STACK_ENABLE config KASAN_STACK int + depends on KASAN_GENERIC || KASAN_SW_TAGS default 1 if KASAN_STACK_ENABLE || CC_IS_GCC default 0 From patchwork Sat Mar 13 05:08:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74E48C433E6 for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4B63964FC2 for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231938AbhCMFIo (ORCPT ); Sat, 13 Mar 2021 00:08:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:42304 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232023AbhCMFIS (ORCPT ); Sat, 13 Mar 2021 00:08:18 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id CBB6164F9F; Sat, 13 Mar 2021 05:08:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612098; bh=b3F1VoctMIJHcIs8Gdx70me1T9qaO6qMZ/xsIutgeXc=; h=Date:From:To:Subject:In-Reply-To:From; b=2qsyFxzrjPvtGVjjXxvygGkWPdND6T2p6di8jHvzKxr74lgGOZ18khu/1wU1wrGmw VzzMp8x70DaeJbxV2ft2/4VVgqhM2sUteGeciRB8LoG0WeB4VBJUFZzNsMH3IaDK+8 ZPGiGZYQyXbBlqwHrA2pGHmvI3+tjkTzGauJnyyY= Date: Fri, 12 Mar 2021 21:08:17 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, linux-mm@kvack.org, luto@kernel.org, mike.kravetz@oracle.com, minchan@kernel.org, mm-commits@vger.kernel.org, namit@vmware.com, peterx@redhat.com, peterz@infradead.org, rppt@linux.vnet.ibm.com, stable@vger.kernel.org, torvalds@linux-foundation.org, will@kernel.org, xemul@openvz.org, yuzhao@google.com Subject: [patch 22/29] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: <20210313050817.0WOtpAOpA%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Nadav Amit Subject: mm/userfaultfd: fix memory corruption due to writeprotect Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit Suggested-by: Yu Zhao Reviewed-by: Peter Xu Tested-by: Peter Xu Cc: Andrea Arcangeli Cc: Andy Lutomirski Cc: Pavel Emelyanov Cc: Mike Kravetz Cc: Mike Rapoport Cc: Minchan Kim Cc: Will Deacon Cc: Peter Zijlstra Cc: [5.9+] Signed-off-by: Andrew Morton --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/memory.c~mm-userfaultfd-fix-memory-corruption-due-to-writeprotect +++ a/mm/memory.c @@ -3097,6 +3097,14 @@ static vm_fault_t do_wp_page(struct vm_f return handle_userfault(vmf, VM_UFFD_WP); } + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* From patchwork Sat Mar 13 05:08:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C87CC433DB for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E863464FA7 for ; Sat, 13 Mar 2021 05:09:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbhCMFIo (ORCPT ); Sat, 13 Mar 2021 00:08:44 -0500 Received: from mail.kernel.org ([198.145.29.99]:42352 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232316AbhCMFIV (ORCPT ); Sat, 13 Mar 2021 00:08:21 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3913D64FA7; Sat, 13 Mar 2021 05:08:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612101; bh=xGzWrpNlMUt9/MxfPRiV1rbhDkPL83KBiXbTDs5FvRc=; h=Date:From:To:Subject:In-Reply-To:From; b=oyCWy5BtIuP8SrDRyMS9Q3U7qARGltEYhCMxM3sXdJPIRKywtxfoJ5UaRp4uEEOfH 05BT48E/7ZBxRlWaC2j07QtWog2vtw35uZ1IvCmt0sXZ70feDoK1X5zcuTHGNAFvSg MmTfrDm2D6NxQtD8CEYSRx5fpbeWcM2gfMwx/Nh0= Date: Fri, 12 Mar 2021 21:08:20 -0800 From: Andrew Morton To: akpm@linux-foundation.org, aneesh.kumar@linux.vnet.ibm.com, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, stable@vger.kernel.org, tony.luck@intel.com, torvalds@linux-foundation.org Subject: [patch 23/29] mm, hwpoison: do not lock page again when me_huge_page() successfully recovers Message-ID: <20210313050820.EoPGLS3gj%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Naoya Horiguchi Subject: mm, hwpoison: do not lock page again when me_huge_page() successfully recovers Currently me_huge_page() temporary unlocks page to perform some actions then locks it again later. My testcase (which calls hard-offline on some tail page in a hugetlb, then accesses the address of the hugetlb range) showed that page allocation code detects this page lock on buddy page and printed out "BUG: Bad page state" message. check_new_page_bad() does not consider a page with __PG_HWPOISON as bad page, so this flag works as kind of filter, but this filtering doesn't work in this case because the "bad page" is not the actual hwpoisoned page. This patch suggests to drop the 2nd page lock to fix the issue. Link: https://lkml.kernel.org/r/20210304064437.962442-1-nao.horiguchi@gmail.com Fixes: commit 78bb920344b8 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error") Signed-off-by: Naoya Horiguchi Reviewed-by: Oscar Salvador Cc: Michal Hocko Cc: Tony Luck Cc: "Aneesh Kumar K.V" Cc: Signed-off-by: Andrew Morton --- mm/memory-failure.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/memory-failure.c~mm-hwpoison-do-not-lock-page-again-when-me_huge_page-successfully-recovers +++ a/mm/memory-failure.c @@ -834,7 +834,6 @@ static int me_huge_page(struct page *p, page_ref_inc(p); res = MF_RECOVERED; } - lock_page(hpage); } return res; @@ -1290,7 +1289,8 @@ static int memory_failure_hugetlb(unsign res = identify_page_state(pfn, p, page_flags); out: - unlock_page(head); + if (PageLocked(head)) + unlock_page(head); return res; } From patchwork Sat Mar 13 05:08:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400068 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC43FC4332E for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B4E964FA9 for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232316AbhCMFIp (ORCPT ); Sat, 13 Mar 2021 00:08:45 -0500 Received: from mail.kernel.org ([198.145.29.99]:42462 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231280AbhCMFIb (ORCPT ); Sat, 13 Mar 2021 00:08:31 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B719764FA7; Sat, 13 Mar 2021 05:08:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612111; bh=h/7AwC0TELgQFSqW0OMIySB5iyXv2povQ2eQlxl7oxM=; h=Date:From:To:Subject:In-Reply-To:From; b=0KeA9uQ9zRfsMPhIEfbPpO+gzp5AcShAG3gE2Guf5rcjQbFjKzCtNks+GoJ3Ytks7 9ITf3tL0J5YLTtVZTFVvYZGljGUi2FbilVaXcr7hpcglEKNG8ows9xqWZHCD/g74xB CGT2/MomKRF/KeUfaJyzzbbq8nl9PT1RDIln3dXQ= Date: Fri, 12 Mar 2021 21:08:30 -0800 From: Andrew Morton To: akpm@linux-foundation.org, chenweilong@huawei.com, dingtianhong@huawei.com, guohanjun@huawei.com, hannes@cmpxchg.org, hughd@google.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, npiggin@gmail.com, rui.xiang@huawei.com, shakeelb@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangkefeng.wang@huawei.com, zhouguanghui1@huawei.com, ziy@nvidia.com Subject: [patch 26/29] mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument Message-ID: <20210313050830.AEb8zbkl_%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Zhou Guanghui Subject: mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass in page number argument. In this way, the interface name is more common and can be used by potential users. In addition, the complete info(memcg and flag) of the memcg needs to be set to the tail pages. Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui Acked-by: Johannes Weiner Reviewed-by: Zi Yan Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Hugh Dickins Cc: "Kirill A. Shutemov" Cc: Nicholas Piggin Cc: Kefeng Wang Cc: Hanjun Guo Cc: Tianhong Ding Cc: Weilong Chen Cc: Rui Xiang Cc: Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 6 ++---- mm/huge_memory.c | 2 +- mm/memcontrol.c | 15 ++++++--------- 3 files changed, 9 insertions(+), 14 deletions(-) --- a/include/linux/memcontrol.h~mm-memcg-rename-mem_cgroup_split_huge_fixup-to-split_page_memcg +++ a/include/linux/memcontrol.h @@ -1061,9 +1061,7 @@ static inline void memcg_memory_event_mm rcu_read_unlock(); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -void mem_cgroup_split_huge_fixup(struct page *head); -#endif +void split_page_memcg(struct page *head, unsigned int nr); #else /* CONFIG_MEMCG */ @@ -1400,7 +1398,7 @@ unsigned long mem_cgroup_soft_limit_recl return 0; } -static inline void mem_cgroup_split_huge_fixup(struct page *head) +static inline void split_page_memcg(struct page *head, unsigned int nr) { } --- a/mm/huge_memory.c~mm-memcg-rename-mem_cgroup_split_huge_fixup-to-split_page_memcg +++ a/mm/huge_memory.c @@ -2467,7 +2467,7 @@ static void __split_huge_page(struct pag int i; /* complete memcg works before add pages to LRU */ - mem_cgroup_split_huge_fixup(head); + split_page_memcg(head, nr); if (PageAnon(head) && PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) }; --- a/mm/memcontrol.c~mm-memcg-rename-mem_cgroup_split_huge_fixup-to-split_page_memcg +++ a/mm/memcontrol.c @@ -3287,24 +3287,21 @@ void obj_cgroup_uncharge(struct obj_cgro #endif /* CONFIG_MEMCG_KMEM */ -#ifdef CONFIG_TRANSPARENT_HUGEPAGE /* - * Because page_memcg(head) is not set on compound tails, set it now. + * Because page_memcg(head) is not set on tails, set it now. */ -void mem_cgroup_split_huge_fixup(struct page *head) +void split_page_memcg(struct page *head, unsigned int nr) { struct mem_cgroup *memcg = page_memcg(head); int i; - if (mem_cgroup_disabled()) + if (mem_cgroup_disabled() || !memcg) return; - for (i = 1; i < HPAGE_PMD_NR; i++) { - css_get(&memcg->css); - head[i].memcg_data = (unsigned long)memcg; - } + for (i = 1; i < nr; i++) + head[i].memcg_data = head->memcg_data; + css_get_many(&memcg->css, nr - 1); } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #ifdef CONFIG_MEMCG_SWAP /** From patchwork Sat Mar 13 05:08:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400067 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67B7BC43619 for ; Sat, 13 Mar 2021 05:09:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4387D64FB5 for ; Sat, 13 Mar 2021 05:09:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232331AbhCMFIq (ORCPT ); Sat, 13 Mar 2021 00:08:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:42516 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232401AbhCMFIg (ORCPT ); Sat, 13 Mar 2021 00:08:36 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 902F264FA8; Sat, 13 Mar 2021 05:08:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612115; bh=3BlKgJi0ll++8rFe7FHuTeUyHYOG/ULHH87iCYBuEz0=; h=Date:From:To:Subject:In-Reply-To:From; b=sAynsjzZMqr18T9Ur1ebKYh7J9gFZMvAjXYLDVbjEEbG5FrUYVtS29cWpS5Q534E8 APTLQ8R4Z5M0sPkwaSArMMaP9URURKQ45sGReuclpfMs+OszNNvrGkm6EfR34NvhNs Cvc72RhQIMExqfH5C5xlt8jSCTZVCEyLV2WNXd9Q= Date: Fri, 12 Mar 2021 21:08:33 -0800 From: Andrew Morton To: akpm@linux-foundation.org, chenweilong@huawei.com, dingtianhong@huawei.com, guohanjun@huawei.com, hannes@cmpxchg.org, hughd@google.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, npiggin@gmail.com, rui.xiang@huawei.com, shakeelb@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org, wangkefeng.wang@huawei.com, zhouguanghui1@huawei.com, ziy@nvidia.com Subject: [patch 27/29] mm/memcg: set memcg when splitting page Message-ID: <20210313050833.w4W0nDDFK%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Zhou Guanghui Subject: mm/memcg: set memcg when splitting page As described in the split_page() comment, for the non-compound high order page, the sub-pages must be freed individually. If the memcg of the first page is valid, the tail pages cannot be uncharged when be freed. For example, when alloc_pages_exact is used to allocate 1MB continuous physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is set). When make_alloc_exact free the unused 1MB and free_pages_exact free the applied 1MB, actually, only 4KB(one page) is uncharged. Therefore, the memcg of the tail page needs to be set when splitting a page. Michel: There are at least two explicit users of __GFP_ACCOUNT with alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713 ("KVM: s390: Add memcg accounting to KVM allocations"), so this is not just a theoretical issue. Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui Acked-by: Johannes Weiner Reviewed-by: Zi Yan Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Hanjun Guo Cc: Hugh Dickins Cc: Kefeng Wang Cc: "Kirill A. Shutemov" Cc: Nicholas Piggin Cc: Rui Xiang Cc: Tianhong Ding Cc: Weilong Chen Cc: Signed-off-by: Andrew Morton --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) --- a/mm/page_alloc.c~mm-memcg-set-memcg-when-split-page +++ a/mm/page_alloc.c @@ -3314,6 +3314,7 @@ void split_page(struct page *page, unsig for (i = 1; i < (1 << order); i++) set_page_refcounted(page + i); split_page_owner(page, 1 << order); + split_page_memcg(page, 1 << order); } EXPORT_SYMBOL_GPL(split_page); From patchwork Sat Mar 13 05:08:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22160C4321A for ; Sat, 13 Mar 2021 05:09:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0EE8464FC2 for ; Sat, 13 Mar 2021 05:09:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232779AbhCMFIq (ORCPT ); Sat, 13 Mar 2021 00:08:46 -0500 Received: from mail.kernel.org ([198.145.29.99]:42568 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232431AbhCMFIj (ORCPT ); Sat, 13 Mar 2021 00:08:39 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9765464F9E; Sat, 13 Mar 2021 05:08:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612118; bh=TaxE/KGyWhstjGhOfVZASTWuhokBSyHjrCxNHx602qk=; h=Date:From:To:Subject:In-Reply-To:From; b=APXjg6ds2UHPXz0cq0YOo3PDAI8q+M2WHFRSFxoFDfHSl3BxYsZqPUEIJVwzBU2b2 fodT7iDZRClponM9vEBAOS7vsZlKl9mSYJtNV6Dr9li+yI5ML0ij0llTQ7LyKRynlO sks5Yo7hD9LMp0HZRrj1mrOWBkzWd7LME03kPQgk= Date: Fri, 12 Mar 2021 21:08:38 -0800 From: Andrew Morton To: akpm@linux-foundation.org, colin.king@canonical.com, joaodias@google.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, sergey.senozhatsky@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 28/29] zram: fix return value on writeback_store Message-ID: <20210313050838.RTs56zfiV%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Minchan Kim Subject: zram: fix return value on writeback_store writeback_store's return value is overwritten by submit_bio_wait's return value. Thus, writeback_store will return zero since there was no IO error. In the end, write syscall from userspace will see the zero as return value, which could make the process stall to keep trying the write until it will succeed. Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim Cc: Sergey Senozhatsky Cc: Colin Ian King Cc: John Dias Cc: Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) --- a/drivers/block/zram/zram_drv.c~zram-fix-return-value-on-writeback_store +++ a/drivers/block/zram/zram_drv.c @@ -627,7 +627,7 @@ static ssize_t writeback_store(struct de struct bio_vec bio_vec; struct page *page; ssize_t ret = len; - int mode; + int mode, err; unsigned long blk_idx = 0; if (sysfs_streq(buf, "idle")) @@ -728,12 +728,17 @@ static ssize_t writeback_store(struct de * XXX: A single page IO would be inefficient for write * but it would be not bad as starter. */ - ret = submit_bio_wait(&bio); - if (ret) { + err = submit_bio_wait(&bio); + if (err) { zram_slot_lock(zram, index); zram_clear_flag(zram, index, ZRAM_UNDER_WB); zram_clear_flag(zram, index, ZRAM_IDLE); zram_slot_unlock(zram, index); + /* + * Return last IO error unless every IO were + * not suceeded. + */ + ret = err; continue; } From patchwork Sat Mar 13 05:08:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 400393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AC05C43333 for ; Sat, 13 Mar 2021 05:09:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DDB4964F8D for ; Sat, 13 Mar 2021 05:09:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232880AbhCMFIr (ORCPT ); Sat, 13 Mar 2021 00:08:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:42630 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232440AbhCMFIm (ORCPT ); Sat, 13 Mar 2021 00:08:42 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B708564FA7; Sat, 13 Mar 2021 05:08:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1615612122; bh=Jp1b5qEapWDbCGjBEiqEpW0Pxh4Nci8/KiOnVdR/LAo=; h=Date:From:To:Subject:In-Reply-To:From; b=fZLsxydd0FjlsBjFS1AgOzYfYLu8Cwok6LaYuwVQOscLRS2BsZ6MVrVFbpUDWhJ+Q zrK+btgn7ObROCH1MyGhqjnpE6Jfi0eoPsZP5zgB6BFkQExWMs0G+sAk8fhLqixjN1 gg6T3FAr0t+728Tx9relwCerGvlOKoqfXjE1vjUE= Date: Fri, 12 Mar 2021 21:08:41 -0800 From: Andrew Morton To: akpm@linux-foundation.org, amosbianchi@google.com, joaodias@google.com, linux-mm@kvack.org, minchan@kernel.org, mm-commits@vger.kernel.org, sergey.senozhatsky@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 29/29] zram: fix broken page writeback Message-ID: <20210313050841.HjY9UHuIF%akpm@linux-foundation.org> In-Reply-To: <20210312210632.9b7d62973d72a56fb13c7a03@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Minchan Kim Subject: zram: fix broken page writeback commit 0d8359620d9b ("zram: support page writeback") introduced two problems. It overwrites writeback_store's return value as kstrtol's return value, which makes return value zero so user could see zero as return value of write syscall even though it wrote data successfully. It also breaks index value in the loop in that it doesn't increase the index any longer. It means it can write only first starting block index so user couldn't write all idle pages in the zram so lose memory saving chance. This patch fixes those issues. Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org Fixes: 0d8359620d9b("zram: support page writeback") Signed-off-by: Minchan Kim Reported-by: Amos Bianchi Cc: Sergey Senozhatsky Cc: John Dias Cc: Signed-off-by: Andrew Morton --- drivers/block/zram/zram_drv.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/drivers/block/zram/zram_drv.c~zram-fix-broken-page-writeback +++ a/drivers/block/zram/zram_drv.c @@ -638,8 +638,8 @@ static ssize_t writeback_store(struct de if (strncmp(buf, PAGE_WB_SIG, sizeof(PAGE_WB_SIG) - 1)) return -EINVAL; - ret = kstrtol(buf + sizeof(PAGE_WB_SIG) - 1, 10, &index); - if (ret || index >= nr_pages) + if (kstrtol(buf + sizeof(PAGE_WB_SIG) - 1, 10, &index) || + index >= nr_pages) return -EINVAL; nr_pages = 1; @@ -663,7 +663,7 @@ static ssize_t writeback_store(struct de goto release_init_lock; } - while (nr_pages--) { + for (; nr_pages != 0; index++, nr_pages--) { struct bio_vec bvec; bvec.bv_page = page;