From patchwork Wed Feb 24 20:00:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BD51C433E9 for ; Wed, 24 Feb 2021 20:02:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5218564F1B for ; Wed, 24 Feb 2021 20:02:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234995AbhBXUCM (ORCPT ); Wed, 24 Feb 2021 15:02:12 -0500 Received: from mail.kernel.org ([198.145.29.99]:55460 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234917AbhBXUCI (ORCPT ); Wed, 24 Feb 2021 15:02:08 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B13FF64F08; Wed, 24 Feb 2021 20:00:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614196832; bh=lgDhecVyjLcC8CX2QsFWfn7GWsa4VUqLSi3Jc9aifws=; h=Date:From:To:Subject:In-Reply-To:From; b=HsgIJOJKi0RhtDiBvgO9IlUsy/qvs9dild4oSVsfCTPAAOkapyl1h/kh+1MMfNE+M EoC44MGoL6+qCb3P/PqQvMjB72z1oKdmARXOJiCZyWEmw27s0FrRxD7VRPFvmoE6G1 LixUS8uPgKHGqBvR5nXfHploTd+IRFqiqf//Y5S8= Date: Wed, 24 Feb 2021 12:00:30 -0800 From: Andrew Morton To: akpm@linux-foundation.org, anton@tuxera.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, rkovhaev@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 007/173] ntfs: check for valid standard information attribute Message-ID: <20210224200030.F8x-RJnAx%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Rustam Kovhaev Subject: ntfs: check for valid standard information attribute Mounting a corrupted filesystem with NTFS resulted in a kernel crash. We should check for valid STANDARD_INFORMATION attribute offset and length before trying to access it Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969 Signed-off-by: Rustam Kovhaev Reported-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Tested-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Acked-by: Anton Altaparmakov Cc: Signed-off-by: Andrew Morton --- fs/ntfs/inode.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/fs/ntfs/inode.c~ntfs-check-for-valid-standard-information-attribute +++ a/fs/ntfs/inode.c @@ -629,6 +629,12 @@ static int ntfs_read_locked_inode(struct } a = ctx->attr; /* Get the standard information attribute value. */ + if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset) + + le32_to_cpu(a->data.resident.value_length) > + (u8 *)ctx->mrec + vol->mft_record_size) { + ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode."); + goto unm_err_out; + } si = (STANDARD_INFORMATION*)((u8*)a + le16_to_cpu(a->data.resident.value_offset)); From patchwork Wed Feb 24 20:04:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387656 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0594C433DB for ; Wed, 24 Feb 2021 20:06:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8705364F14 for ; Wed, 24 Feb 2021 20:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233674AbhBXUGQ (ORCPT ); Wed, 24 Feb 2021 15:06:16 -0500 Received: from mail.kernel.org ([198.145.29.99]:56778 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235120AbhBXUFg (ORCPT ); Wed, 24 Feb 2021 15:05:36 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 99A5B64F32; Wed, 24 Feb 2021 20:04:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197060; bh=b8z0jFYGhZJXgwfpm3qd1Qnj3LqjaHi5HU4vzHcmA/4=; h=Date:From:To:Subject:In-Reply-To:From; b=IKlOtRgJqR4hrwoFFiTiea8tUcTonCvTXKqzlR36HWl3pVc8oQDuxnFuarMG5wrVx wCvh/VP1XcTMCad9gSOmfnn0NJ1WjoKc3RviocJGf3x+S2R+EE2aT5PSjM+Z2yeXdh UvkGKUmMia0KYZKfwznkgiSKFRYtdqibrnJAdbXI= Date: Wed, 24 Feb 2021 12:04:19 -0800 From: Andrew Morton To: akpm@linux-foundation.org, hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, shakeelb@google.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vdavydov.dev@gmail.com Subject: [patch 073/173] mm: memcontrol: fix swap undercounting in cgroup2 Message-ID: <20210224200419.SKAB6Vl7M%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: memcontrol: fix swap undercounting in cgroup2 When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. Add the do_memsw_account() check back to fix this problem. Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: Muchun Song Acked-by: Johannes Weiner Reviewed-by: Shakeel Butt Acked-by: Michal Hocko Cc: Vladimir Davydov Cc: [5.8+] Signed-off-by: Andrew Morton --- mm/memcontrol.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-swap-undercounting-in-cgroup2 +++ a/mm/memcontrol.c @@ -6748,7 +6748,19 @@ int mem_cgroup_charge(struct page *page, memcg_check_events(memcg, page); local_irq_enable(); - if (PageSwapCache(page)) { + /* + * Cgroup1's unified memory+swap counter has been charged with the + * new swapcache page, finish the transfer by uncharging the swap + * slot. The swap slot would also get uncharged when it dies, but + * it can stick around indefinitely and we'd count the page twice + * the entire time. + * + * Cgroup2 has separate resource counters for memory and swap, + * so this is a non-issue here. Memory and swap charge lifetimes + * correspond 1:1 to page and swap slot lifetimes: we charge the + * page to memory here, and uncharge swap when the slot is freed. + */ + if (do_memsw_account() && PageSwapCache(page)) { swp_entry_t entry = { .val = page_private(page) }; /* * The swap entry might not get freed for a long time, From patchwork Wed Feb 24 20:04:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0304C433E0 for ; Wed, 24 Feb 2021 20:06:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 622F664F11 for ; Wed, 24 Feb 2021 20:06:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234364AbhBXUGS (ORCPT ); Wed, 24 Feb 2021 15:06:18 -0500 Received: from mail.kernel.org ([198.145.29.99]:56140 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234906AbhBXUFv (ORCPT ); Wed, 24 Feb 2021 15:05:51 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 31F4764F11; Wed, 24 Feb 2021 20:04:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197063; bh=G31EBYnGYWA+9ziOs13ui7hLFTA9Ds66VuzLUR1+Iv4=; h=Date:From:To:Subject:In-Reply-To:From; b=si5bUio5hFqsEjKRig7n4vmR60gzC42liNYSxQ6OQc3feQk003GyrD0TmiSURD3DK NqjymUnrfzPZHB1usvHOdEf13vscSZfoeV6hQINEKMuno0e5H0/KG4D1aFDvMsr/Te MmT+prNuBlA4QG+t3+FVRVtGntzQenx+rTuRsyWI= Date: Wed, 24 Feb 2021 12:04:22 -0800 From: Andrew Morton To: akpm@linux-foundation.org, guro@fb.com, hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, shakeelb@google.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vdavydov.dev@gmail.com Subject: [patch 074/173] mm: memcontrol: fix get_active_memcg return value Message-ID: <20210224200422.mMBuJs4L_%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: memcontrol: fix get_active_memcg return value We use a global percpu int_active_memcg variable to store the remote memcg when we are in the interrupt context. But get_active_memcg always return the current->active_memcg or root_mem_cgroup. The remote memcg (set in the interrupt context) is ignored. This is not what we want. So fix it. Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts") Signed-off-by: Muchun Song Reviewed-by: Shakeel Butt Reviewed-by: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Cc: Vladimir Davydov Cc: Signed-off-by: Andrew Morton --- mm/memcontrol.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-get_active_memcg-return-value +++ a/mm/memcontrol.c @@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup rcu_read_lock(); memcg = active_memcg(); - if (memcg) { - /* current->active_memcg must hold a ref. */ - if (WARN_ON_ONCE(!css_tryget(&memcg->css))) - memcg = root_mem_cgroup; - else - memcg = current->active_memcg; - } + /* remote memcg must hold a ref. */ + if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css))) + memcg = root_mem_cgroup; rcu_read_unlock(); return memcg; From patchwork Wed Feb 24 20:07:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387655 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 467FBC433E6 for ; Wed, 24 Feb 2021 20:08:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0991664F25 for ; Wed, 24 Feb 2021 20:08:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235368AbhBXUId (ORCPT ); Wed, 24 Feb 2021 15:08:33 -0500 Received: from mail.kernel.org ([198.145.29.99]:59764 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235372AbhBXUIG (ORCPT ); Wed, 24 Feb 2021 15:08:06 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 9964764E60; Wed, 24 Feb 2021 20:07:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197271; bh=/PowuA+liQXKosAi5ro+HSU9tErdU/1suZR4xdRManE=; h=Date:From:To:Subject:In-Reply-To:From; b=jvOjdcmzKa5aBOYZ+fyNyBW2hJVjNkQRdaXh/D8Ntd5ODyhnrYVOoJZCPIgYkJoDf FRxeEIVvTk/8EqZvTBtX1BG7fxOuBTsl2KR9+Q9jHRU8oWrLD3xK5z17aBkWM3CFt6 trWjNffznm0i3evCunxpUI93ScH6chRbhfWhMKjo= Date: Wed, 24 Feb 2021 12:07:50 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, dbueso@suse.de, joao.m.martins@oracle.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org, ziy@nvidia.com Subject: [patch 130/173] hugetlb: fix update_and_free_page contig page struct assumption Message-ID: <20210224200750.Dfh-Co_Ux%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Kravetz Subject: hugetlb: fix update_and_free_page contig page struct assumption page structs are not guaranteed to be contiguous for gigantic pages. The routine update_and_free_page can encounter a gigantic page, yet it assumes page structs are contiguous when setting page flags in subpages. If update_and_free_page encounters non-contiguous page structs, we can see “BUG: Bad page state in process …” errors. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Zi Yan outlined steps to reproduce here [1]. [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/ Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") Signed-off-by: Zi Yan Signed-off-by: Mike Kravetz Cc: Zi Yan Cc: Davidlohr Bueso Cc: "Kirill A . Shutemov" Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: Oscar Salvador Cc: Joao Martins Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~hugetlb-fix-update_and_free_page-contig-page-struct-assumption +++ a/mm/hugetlb.c @@ -1321,14 +1321,16 @@ static inline void destroy_compound_giga static void update_and_free_page(struct hstate *h, struct page *page) { int i; + struct page *subpage = page; if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return; h->nr_huge_pages--; h->nr_huge_pages_node[page_to_nid(page)]--; - for (i = 0; i < pages_per_huge_page(h); i++) { - page[i].flags &= ~(1 << PG_locked | 1 << PG_error | + for (i = 0; i < pages_per_huge_page(h); + i++, subpage = mem_map_next(subpage, page, i)) { + subpage->flags &= ~(1 << PG_locked | 1 << PG_error | 1 << PG_referenced | 1 << PG_dirty | 1 << PG_active | 1 << PG_private | 1 << PG_writeback); From patchwork Wed Feb 24 20:07:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3FA5C43381 for ; Wed, 24 Feb 2021 20:09:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6EDE64F2A for ; Wed, 24 Feb 2021 20:09:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235166AbhBXUJR (ORCPT ); Wed, 24 Feb 2021 15:09:17 -0500 Received: from mail.kernel.org ([198.145.29.99]:60432 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233143AbhBXUIg (ORCPT ); Wed, 24 Feb 2021 15:08:36 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 389E364E09; Wed, 24 Feb 2021 20:07:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197275; bh=br2SndbgWMiTdshaOJ96Bl/IjnRBOH6TI6vIRuk13hA=; h=Date:From:To:Subject:In-Reply-To:From; b=j4coavo/7wIDbnXSiGAKMsdOw27MuUSL3vUcUJzYlO/u6RGStE0PdlKVWTpfr31Vv ANtks2OJDILcCxp2ugUvB8JDW958pEt5FAaPluYVH8jto6wNkdiKq6bkQpCGsvsUJH E9SGLDvWoIGvr+THUgAo1a5adCKoUUbaiYU2eMzU= Date: Wed, 24 Feb 2021 12:07:54 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, dbueso@suse.de, joao.m.martins@oracle.com, kirill.shutemov@linux.intel.com, linux-mm@kvack.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org, ziy@nvidia.com Subject: [patch 131/173] hugetlb: fix copy_huge_page_from_user contig page struct assumption Message-ID: <20210224200754.-vZcYjb2n%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Kravetz Subject: hugetlb: fix copy_huge_page_from_user contig page struct assumption page structs are not guaranteed to be contiguous for gigantic pages. The routine copy_huge_page_from_user can encounter gigantic pages, yet it assumes page structs are contiguous when copying pages from user space. Since page structs for the target gigantic page are not contiguous, the data copied from user space could overwrite other pages not associated with the gigantic page and cause data corruption. Non-contiguous page structs are generally not an issue. However, they can exist with a specific kernel configuration and hotplug operations. For example: Configure the kernel with CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where the gigantic page will be allocated. Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Mike Kravetz Cc: Zi Yan Cc: Davidlohr Bueso Cc: "Kirill A . Shutemov" Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: Oscar Salvador Cc: Joao Martins Cc: Signed-off-by: Andrew Morton --- mm/memory.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) --- a/mm/memory.c~hugetlb-fix-copy_huge_page_from_user-contig-page-struct-assumption +++ a/mm/memory.c @@ -5177,17 +5177,19 @@ long copy_huge_page_from_user(struct pag void *page_kaddr; unsigned long i, rc = 0; unsigned long ret_val = pages_per_huge_page * PAGE_SIZE; + struct page *subpage = dst_page; - for (i = 0; i < pages_per_huge_page; i++) { + for (i = 0; i < pages_per_huge_page; + i++, subpage = mem_map_next(subpage, dst_page, i)) { if (allow_pagefault) - page_kaddr = kmap(dst_page + i); + page_kaddr = kmap(subpage); else - page_kaddr = kmap_atomic(dst_page + i); + page_kaddr = kmap_atomic(subpage); rc = copy_from_user(page_kaddr, (const void __user *)(src + i * PAGE_SIZE), PAGE_SIZE); if (allow_pagefault) - kunmap(dst_page + i); + kunmap(subpage); else kunmap_atomic(page_kaddr); From patchwork Wed Feb 24 20:09:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387654 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F680C433E0 for ; Wed, 24 Feb 2021 20:11:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03B1260202 for ; Wed, 24 Feb 2021 20:11:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234570AbhBXUKk (ORCPT ); Wed, 24 Feb 2021 15:10:40 -0500 Received: from mail.kernel.org ([198.145.29.99]:33400 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235385AbhBXUJ5 (ORCPT ); Wed, 24 Feb 2021 15:09:57 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id DDE8564F23; Wed, 24 Feb 2021 20:09:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197356; bh=6uZAIWo00IIlfca439cq6f0bs6LY/39RlcishU7qFJI=; h=Date:From:To:Subject:In-Reply-To:From; b=S9O5QbeZKjn956bTm527aEuDSv+5HUVYmBTRmRGq6PUGt892WCBjgHCNgiA/lFZtS 8hBD/I4ICCbHqovlrvcQAnEaHvIcNeA8BEQXD2xoTtmwkFl8Ls99ivazAKQOp06Qpt EQKJ39Uizh8yDk4GrrJIGU3Xsjfv+A+IptZe1PKU= Date: Wed, 24 Feb 2021 12:09:15 -0800 From: Andrew Morton To: akpm@linux-foundation.org, alex.shi@linux.alibaba.com, ben.widawsky@intel.com, cai@lca.pw, cl@linux.com, dan.j.williams@intel.com, dave.hansen@linux.intel.com, dwagner@suse.de, linux-mm@kvack.org, mm-commits@vger.kernel.org, osalvador@suse.de, rientjes@google.com, stable@vger.kernel.org, tobin@kernel.org, torvalds@linux-foundation.org, ying.huang@intel.com Subject: [patch 152/173] mm/vmscan: restore zone_reclaim_mode ABI Message-ID: <20210224200915.lASg4FukJ%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Dave Hansen Subject: mm/vmscan: restore zone_reclaim_mode ABI I went to go add a new RECLAIM_* mode for the zone_reclaim_mode sysctl. Like a good kernel developer, I also went to go update the documentation. I noticed that the bits in the documentation didn't match the bits in the #defines. The VM never explicitly checks the RECLAIM_ZONE bit. The bit is, however implicitly checked when checking 'node_reclaim_mode==0'. The RECLAIM_ZONE #define was removed in a cleanup. That, by itself is fine. But, when the bit was removed (bit 0) the _other_ bit locations also got changed. That's not OK because the bit values are documented to mean one specific thing. Users surely do not expect the meaning to change from kernel to kernel. The end result is that if someone had a script that did: sysctl vm.zone_reclaim_mode=1 it would have gone from enabling node reclaim for clean unmapped pages to writing out pages during node reclaim after the commit in question. That's not great. Put the bits back the way they were and add a comment so something like this is a bit harder to do again. Update the documentation to make it clear that the first bit is ignored. Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com Signed-off-by: Dave Hansen Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") Reviewed-by: Ben Widawsky Reviewed-by: Oscar Salvador Acked-by: David Rientjes Acked-by: Christoph Lameter Cc: Alex Shi Cc: Daniel Wagner Cc: "Tobin C. Harding" Cc: Christoph Lameter Cc: Andrew Morton Cc: Huang Ying Cc: Dan Williams Cc: Qian Cai Cc: Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/vm.rst | 10 +++++----- mm/vmscan.c | 9 +++++++-- 2 files changed, 12 insertions(+), 7 deletions(-) --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-zone_reclaim_mode-abi +++ a/Documentation/admin-guide/sysctl/vm.rst @@ -983,11 +983,11 @@ that benefit from having their data cach left disabled as the caching effect is likely to be more important than data locality. -zone_reclaim may be enabled if it's known that the workload is partitioned -such that each partition fits within a NUMA node and that accessing remote -memory would cause a measurable performance reduction. The page allocator -will then reclaim easily reusable pages (those page cache pages that are -currently not used) before allocating off node pages. +Consider enabling one or more zone_reclaim mode bits if it's known that the +workload is partitioned such that each partition fits within a NUMA node +and that accessing remote memory would cause a measurable performance +reduction. The page allocator will take additional actions before +allocating off node pages. Allowing zone reclaim to write out pages stops processes that are writing large amounts of data from dirtying pages on other nodes. Zone --- a/mm/vmscan.c~mm-vmscan-restore-zone_reclaim_mode-abi +++ a/mm/vmscan.c @@ -4085,8 +4085,13 @@ module_init(kswapd_init) */ int node_reclaim_mode __read_mostly; -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */ -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */ +/* + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl + * ABI. New bits are OK, but existing bits can never change. + */ +#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */ +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ /* * Priority for NODE_RECLAIM. This determines the fraction of pages From patchwork Wed Feb 24 20:09:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 387154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51B44C433E9 for ; Wed, 24 Feb 2021 20:12:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1AEEC60202 for ; Wed, 24 Feb 2021 20:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235420AbhBXULl (ORCPT ); Wed, 24 Feb 2021 15:11:41 -0500 Received: from mail.kernel.org ([198.145.29.99]:33776 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235427AbhBXUKV (ORCPT ); Wed, 24 Feb 2021 15:10:21 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 3626E64F08; Wed, 24 Feb 2021 20:09:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1614197380; bh=VV6ZulOy2kH1v4Md6bTtg1bw8UcpYGu+3f/5IWxm+AE=; h=Date:From:To:Subject:In-Reply-To:From; b=vP6hVi4+mA48JEBv/XbAVy3giNWWIgob9CiX0fd69Xjersrv61MOqia/yl3pJuNIL ov0ujnwWJevafn16RetSlB83DYQ9WUQDUDoEhFyp5bG5e/svxnI5FFqrPSqaBdhVkV IjUbInV6VzzHt2CJwuS0LuiFORhbebSny6ud0Z6o= Date: Wed, 24 Feb 2021 12:09:39 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mgorman@techsingularity.net, mhocko@kernel.org, mm-commits@vger.kernel.org, rientjes@google.com, rppt@kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 159/173] mm, compaction: make fast_isolate_freepages() stay within zone Message-ID: <20210224200939.yLPGAqLE6%akpm@linux-foundation.org> In-Reply-To: <20210224115824.1e289a6895087f10c41dd8d6@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Vlastimil Babka Subject: mm, compaction: make fast_isolate_freepages() stay within zone Compaction always operates on pages from a single given zone when isolating both pages to migrate and freepages. Pageblock boundaries are intersected with zone boundaries to be safe in case zone starts or ends in the middle of pageblock. The use of pageblock_pfn_to_page() protects against non-contiguous pageblocks. The functions fast_isolate_freepages() and fast_isolate_around() don't currently protect the fast freepage isolation thoroughly enough against these corner cases, and can result in freepage isolation operate outside of zone boundaries: - in fast_isolate_freepages() if we get a pfn from the first pageblock of a zone that starts in the middle of that pageblock, 'highest' can be a pfn outside of the zone. If we fail to isolate anything in this function, we may then call fast_isolate_around() on a pfn outside of the zone and there effectively do a set_pageblock_skip(page_to_pfn(highest)) which may currently hit a VM_BUG_ON() in some configurations - fast_isolate_around() checks only the zone end boundary and not beginning, nor that the pageblock is contiguous (with pageblock_pfn_to_page()) so it's possible that we end up calling isolate_freepages_block() on a range of pfn's from two different zones and end up e.g. isolating freepages under the wrong zone's lock. This patch should fix the above issues. Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Vlastimil Babka Acked-by: David Rientjes Acked-by: Mel Gorman Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Michal Hocko Cc: Mike Rapoport Cc: Signed-off-by: Andrew Morton --- mm/compaction.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) --- a/mm/compaction.c~mm-compaction-make-fast_isolate_freepages-stay-within-zone +++ a/mm/compaction.c @@ -1284,7 +1284,7 @@ static void fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) { unsigned long start_pfn, end_pfn; - struct page *page = pfn_to_page(pfn); + struct page *page; /* Do not search around if there are enough pages already */ if (cc->nr_freepages >= cc->nr_migratepages) @@ -1295,8 +1295,12 @@ fast_isolate_around(struct compact_contr return; /* Pageblock boundaries */ - start_pfn = pageblock_start_pfn(pfn); - end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)) - 1; + start_pfn = max(pageblock_start_pfn(pfn), cc->zone->zone_start_pfn); + end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)); + + page = pageblock_pfn_to_page(start_pfn, end_pfn, cc->zone); + if (!page) + return; /* Scan before */ if (start_pfn != pfn) { @@ -1398,7 +1402,8 @@ fast_isolate_freepages(struct compact_co pfn = page_to_pfn(freepage); if (pfn >= highest) - highest = pageblock_start_pfn(pfn); + highest = max(pageblock_start_pfn(pfn), + cc->zone->zone_start_pfn); if (pfn >= low_pfn) { cc->fast_search_fail = 0; @@ -1468,7 +1473,8 @@ fast_isolate_freepages(struct compact_co } else { if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pageblock_pfn_to_page(min_pfn, - pageblock_end_pfn(min_pfn), + min(pageblock_end_pfn(min_pfn), + zone_end_pfn(cc->zone)), cc->zone); cc->free_pfn = min_pfn; }