From patchwork Sun Jan 24 05:00:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 370172 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D2F0C433E6 for ; Sun, 24 Jan 2021 05:01:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 313A4227BF for ; Sun, 24 Jan 2021 05:01:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725822AbhAXFBp (ORCPT ); Sun, 24 Jan 2021 00:01:45 -0500 Received: from mail.kernel.org ([198.145.29.99]:46780 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbhAXFBl (ORCPT ); Sun, 24 Jan 2021 00:01:41 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0DFCE227BF; Sun, 24 Jan 2021 05:00:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1611464460; bh=gTCA6l8fHulaeGy01+UkuoQeblt1NxvEMSuKk03+3GE=; h=Date:From:To:Subject:In-Reply-To:From; b=iH5APec+9Zv+ZhLSbV8JU1awwWHjZ0JMk9qCc28qIKXwGXb1A1GNY+G1Hl9frzXIx 12vdLCb2f06iAtyvFqkGq6DUzv8db/klPb1/AoGngcDcPLNa0eek7a+sglP8Kq6L6k oncMSfj72kLL/Ct0s4OebsBPRt5rNQ0rIKw10OCc= Date: Sat, 23 Jan 2021 21:00:57 -0800 From: Andrew Morton To: akpm@linux-foundation.org, bhe@redhat.com, bp@alien8.de, cai@lca.pw, david@redhat.com, hpa@zytor.com, linux-mm@kvack.org, mgorman@suse.de, mhocko@kernel.org, mingo@redhat.com, mm-commits@vger.kernel.org, rppt@linux.ibm.com, stable@vger.kernel.org, tglx@linutronix.de, torvalds@linux-foundation.org, vbabka@suse.cz Subject: [patch 01/19] x86/setup: don't remove E820_TYPE_RAM for pfn 0 Message-ID: <20210124050057.NwJ_nb3-V%akpm@linux-foundation.org> In-Reply-To: <20210123210029.a7c704d0cab204683e41ad10@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Mike Rapoport Subject: x86/setup: don't remove E820_TYPE_RAM for pfn 0 Patch series "mm: fix initialization of struct page for holes in memory layout", v3. Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") exposed several issues with the memory map initialization and these patches fix those issues. Initially there were crashes during compaction that Qian Cai reported back in April [1]. It seemed back then that the problem was fixed, but a few weeks ago Andrea Arcangeli hit the same bug [2] and there was an additional discussion at [3]. [1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw [2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com [3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foundation.org This patch (of 2): The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA. If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0. Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory. Link: https://lkml.kernel.org/r/20210111194017.22696-2-rppt@kernel.org Signed-off-by: Mike Rapoport Cc: Baoquan He Cc: Borislav Petkov Cc: David Hildenbrand Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Mel Gorman Cc: Michal Hocko Cc: Qian Cai Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- arch/x86/kernel/setup.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) --- a/arch/x86/kernel/setup.c~x86-setup-dont-remove-e820_type_ram-for-pfn-0 +++ a/arch/x86/kernel/setup.c @@ -661,17 +661,6 @@ static void __init trim_platform_memory_ static void __init trim_bios_range(void) { /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); - - /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not. * take them out. @@ -728,6 +717,15 @@ early_param("reservelow", parse_reservel static void __init trim_low_memory_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); } From patchwork Sun Jan 24 05:01:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 370171 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7E83C43381 for ; Sun, 24 Jan 2021 05:01:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83FD422CAF for ; Sun, 24 Jan 2021 05:01:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725855AbhAXFBu (ORCPT ); Sun, 24 Jan 2021 00:01:50 -0500 Received: from mail.kernel.org ([198.145.29.99]:46900 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725803AbhAXFBt (ORCPT ); Sun, 24 Jan 2021 00:01:49 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id DD67122CAF; Sun, 24 Jan 2021 05:01:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1611464468; bh=v4lR/kqZFRCCNu+1PsQARvAPeB5fTJSYEm5bicK1Y70=; h=Date:From:To:Subject:In-Reply-To:From; b=Nku7JUjIHcZtLpNBL3Qw20IjwbmS/uNiPl7GkC3yTCjlEfUF51gd3dgSvWrPWvfae gkceWAtj9eZKmPpzBP1jWwVpa1zMN/xeCVlcukJKJdXKo1q7cXJSZQrZrKdOg2e3RT 7Nm1zHW/uR3KWHT3Zp+Jdc3pFXBmCVAkOh5dylPo= Date: Sat, 23 Jan 2021 21:01:07 -0800 From: Andrew Morton To: akpm@linux-foundation.org, guro@fb.com, hannes@cmpxchg.org, imran.f.khan@oracle.com, linux-mm@kvack.org, mkoutny@suse.com, mm-commits@vger.kernel.org, shakeelb@google.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 03/19] mm: memcg/slab: optimize objcg stock draining Message-ID: <20210124050107.0rcfKNymq%akpm@linux-foundation.org> In-Reply-To: <20210123210029.a7c704d0cab204683e41ad10@linux-foundation.org> User-Agent: s-nail v14.8.16 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Roman Gushchin Subject: mm: memcg/slab: optimize objcg stock draining Imran Khan reported a 16% regression in hackbench results caused by the commit f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages"). The regression is noticeable in the case of a consequent allocation of several relatively large slab objects, e.g. skb's. As soon as the amount of stocked bytes exceeds PAGE_SIZE, drain_obj_stock() and __memcg_kmem_uncharge() are called, and it leads to a number of atomic operations in page_counter_uncharge(). The corresponding call graph is below (provided by Imran Khan): |__alloc_skb | | | |__kmalloc_reserve.isra.61 | | | | | |__kmalloc_node_track_caller | | | | | | | |slab_pre_alloc_hook.constprop.88 | | | obj_cgroup_charge | | | | | | | | | |__memcg_kmem_charge | | | | | | | | | | | |page_counter_try_charge | | | | | | | | | |refill_obj_stock | | | | | | | | | | | |drain_obj_stock.isra.68 | | | | | | | | | | | | | |__memcg_kmem_uncharge | | | | | | | | | | | | | | | |page_counter_uncharge | | | | | | | | | | | | | | | | | |page_counter_cancel | | | | | | | | | | | |__slab_alloc | | | | | | | | | |___slab_alloc | | | | | | | | |slab_post_alloc_hook Instead of directly uncharging the accounted kernel memory, it's possible to refill the generic page-sized per-cpu stock instead. It's a much faster operation, especially on a default hierarchy. As a bonus, __memcg_kmem_uncharge_page() will also get faster, so the freeing of page-sized kernel allocations (e.g. large kmallocs) will become faster. A similar change has been done earlier for the socket memory by the commit 475d0487a2ad ("mm: memcontrol: use per-cpu stocks for socket memory uncharging"). Link: https://lkml.kernel.org/r/20210106042239.2860107-1-guro@fb.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Roman Gushchin Reported-by: Imran Khan Tested-by: Imran Khan Reviewed-by: Shakeel Butt Reviewed-by: Michal Koutn Cc: Michal Koutný Cc: Johannes Weiner Cc: Signed-off-by: Andrew Morton --- mm/memcontrol.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/mm/memcontrol.c~mm-memcg-slab-optimize-objcg-stock-draining +++ a/mm/memcontrol.c @@ -3115,9 +3115,7 @@ void __memcg_kmem_uncharge(struct mem_cg if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) page_counter_uncharge(&memcg->kmem, nr_pages); - page_counter_uncharge(&memcg->memory, nr_pages); - if (do_memsw_account()) - page_counter_uncharge(&memcg->memsw, nr_pages); + refill_stock(memcg, nr_pages); } /** From patchwork Sun Jan 24 05:01:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 370170 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02E3AC433E0 for ; Sun, 24 Jan 2021 05:02:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D07DB22D50 for ; Sun, 24 Jan 2021 05:02:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726050AbhAXFCJ (ORCPT ); Sun, 24 Jan 2021 00:02:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:46990 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725948AbhAXFB6 (ORCPT ); Sun, 24 Jan 2021 00:01:58 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6400522CB2; Sun, 24 Jan 2021 05:01:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1611464477; bh=FqJpnG6+FxDthYi+rsO9sDjh30QIYODfuHDU0bQOfQw=; h=Date:From:To:Subject:In-Reply-To:From; b=s4w2RQQTFhbkTcHmVBisEWCg1epgpVe5+bOXdiTjRVcXApZjEDWVOemJy/WzfbAtk ge2JuobXFOb/Xx47dtZb6w+ASBiFIMvtSeS5ZntHGPK8+c4tRqAtYiHyBrrKlnBzEk +xBKR8Jmj/K55XXaKx0JeoFcEQOZ3GwQwC5pb2V4= Date: Sat, 23 Jan 2021 21:01:15 -0800 From: Andrew Morton To: akpm@linux-foundation.org, guro@fb.com, hannes@cmpxchg.org, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, shakeelb@google.com, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 05/19] mm: fix numa stats for thp migration Message-ID: <20210124050115.pcRcPJgcb%akpm@linux-foundation.org> In-Reply-To: <20210123210029.a7c704d0cab204683e41ad10@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Shakeel Butt Subject: mm: fix numa stats for thp migration Currently the kernel is not correctly updating the numa stats for NR_FILE_PAGES and NR_SHMEM on THP migration. Fix that. For NR_FILE_DIRTY and NR_ZONE_WRITE_PENDING, although at the moment there is no need to handle THP migration as kernel still does not have write support for file THP but to be more future proof, this patch adds the THP support for those stats as well. Link: https://lkml.kernel.org/r/20210108155813.2914586-2-shakeelb@google.com Fixes: e71769ae52609 ("mm: enable thp migration for shmem thp") Signed-off-by: Shakeel Butt Acked-by: Yang Shi Reviewed-by: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Cc: Muchun Song Cc: Signed-off-by: Andrew Morton --- mm/migrate.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) --- a/mm/migrate.c~mm-fix-numa-stats-for-thp-migration +++ a/mm/migrate.c @@ -402,6 +402,7 @@ int migrate_page_move_mapping(struct add struct zone *oldzone, *newzone; int dirty; int expected_count = expected_page_refs(mapping, page) + extra_count; + int nr = thp_nr_pages(page); if (!mapping) { /* Anonymous page without mapping */ @@ -437,7 +438,7 @@ int migrate_page_move_mapping(struct add */ newpage->index = page->index; newpage->mapping = page->mapping; - page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */ + page_ref_add(newpage, nr); /* add cache reference */ if (PageSwapBacked(page)) { __SetPageSwapBacked(newpage); if (PageSwapCache(page)) { @@ -459,7 +460,7 @@ int migrate_page_move_mapping(struct add if (PageTransHuge(page)) { int i; - for (i = 1; i < HPAGE_PMD_NR; i++) { + for (i = 1; i < nr; i++) { xas_next(&xas); xas_store(&xas, newpage); } @@ -470,7 +471,7 @@ int migrate_page_move_mapping(struct add * to one less reference. * We know this isn't the last reference. */ - page_ref_unfreeze(page, expected_count - thp_nr_pages(page)); + page_ref_unfreeze(page, expected_count - nr); xas_unlock(&xas); /* Leave irq disabled to prevent preemption while updating stats */ @@ -493,17 +494,17 @@ int migrate_page_move_mapping(struct add old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat); new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat); - __dec_lruvec_state(old_lruvec, NR_FILE_PAGES); - __inc_lruvec_state(new_lruvec, NR_FILE_PAGES); + __mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr); + __mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr); if (PageSwapBacked(page) && !PageSwapCache(page)) { - __dec_lruvec_state(old_lruvec, NR_SHMEM); - __inc_lruvec_state(new_lruvec, NR_SHMEM); + __mod_lruvec_state(old_lruvec, NR_SHMEM, -nr); + __mod_lruvec_state(new_lruvec, NR_SHMEM, nr); } if (dirty && mapping_can_writeback(mapping)) { - __dec_lruvec_state(old_lruvec, NR_FILE_DIRTY); - __dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING); - __inc_lruvec_state(new_lruvec, NR_FILE_DIRTY); - __inc_zone_state(newzone, NR_ZONE_WRITE_PENDING); + __mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr); + __mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr); + __mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr); + __mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr); } } local_irq_enable(); From patchwork Sun Jan 24 05:01:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 370169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A1A2C433E9 for ; Sun, 24 Jan 2021 05:02:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E0C1022CB3 for ; Sun, 24 Jan 2021 05:02:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726237AbhAXFCz (ORCPT ); Sun, 24 Jan 2021 00:02:55 -0500 Received: from mail.kernel.org ([198.145.29.99]:47458 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726252AbhAXFCr (ORCPT ); Sun, 24 Jan 2021 00:02:47 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id BF1C322CAF; Sun, 24 Jan 2021 05:01:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1611464514; bh=pcz3PCqxYzxs5kJ6kwRGGOcJPEtj0FCO0fqNHgQmnac=; h=Date:From:To:Subject:In-Reply-To:From; b=wB3ctqADZRXjXwvFYSEMxjCWcjIlPnnBgID8LlbMEy6E6K84dIgHuqfg+PnVUCcoG /KW4J+zY68jzM7HrZrq78YOz5zAohY8RvhMko7xCegLjpXKu5M5rBI/s2mIkunHAgk qQBuBdlODqS29OzafpGTumdPRGJmlZfhZAYtdePU= Date: Sat, 23 Jan 2021 21:01:52 -0800 From: Andrew Morton To: akpm@linux-foundation.org, cai@lca.pw, dan.j.williams@intel.com, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.de, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 13/19] mm: fix page reference leak in soft_offline_page() Message-ID: <20210124050152.mIDl3p6sw%akpm@linux-foundation.org> In-Reply-To: <20210123210029.a7c704d0cab204683e41ad10@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Dan Williams Subject: mm: fix page reference leak in soft_offline_page() The conversion to move pfn_to_online_page() internal to soft_offline_page() missed that the get_user_pages() reference taken by the madvise() path needs to be dropped when pfn_to_online_page() fails. Note the direct sysfs-path to soft_offline_page() does not perform a get_user_pages() lookup. When soft_offline_page() is handed a pfn_valid() && !pfn_to_online_page() pfn the kernel hangs at dax-device shutdown due to a leaked reference. Link: https://lkml.kernel.org/r/161058501210.1840162.8108917599181157327.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: feec24a6139d ("mm, soft-offline: convert parameter to pfn") Signed-off-by: Dan Williams Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Reviewed-by: Naoya Horiguchi Cc: Michal Hocko Cc: Qian Cai Cc: Signed-off-by: Andrew Morton --- mm/memory-failure.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) --- a/mm/memory-failure.c~mm-fix-page-reference-leak-in-soft_offline_page +++ a/mm/memory-failure.c @@ -1885,6 +1885,12 @@ static int soft_offline_free_page(struct return rc; } +static void put_ref_page(struct page *page) +{ + if (page) + put_page(page); +} + /** * soft_offline_page - Soft offline a page. * @pfn: pfn to soft-offline @@ -1910,20 +1916,26 @@ static int soft_offline_free_page(struct int soft_offline_page(unsigned long pfn, int flags) { int ret; - struct page *page; bool try_again = true; + struct page *page, *ref_page = NULL; + + WARN_ON_ONCE(!pfn_valid(pfn) && (flags & MF_COUNT_INCREASED)); if (!pfn_valid(pfn)) return -ENXIO; + if (flags & MF_COUNT_INCREASED) + ref_page = pfn_to_page(pfn); + /* Only online pages can be soft-offlined (esp., not ZONE_DEVICE). */ page = pfn_to_online_page(pfn); - if (!page) + if (!page) { + put_ref_page(ref_page); return -EIO; + } if (PageHWPoison(page)) { pr_info("%s: %#lx page already poisoned\n", __func__, pfn); - if (flags & MF_COUNT_INCREASED) - put_page(page); + put_ref_page(ref_page); return 0; }