From patchwork Fri Feb 5 02:32:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 377623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78FB1C433E9 for ; Fri, 5 Feb 2021 02:32:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 44EE864FC4 for ; Fri, 5 Feb 2021 02:32:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230208AbhBECcv (ORCPT ); Thu, 4 Feb 2021 21:32:51 -0500 Received: from mail.kernel.org ([198.145.29.99]:45444 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230196AbhBECcp (ORCPT ); Thu, 4 Feb 2021 21:32:45 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id C267764FB5; Fri, 5 Feb 2021 02:32:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492324; bh=wO7JDv5iUq5UBRYLDqqgzS9Pn6nL3dsNlzx3vMC1jMQ=; h=Date:From:To:Subject:In-Reply-To:From; b=2ENTE3S+uALxsM14oC/u7e2la/Bcq5IhURbYlNzUHEi8EDb3OHM/EAiMv6G5TPLmN 19Qg8mtf+Bk5TA/LGmZzHdiPd3BbxFGZxK/i5B7qtcxFF92jDhWikuxnVU73Nlm7hd urqlxYtAZqx7aknkH5t8aTw/U14JOq+egWwQBlHc= Date: Thu, 04 Feb 2021 18:32:03 -0800 From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 01/18] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Message-ID: <20210205023203.MmSvWEbBn%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song Acked-by: Michal Hocko Reviewed-by: Mike Kravetz Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 2 +- 3 files changed, 5 insertions(+), 2 deletions(-) --- a/fs/hugetlbfs/inode.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/fs/hugetlbfs/inode.c @@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct f mutex_unlock(&hugetlb_fault_mutex_table[hash]); + set_page_huge_active(page); /* * unlock_page because locked by add_to_page_cache() - * page_put due to reference from alloc_huge_page() + * put_page() due to reference from alloc_huge_page() */ unlock_page(page); put_page(page); --- a/include/linux/hugetlb.h~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/include/linux/hugetlb.h @@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot } #endif +void set_page_huge_active(struct page *page); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; --- a/mm/hugetlb.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page +++ a/mm/hugetlb.c @@ -1349,7 +1349,7 @@ bool page_huge_active(struct page *page) } /* never called for tail page */ -static void set_page_huge_active(struct page *page) +void set_page_huge_active(struct page *page) { VM_BUG_ON_PAGE(!PageHeadHuge(page), page); SetPagePrivate(&page[1]); From patchwork Fri Feb 5 02:32:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 378096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D757BC433E0 for ; Fri, 5 Feb 2021 02:33:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9FB3464FB7 for ; Fri, 5 Feb 2021 02:33:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230243AbhBECdJ (ORCPT ); Thu, 4 Feb 2021 21:33:09 -0500 Received: from mail.kernel.org ([198.145.29.99]:45488 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230216AbhBECcz (ORCPT ); Thu, 4 Feb 2021 21:32:55 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 838F864FA7; Fri, 5 Feb 2021 02:32:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492328; bh=P6f1F6FJYzZax6vY2IZ9d990X5Oqy/8KsN8hSZGebcU=; h=Date:From:To:Subject:In-Reply-To:From; b=gOnREfVGwXXZccNnGCGP9ebPlEM7zjV+Rd2SFhID5HhWTvlemThb9Nmygojic4S9Z xc+cWVLFPlgUcjAwDhvq5GyqYVKG99ZGqID40vj215pSgaxrdVn2/ngU1ScYsctyx7 lv27ir17+a2FeuNA8Jz6WdhXDNPHlCREQL8SJx+c= Date: Thu, 04 Feb 2021 18:32:06 -0800 From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 02/18] mm: hugetlb: fix a race between freeing and dissolving the page Message-ID: <20210205023206.qxjmf4Zwv%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: hugetlb: fix a race between freeing and dissolving the page There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race windows is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) --- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page +++ a/mm/hugetlb.c @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock); static int num_fault_mutexes; struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; +static inline bool PageHugeFreed(struct page *head) +{ + return page_private(head + 4) == -1UL; +} + +static inline void SetPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, -1UL); +} + +static inline void ClearPageHugeFreed(struct page *head) +{ + set_page_private(head + 4, 0); +} + /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); @@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hst list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; + SetPageHugeFreed(page); } static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) @@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_no list_move(&page->lru, &h->hugepage_activelist); set_page_refcounted(page); + ClearPageHugeFreed(page); h->free_huge_pages--; h->free_huge_pages_node[nid]--; return page; @@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hs spin_lock(&hugetlb_lock); h->nr_huge_pages++; h->nr_huge_pages_node[nid]++; + ClearPageHugeFreed(page); spin_unlock(&hugetlb_lock); } @@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page { int rc = -EBUSY; +retry: /* Not to disrupt normal path by vainly holding hugetlb_lock */ if (!PageHuge(page)) return 0; @@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page int nid = page_to_nid(head); if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; + + /* + * We should make sure that the page is already on the free list + * when it is dissolved. + */ + if (unlikely(!PageHugeFreed(head))) { + spin_unlock(&hugetlb_lock); + cond_resched(); + + /* + * Theoretically, we should return -EBUSY when we + * encounter this race. In fact, we have a chance + * to successfully dissolve the page if we do a + * retry. Because the race window is quite small. + * If we seize this opportunity, it is an optimization + * for increasing the success rate of dissolving page. + */ + goto retry; + } + /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. From patchwork Fri Feb 5 02:32:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 378098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBE1FC4332B for ; Fri, 5 Feb 2021 02:32:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9380464FC4 for ; Fri, 5 Feb 2021 02:32:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230196AbhBECcy (ORCPT ); Thu, 4 Feb 2021 21:32:54 -0500 Received: from mail.kernel.org ([198.145.29.99]:45558 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230205AbhBECcw (ORCPT ); Thu, 4 Feb 2021 21:32:52 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0572564FB6; Fri, 5 Feb 2021 02:32:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492331; bh=kbll5D4ZbhMog5kqLpZHwTVIcRhyuoEWzqh9o47MoW4=; h=Date:From:To:Subject:In-Reply-To:From; b=M04fxGupr5nVaKNJi2DFBpl0WXYUL+zd77rkTBYVVu7uAr8SiKiAHzcWZdwtm3WIg dkU2PKDn3vBVGITDBvzs+kRbLs75rqTeyWTpXUHp84AlTsSWltz4Imt4M6inX/a9Vn WyDNOutGzdqsd1JSV8J7mlef36eFohhmtLIgeVaM= Date: Thu, 04 Feb 2021 18:32:10 -0800 From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 03/18] mm: hugetlb: fix a race between isolating and freeing page Message-ID: <20210205023210.CbSNsXPxg%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: hugetlb: fix a race between isolating and freeing page There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-isolating-and-freeing-page +++ a/mm/hugetlb.c @@ -5594,9 +5594,9 @@ bool isolate_huge_page(struct page *page { bool ret = true; - VM_BUG_ON_PAGE(!PageHead(page), page); spin_lock(&hugetlb_lock); - if (!page_huge_active(page) || !get_page_unless_zero(page)) { + if (!PageHeadHuge(page) || !page_huge_active(page) || + !get_page_unless_zero(page)) { ret = false; goto unlock; } From patchwork Fri Feb 5 02:32:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 378097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 282EAC433E0 for ; Fri, 5 Feb 2021 02:33:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1DA264FB9 for ; Fri, 5 Feb 2021 02:33:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230226AbhBECdH (ORCPT ); Thu, 4 Feb 2021 21:33:07 -0500 Received: from mail.kernel.org ([198.145.29.99]:45596 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230218AbhBECcz (ORCPT ); Thu, 4 Feb 2021 21:32:55 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 46F1664FB9; Fri, 5 Feb 2021 02:32:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492334; bh=5dF3wYRxUT7z1CH7ZhqlJ7Ak7R/4Qo16/DGhNKCIElE=; h=Date:From:To:Subject:In-Reply-To:From; b=lLzbEgtg/oBqWX+rtAt1bKwpwPd1BaorLa/tqn5G2J5Wd1tCDGaa3ioXhknlId2gU J9S8hzuI0/dqpdn/5dyEpFtnRE7/+rHbD08OWN95zvHkR3GKQqR1e0tEDbvJ3tHMHM sSWeGQLJr/R1Ylijv7Q5PWkI25VBjxL9ha/hUomE= Date: Thu, 04 Feb 2021 18:32:13 -0800 From: Andrew Morton To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, mhocko@suse.com, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, osalvador@suse.de, shy828301@gmail.com, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 04/18] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Message-ID: <20210205023213._BtpJL1s-%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Oscar Salvador Cc: David Hildenbrand Cc: Yang Shi Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active +++ a/mm/hugetlb.c @@ -1361,8 +1361,7 @@ struct hstate *size_to_hstate(unsigned l */ bool page_huge_active(struct page *page) { - VM_BUG_ON_PAGE(!PageHuge(page), page); - return PageHead(page) && PagePrivate(&page[1]); + return PageHeadHuge(page) && PagePrivate(&page[1]); } /* never called for tail page */ From patchwork Fri Feb 5 02:32:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 377622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72A0DC433E6 for ; Fri, 5 Feb 2021 02:33:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1B6B464FC5 for ; Fri, 5 Feb 2021 02:33:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230232AbhBECdI (ORCPT ); Thu, 4 Feb 2021 21:33:08 -0500 Received: from mail.kernel.org ([198.145.29.99]:45698 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230205AbhBECdF (ORCPT ); Thu, 4 Feb 2021 21:33:05 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2F46464FBA; Fri, 5 Feb 2021 02:32:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492341; bh=end8ZKS4T7ptw5SR9wK/aXEzwYfzKhcMOz6jwiV+54s=; h=Date:From:To:Subject:In-Reply-To:From; b=waFML216VtBpZWEimUzOBFNdrjCq3s9ulmhUoyQKiQuZBJ4j77lg5Nl0JGSsHp+HT xPdJFqckQJ2j5WCZH4x166L6WRWymBRuhGP35+pDrHmrsO2tGsuMZB4eZx0mgVdtfE fUTlRaaWDodGhak9sgWyYjRZ2+sDT+StLtzVu7HQ= Date: Thu, 04 Feb 2021 18:32:20 -0800 From: Andrew Morton To: akpm@linux-foundation.org, linux-mm@kvack.org, mgorman@techsingularity.net, mm-commits@vger.kernel.org, stable@vger.kernel.org, torvalds@linux-foundation.org, vbabka@suse.cz, wu-yan@tcl.com Subject: [patch 06/18] mm, compaction: move high_pfn to the for loop scope Message-ID: <20210205023220.Bma4nzI32%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Rokudo Yan Subject: mm, compaction: move high_pfn to the for loop scope In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >= low_fn) not found. But the high_pfn is not reset before searching an free area, so when it was used as freepage, it may from another free area searched before. And move_freelist_head(freelist, freepage) will have unexpected behavior(eg. corrupt the MOVABLE freelist) Unable to handle kernel paging request at virtual address dead000000000200 Mem abort info: ESR = 0x96000044 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000044 CM = 0, WnR = 1 [dead000000000200] address between user and kernel address ranges -000|list_cut_before(inline) -000|move_freelist_head(inline) -000|fast_isolate_freepages(inline) -000|isolate_freepages(inline) -000|compaction_alloc(?, ?) -001|unmap_and_move(inline) -001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p -002|__read_once_size(inline) -002|static_key_count(inline) -002|static_key_false(inline) -002|trace_mm_compaction_migratepages(inline) -002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0) -003|kcompactd_do_work(inline) -003|kcompactd([X19] p = 0xFFFFFF93227FBC40) -004|kthread([X20] _create = 0xFFFFFFE1AFB26380) -005|ret_from_fork(asm) ---|end of frame The issue was reported on an smart phone product with 6GB ram and 3GB zram as swap device. This patch fixes the issue by reset high_pfn before searching each free area, which ensure freepage and freelist match when call move_freelist_head in fast_isolate_freepages(). Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Rokudo Yan Acked-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- mm/compaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/mm/compaction.c~mm-compaction-move-high_pfn-to-the-for-loop-scope +++ a/mm/compaction.c @@ -1342,7 +1342,7 @@ fast_isolate_freepages(struct compact_co { unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1); unsigned int nr_scanned = 0; - unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0; + unsigned long low_pfn, min_pfn, highest = 0; unsigned long nr_isolated = 0; unsigned long distance; struct page *page = NULL; @@ -1387,6 +1387,7 @@ fast_isolate_freepages(struct compact_co struct page *freepage; unsigned long flags; unsigned int order_scanned = 0; + unsigned long high_pfn = 0; if (!area->nr_free) continue; From patchwork Fri Feb 5 02:32:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 378095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA8BDC433DB for ; Fri, 5 Feb 2021 02:33:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 70F4A64FA7 for ; Fri, 5 Feb 2021 02:33:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230139AbhBECdj (ORCPT ); Thu, 4 Feb 2021 21:33:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:45980 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230249AbhBECda (ORCPT ); Thu, 4 Feb 2021 21:33:30 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6C4E664FBD; Fri, 5 Feb 2021 02:32:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492346; bh=y/WGAdGk3Verfgb7u8hcLd8MBjjWSYD0nWpFTyoT/oc=; h=Date:From:To:Subject:In-Reply-To:From; b=HzTcfrRH8VTyok4APLpdcCQjEav0YjzjpmHs7qi8ZwtnKkJ6rGdHx9hq0u039Dy2W vDUtMPnNl29+41FBPpLMZJpHBcqo18BOoW6V4LXDpHD8lITT9UOKNYAq0jRfrU38Mq 2gBMHBerBXTAVY82B8xRoCCOhq3WNtxG0terW2fE= Date: Thu, 04 Feb 2021 18:32:24 -0800 From: Andrew Morton To: akpm@linux-foundation.org, dja@axtens.net, hch@lst.de, linmiaohe@huawei.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, rick.p.edgecombe@intel.com, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org Subject: [patch 07/18] mm/vmalloc: separate put pages and flush VM flags Message-ID: <20210205023224.fy2bx0YLU%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Rick Edgecombe Subject: mm/vmalloc: separate put pages and flush VM flags When VM_MAP_PUT_PAGES was added, it was defined with the same value as VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big functional problems other than some excess flushing for VM_MAP_PUT_PAGES allocations. Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things so flags are less likely to be missed in the future. Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com Fixes: b944afc9d64d ("mm: add a VM_MAP_PUT_PAGES flag for vmap") Signed-off-by: Rick Edgecombe Suggested-by: Matthew Wilcox Cc: Miaohe Lin Cc: Christoph Hellwig Cc: Daniel Axtens Cc: Signed-off-by: Andrew Morton --- include/linux/vmalloc.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) --- a/include/linux/vmalloc.h~mm-vmalloc-separate-put-pages-and-flush-vm-flags +++ a/include/linux/vmalloc.h @@ -24,7 +24,8 @@ struct notifier_block; /* in notifier.h #define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD 0x00000040 /* don't add guard page */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ -#define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */ +#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */ +#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ /* * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC. @@ -37,12 +38,6 @@ struct notifier_block; /* in notifier.h * determine which allocations need the module shadow freed. */ -/* - * Memory with VM_FLUSH_RESET_PERMS cannot be freed in an interrupt or with - * vfree_atomic(). - */ -#define VM_FLUSH_RESET_PERMS 0x00000100 /* Reset direct map and flush TLB on unmap */ - /* bits [20..32] reserved for arch specific ioremap internals */ /* From patchwork Fri Feb 5 02:32:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 377620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1FC9C433E0 for ; Fri, 5 Feb 2021 02:33:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E7C364FA7 for ; Fri, 5 Feb 2021 02:33:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230252AbhBECdn (ORCPT ); Thu, 4 Feb 2021 21:33:43 -0500 Received: from mail.kernel.org ([198.145.29.99]:46020 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230224AbhBECdf (ORCPT ); Thu, 4 Feb 2021 21:33:35 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2F55964FB8; Fri, 5 Feb 2021 02:32:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492353; bh=Ej+wJlFyaECsFAWWUxNTxpzUQNone8iCdAcIp4GZvpQ=; h=Date:From:To:Subject:In-Reply-To:From; b=lI5V8mJqUYZZpQ9q2kiiElRnSPGzLp01BujQRXt3DBAOl+/bJipzo1wGrpTNIjPdS ZZXSE7hjXHa6MgQ35CE4mtOFSvpf6XPi3n124i31oMtV8IX8UHCLnPKA9FJY/Nk3Kd mD3bp1yRSd2ccxzHWFm/DRWz2tVWecsWvRJgisFw= Date: Thu, 04 Feb 2021 18:32:31 -0800 From: Andrew Morton To: aarcange@redhat.com, akpm@linux-foundation.org, hughd@google.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, sergey.senozhatsky.work@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 09/18] mm: thp: fix MADV_REMOVE deadlock on shmem THP Message-ID: <20210205023231.ne7UDbPxT%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Hugh Dickins Subject: mm: thp: fix MADV_REMOVE deadlock on shmem THP Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: Hugh Dickins Reported-by: Sergey Senozhatsky Reviewed-by: Andrea Arcangeli Cc: Signed-off-by: Andrew Morton --- mm/huge_memory.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-madv_remove-deadlock-on-shmem-thp +++ a/mm/huge_memory.c @@ -2202,7 +2202,7 @@ void __split_huge_pmd(struct vm_area_str { spinlock_t *ptl; struct mmu_notifier_range range; - bool was_locked = false; + bool do_unlock_page = false; pmd_t _pmd; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, @@ -2218,7 +2218,6 @@ void __split_huge_pmd(struct vm_area_str VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - was_locked = true; if (page != pmd_page(*pmd)) goto out; } @@ -2227,19 +2226,29 @@ repeat: if (pmd_trans_huge(*pmd)) { if (!page) { page = pmd_page(*pmd); - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); + /* + * An anonymous page must be locked, to ensure that a + * concurrent reuse_swap_page() sees stable mapcount; + * but reuse_swap_page() is not used on shmem or file, + * and page lock must not be taken when zap_pmd_range() + * calls __split_huge_pmd() while i_mmap_lock is held. + */ + if (PageAnon(page)) { + if (unlikely(!trylock_page(page))) { + get_page(page); + _pmd = *pmd; + spin_unlock(ptl); + lock_page(page); + spin_lock(ptl); + if (unlikely(!pmd_same(*pmd, _pmd))) { + unlock_page(page); + put_page(page); + page = NULL; + goto repeat; + } put_page(page); - page = NULL; - goto repeat; } - put_page(page); + do_unlock_page = true; } } if (PageMlocked(page)) @@ -2249,7 +2258,7 @@ repeat: __split_huge_pmd_locked(vma, pmd, range.start, freeze); out: spin_unlock(ptl); - if (!was_locked && page) + if (do_unlock_page) unlock_page(page); /* * No need to double call mmu_notifier->invalidate_range() callback. From patchwork Fri Feb 5 02:32:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 377621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52ACAC433E6 for ; Fri, 5 Feb 2021 02:33:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2511A64FB6 for ; Fri, 5 Feb 2021 02:33:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230253AbhBECdl (ORCPT ); Thu, 4 Feb 2021 21:33:41 -0500 Received: from mail.kernel.org ([198.145.29.99]:46018 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230257AbhBECde (ORCPT ); Thu, 4 Feb 2021 21:33:34 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7CFCA64FBC; Fri, 5 Feb 2021 02:32:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492357; bh=r5DFenVBVkTL6Ifse8+fty8xgkyhRWw1Yw5CYn7AARU=; h=Date:From:To:Subject:In-Reply-To:From; b=hHmLE+q4IcpjlL33talM+L6xZoH6UlooO9EDvJNoTBD3IfaYhN/Q1DOYrZ9XvYNN5 DQuu4K6GheiPQ2iLh79PIiiSvf5IDORvEhG0MjvqTO0LQIdY3unTWLMlCA1nU5pOYI eV+h6dwdC2tim1hUE9j0D+qWDnZNbVmBK9Xd9KII= Date: Thu, 04 Feb 2021 18:32:36 -0800 From: Andrew Morton To: akpm@linux-foundation.org, bauerman@linux.ibm.com, guro@fb.com, iamjoonsoo.kim@lge.com, linux-mm@kvack.org, mhocko@kernel.org, mm-commits@vger.kernel.org, riel@surriel.com, rppt@linux.ibm.com, stable@vger.kernel.org, torvalds@linux-foundation.org, vvghjk1234@gmail.com Subject: [patch 10/18] memblock: do not start bottom-up allocations with kernel_end Message-ID: <20210205023236.qcEErkapb%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Roman Gushchin Subject: memblock: do not start bottom-up allocations with kernel_end With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: [ 0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node [ 0.002921] ------------[ cut here ]------------ [ 0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected [ 0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a [ 0.002937] Modules linked in: [ 0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 [ 0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a [ 0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c [ 0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 [ 0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff [ 0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 [ 0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb [ 0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 [ 0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 [ 0.002952] FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 [ 0.002953] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 [ 0.002956] Call Trace: [ 0.002961] ? memblock_alloc_range_nid+0x8d/0x11e [ 0.002963] ? cma_declare_contiguous_nid+0x2c4/0x38c [ 0.002964] ? hugetlb_cma_reserve+0xdc/0x128 [ 0.002968] ? flush_tlb_one_kernel+0xc/0x20 [ 0.002969] ? native_set_fixmap+0x82/0xd0 [ 0.002971] ? flat_get_apic_id+0x5/0x10 [ 0.002973] ? register_lapic_address+0x8e/0x97 [ 0.002975] ? setup_arch+0x8a5/0xc3f [ 0.002978] ? start_kernel+0x66/0x547 [ 0.002980] ? load_ucode_bsp+0x4c/0xcd [ 0.002982] ? secondary_startup_64_no_verify+0xb0/0xbb [ 0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 [ 0.002988] ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory") Signed-off-by: Roman Gushchin Reviewed-by: Mike Rapoport Cc: Joonsoo Kim Cc: Michal Hocko Cc: Rik van Riel Cc: Wonhyuk Yang Cc: Thiago Jung Bauermann Signed-off-by: Andrew Morton --- mm/memblock.c | 49 +++++------------------------------------------- 1 file changed, 6 insertions(+), 43 deletions(-) --- a/mm/memblock.c~memblock-do-not-start-bottom-up-allocations-with-kernel_end +++ a/mm/memblock.c @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr * * Find @size free area aligned to @align in the specified range and node. * - * When allocation direction is bottom-up, the @start should be greater - * than the end of the kernel image. Otherwise, it will be trimmed. The - * reason is that we want the bottom-up allocation just near the kernel - * image so it is highly likely that the allocated memory and the kernel - * will reside in the same node. - * - * If bottom-up allocation failed, will try to allocate memory top-down. - * * Return: * Found address on success, 0 on failure. */ @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock membl phys_addr_t end, int nid, enum memblock_flags flags) { - phys_addr_t kernel_end, ret; - /* pump up @end */ if (end == MEMBLOCK_ALLOC_ACCESSIBLE || end == MEMBLOCK_ALLOC_KASAN) @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock membl /* avoid allocating the first page */ start = max_t(phys_addr_t, start, PAGE_SIZE); end = max(start, end); - kernel_end = __pa_symbol(_end); - - /* - * try bottom-up allocation only when bottom-up mode - * is set and @end is above the kernel image. - */ - if (memblock_bottom_up() && end > kernel_end) { - phys_addr_t bottom_up_start; - - /* make sure we will allocate above the kernel */ - bottom_up_start = max(start, kernel_end); - - /* ok, try bottom-up allocation first */ - ret = __memblock_find_range_bottom_up(bottom_up_start, end, - size, align, nid, flags); - if (ret) - return ret; - - /* - * we always limit bottom-up allocation above the kernel, - * but top-down allocation doesn't have the limit, so - * retrying top-down allocation may succeed when bottom-up - * allocation failed. - * - * bottom-up allocation is expected to be fail very rarely, - * so we use WARN_ONCE() here to see the stack trace if - * fail happens. - */ - WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE), - "memblock: bottom-up allocation failed, memory hotremove may be affected\n"); - } - return __memblock_find_range_top_down(start, end, size, align, nid, - flags); + if (memblock_bottom_up()) + return __memblock_find_range_bottom_up(start, end, size, align, + nid, flags); + else + return __memblock_find_range_top_down(start, end, size, align, + nid, flags); } /** From patchwork Fri Feb 5 02:32:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 378094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1338C433DB for ; Fri, 5 Feb 2021 02:34:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5859D64FA7 for ; Fri, 5 Feb 2021 02:34:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230224AbhBECd7 (ORCPT ); Thu, 4 Feb 2021 21:33:59 -0500 Received: from mail.kernel.org ([198.145.29.99]:46194 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230282AbhBECdr (ORCPT ); Thu, 4 Feb 2021 21:33:47 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 57EAC64FC2; Fri, 5 Feb 2021 02:32:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492367; bh=QB1ZdaxTR3P9QGE1uq+nTRbZqdQOxGc8pgnARUwOAgQ=; h=Date:From:To:Subject:In-Reply-To:From; b=m1jca+IunX4b4dv/QpJrwFzHZhGgMPKzuZ7YuLTGdtA8WeDke5cQ9ymPAtkbM19wU O34Kfoa8Tm+eu+lmiPR4F41LFsBWnPId1eEnuNFa6xeelVhhebyWUc+CquBQySTIUf 9Vh6SKzsH5hUUou9ueAZgSfTcm8rAW4d6yoFhXfE= Date: Thu, 04 Feb 2021 18:32:45 -0800 From: Andrew Morton To: akpm@linux-foundation.org, alex.shi@linux.alibaba.com, hannes@cmpxchg.org, linmiaohe@huawei.com, linux-mm@kvack.org, longman@redhat.com, mhocko@suse.com, mm-commits@vger.kernel.org, smuchun@gmail.com, stable@vger.kernel.org, torvalds@linux-foundation.org, willy@infradead.org Subject: [patch 13/18] mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() Message-ID: <20210205023245.2MSWh6RsL%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Waiman Long Subject: mm/filemap: add missing mem_cgroup_uncharge() to __add_to_page_cache_locked() commit 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked() causing the following splat: [ 1570.068330] page dumped because: VM_BUG_ON_PAGE(page_memcg(page)) [ 1570.068333] pages's memcg:ffff8889a4116000 [ 1570.068343] ------------[ cut here ]------------ [ 1570.068346] kernel BUG at mm/memcontrol.c:2924! [ 1570.068355] invalid opcode: 0000 [#1] SMP KASAN PTI [ 1570.068359] CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1 [ 1570.068363] Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017 [ 1570.068365] RIP: 0010:commit_charge+0xf4/0x130 : [ 1570.068375] RSP: 0018:ffff8881b38d70e8 EFLAGS: 00010286 [ 1570.068379] RAX: 0000000000000000 RBX: ffffea00260ddd00 RCX: 0000000000000027 [ 1570.068382] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88907ebe05a8 [ 1570.068384] RBP: ffffea00260ddd00 R08: ffffed120fd7c0b6 R09: ffffed120fd7c0b6 [ 1570.068386] R10: ffff88907ebe05ab R11: ffffed120fd7c0b5 R12: ffffea00260ddd38 [ 1570.068389] R13: ffff8889a4116000 R14: ffff8889a4116000 R15: 0000000000000001 [ 1570.068391] FS: 00007ff039638680(0000) GS:ffff88907ea00000(0000) knlGS:0000000000000000 [ 1570.068394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1570.068396] CR2: 00007f36f354cc20 CR3: 00000008a0126006 CR4: 00000000007706e0 [ 1570.068398] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1570.068400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1570.068402] PKRU: 55555554 [ 1570.068404] Call Trace: [ 1570.068407] mem_cgroup_charge+0x175/0x770 [ 1570.068413] __add_to_page_cache_locked+0x712/0xad0 [ 1570.068439] add_to_page_cache_lru+0xc5/0x1f0 [ 1570.068461] cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles] [ 1570.068524] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache] [ 1570.068540] __nfs_readpages_from_fscache+0x16d/0x630 [nfs] [ 1570.068585] nfs_readpages+0x24e/0x540 [nfs] [ 1570.068693] read_pages+0x5b1/0xc40 [ 1570.068711] page_cache_ra_unbounded+0x460/0x750 [ 1570.068729] generic_file_buffered_read_get_pages+0x290/0x1710 [ 1570.068756] generic_file_buffered_read+0x2a9/0xc30 [ 1570.068832] nfs_file_read+0x13f/0x230 [nfs] [ 1570.068872] new_sync_read+0x3af/0x610 [ 1570.068901] vfs_read+0x339/0x4b0 [ 1570.068909] ksys_read+0xf1/0x1c0 [ 1570.068920] do_syscall_64+0x33/0x40 [ 1570.068926] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1570.068930] RIP: 0033:0x7ff039135595 Before that commit, there was a try_charge() and commit_charge() in __add_to_page_cache_locked(). These 2 separated charge functions were replaced by a single mem_cgroup_charge(). However, it forgot to add a matching mem_cgroup_uncharge() when the xarray insertion failed with the page released back to the pool. Fix this by adding a mem_cgroup_uncharge() call when insertion error happens. Link: https://lkml.kernel.org/r/20210125042441.20030-1-longman@redhat.com Fixes: 3fea5a499d57 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") Signed-off-by: Waiman Long Reviewed-by: Alex Shi Acked-by: Johannes Weiner Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Muchun Song Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton --- mm/filemap.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/mm/filemap.c~mm-filemap-adding-missing-mem_cgroup_uncharge-to-__add_to_page_cache_locked +++ a/mm/filemap.c @@ -835,6 +835,7 @@ noinline int __add_to_page_cache_locked( XA_STATE(xas, &mapping->i_pages, offset); int huge = PageHuge(page); int error; + bool charged = false; VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageSwapBacked(page), page); @@ -848,6 +849,7 @@ noinline int __add_to_page_cache_locked( error = mem_cgroup_charge(page, current->mm, gfp); if (error) goto error; + charged = true; } gfp &= GFP_RECLAIM_MASK; @@ -896,6 +898,8 @@ unlock: if (xas_error(&xas)) { error = xas_error(&xas); + if (charged) + mem_cgroup_uncharge(page); goto error; } From patchwork Fri Feb 5 02:33:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 377619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA1F4C433DB for ; Fri, 5 Feb 2021 02:34:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 98C6064FB5 for ; Fri, 5 Feb 2021 02:34:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230292AbhBECef (ORCPT ); Thu, 4 Feb 2021 21:34:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:46242 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230216AbhBECdx (ORCPT ); Thu, 4 Feb 2021 21:33:53 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id C6B1564FBE; Fri, 5 Feb 2021 02:33:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1612492381; bh=b1DxOcOqwfE5+Jwl+Bfd1Px9tLz4gowrSr/e+e0TsdQ=; h=Date:From:To:Subject:In-Reply-To:From; b=Q9AjBvrANEtR8x74nTbno+zIAGwrRnDCQ9TRa8n+1Ds4hb9TqC0iJgpbSXfzzgNHZ 9yuhazaXDzVsM0sF+TUmTNXS2h4mnpQP6+84mdBsn+Mpot//0GirEZC2bcE1a8mFDH YAgZUT04hJk3i1YbVFaTnqvdtUNxXYIkqGEZq4qw= Date: Thu, 04 Feb 2021 18:33:00 -0800 From: Andrew Morton To: akpm@linux-foundation.org, linmiaohe@huawei.com, linux-mm@kvack.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, songmuchun@bytedance.com, stable@vger.kernel.org, torvalds@linux-foundation.org Subject: [patch 17/18] mm: hugetlb: fix missing put_page in gather_surplus_pages() Message-ID: <20210205023300.kuRDAAZIZ%akpm@linux-foundation.org> In-Reply-To: <20210204183135.e123f0d6027529f2cf500cf2@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Muchun Song Subject: mm: hugetlb: fix missing put_page in gather_surplus_pages() The VM_BUG_ON_PAGE avoids the generation of any code, even if that expression has side-effects when !CONFIG_DEBUG_VM. Link: https://lkml.kernel.org/r/20210126031009.96266-1-songmuchun@bytedance.com Fixes: e5dfacebe4a4 ("mm/hugetlb.c: just use put_page_testzero() instead of page_count()") Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Reviewed-by: Miaohe Lin Cc: Signed-off-by: Andrew Morton --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-missing-put_page-in-gather_surplus_pages +++ a/mm/hugetlb.c @@ -2047,13 +2047,16 @@ retry: /* Free the needed pages to the hugetlb pool */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + int zeroed; + if ((--needed) < 0) break; /* * This page is now managed by the hugetlb allocator and has * no users -- drop the buddy allocator's reference. */ - VM_BUG_ON_PAGE(!put_page_testzero(page), page); + zeroed = put_page_testzero(page); + VM_BUG_ON_PAGE(!zeroed, page); enqueue_huge_page(h, page); } free: