diff mbox series

+mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patchadded to -mm tree

Message ID 20200916000948.N0vvr%akpm@linux-foundation.org
State New
Headers show
Series +mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patchadded to -mm tree | expand

Commit Message

Andrew Morton Sept. 16, 2020, 12:09 a.m. UTC
The patch titled
     Subject: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
has been added to the -mm tree.  Its filename is
     mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Vijay Balakrishna <vijayb@linux.microsoft.com>
Subject: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged

When memory is hotplug added or removed the min_free_kbytes must be
recalculated based on what is expected by khugepaged.  Currently after
hotplug, min_free_kbytes will be set to a lower default and higher default
set when THP enabled is lost.  This leaves the system with small
min_free_kbytes which isn't suitable for systems especially with network
intensive loads.  Typical failure symptoms include HW WATCHDOG reset, soft
lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process kills.

Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Fixes: f000565adb77 ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Allen Pais <apais@microsoft.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/khugepaged.h |    5 +++++
 mm/khugepaged.c            |   13 +++++++++++--
 mm/memory_hotplug.c        |    3 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

Comments

Michal Hocko Sept. 16, 2020, 7:33 a.m. UTC | #1
On Tue 15-09-20 17:09:48, Andrew Morton wrote:
> From: Vijay Balakrishna <vijayb@linux.microsoft.com>
> Subject: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
> 
> When memory is hotplug added or removed the min_free_kbytes must be
> recalculated based on what is expected by khugepaged.  Currently after
> hotplug, min_free_kbytes will be set to a lower default and higher default
> set when THP enabled is lost.  This leaves the system with small
> min_free_kbytes which isn't suitable for systems especially with network
> intensive loads.  Typical failure symptoms include HW WATCHDOG reset, soft
> lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process kills.
> 
> Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
> Fixes: f000565adb77 ("thp: set recommended min free kbytes")
> Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Allen Pais <apais@microsoft.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Song Liu <songliubraving@fb.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

The patch has been explicitly nacked by Kirill IIRC. I am also not happy
about it because the changelog doesn't really explain the problem and
the follow up discussion didn't drill down to the underlying problem
either.

Maybe we want to make the min_free_kbytes udpate consistent with the
boot but the current changelog is incomplete and this shouldn't have
been added yet.

> ---
> 
>  include/linux/khugepaged.h |    5 +++++
>  mm/khugepaged.c            |   13 +++++++++++--
>  mm/memory_hotplug.c        |    3 +++
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/khugepaged.h~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
> +++ a/include/linux/khugepaged.h
> @@ -15,6 +15,7 @@ extern int __khugepaged_enter(struct mm_
>  extern void __khugepaged_exit(struct mm_struct *mm);
>  extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
>  				      unsigned long vm_flags);
> +extern void khugepaged_min_free_kbytes_update(void);
>  #ifdef CONFIG_SHMEM
>  extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
>  #else
> @@ -85,6 +86,10 @@ static inline void collapse_pte_mapped_t
>  					   unsigned long addr)
>  {
>  }
> +
> +static inline void khugepaged_min_free_kbytes_update(void)
> +{
> +}
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
>  #endif /* _LINUX_KHUGEPAGED_H */
> --- a/mm/khugepaged.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
> +++ a/mm/khugepaged.c
> @@ -56,6 +56,9 @@ enum scan_result {
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/huge_memory.h>
>  
> +static struct task_struct *khugepaged_thread __read_mostly;
> +static DEFINE_MUTEX(khugepaged_mutex);
> +
>  /* default scan 8*512 pte (or vmas) every 30 second */
>  static unsigned int khugepaged_pages_to_scan __read_mostly;
>  static unsigned int khugepaged_pages_collapsed;
> @@ -2292,8 +2295,6 @@ static void set_recommended_min_free_kby
>  
>  int start_stop_khugepaged(void)
>  {
> -	static struct task_struct *khugepaged_thread __read_mostly;
> -	static DEFINE_MUTEX(khugepaged_mutex);
>  	int err = 0;
>  
>  	mutex_lock(&khugepaged_mutex);
> @@ -2320,3 +2321,11 @@ fail:
>  	mutex_unlock(&khugepaged_mutex);
>  	return err;
>  }
> +
> +void khugepaged_min_free_kbytes_update(void)
> +{
> +	mutex_lock(&khugepaged_mutex);
> +	if (khugepaged_enabled() && khugepaged_thread)
> +		set_recommended_min_free_kbytes();
> +	mutex_unlock(&khugepaged_mutex);
> +}
> --- a/mm/memory_hotplug.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
> +++ a/mm/memory_hotplug.c
> @@ -36,6 +36,7 @@
>  #include <linux/memblock.h>
>  #include <linux/compaction.h>
>  #include <linux/rmap.h>
> +#include <linux/khugepaged.h>
>  
>  #include <asm/tlbflush.h>
>  
> @@ -857,6 +858,7 @@ int __ref online_pages(unsigned long pfn
>  	zone_pcp_update(zone);
>  
>  	init_per_zone_wmark_min();
> +	khugepaged_min_free_kbytes_update();
>  
>  	kswapd_run(nid);
>  	kcompactd_run(nid);
> @@ -1614,6 +1616,7 @@ static int __ref __offline_pages(unsigne
>  	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>  
>  	init_per_zone_wmark_min();
> +	khugepaged_min_free_kbytes_update();
>  
>  	if (!populated_zone(zone)) {
>  		zone_pcp_reset(zone);
> _
> 
> Patches currently in -mm which might be from vijayb@linux.microsoft.com are
> 
> mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch
> mm-khugepaged-avoid-overriding-min_free_kbytes-set-by-user.patch
Pasha Tatashin Sept. 16, 2020, 2:09 p.m. UTC | #2
On Wed, Sep 16, 2020 at 3:33 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Tue 15-09-20 17:09:48, Andrew Morton wrote:
> > From: Vijay Balakrishna <vijayb@linux.microsoft.com>
> > Subject: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
> >
> > When memory is hotplug added or removed the min_free_kbytes must be
> > recalculated based on what is expected by khugepaged.  Currently after
> > hotplug, min_free_kbytes will be set to a lower default and higher default
> > set when THP enabled is lost.  This leaves the system with small
> > min_free_kbytes which isn't suitable for systems especially with network
> > intensive loads.  Typical failure symptoms include HW WATCHDOG reset, soft
> > lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process kills.
> >
> > Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
> > Fixes: f000565adb77 ("thp: set recommended min free kbytes")
> > Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
> > Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Allen Pais <apais@microsoft.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Song Liu <songliubraving@fb.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> The patch has been explicitly nacked by Kirill IIRC.

Hi Michal,

You are correct it was NAK by Kirill because of this:
"NAK. It would override min_free_kbytes set by user."

I do not see a problem with that because during boot we are doing
exactly that: if the user sets unreasonably small min_free_kbytes we
overwrite it and print a message about it. Kirill could you please
comment on this?
IMO the hot-add behaviour must be exactly the same as during boot.

I am also not happy
> about it because the changelog doesn't really explain the problem and
> the follow up discussion didn't drill down to the underlying problem
> either.
>
> Maybe we want to make the min_free_kbytes udpate consistent with the
> boot but the current changelog is incomplete and this shouldn't have
> been added yet.
>

Yes, what Vijay should do is to remove all the irrelevant information
from the commit log, and only state the actual problem that he found
which is min_free_kbytes is not being updated during the memory
hotplug. I think the OOMs and timeouts should be covered in a
different discussion.

Thank you,
Pasha
Vijay Balakrishna Sept. 16, 2020, 7:02 p.m. UTC | #3
On 9/16/2020 12:33 AM, Michal Hocko wrote:
> On Tue 15-09-20 17:09:48, Andrew Morton wrote:
>> From: Vijay Balakrishna <vijayb@linux.microsoft.com>
>> Subject: mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged
>>
>> When memory is hotplug added or removed the min_free_kbytes must be
>> recalculated based on what is expected by khugepaged.  Currently after
>> hotplug, min_free_kbytes will be set to a lower default and higher default
>> set when THP enabled is lost.  This leaves the system with small
>> min_free_kbytes which isn't suitable for systems especially with network
>> intensive loads.  Typical failure symptoms include HW WATCHDOG reset, soft
>> lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process kills.
>>
>> Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
>> Fixes: f000565adb77 ("thp: set recommended min free kbytes")
>> Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
>> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Allen Pais <apais@microsoft.com>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Song Liu <songliubraving@fb.com>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> 
> The patch has been explicitly nacked by Kirill IIRC. I am also not happy
> about it because the changelog doesn't really explain the problem and
> the follow up discussion didn't drill down to the underlying problem
> either.
> 
> Maybe we want to make the min_free_kbytes udpate consistent with the
> boot but the current changelog is incomplete and this shouldn't have
> been added yet.

Let me modify changelog to remove references symptoms mentioned.

Thanks,
Vijay

> 
>> ---
>>
>>   include/linux/khugepaged.h |    5 +++++
>>   mm/khugepaged.c            |   13 +++++++++++--
>>   mm/memory_hotplug.c        |    3 +++
>>   3 files changed, 19 insertions(+), 2 deletions(-)
>>
>> --- a/include/linux/khugepaged.h~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
>> +++ a/include/linux/khugepaged.h
>> @@ -15,6 +15,7 @@ extern int __khugepaged_enter(struct mm_
>>   extern void __khugepaged_exit(struct mm_struct *mm);
>>   extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
>>   				      unsigned long vm_flags);
>> +extern void khugepaged_min_free_kbytes_update(void);
>>   #ifdef CONFIG_SHMEM
>>   extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
>>   #else
>> @@ -85,6 +86,10 @@ static inline void collapse_pte_mapped_t
>>   					   unsigned long addr)
>>   {
>>   }
>> +
>> +static inline void khugepaged_min_free_kbytes_update(void)
>> +{
>> +}
>>   #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>   
>>   #endif /* _LINUX_KHUGEPAGED_H */
>> --- a/mm/khugepaged.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
>> +++ a/mm/khugepaged.c
>> @@ -56,6 +56,9 @@ enum scan_result {
>>   #define CREATE_TRACE_POINTS
>>   #include <trace/events/huge_memory.h>
>>   
>> +static struct task_struct *khugepaged_thread __read_mostly;
>> +static DEFINE_MUTEX(khugepaged_mutex);
>> +
>>   /* default scan 8*512 pte (or vmas) every 30 second */
>>   static unsigned int khugepaged_pages_to_scan __read_mostly;
>>   static unsigned int khugepaged_pages_collapsed;
>> @@ -2292,8 +2295,6 @@ static void set_recommended_min_free_kby
>>   
>>   int start_stop_khugepaged(void)
>>   {
>> -	static struct task_struct *khugepaged_thread __read_mostly;
>> -	static DEFINE_MUTEX(khugepaged_mutex);
>>   	int err = 0;
>>   
>>   	mutex_lock(&khugepaged_mutex);
>> @@ -2320,3 +2321,11 @@ fail:
>>   	mutex_unlock(&khugepaged_mutex);
>>   	return err;
>>   }
>> +
>> +void khugepaged_min_free_kbytes_update(void)
>> +{
>> +	mutex_lock(&khugepaged_mutex);
>> +	if (khugepaged_enabled() && khugepaged_thread)
>> +		set_recommended_min_free_kbytes();
>> +	mutex_unlock(&khugepaged_mutex);
>> +}
>> --- a/mm/memory_hotplug.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
>> +++ a/mm/memory_hotplug.c
>> @@ -36,6 +36,7 @@
>>   #include <linux/memblock.h>
>>   #include <linux/compaction.h>
>>   #include <linux/rmap.h>
>> +#include <linux/khugepaged.h>
>>   
>>   #include <asm/tlbflush.h>
>>   
>> @@ -857,6 +858,7 @@ int __ref online_pages(unsigned long pfn
>>   	zone_pcp_update(zone);
>>   
>>   	init_per_zone_wmark_min();
>> +	khugepaged_min_free_kbytes_update();
>>   
>>   	kswapd_run(nid);
>>   	kcompactd_run(nid);
>> @@ -1614,6 +1616,7 @@ static int __ref __offline_pages(unsigne
>>   	pgdat_resize_unlock(zone->zone_pgdat, &flags);
>>   
>>   	init_per_zone_wmark_min();
>> +	khugepaged_min_free_kbytes_update();
>>   
>>   	if (!populated_zone(zone)) {
>>   		zone_pcp_reset(zone);
>> _
>>
>> Patches currently in -mm which might be from vijayb@linux.microsoft.com are
>>
>> mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged.patch
>> mm-khugepaged-avoid-overriding-min_free_kbytes-set-by-user.patch
>
diff mbox series

Patch

--- a/include/linux/khugepaged.h~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
+++ a/include/linux/khugepaged.h
@@ -15,6 +15,7 @@  extern int __khugepaged_enter(struct mm_
 extern void __khugepaged_exit(struct mm_struct *mm);
 extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
 				      unsigned long vm_flags);
+extern void khugepaged_min_free_kbytes_update(void);
 #ifdef CONFIG_SHMEM
 extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr);
 #else
@@ -85,6 +86,10 @@  static inline void collapse_pte_mapped_t
 					   unsigned long addr)
 {
 }
+
+static inline void khugepaged_min_free_kbytes_update(void)
+{
+}
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
 #endif /* _LINUX_KHUGEPAGED_H */
--- a/mm/khugepaged.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
+++ a/mm/khugepaged.c
@@ -56,6 +56,9 @@  enum scan_result {
 #define CREATE_TRACE_POINTS
 #include <trace/events/huge_memory.h>
 
+static struct task_struct *khugepaged_thread __read_mostly;
+static DEFINE_MUTEX(khugepaged_mutex);
+
 /* default scan 8*512 pte (or vmas) every 30 second */
 static unsigned int khugepaged_pages_to_scan __read_mostly;
 static unsigned int khugepaged_pages_collapsed;
@@ -2292,8 +2295,6 @@  static void set_recommended_min_free_kby
 
 int start_stop_khugepaged(void)
 {
-	static struct task_struct *khugepaged_thread __read_mostly;
-	static DEFINE_MUTEX(khugepaged_mutex);
 	int err = 0;
 
 	mutex_lock(&khugepaged_mutex);
@@ -2320,3 +2321,11 @@  fail:
 	mutex_unlock(&khugepaged_mutex);
 	return err;
 }
+
+void khugepaged_min_free_kbytes_update(void)
+{
+	mutex_lock(&khugepaged_mutex);
+	if (khugepaged_enabled() && khugepaged_thread)
+		set_recommended_min_free_kbytes();
+	mutex_unlock(&khugepaged_mutex);
+}
--- a/mm/memory_hotplug.c~mm-khugepaged-recalculate-min_free_kbytes-after-memory-hotplug-as-expected-by-khugepaged
+++ a/mm/memory_hotplug.c
@@ -36,6 +36,7 @@ 
 #include <linux/memblock.h>
 #include <linux/compaction.h>
 #include <linux/rmap.h>
+#include <linux/khugepaged.h>
 
 #include <asm/tlbflush.h>
 
@@ -857,6 +858,7 @@  int __ref online_pages(unsigned long pfn
 	zone_pcp_update(zone);
 
 	init_per_zone_wmark_min();
+	khugepaged_min_free_kbytes_update();
 
 	kswapd_run(nid);
 	kcompactd_run(nid);
@@ -1614,6 +1616,7 @@  static int __ref __offline_pages(unsigne
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
 
 	init_per_zone_wmark_min();
+	khugepaged_min_free_kbytes_update();
 
 	if (!populated_zone(zone)) {
 		zone_pcp_reset(zone);