diff mbox series

[v5,1/2] mm/vmscan: Skip memcg with !usage in shrink_node_memcgs()

Message ID 20250407162316.1434714-2-longman@redhat.com
State New
Headers show
Series memcg: Fix test_memcg_min/low test failures | expand

Commit Message

Waiman Long April 7, 2025, 4:23 p.m. UTC
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.

The two failed use cases are as follows:

1) memory.low is set to 0, but low events can still be triggered and
   so the cgroup may have a non-zero low event count. I doubt users are
   looking for that as they didn't set memory.low at all.

2) memory.low is set to a non-zero value but the cgroup has no task in
   it so that it has an effective low value of 0. Again it may have a
   non-zero low event count if memory reclaim happens. This is probably
   not a result expected by the users and it is really doubtful that
   users will check an empty cgroup with no task in it and expecting
   some non-zero event counts.

In the first case, even though memory.low isn't set, it may still have
some low protection if memory.low is set in the parent. So low event may
still be recorded. The test_memcontrol.c test has to be modified to
account for that.

For the second case, it really doesn't make sense to have non-zero low
event if the cgroup has 0 usage. So we need to skip this corner case
in shrink_node_memcgs() using mem_cgroup_usage(). The mem_cgroup_usage()
function declaration is moved from mm/memcontrol-v1.h to mm/internal.h
with the !CONFIG_MEMCG case defined as always true.

With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/internal.h                                    | 9 +++++++++
 mm/memcontrol-v1.h                               | 2 --
 mm/vmscan.c                                      | 4 ++++
 tools/testing/selftests/cgroup/test_memcontrol.c | 7 ++++++-
 4 files changed, 19 insertions(+), 3 deletions(-)

Comments

Michal Koutný April 11, 2025, 5:11 p.m. UTC | #1
Hello.

On Mon, Apr 07, 2025 at 12:23:15PM -0400, Waiman Long <longman@redhat.com> wrote:
> --- a/mm/memcontrol-v1.h
> +++ b/mm/memcontrol-v1.h
> @@ -22,8 +22,6 @@
>  	     iter != NULL;				\
>  	     iter = mem_cgroup_iter(NULL, iter, NULL))
>  
> -unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
> -

Hm, maybe keep it for v1 only where mem_cgroup_usage has meaning for
memsw (i.e. do the opposite and move the function definition to -v1.c).

>  void drain_all_stock(struct mem_cgroup *root_memcg);
>  
>  unsigned long memcg_events(struct mem_cgroup *memcg, int event);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b620d74b0f66..a771a0145a12 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>  
>  		mem_cgroup_calculate_protection(target_memcg, memcg);
>  
> +		/* Skip memcg with no usage */
> +		if (!mem_cgroup_usage(memcg, false))
> +			continue;
> +

(Not only for v2), there is mem_cgroup_size() for this purpose (already
used in mm/vmscan.c).

>  		if (mem_cgroup_below_min(target_memcg, memcg)) {
>  			/*
>  			 * Hard protection.
> diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
> index 16f5d74ae762..bab826b6b7b0 100644
> --- a/tools/testing/selftests/cgroup/test_memcontrol.c
> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
> @@ -525,8 +525,13 @@ static int test_memcg_protection(const char *root, bool min)
>  		goto cleanup;
>  	}
>  
> +	/*
> +	 * Child 2 has memory.low=0, but some low protection is still being
> +	 * distributed down from its parent with memory.low=50M. So the low
> +	 * event count will be non-zero.
> +	 */
>  	for (i = 0; i < ARRAY_SIZE(children); i++) {
> -		int no_low_events_index = 1;
> +		int no_low_events_index = 2;

See suggestion in
https://lore.kernel.org/lkml/awgbdn6gwnj4kfaezsorvopgsdyoty3yahdeanqvoxstz2w2ke@xc3sv43elkz5/

HTH,
Michal
Waiman Long April 11, 2025, 9:13 p.m. UTC | #2
On 4/11/25 1:11 PM, Michal Koutný wrote:
> Hello.
>
> On Mon, Apr 07, 2025 at 12:23:15PM -0400, Waiman Long <longman@redhat.com> wrote:
>> --- a/mm/memcontrol-v1.h
>> +++ b/mm/memcontrol-v1.h
>> @@ -22,8 +22,6 @@
>>   	     iter != NULL;				\
>>   	     iter = mem_cgroup_iter(NULL, iter, NULL))
>>   
>> -unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
>> -
> Hm, maybe keep it for v1 only where mem_cgroup_usage has meaning for
> memsw (i.e. do the opposite and move the function definition to -v1.c).
memcontrol-v1.c also include mm/internal.h. That is the reason why I can 
remove it from here.
>>   void drain_all_stock(struct mem_cgroup *root_memcg);
>>   
>>   unsigned long memcg_events(struct mem_cgroup *memcg, int event);
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index b620d74b0f66..a771a0145a12 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -5963,6 +5963,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
>>   
>>   		mem_cgroup_calculate_protection(target_memcg, memcg);
>>   
>> +		/* Skip memcg with no usage */
>> +		if (!mem_cgroup_usage(memcg, false))
>> +			continue;
>> +
> (Not only for v2), there is mem_cgroup_size() for this purpose (already
> used in mm/vmscan.c).
My understanding is that mem_cgroup_usage() is for both v1 and v2, while 
mem_cgroup_size() is for v2 only.
>
>>   		if (mem_cgroup_below_min(target_memcg, memcg)) {
>>   			/*
>>   			 * Hard protection.
>> diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
>> index 16f5d74ae762..bab826b6b7b0 100644
>> --- a/tools/testing/selftests/cgroup/test_memcontrol.c
>> +++ b/tools/testing/selftests/cgroup/test_memcontrol.c
>> @@ -525,8 +525,13 @@ static int test_memcg_protection(const char *root, bool min)
>>   		goto cleanup;
>>   	}
>>   
>> +	/*
>> +	 * Child 2 has memory.low=0, but some low protection is still being
>> +	 * distributed down from its parent with memory.low=50M. So the low
>> +	 * event count will be non-zero.
>> +	 */
>>   	for (i = 0; i < ARRAY_SIZE(children); i++) {
>> -		int no_low_events_index = 1;
>> +		int no_low_events_index = 2;
> See suggestion in
> https://lore.kernel.org/lkml/awgbdn6gwnj4kfaezsorvopgsdyoty3yahdeanqvoxstz2w2ke@xc3sv43elkz5/

I have just replied on your suggestion.

Cheers,
Longman

>
> HTH,
> Michal
diff mbox series

Patch

diff --git a/mm/internal.h b/mm/internal.h
index 50c2f590b2d0..c06fb0e8d75c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1535,6 +1535,15 @@  void __meminit __init_page_from_nid(unsigned long pfn, int nid);
 unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg,
 			  int priority);
 
+#ifdef CONFIG_MEMCG
+unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
+#else
+static inline unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
+{
+	return 1UL;
+}
+#endif
+
 #ifdef CONFIG_SHRINKER_DEBUG
 static inline __printf(2, 0) int shrinker_debugfs_name_alloc(
 			struct shrinker *shrinker, const char *fmt, va_list ap)
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index 6358464bb416..e92b21af92b1 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -22,8 +22,6 @@ 
 	     iter != NULL;				\
 	     iter = mem_cgroup_iter(NULL, iter, NULL))
 
-unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap);
-
 void drain_all_stock(struct mem_cgroup *root_memcg);
 
 unsigned long memcg_events(struct mem_cgroup *memcg, int event);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b620d74b0f66..a771a0145a12 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5963,6 +5963,10 @@  static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		mem_cgroup_calculate_protection(target_memcg, memcg);
 
+		/* Skip memcg with no usage */
+		if (!mem_cgroup_usage(memcg, false))
+			continue;
+
 		if (mem_cgroup_below_min(target_memcg, memcg)) {
 			/*
 			 * Hard protection.
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 16f5d74ae762..bab826b6b7b0 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -525,8 +525,13 @@  static int test_memcg_protection(const char *root, bool min)
 		goto cleanup;
 	}
 
+	/*
+	 * Child 2 has memory.low=0, but some low protection is still being
+	 * distributed down from its parent with memory.low=50M. So the low
+	 * event count will be non-zero.
+	 */
 	for (i = 0; i < ARRAY_SIZE(children); i++) {
-		int no_low_events_index = 1;
+		int no_low_events_index = 2;
 		long low, oom;
 
 		oom = cg_read_key_long(children[i], "memory.events", "oom ");