diff mbox series

mm, memcg: fix wrong mem cgroup protection

Message ID 20200423061629.24185-1-laoar.shao@gmail.com
State New
Headers show
Series mm, memcg: fix wrong mem cgroup protection | expand

Commit Message

Yafang Shao April 23, 2020, 6:16 a.m. UTC
This patch is an improvement of a previous version[1], as the previous
version is not easy to understand.
This issue persists in the newest kernel, I have to resend the fix. As
the implementation is changed, I drop Roman's ack from the previous
version.

Here's the explanation of this issue.
memory.{low,min} won't take effect if the to-be-reclaimed memcg is the
sc->target_mem_cgroup, that can also be proved by the implementation in
mem_cgroup_protected(), see bellow,
	mem_cgroup_protected
		if (memcg == root) [2]
			return MEMCG_PROT_NONE;

But this rule is ignored in mem_cgroup_protection(), which will read
memory.{emin, elow} as the protection whatever the memcg is.

How would this issue happen?
Because in mem_cgroup_protected() we forget to clear the
memory.{emin, elow} if the memcg is target_mem_cgroup [2].

An example to illustrate this issue.
   root_mem_cgroup
         /
        A   memory.max: 1024M
            memory.min: 512M
            memory.current: 800M ('current' must be greater than 'min')
Once kswapd starts to reclaim memcg A, it assigns 512M to memory.emin of A.
Then kswapd stops.
As a result of it, the memory values of A will be,
   root_mem_cgroup
         /
        A   memory.max: 1024M
            memory.min: 512M
            memory.current: 512M (approximately)
            memory.emin: 512M

Then a new workload starts to run in memcg A, and it will trigger memcg
relcaim in A soon. As memcg A is the target_mem_cgroup of this
reclaimer, so it return directly without touching memory.{emin, elow}.[2]
The memory values of A will be,
   root_mem_cgroup
         /
        A   memory.max: 1024M
            memory.min: 512M
            memory.current: 1024M (approximately)
            memory.emin: 512M
Then this memory.emin will be used in mem_cgroup_protection() to get the
scan count, which is obvoiusly a wrong scan count.

[1]. https://lore.kernel.org/linux-mm/20200216145249.6900-1-laoar.shao@gmail.com/

Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim")
Cc: Chris Down <chris@chrisdown.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/memcontrol.h | 13 +++++++++++--
 mm/vmscan.c                |  4 ++--
 2 files changed, 13 insertions(+), 4 deletions(-)
diff mbox series

Patch

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d275c72c4f8e..114cfe06bf60 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -344,12 +344,20 @@  static inline bool mem_cgroup_disabled(void)
 	return !cgroup_subsys_enabled(memory_cgrp_subsys);
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg,
+static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
+						  struct mem_cgroup *memcg,
 						  bool in_low_reclaim)
 {
 	if (mem_cgroup_disabled())
 		return 0;
 
+	/*
+	 * Memcg protection won't take effect if the memcg is the target
+	 * root memcg.
+	 */
+	if (root == memcg)
+		return 0;
+
 	if (in_low_reclaim)
 		return READ_ONCE(memcg->memory.emin);
 
@@ -835,7 +843,8 @@  static inline void memcg_memory_event_mm(struct mm_struct *mm,
 {
 }
 
-static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg,
+static inline unsigned long mem_cgroup_protection(struct mem_cgroup *root,
+						  struct mem_cgroup *memcg,
 						  bool in_low_reclaim)
 {
 	return 0;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b06868fc4926..ad2782f754ab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2346,9 +2346,9 @@  static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 		unsigned long protection;
 
 		lruvec_size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
-		protection = mem_cgroup_protection(memcg,
+		protection = mem_cgroup_protection(sc->target_mem_cgroup,
+						   memcg,
 						   sc->memcg_low_reclaim);
-
 		if (protection) {
 			/*
 			 * Scale a cgroup's reclaim pressure by proportioning