diff mbox series

[4.14,27/50] mm, slub: consider rest of partial list if acquire_slab() fails

Message ID 20210122135736.291270624@linuxfoundation.org
State Superseded
Headers show
Series None | expand

Commit Message

Greg Kroah-Hartman Jan. 22, 2021, 2:12 p.m. UTC
From: Jann Horn <jannh@google.com>

commit 8ff60eb052eeba95cfb3efe16b08c9199f8121cf upstream.

acquire_slab() fails if there is contention on the freelist of the page
(probably because some other CPU is concurrently freeing an object from
the page).  In that case, it might make sense to look for a different page
(since there might be more remote frees to the page from other CPUs, and
we don't want contention on struct page).

However, the current code accidentally stops looking at the partial list
completely in that case.  Especially on kernels without CONFIG_NUMA set,
this means that get_partial() fails and new_slab_objects() falls back to
new_slab(), allocating new pages.  This could lead to an unnecessary
increase in memory fragmentation.

Link: https://lkml.kernel.org/r/20201228130853.1871516-1-jannh@google.com
Fixes: 7ced37197196 ("slub: Acquire_slab() avoid loop")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/slub.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Linus Torvalds March 10, 2021, 6:43 p.m. UTC | #1
Just a note to the stable tree: this commit has been reverted
upstream, because it causes a huge performance drop (admittedly on a
load and setup that may not be all that relevant to most people).

It was applied to 4.4, 4.9 and 4.12, because the commit it was marked
as "fixing" is from 2012, but it turns out that the early exit from
the loop in that commit was very much intentional, and very much shows
up on scalability benchmarks.

I don't think this is likely to be a big deal for the stable kernels -
we're basically talking tuning for special cases, and while it is
reverted in my tree now, the "correct" thing to do is likely to be a
bit more flexible than either "exit loop immediately" or "loop for as
long as we have contention".

In practice, most machines probably won't see either case - or it will
at least be rare enough that you can't tell.

The machine that reports a huge performance drop was a multi-socket
machine under fairly extreme conditions, and these contention issues
are often close to exponential - a smaller machine (or a slighly less
extreme load) would never see the issue at all either way.

See

    https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/

for details if you care. I don't think this has to necessarily be
undone in the stable trees, this email is more of an incidental note
just as a heads-up.

                Linus

On Fri, Jan 22, 2021 at 6:14 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>

> From: Jann Horn <jannh@google.com>

>

> commit 8ff60eb052eeba95cfb3efe16b08c9199f8121cf upstream.

>

> acquire_slab() fails if there is contention on the freelist of the page [..]
Greg Kroah-Hartman March 10, 2021, 6:50 p.m. UTC | #2
On Wed, Mar 10, 2021 at 10:43:33AM -0800, Linus Torvalds wrote:
> Just a note to the stable tree: this commit has been reverted

> upstream, because it causes a huge performance drop (admittedly on a

> load and setup that may not be all that relevant to most people).

> 

> It was applied to 4.4, 4.9 and 4.12, because the commit it was marked

> as "fixing" is from 2012, but it turns out that the early exit from

> the loop in that commit was very much intentional, and very much shows

> up on scalability benchmarks.

> 

> I don't think this is likely to be a big deal for the stable kernels -

> we're basically talking tuning for special cases, and while it is

> reverted in my tree now, the "correct" thing to do is likely to be a

> bit more flexible than either "exit loop immediately" or "loop for as

> long as we have contention".

> 

> In practice, most machines probably won't see either case - or it will

> at least be rare enough that you can't tell.

> 

> The machine that reports a huge performance drop was a multi-socket

> machine under fairly extreme conditions, and these contention issues

> are often close to exponential - a smaller machine (or a slighly less

> extreme load) would never see the issue at all either way.

> 

> See

> 

>     https://lore.kernel.org/lkml/20210301080404.GF12822@xsang-OptiPlex-9020/

> 

> for details if you care. I don't think this has to necessarily be

> undone in the stable trees, this email is more of an incidental note

> just as a heads-up.


Thanks for the details, I'll look into reverting it in a future stable
release.

greg k-h
diff mbox series

Patch

--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1846,7 +1846,7 @@  static void *get_partial_node(struct kme
 
 		t = acquire_slab(s, n, page, object == NULL, &objects);
 		if (!t)
-			break;
+			continue; /* cmpxchg raced */
 
 		available += objects;
 		if (!object) {