[merged] mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch removed from -mm tree

From: Roman Gushchin <guro@fb.com>

The patch titled
     Subject: mm: memcg/slab: fix percpu slab vmstats flushing
has been removed from the -mm tree.  Its filename was
     mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch

This patch was dropped because it was merged into mainline or a subsystem tree

------------------------------------------------------
From: Roman Gushchin <guro@fb.com>
Subject: mm: memcg/slab: fix percpu slab vmstats flushing

Currently slab percpu vmstats are flushed twice: during the memcg
offlining and just before freeing the memcg structure.  Each time percpu
counters are summed, added to the atomic counterparts and propagated up by
the cgroup tree.

The second flushing is required due to how recursive vmstats are
implemented: counters are batched in percpu variables on a local level,
and once a percpu value is crossing some predefined threshold, it spills
over to atomic values on the local and each ascendant levels.  It means
that without flushing some numbers cached in percpu variables will be
dropped on floor each time a cgroup is destroyed.  And with uptime the
error on upper levels might become noticeable.

The first flushing aims to make counters on ancestor levels more precise. 
Dying cgroups may resume in the dying state for a long time.  After
kmem_cache reparenting which is performed during the offlining slab
counters of the dying cgroup don't have any chances to be updated, because
any slab operations will be performed on the parent level.  It means that
the inaccuracy caused by percpu batching will not decrease up to the final
destruction of the cgroup.  By the original idea flushing slab counters
during the offlining should minimize the visible inaccuracy of slab
counters on the parent level.

The problem is that percpu counters are not zeroed after the first
flushing.  So every cached percpu value is summed twice.  It creates a
small error (up to 32 pages per cpu, but usually less) which accumulates
on parent cgroup level.  After creating and destroying of thousands of
child cgroups, slab counter on parent level can be way off the real value.

For now, let's just stop flushing slab counters on memcg offlining.  It
can't be done correctly without scheduling a work on each cpu: reading and
zeroing it during css offlining can race with an asynchronous update,
which doesn't expect values to be changed underneath.

With this change, slab counters on parent level will become eventually
consistent.  Once all dying children are gone, values are correct.  And if
not, the error is capped by 32 * NR_CPUS pages per dying cgroup.

It's not perfect, as slab are reparented, so any updates after the
reparenting will happen on the parent level.  It means that if a slab page
was allocated, a counter on child level was bumped, then the page was
reparented and freed, the annihilation of positive and negative counter
values will not happen until the child cgroup is released.  It makes slab
counters different from others, and it might want us to implement flushing
in a correct form again.  But it's also a question of performance:
scheduling a work on each cpu isn't free, and it's an open question if the
benefit of having more accurate counters is worth it.

We might also consider flushing all counters on offlining, not only slab
counters.

So let's fix the main problem now: make the slab counters eventually
consistent, so at least the error won't grow with uptime (or more
precisely the number of created and destroyed cgroups).  And think about
the accuracy of counters separately.

Link: http://lkml.kernel.org/r/20191220042728.1045881-1-guro@fb.com
Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |    5 ++---
 mm/memcontrol.c        |   37 +++++++++----------------------------
 2 files changed, 11 insertions(+), 31 deletions(-)

Message ID	20200115004546.F80tSTM38%akpm@linux-foundation.org
State	New
Headers	show Return-Path: <SRS0=mDl0=3E=vger.kernel.org=stable-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B4D7C33C9E for <stable@archiver.kernel.org>; Wed, 15 Jan 2020 00:45:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E568F2467A for <stable@archiver.kernel.org>; Wed, 15 Jan 2020 00:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1579049149; bh=dorKpd3AxIel2cwXVBMcV991eHyRwvCpfhX8g7guZT4=; h=Date:From:To:Subject:List-ID:From; b=ybXzx3m6+XVIhEoka7uADKCb+KNH4vEkev68n9vGTDWKvhAIGndHD+mURMrEIf7C9 mDM4ByTpLCs8nBV8wtjL6s0m4ZzYEANKPnieup+7WEZqas7VDKqdPlqEkyOipoz4V8 fOjr3dMK/ODmLYt5O4kIQFMSCso2fBUbQrGDE0EY= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728901AbgAOAps (ORCPT <rfc822;stable@archiver.kernel.org>); Tue, 14 Jan 2020 19:45:48 -0500 Received: from mail.kernel.org ([198.145.29.99]:48604 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728795AbgAOAps (ORCPT <rfc822;stable@vger.kernel.org>); Tue, 14 Jan 2020 19:45:48 -0500 Received: from localhost.localdomain (c-71-198-47-131.hsd1.ca.comcast.net [71.198.47.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9070D222C3; Wed, 15 Jan 2020 00:45:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1579049146; bh=dorKpd3AxIel2cwXVBMcV991eHyRwvCpfhX8g7guZT4=; h=Date:From:To:Subject:From; b=xN8agroi2oEbXiPEj0NJnBvUuvBzXO94j+s0/qXijD0xn/7048jpcdIiZQpzoepeP 5gPWWKt1M9S0B85kbz/tErHtx7jXpku07Jm2Cfz4psStUHPSotLWqt+B2oIjE2NUV+ bYKXg/Pdz1yH3s1LCh/fbYAYOyx7QeidTtrwXSAw= Date: Tue, 14 Jan 2020 16:45:46 -0800 From: akpm@linux-foundation.org To: guro@fb.com, hannes@cmpxchg.org, mhocko@suse.com, mm-commits@vger.kernel.org, stable@vger.kernel.org Subject: [merged] mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch removed from -mm tree Message-ID: <20200115004546.F80tSTM38%akpm@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: <stable.vger.kernel.org> X-Mailing-List: stable@vger.kernel.org
Series	[merged] mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch removed from -mm tree \| expand [merged] mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch removed from -mm tree

[merged] mm-memcg-slab-fix-percpu-slab-vmstats-flushing.patch removed from -mm tree

Commit Message

Patch