From patchwork Mon Aug 8 12:55:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Oberhollenzer X-Patchwork-Id: 596179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51958C00140 for ; Mon, 8 Aug 2022 12:56:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242737AbiHHM4o (ORCPT ); Mon, 8 Aug 2022 08:56:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237479AbiHHM4k (ORCPT ); Mon, 8 Aug 2022 08:56:40 -0400 Received: from mail.infraroot.at (mail.infraroot.at [IPv6:2001:41d0:701:1100::afc]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33C2838A0 for ; Mon, 8 Aug 2022 05:56:39 -0700 (PDT) Received: from localtoast.corp.sigma-star.at (unknown [82.150.214.1]) by mail.infraroot.at (Postfix) with ESMTPSA id 051964169C; Mon, 8 Aug 2022 14:56:35 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.infraroot.at 051964169C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infraroot.at; s=default; t=1659963396; bh=Nhwi56mtmN1YvbuXrfZ4GWgqCaJD62FwSaPWVWCpkCU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qa83pUjCCwS4GMTwQavMNk40kyo2bxoz+rexnQSWElo9r0UJ4h0IIrCw4VDHOR6CB /koLVRVmpZjKCiCE76y/tOF+iaZ9IoNrq6G57zyb0QDRhB7Hw6XCCZjezXsU30dlcA Eq5aa0hdGDQN0/A1JDBhbAK8g0nMD7v+c8E6VSXA= From: David Oberhollenzer To: linux-rt-users@vger.kernel.org Cc: williams@redhat.com, bigeasy@linutronix.de, richard@nod.at, joseph.salisbury@canonical.com, Roman Gushchin , David Oberhollenzer Subject: [PATCH v2 03/10] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Date: Mon, 8 Aug 2022 14:55:55 +0200 Message-Id: <20220808125602.97747-4-goliath@infraroot.at> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20220808125602.97747-1-goliath@infraroot.at> References: <20220808125602.97747-1-goliath@infraroot.at> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org From: Sebastian Andrzej Siewior The per-CPU counter are modified with the non-atomic modifier. The consistency is ensured by disabling interrupts for the update. On non PREEMPT_RT configuration this works because acquiring a spinlock_t typed lock with the _irq() suffix disables interrupts. On PREEMPT_RT configurations the RMW operation can be interrupted. Another problem is that mem_cgroup_swapout() expects to be invoked with disabled interrupts because the caller has to acquire a spinlock_t which is acquired with disabled interrupts. Since spinlock_t never disables interrupts on PREEMPT_RT the interrupts are never disabled at this point. The code is never called from in_irq() context on PREEMPT_RT therefore disabling preemption during the update is sufficient on PREEMPT_RT. The sections which explicitly disable interrupts can remain on PREEMPT_RT because the sections remain short and they don't involve sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT). Disable preemption during update of the per-CPU variables which do not explicitly disable interrupts. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Roman Gushchin [do: backported to v5.15] Signed-off-by: David Oberhollenzer --- mm/memcontrol.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6078d57aee0f..e566c34466a1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -654,6 +654,35 @@ static u64 flush_next_time; #define FLUSH_TIME (2UL*HZ) +/* + * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can + * not rely on this as part of an acquired spinlock_t lock. These functions are + * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion + * is sufficient. + */ +static void memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#else + VM_BUG_ON(!irqs_disabled()); +#endif +} + +static void __memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#endif +} + +static void memcg_stats_unlock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_enable(); +#endif +} + static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; @@ -737,6 +766,20 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + /* + * The caller from rmap relay on disabled preemption becase they never + * update their counter from in-interrupt context. For these two + * counters we check that the update is never performed from an + * interrupt context while other caller need to have disabled interrupt. + */ + __memcg_stats_lock(); + if (IS_ENABLED(CONFIG_DEBUG_VM)) { + if (idx == NR_ANON_MAPPED || idx == NR_FILE_MAPPED) + WARN_ON_ONCE(!in_task()); + else + WARN_ON_ONCE(!irqs_disabled()); + } + /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -744,6 +787,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg, val); + memcg_stats_unlock(); } /** @@ -844,8 +888,10 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, if (mem_cgroup_disabled()) return; + memcg_stats_lock(); __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg, count); + memcg_stats_unlock(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -7211,8 +7257,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. */ - VM_BUG_ON(!irqs_disabled()); + memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, page, -nr_entries); + memcg_stats_unlock(); memcg_check_events(memcg, page); css_put(&memcg->css);