From patchwork Thu Mar 12 22:35:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 229403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44E40C2BB1D for ; Thu, 12 Mar 2020 22:35:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0E26C206FA for ; Thu, 12 Mar 2020 22:35:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584052557; bh=SVoMG37MrUK63/u/gCSAUXqThKcGoGVHVk074lebFHk=; h=Date:From:To:Subject:In-Reply-To:List-ID:From; b=BhriQQEP2AYIH0xUn+BTsU9brjS5wGPNvJA2gL9abuciIrpugLxajNyjUxpaCfX0O BCn+uHFAvmD+X4Zskt4JxY3ca3/vqDde4rsQeiF4lXF4OgclRw7us7ieP5AdP4iHYF GcdIZ40HL7xQxcMmi7W61R1juo3E9hpriypLndPI= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726799AbgCLWf4 (ORCPT ); Thu, 12 Mar 2020 18:35:56 -0400 Received: from mail.kernel.org ([198.145.29.99]:53544 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726605AbgCLWfz (ORCPT ); Thu, 12 Mar 2020 18:35:55 -0400 Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0695A206CD; Thu, 12 Mar 2020 22:35:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1584052554; bh=SVoMG37MrUK63/u/gCSAUXqThKcGoGVHVk074lebFHk=; h=Date:From:To:Subject:In-Reply-To:From; b=0ZfZpGNm2rYn4BFJ0mQv+kkQCVsmLp68YJN8kccVuEzAyzXTSrW2u2wZWhdYXWRyJ adBE1hrhI2Zluk7ox5v9AUKSW2+LYIZXqtc2ADWE5rcdKI63fgh48p7mYmGZJhPYyi SQ05YE4vkqqfmSWkih3hMeLHLobG8kpG0czU7NNw= Date: Thu, 12 Mar 2020 15:35:53 -0700 From: Andrew Morton To: chris@chrisdown.name, guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, mm-commits@vger.kernel.org, natechancellor@gmail.com, stable@vger.kernel.org, tj@kernel.org Subject: + mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh.patch added to -mm tree Message-ID: <20200312223553.Bupednvca%akpm@linux-foundation.org> In-Reply-To: <20200305222751.6d781a3f2802d79510941e4e@linux-foundation.org> User-Agent: s-nail v14.8.16 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The patch titled Subject: mm, memcg: throttle allocators based on ancestral memory.high has been added to the -mm tree. Its filename is mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Chris Down Subject: mm, memcg: throttle allocators based on ancestral memory.high Prior to this commit, we only directly check the affected cgroup's memory.high against its usage. However, it's possible that we are being reclaimed as a result of hitting an ancestor memory.high and should be penalised based on that, instead. This patch changes memory.high overage throttling to use the largest overage in its ancestors when considering how many penalty jiffies to charge. This makes sure that we penalise poorly behaving cgroups in the same way regardless of at what level of the hierarchy memory.high was breached. Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high") Reported-by: Johannes Weiner Signed-off-by: Chris Down Cc: Tejun Heo Cc: [5.4.x+] Cc: Michal Hocko Cc: Nathan Chancellor Cc: Roman Gushchin Signed-off-by: Andrew Morton --- mm/memcontrol.c | 93 ++++++++++++++++++++++++++++------------------ 1 file changed, 58 insertions(+), 35 deletions(-) --- a/mm/memcontrol.c~mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh +++ a/mm/memcontrol.c @@ -2308,28 +2308,41 @@ static void high_work_func(struct work_s #define MEMCG_DELAY_SCALING_SHIFT 14 /* - * Scheduled by try_charge() to be executed from the userland return path - * and reclaims memory over the high limit. + * Get the number of jiffies that we should penalise a mischievous cgroup which + * is exceeding its memory.high by checking both it and its ancestors. */ -void mem_cgroup_handle_over_high(void) +static unsigned long calculate_high_delay(struct mem_cgroup *memcg, + unsigned int nr_pages) { - unsigned long usage, high, clamped_high; - unsigned long pflags; - unsigned long penalty_jiffies, overage; - unsigned int nr_pages = current->memcg_nr_pages_over_high; - struct mem_cgroup *memcg; + unsigned long penalty_jiffies; + u64 max_overage = 0; - if (likely(!nr_pages)) - return; + do { + unsigned long usage, high; + u64 overage; + + usage = page_counter_read(&memcg->memory); + high = READ_ONCE(memcg->high); + + /* + * Prevent division by 0 in overage calculation by acting as if + * it was a threshold of 1 page + */ + high = max(high, 1UL); + + overage = usage - high; + overage <<= MEMCG_DELAY_PRECISION_SHIFT; + overage = div64_u64(overage, high); + + if (overage > max_overage) + max_overage = overage; + } while ((memcg = parent_mem_cgroup(memcg)) && + !mem_cgroup_is_root(memcg)); - memcg = get_mem_cgroup_from_mm(current->mm); - reclaim_high(memcg, nr_pages, GFP_KERNEL); - current->memcg_nr_pages_over_high = 0; + if (!max_overage) + return 0; /* - * memory.high is breached and reclaim is unable to keep up. Throttle - * allocators proactively to slow down excessive growth. - * * We use overage compared to memory.high to calculate the number of * jiffies to sleep (penalty_jiffies). Ideally this value should be * fairly lenient on small overages, and increasingly harsh when the @@ -2337,24 +2350,9 @@ void mem_cgroup_handle_over_high(void) * its crazy behaviour, so we exponentially increase the delay based on * overage amount. */ - - usage = page_counter_read(&memcg->memory); - high = READ_ONCE(memcg->high); - - if (usage <= high) - goto out; - - /* - * Prevent division by 0 in overage calculation by acting as if it was a - * threshold of 1 page - */ - clamped_high = max(high, 1UL); - - overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT, - clamped_high); - - penalty_jiffies = ((u64)overage * overage * HZ) - >> (MEMCG_DELAY_PRECISION_SHIFT + MEMCG_DELAY_SCALING_SHIFT); + penalty_jiffies = max_overage * max_overage * HZ; + penalty_jiffies >>= MEMCG_DELAY_PRECISION_SHIFT; + penalty_jiffies >>= MEMCG_DELAY_SCALING_SHIFT; /* * Factor in the task's own contribution to the overage, such that four @@ -2371,7 +2369,32 @@ void mem_cgroup_handle_over_high(void) * application moving forwards and also permit diagnostics, albeit * extremely slowly. */ - penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); + return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES); +} + +/* + * Scheduled by try_charge() to be executed from the userland return path + * and reclaims memory over the high limit. + */ +void mem_cgroup_handle_over_high(void) +{ + unsigned long penalty_jiffies; + unsigned long pflags; + unsigned int nr_pages = current->memcg_nr_pages_over_high; + struct mem_cgroup *memcg; + + if (likely(!nr_pages)) + return; + + memcg = get_mem_cgroup_from_mm(current->mm); + reclaim_high(memcg, nr_pages, GFP_KERNEL); + current->memcg_nr_pages_over_high = 0; + + /* + * memory.high is breached and reclaim is unable to keep up. Throttle + * allocators proactively to slow down excessive growth. + */ + penalty_jiffies = calculate_high_delay(memcg, nr_pages); /* * Don't sleep if the amount of jiffies this memcg owes us is so low