[v2,0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)'

Message ID	20201210080844.23741-1-sjpark@amazon.com
Headers	show Return-Path: <netdev-owner@kernel.org> From: SeongJae Park <sjpark@amazon.com> To: <davem@davemloft.net> CC: SeongJae Park <sjpark@amazon.de>, <kuba@kernel.org>, <kuznet@ms2.inr.ac.ru>, <edumazet@google.com>, <fw@strlen.de>, <paulmck@kernel.org>, <netdev@vger.kernel.org>, <rcu@vger.kernel.org>, <linux-kernel@vger.kernel.org> Subject: [PATCH v2 0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' Date: Thu, 10 Dec 2020 09:08:43 +0100 Message-ID: <20201210080844.23741-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' \| expand [v2,0/1] net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' [v2,1/1] net/ipv4/inet_fragment: Batch fqdir destroy works

Message ID

20201210080844.23741-1-sjpark@amazon.com

Headers

From: SeongJae Park <sjpark@amazon.com>
To: <davem@davemloft.net>
CC: SeongJae Park <sjpark@amazon.de>, <kuba@kernel.org>,
	<kuznet@ms2.inr.ac.ru>, <edumazet@google.com>, <fw@strlen.de>,
	<paulmck@kernel.org>, <netdev@vger.kernel.org>,
	<rcu@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v2 0/1] net: Reduce rcu_barrier() contentions from
	'unshare(CLONE_NEWNET)'
Date: Thu, 10 Dec 2020 09:08:43 +0100
Message-ID: <20201210080844.23741-1-sjpark@amazon.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

net: Reduce rcu_barrier() contentions from 'unshare(CLONE_NEWNET)' | expand

Message

SeongJae Park Dec. 10, 2020, 8:08 a.m. UTC

From: SeongJae Park <sjpark@amazon.de>

On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster.  It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
about 2 minutes.  On 40 CPU cores, 70GB DRAM machine, it reduced about
15GB of available memory in total.  Note that the issue don't reproduce
on every machine.  On my 6 CPU cores machine, the problem didn't
reproduce.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.

Therefore, 'fqdir_work_fn()' called frequently under the workload and
made the contention for 'rcu_barrier()' high.  In more detail, the
global mutex, 'rcu_state.barrier_mutex' became the bottleneck.

I tried making 'fqdir_work_fn()' batched and confirmed it works.  The
following patch is for the change.  I think this is the right solution
for point fix of this issue, but someone might blame different parts.

1. User: Frequent 'unshare()' calls