From patchwork Tue Jul 20 14:18:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 481849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADD55C6377A for ; Tue, 20 Jul 2021 14:26:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B76B6120A for ; Tue, 20 Jul 2021 14:26:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238922AbhGTNqE (ORCPT ); Tue, 20 Jul 2021 09:46:04 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:53740 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239787AbhGTNjK (ORCPT ); Tue, 20 Jul 2021 09:39:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626790788; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=TB0+2ckjUrwSxwX0c/mMZfMQH7v5b2QE7Wq//06xzH0=; b=hnNE4IqLyj5YUemS6cRWxVarFdS7W+7EPZHLti+27j1/edgkIpj9tR2Gjy0kGEURaDv86b 9Uh4+rG6yOyrtdEwsRxotLKhJWyo4MGBZZu7zq9rOZKQ4cNJr78rELMZzI4fKc9RRpm0I9 QWG37MDZqMge9E8abZcnJ8GPgW8IF9o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-418-WujaTHGHP0eImPyl6Z3_gQ-1; Tue, 20 Jul 2021 10:19:47 -0400 X-MC-Unique: WujaTHGHP0eImPyl6Z3_gQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CB651100C660; Tue, 20 Jul 2021 14:19:45 +0000 (UTC) Received: from llong.com (ovpn-116-153.rdu2.redhat.com [10.10.116.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id BAE5E369A; Tue, 20 Jul 2021 14:19:40 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli , Frederic Weisbecker , Marcelo Tosatti , =?utf-8?q?Michal_Koutn?= =?utf-8?b?w70=?= , Waiman Long Subject: [PATCH v3 2/9] cgroup/cpuset: Fix a partition bug with hotplug Date: Tue, 20 Jul 2021 10:18:27 -0400 Message-Id: <20210720141834.10624-3-longman@redhat.com> In-Reply-To: <20210720141834.10624-1-longman@redhat.com> References: <20210720141834.10624-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org In cpuset_hotplug_workfn(), the detection of whether the cpu list has been changed is done by comparing the effective cpus of the top cpuset with the cpu_active_mask. However, in the rare case that just all the CPUs in the subparts_cpus are offlined, the detection fails and the partition states are not updated correctly. Fix it by forcing the cpus_updated flag to true in this particular case. Fixes: 4b842da276a8 ("cpuset: Make CPU hotplug work with partition") Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f5fef5516d99..b00982e6f6d8 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3166,6 +3166,13 @@ static void cpuset_hotplug_workfn(struct work_struct *work) cpus_updated = !cpumask_equal(top_cpuset.effective_cpus, &new_cpus); mems_updated = !nodes_equal(top_cpuset.effective_mems, new_mems); + /* + * In the rare case that hotplug removes just all the cpus in + * subparts_cpus, we assumed that cpus are updated. + */ + if (!cpus_updated && top_cpuset.nr_subparts_cpus) + cpus_updated = true; + /* synchronize cpus_allowed to cpu_active_mask */ if (cpus_updated) { spin_lock_irq(&callback_lock); From patchwork Tue Jul 20 14:18:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 481847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64333C07E95 for ; Tue, 20 Jul 2021 14:26:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A06361164 for ; Tue, 20 Jul 2021 14:26:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239820AbhGTNqI (ORCPT ); Tue, 20 Jul 2021 09:46:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46620 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239963AbhGTNjZ (ORCPT ); Tue, 20 Jul 2021 09:39:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626790803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=OSD1ueZb/8AiWAEOmohDX/0RCpwXdDXZ8clzlRhUsiQ=; b=QuVHR451Ket+QFl+wWHXkBiRdokasp9F0fl2iX9myJsbFAyZAjtv93bwhaxJF9phb2P1XU rVd3ZLq/FUtPV7ZTulCMAkPzZrsAB5+5IuxJyYUOzww9xtWNd7dSlhfKdiLG0DBgNArTdz S9h2Ye/xdGKW7FkUc2e6yYqc2X+++8o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-573-AsULK73wOU-E_MuqCx3YSg-1; Tue, 20 Jul 2021 10:19:59 -0400 X-MC-Unique: AsULK73wOU-E_MuqCx3YSg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D6BDA1005E6A; Tue, 20 Jul 2021 14:19:56 +0000 (UTC) Received: from llong.com (ovpn-116-153.rdu2.redhat.com [10.10.116.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E44219C87; Tue, 20 Jul 2021 14:19:51 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli , Frederic Weisbecker , Marcelo Tosatti , =?utf-8?q?Michal_Koutn?= =?utf-8?b?w70=?= , Waiman Long Subject: [PATCH v3 4/9] cgroup/cpuset: Enable event notification when partition become invalid Date: Tue, 20 Jul 2021 10:18:29 -0400 Message-Id: <20210720141834.10624-5-longman@redhat.com> In-Reply-To: <20210720141834.10624-1-longman@redhat.com> References: <20210720141834.10624-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org A valid cpuset partition can become invalid if all its CPUs are offlined or somehow removed. This can happen through external events without "cpuset.cpus.partition" being touched at all. Users that rely on the property of a partition being present do not currently have a simple way to get such an event notified other than constant periodic polling which is both inefficient and cumbersome. To make life easier for those users, event notification is now enabled for "cpuset.cpus.partition" when it goes into or out of an invalid partition state. Suggested-by: Tejun Heo Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 49 ++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 04a6951abe2a..2e34fc5b76f0 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -160,6 +160,9 @@ struct cpuset { */ int use_parent_ecpus; int child_ecpus_count; + + /* Handle for cpuset.cpus.partition */ + struct cgroup_file partition_file; }; /* @@ -263,6 +266,19 @@ static inline int is_partition_root(const struct cpuset *cs) return cs->partition_root_state > 0; } +/* + * Send notification event of partition_root_state change when going into + * or out of PRS_ERROR which may be due to an external event like hotplug. + */ +static inline void notify_partition_change(struct cpuset *cs, + int old_prs, int new_prs) +{ + if ((old_prs == new_prs) || + ((old_prs != PRS_ERROR) && (new_prs != PRS_ERROR))) + return; + cgroup_file_notify(&cs->partition_file); +} + static struct cpuset top_cpuset = { .flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) | (1 << CS_MEM_EXCLUSIVE)), @@ -1148,7 +1164,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, struct cpuset *parent = parent_cs(cpuset); int adding; /* Moving cpus from effective_cpus to subparts_cpus */ int deleting; /* Moving cpus from subparts_cpus to effective_cpus */ - int new_prs; + int old_prs, new_prs; bool part_error = false; /* Partition error? */ percpu_rwsem_assert_held(&cpuset_rwsem); @@ -1184,7 +1200,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, * A cpumask update cannot make parent's effective_cpus become empty. */ adding = deleting = false; - new_prs = cpuset->partition_root_state; + old_prs = new_prs = cpuset->partition_root_state; if (cmd == partcmd_enable) { cpumask_copy(tmp->addmask, cpuset->cpus_allowed); adding = true; @@ -1274,7 +1290,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->subparts_cpus); } - if (!adding && !deleting && (new_prs == cpuset->partition_root_state)) + if (!adding && !deleting && (new_prs == old_prs)) return 0; /* @@ -1302,9 +1318,11 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, parent->nr_subparts_cpus = cpumask_weight(parent->subparts_cpus); - if (cpuset->partition_root_state != new_prs) + if (old_prs != new_prs) cpuset->partition_root_state = new_prs; + spin_unlock_irq(&callback_lock); + notify_partition_change(cpuset, old_prs, new_prs); return cmd == partcmd_update; } @@ -1326,7 +1344,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) struct cpuset *cp; struct cgroup_subsys_state *pos_css; bool need_rebuild_sched_domains = false; - int new_prs; + int old_prs, new_prs; rcu_read_lock(); cpuset_for_each_descendant_pre(cp, pos_css, cs) { @@ -1366,8 +1384,8 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) * update_tasks_cpumask() again for tasks in the parent * cpuset if the parent's subparts_cpus changes. */ - new_prs = cp->partition_root_state; - if ((cp != cs) && new_prs) { + old_prs = new_prs = cp->partition_root_state; + if ((cp != cs) && old_prs) { switch (parent->partition_root_state) { case PRS_DISABLED: /* @@ -1438,10 +1456,11 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) } } - if (new_prs != cp->partition_root_state) + if (new_prs != old_prs) cp->partition_root_state = new_prs; spin_unlock_irq(&callback_lock); + notify_partition_change(cp, old_prs, new_prs); WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); @@ -2023,6 +2042,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) spin_lock_irq(&callback_lock); cs->partition_root_state = new_prs; spin_unlock_irq(&callback_lock); + notify_partition_change(cs, old_prs, new_prs); } free_cpumasks(NULL, &tmpmask); @@ -2708,6 +2728,7 @@ static struct cftype dfl_files[] = { .write = sched_partition_write, .private = FILE_PARTITION_ROOT, .flags = CFTYPE_NOT_ON_ROOT, + .file_offset = offsetof(struct cpuset, partition_file), }, { @@ -3103,11 +3124,17 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) */ if ((parent->partition_root_state == PRS_ERROR) || cpumask_empty(&new_cpus)) { + int old_prs; + update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp); - spin_lock_irq(&callback_lock); - cs->partition_root_state = PRS_ERROR; - spin_unlock_irq(&callback_lock); + old_prs = cs->partition_root_state; + if (old_prs != PRS_ERROR) { + spin_lock_irq(&callback_lock); + cs->partition_root_state = PRS_ERROR; + spin_unlock_irq(&callback_lock); + notify_partition_change(cs, old_prs, PRS_ERROR); + } } cpuset_force_rebuild(); } From patchwork Tue Jul 20 14:18:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 481848 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BEB0C636C8 for ; Tue, 20 Jul 2021 14:26:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 670676120A for ; Tue, 20 Jul 2021 14:26:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239815AbhGTNqH (ORCPT ); Tue, 20 Jul 2021 09:46:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51588 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239960AbhGTNjY (ORCPT ); Tue, 20 Jul 2021 09:39:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626790802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=rZZr3sNUbKZUg7SvSA895cqp7BzbFQI3D2K+Izz5kw4=; b=ULQeEu0GG7w6kW4lFOfG93NPsBzt3o3RsPebEvAqRVc+OjgzDzOKL88BwbDThc47XV/+s1 0SEoCc3sIXBCJTGETXjIUO8bbtjazG6fd7bH14Qr532Zy1SkfuGsaKvU4CMkDU2e8gCPBm cjPYS6wTKBlP9LsLtHBJmJeeedLGn3k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-43-ac5GV_PSOfCLmnCyQdtcsA-1; Tue, 20 Jul 2021 10:20:01 -0400 X-MC-Unique: ac5GV_PSOfCLmnCyQdtcsA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B7697800050; Tue, 20 Jul 2021 14:19:58 +0000 (UTC) Received: from llong.com (ovpn-116-153.rdu2.redhat.com [10.10.116.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 080B560BD8; Tue, 20 Jul 2021 14:19:56 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli , Frederic Weisbecker , Marcelo Tosatti , =?utf-8?q?Michal_Koutn?= =?utf-8?b?w70=?= , Waiman Long Subject: [PATCH v3 5/9] cgroup/cpuset: Clarify the use of invalid partition root Date: Tue, 20 Jul 2021 10:18:30 -0400 Message-Id: <20210720141834.10624-6-longman@redhat.com> In-Reply-To: <20210720141834.10624-1-longman@redhat.com> References: <20210720141834.10624-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org For cpuset partition, the special state of PRS_ERROR (invalid partition root) was originally designed to handle hotplug events. In this state, CPUs allocated to the partition root is released back to the parent but the cpuset exclusive flags remain unchanged. Since partition root sets the CPU_EXCLUSIVE flag, cpuset.cpus changes that break the cpu exclusivity rule will not be allowed. However, other changes to cpuset.cpus on a partition root may still cause it to become invalid. This is undesriable as we don't want accidental change to cpuset.cpus to invalidate a partition root. Additional checks are now added to make sure that regular cpuset control file manipulations are not allowed to make a partition root invalid. These additional checks are: 1) A partition root can't be changed to member if it has child partition roots. 2) Removing CPUs from cpuset.cpus that causes it to become invalid is not allowed. Comments are also added to clarify that a partition root becomes invalid only when an external event like hotplug that causes all the CPUs allocated to a partition root to become unavailable. Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 144 ++++++++++++++++++++++++----------------- 1 file changed, 86 insertions(+), 58 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 2e34fc5b76f0..c16060b703cc 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -177,7 +177,9 @@ struct cpuset { * subparts_cpus. In this case, the cpuset is not a real partition * root anymore. However, the CPU_EXCLUSIVE bit will still be set * and the cpuset can be restored back to a partition root if the - * parent cpuset can give more CPUs back to this child cpuset. + * parent cpuset can give more CPUs back to this child cpuset. A + * partition root becomes invalid when all its cpus become unavailable + * like being offlined. */ #define PRS_DISABLED 0 #define PRS_ENABLED 1 @@ -1211,6 +1213,15 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, /* * partcmd_update with newmask: * + * Return error if newmask isn't a subset of + * (cpus_allowed | parent->effective_cpus). + */ + cpumask_or(tmp->addmask, cpuset->cpus_allowed, + parent->effective_cpus); + if (!cpumask_subset(newmask, tmp->addmask)) + return -EINVAL; + + /* * delmask = cpus_allowed & ~newmask & parent->subparts_cpus * addmask = newmask & parent->effective_cpus * & ~parent->subparts_cpus @@ -1223,7 +1234,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, adding = cpumask_andnot(tmp->addmask, tmp->addmask, parent->subparts_cpus); /* - * Return error if the new effective_cpus could become empty. + * Return error if parent's effective_cpus could become empty. */ if (adding && cpumask_equal(parent->effective_cpus, tmp->addmask)) { @@ -1239,20 +1250,35 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd, return -EINVAL; cpumask_copy(tmp->addmask, parent->effective_cpus); } + + /* + * Return error if effective_cpus becomes empty or any CPU + * distributed to child partitions is deleted. + */ + if (deleting && + (cpumask_intersects(tmp->delmask, cpuset->subparts_cpus) || + cpumask_equal(tmp->delmask, cpuset->effective_cpus))) + return -EBUSY; } else { /* * partcmd_update w/o newmask: * * addmask = cpus_allowed & parent->effective_cpus * + * This gets invoked either due to a hotplug event or + * from update_cpumasks_hier() where we can't return an + * error. This can cause a partition root to become invalid + * in the case of a hotplug. + * * Note that parent's subparts_cpus may have been * pre-shrunk in case there is a change in the cpu list. * So no deletion is needed. */ adding = cpumask_and(tmp->addmask, cpuset->cpus_allowed, parent->effective_cpus); - part_error = cpumask_equal(tmp->addmask, - parent->effective_cpus); + part_error = (is_partition_root(cpuset) && + !parent->nr_subparts_cpus) || + cpumask_equal(tmp->addmask, parent->effective_cpus); } if (cmd == partcmd_update) { @@ -1427,33 +1453,27 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) spin_lock_irq(&callback_lock); - cpumask_copy(cp->effective_cpus, tmp->new_cpus); if (cp->nr_subparts_cpus && (new_prs != PRS_ENABLED)) { + /* + * Put all active subparts_cpus back to effective_cpus. + */ + cpumask_or(tmp->new_cpus, tmp->new_cpus, + cp->subparts_cpus); + cpumask_and(tmp->new_cpus, tmp->new_cpus, + cpu_active_mask); cp->nr_subparts_cpus = 0; cpumask_clear(cp->subparts_cpus); - } else if (cp->nr_subparts_cpus) { + } + + cpumask_copy(cp->effective_cpus, tmp->new_cpus); + if (cp->nr_subparts_cpus) { /* * Make sure that effective_cpus & subparts_cpus - * are mutually exclusive. - * - * In the unlikely event that effective_cpus - * becomes empty. we clear cp->nr_subparts_cpus and - * let its child partition roots to compete for - * CPUs again. + * of a partition root are mutually exclusive. */ cpumask_andnot(cp->effective_cpus, cp->effective_cpus, cp->subparts_cpus); - if (cpumask_empty(cp->effective_cpus)) { - cpumask_copy(cp->effective_cpus, tmp->new_cpus); - cpumask_clear(cp->subparts_cpus); - cp->nr_subparts_cpus = 0; - } else if (!cpumask_subset(cp->subparts_cpus, - tmp->new_cpus)) { - cpumask_andnot(cp->subparts_cpus, - cp->subparts_cpus, tmp->new_cpus); - cp->nr_subparts_cpus - = cpumask_weight(cp->subparts_cpus); - } + WARN_ON_ONCE(cpumask_empty(cp->effective_cpus)); } if (new_prs != old_prs) @@ -1462,7 +1482,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) spin_unlock_irq(&callback_lock); notify_partition_change(cp, old_prs, new_prs); - WARN_ON(!is_in_v2_mode() && + WARN_ON_ONCE(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); update_tasks_cpumask(cp); @@ -1585,8 +1605,8 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, * Make sure that subparts_cpus is a subset of cpus_allowed. */ if (cs->nr_subparts_cpus) { - cpumask_andnot(cs->subparts_cpus, cs->subparts_cpus, - cs->cpus_allowed); + cpumask_and(cs->subparts_cpus, cs->subparts_cpus, + cs->cpus_allowed); cs->nr_subparts_cpus = cpumask_weight(cs->subparts_cpus); } spin_unlock_irq(&callback_lock); @@ -2008,20 +2028,26 @@ static int update_prstate(struct cpuset *cs, int new_prs) } } else { /* - * Turning off partition root will clear the - * CS_CPU_EXCLUSIVE bit. + * Switch back to member is always allowed if PRS_ERROR. */ if (old_prs == PRS_ERROR) { - update_flag(CS_CPU_EXCLUSIVE, cs, 0); err = 0; - goto out; + goto reset_flag; } + /* + * A partition root cannot be reverted to member if some + * CPUs have been distributed to child partition roots. + */ + if (!cpumask_empty(cs->subparts_cpus)) + return -EBUSY; + err = update_parent_subparts_cpumask(cs, partcmd_disable, NULL, &tmpmask); if (err) goto out; +reset_flag: /* Turning off CS_CPU_EXCLUSIVE will not return error */ update_flag(CS_CPU_EXCLUSIVE, cs, 0); } @@ -3103,11 +3129,28 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) /* * In the unlikely event that a partition root has empty - * effective_cpus or its parent becomes erroneous, we have to - * transition it to the erroneous state. + * effective_cpus, we will have to force any child partitions, + * if present, to become invalid by setting nr_subparts_cpus to 0 + * without causing itself to become invalid. + */ + if (is_partition_root(cs) && cs->nr_subparts_cpus && + cpumask_empty(&new_cpus)) { + cs->nr_subparts_cpus = 0; + cpumask_clear(cs->subparts_cpus); + compute_effective_cpumask(&new_cpus, cs, parent); + } + + /* + * If empty effective_cpus or zero nr_subparts_cpus or its parent + * becomes erroneous, we have to transition it to the erroneous state. */ if (is_partition_root(cs) && (cpumask_empty(&new_cpus) || - (parent->partition_root_state == PRS_ERROR))) { + (parent->partition_root_state == PRS_ERROR) || + !parent->nr_subparts_cpus)) { + int old_prs; + + update_parent_subparts_cpumask(cs, partcmd_disable, + NULL, tmp); if (cs->nr_subparts_cpus) { spin_lock_irq(&callback_lock); cs->nr_subparts_cpus = 0; @@ -3116,38 +3159,23 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) compute_effective_cpumask(&new_cpus, cs, parent); } - /* - * If the effective_cpus is empty because the child - * partitions take away all the CPUs, we can keep - * the current partition and let the child partitions - * fight for available CPUs. - */ - if ((parent->partition_root_state == PRS_ERROR) || - cpumask_empty(&new_cpus)) { - int old_prs; - - update_parent_subparts_cpumask(cs, partcmd_disable, - NULL, tmp); - old_prs = cs->partition_root_state; - if (old_prs != PRS_ERROR) { - spin_lock_irq(&callback_lock); - cs->partition_root_state = PRS_ERROR; - spin_unlock_irq(&callback_lock); - notify_partition_change(cs, old_prs, PRS_ERROR); - } + old_prs = cs->partition_root_state; + if (old_prs != PRS_ERROR) { + spin_lock_irq(&callback_lock); + cs->partition_root_state = PRS_ERROR; + spin_unlock_irq(&callback_lock); + notify_partition_change(cs, old_prs, PRS_ERROR); } cpuset_force_rebuild(); } /* * On the other hand, an erroneous partition root may be transitioned - * back to a regular one or a partition root with no CPU allocated - * from the parent may change to erroneous. + * back to a regular one. */ - if (is_partition_root(parent) && - ((cs->partition_root_state == PRS_ERROR) || - !cpumask_intersects(&new_cpus, parent->subparts_cpus)) && - update_parent_subparts_cpumask(cs, partcmd_update, NULL, tmp)) + else if (is_partition_root(parent) && + (cs->partition_root_state == PRS_ERROR) && + update_parent_subparts_cpumask(cs, partcmd_update, NULL, tmp)) cpuset_force_rebuild(); update_tasks: From patchwork Tue Jul 20 14:18:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 481846 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER, INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADD9DC07E9B for ; Tue, 20 Jul 2021 14:27:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 982CC61164 for ; Tue, 20 Jul 2021 14:27:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239856AbhGTNqN (ORCPT ); Tue, 20 Jul 2021 09:46:13 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:38388 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240059AbhGTNjs (ORCPT ); Tue, 20 Jul 2021 09:39:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626790824; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:in-reply-to:in-reply-to:references:references; bh=3Bali8rkEps7iXEarLyy6nashKZ29ZfCD2TM56N0zaI=; b=UeqpbPbIEHellL/03s8fftVFyl566k8MIj3razfuOnav/buEEJ55N67ZfeuFh4V9h3UwDh L6lHify7thkhG53LzEj8gvxSPzd74bNHMpUbICV3yrRGv3x03bIQFdsMy/zIFdFampdHaA 9hh8gI0oSd9T+odh8lLh7l5rbb1Mv0A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-90-0sglr9-zOWS3UqbaUwzhmg-1; Tue, 20 Jul 2021 10:20:23 -0400 X-MC-Unique: 0sglr9-zOWS3UqbaUwzhmg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1BFEB1084F4C; Tue, 20 Jul 2021 14:20:21 +0000 (UTC) Received: from llong.com (ovpn-116-153.rdu2.redhat.com [10.10.116.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 586A8369A; Tue, 20 Jul 2021 14:20:19 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Phil Auld , Peter Zijlstra , Juri Lelli , Frederic Weisbecker , Marcelo Tosatti , =?utf-8?q?Michal_Koutn?= =?utf-8?b?w70=?= , Waiman Long Subject: [PATCH v3 8/9] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst Date: Tue, 20 Jul 2021 10:18:33 -0400 Message-Id: <20210720141834.10624-9-longman@redhat.com> In-Reply-To: <20210720141834.10624-1-longman@redhat.com> References: <20210720141834.10624-1-longman@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org Update Documentation/admin-guide/cgroup-v2.rst on the newly introduced "isolated" cpuset partition type as well as the ability to create non-top cpuset partition with no cpu allocated to it. Signed-off-by: Waiman Long --- Documentation/admin-guide/cgroup-v2.rst | 94 +++++++++++++++---------- 1 file changed, 58 insertions(+), 36 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 5c7377b5bd3e..2e101a353ab1 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2080,8 +2080,9 @@ Cpuset Interface Files It accepts only the following input values when written to. ======== ================================ - "root" a partition root - "member" a non-root member of a partition + "member" Non-root member of a partition + "root" Partition root + "isolated" Partition root without load balancing ======== ================================ When set to be a partition root, the current cgroup is the @@ -2090,9 +2091,14 @@ Cpuset Interface Files partition roots themselves and their descendants. The root cgroup is always a partition root. - There are constraints on where a partition root can be set. - It can only be set in a cgroup if all the following conditions - are true. + When set to "isolated", the CPUs in that partition root will + be in an isolated state without any load balancing from the + scheduler. Tasks in such a partition must be explicitly bound + to each individual CPU. + + There are constraints on where a partition root can be set + ("root" or "isolated"). It can only be set in a cgroup if all + the following conditions are true. 1) The "cpuset.cpus" is not empty and the list of CPUs are exclusive, i.e. they are not shared by any of its siblings. @@ -2103,51 +2109,67 @@ Cpuset Interface Files eliminating corner cases that have to be handled if such a condition is allowed. - Setting it to partition root will take the CPUs away from the - effective CPUs of the parent cgroup. Once it is set, this + Setting it to a partition root will take the CPUs away from + the effective CPUs of the parent cgroup. Once it is set, this file cannot be reverted back to "member" if there are any child cgroups with cpuset enabled. - A parent partition cannot distribute all its CPUs to its - child partitions. There must be at least one cpu left in the - parent partition. + A parent partition may distribute all its CPUs to its child + partitions as long as it is not the root cgroup and there is no + task directly associated with that parent partition. Otherwise, + there must be at least one cpu left in the parent partition. + A new task cannot be moved to a partition root with no effective + cpu. + + Once becoming a partition root, changes to "cpuset.cpus" + is generally allowed as long as the first condition above + (cpu exclusivity rule) is true. Other constraints for this + operation are as follows. - Once becoming a partition root, changes to "cpuset.cpus" is - generally allowed as long as the first condition above is true, - the change will not take away all the CPUs from the parent - partition and the new "cpuset.cpus" value is a superset of its - children's "cpuset.cpus" values. + 1) Any newly added CPUs must be a subset of the parent's + "cpuset.cpus.effective". + 2) Taking away all the CPUs from the parent's "cpuset.cpus.effective" + is only allowed if there is no task associated with the + parent partition. + 3) Deletion of CPUs that have been distributed to child partition + roots are not allowed. Sometimes, external factors like changes to ancestors' "cpuset.cpus" or cpu hotplug can cause the state of the partition - root to change. On read, the "cpuset.sched.partition" file - can show the following values. + root to change. On read, the "cpuset.cpus.partition" file can + show the following values. ============== ============================== "member" Non-root member of a partition "root" Partition root + "isolated" Partition root without load balancing "root invalid" Invalid partition root ============== ============================== - It is a partition root if the first 2 partition root conditions - above are true and at least one CPU from "cpuset.cpus" is - granted by the parent cgroup. - - A partition root can become invalid if none of CPUs requested - in "cpuset.cpus" can be granted by the parent cgroup or the - parent cgroup is no longer a partition root itself. In this - case, it is not a real partition even though the restriction - of the first partition root condition above will still apply. - The cpu affinity of all the tasks in the cgroup will then be - associated with CPUs in the nearest ancestor partition. - - An invalid partition root can be transitioned back to a - real partition root if at least one of the requested CPUs - can now be granted by its parent. In this case, the cpu - affinity of all the tasks in the formerly invalid partition - will be associated to the CPUs of the newly formed partition. - Changing the partition state of an invalid partition root to - "member" is always allowed even if child cpusets are present. + A partition root becomes invalid if all the CPUs requested in + "cpuset.cpus" become unavailable. This can happen if all the + CPUs have been offlined, or the state of an ancestor partition + root become invalid. In this case, it is not a real partition + even though the restriction of the cpu exclusivity rule will + still apply. The cpu affinity of all the tasks in the cgroup + will then be associated with CPUs in the nearest ancestor + partition. + + In the special case of a parent partition competing with a child + partition for the only CPU left, the parent partition wins and + the child partition becomes invalid. + + An invalid partition root can be transitioned back to a real + partition root if at least one of the requested CPUs become + available again. In this case, the cpu affinity of all the tasks + in the formerly invalid partition will be associated to the CPUs + of the newly formed partition. Changing the partition state of + an invalid partition root to "member" is always allowed even if + child cpusets are present. However changing a partition root back + to member will not be allowed if child partitions are present. + + Poll and inotify events are triggered when transition to or + from invalid partition root happens. Device controller