Message ID | 20210825213750.6933-1-longman@redhat.com |
---|---|
Headers | show |
Series | cgroup/cpuset: Add new cpuset partition type & empty effecitve cpus | expand |
Hello, Waiman. Let's stop iterating on the patchset until we reach a consensus. On Wed, Aug 25, 2021 at 05:37:49PM -0400, Waiman Long wrote: > 1) The "cpuset.cpus" is not empty and the list of CPUs are > exclusive, i.e. they are not shared by any of its siblings. Part of it can be reached by cpus going offline. > 2) The parent cgroup is a partition root. This condition can happen if a parent stop being a partition. > - 3) The "cpuset.cpus" is also a proper subset of the parent's > + 3) The "cpuset.cpus" is a subset of the parent's > "cpuset.cpus.effective". This can happen if cpus go offline. > 4) There is no child cgroups with cpuset enabled. This is for > eliminating corner cases that have to be handled if such a > condition is allowed. This may make sense as a short cut for us but doesn't really stem from interface or behavior requirements. Of the four conditions listed, two are bogus (the states can be reached through a different path and the configuration success or failure can be timing dependent if configuration racaes against cpu hotplug operations) and one maybe makes sense half-way and one is more of a shortcut. Can't we just replace these with transitions to invalid state with proper explanation? That'd get rid of the error handling duplications from both the kernel and user side, make automated configurations which may race against hot plug operations reliable, and consistently provide users with why something failed. Thank you. -- tejun
On 8/26/21 1:35 PM, Tejun Heo wrote: > Hello, Waiman. > > Let's stop iterating on the patchset until we reach a consensus. > > On Wed, Aug 25, 2021 at 05:37:49PM -0400, Waiman Long wrote: >> 1) The "cpuset.cpus" is not empty and the list of CPUs are >> exclusive, i.e. they are not shared by any of its siblings. > Part of it can be reached by cpus going offline. > >> 2) The parent cgroup is a partition root. > This condition can happen if a parent stop being a partition. > >> - 3) The "cpuset.cpus" is also a proper subset of the parent's >> + 3) The "cpuset.cpus" is a subset of the parent's >> "cpuset.cpus.effective". > This can happen if cpus go offline. > >> 4) There is no child cgroups with cpuset enabled. This is for >> eliminating corner cases that have to be handled if such a >> condition is allowed. > This may make sense as a short cut for us but doesn't really stem from > interface or behavior requirements. > > Of the four conditions listed, two are bogus (the states can be > reached through a different path and the configuration success or > failure can be timing dependent if configuration racaes against cpu > hotplug operations) and one maybe makes sense half-way and one is more > of a shortcut. > > Can't we just replace these with transitions to invalid state with > proper explanation? That'd get rid of the error handling duplications > from both the kernel and user side, make automated configurations > which may race against hot plug operations reliable, and consistently > provide users with why something failed. What I am doing here is setting a high bar for transitioning from member to either "root" or "isolated". Once it becomes a partition, there are multiple ways that can make it invalid. I am fine with that. However, I am not sure it is a good idea to allow users to echo "root" to cpuset.cpus.partition anywhere in the cgroup hierarchy and require them to read it back to see if it succeed. All the checking are done with cpuset_rwsem held. So there shouldn't be any racing. Of course, a hotplug can immediately follow and make the partition invalid. Cheers, Longman
Hello, On Thu, Aug 26, 2021 at 11:01:30PM -0400, Waiman Long wrote: > What I am doing here is setting a high bar for transitioning from member to > either "root" or "isolated". Once it becomes a partition, there are multiple > ways that can make it invalid. I am fine with that. However, I am not sure > it is a good idea to allow users to echo "root" to cpuset.cpus.partition > anywhere in the cgroup hierarchy and require them to read it back to see if > it succeed. The problem is that the "high" bar is rather arbitrary. It might feel like a good idea to some but not to others. There are no clear technical reasons or principles for rules to be set this particular way. > All the checking are done with cpuset_rwsem held. So there shouldn't be any > racing. Of course, a hotplug can immediately follow and make the partition > invalid. Imagine a system which dynamically on/offlines its cpus based on load or whatever and also configures partitions for cases where the needed cpus are online. If the partitions are set up while the cpus are online, it'd work as expected - partitions are in effect when the system can support them and ignored otherwise. However, if the partition configuration is attempted while the cpus happen to be offline, the configuration will fail, and there is no guaranteed way to make that configuration stick short of disabling hotplug operations. This is a pretty jarring brekage happening exactly because the behavior is an inconsistent amalgam. It's usually not a good sign if interface restrictions can be added or removed because how one feels without clear functional reasons and often indicates that there's something broken, which seems to be the case here too. Thanks. -- tejun
On 8/27/21 12:00 AM, Tejun Heo wrote: > Hello, > > On Thu, Aug 26, 2021 at 11:01:30PM -0400, Waiman Long wrote: >> What I am doing here is setting a high bar for transitioning from member to >> either "root" or "isolated". Once it becomes a partition, there are multiple >> ways that can make it invalid. I am fine with that. However, I am not sure >> it is a good idea to allow users to echo "root" to cpuset.cpus.partition >> anywhere in the cgroup hierarchy and require them to read it back to see if >> it succeed. > The problem is that the "high" bar is rather arbitrary. It might feel like a > good idea to some but not to others. There are no clear technical reasons or > principles for rules to be set this particular way. > >> All the checking are done with cpuset_rwsem held. So there shouldn't be any >> racing. Of course, a hotplug can immediately follow and make the partition >> invalid. > Imagine a system which dynamically on/offlines its cpus based on load or > whatever and also configures partitions for cases where the needed cpus are > online. If the partitions are set up while the cpus are online, it'd work as > expected - partitions are in effect when the system can support them and > ignored otherwise. However, if the partition configuration is attempted > while the cpus happen to be offline, the configuration will fail, and there > is no guaranteed way to make that configuration stick short of disabling > hotplug operations. This is a pretty jarring brekage happening exactly > because the behavior is an inconsistent amalgam. > > It's usually not a good sign if interface restrictions can be added or > removed because how one feels without clear functional reasons and often > indicates that there's something broken, which seems to be the case here > too. Well, that is a valid point. The cpus may have been offlined when a partition is being created. I can certainly relent on this check in forming a partition. IOW, cpus_allowed can contain some or all offline cpus and a valid (some are online) or invalid (all are offline) partition can be formed. I can also allow an invalid child partition to be formed with an invalid parent partition. However, the cpu exclusivity rules will still apply. Other than that, do you envision any other circumstances where we should allow an invalid partition to be formed? Cheers, Longman
Hello, On Fri, Aug 27, 2021 at 05:19:31PM -0400, Waiman Long wrote: > Well, that is a valid point. The cpus may have been offlined when a > partition is being created. I can certainly relent on this check in forming > a partition. IOW, cpus_allowed can contain some or all offline cpus and a > valid (some are online) or invalid (all are offline) partition can be > formed. I can also allow an invalid child partition to be formed with an > invalid parent partition. However, the cpu exclusivity rules will still > apply. > > Other than that, do you envision any other circumstances where we should > allow an invalid partition to be formed? Now that most restrictions are removed from configuration side, just go all the way? Given that the user must check the status afterwards anyway, I don't see technical or even usability reasons for leaving some pieces behind. Going all the way would be easier to use too - bang in the target config and read the resulting state to reliably find out why a partition isn't valid, especially if we list *all* the reasons so that the user can tell whether the configuration is as intended immediately. Thanks. -- tejun
On 8/27/21 5:27 PM, Tejun Heo wrote: > Hello, > > On Fri, Aug 27, 2021 at 05:19:31PM -0400, Waiman Long wrote: >> Well, that is a valid point. The cpus may have been offlined when a >> partition is being created. I can certainly relent on this check in forming >> a partition. IOW, cpus_allowed can contain some or all offline cpus and a >> valid (some are online) or invalid (all are offline) partition can be >> formed. I can also allow an invalid child partition to be formed with an >> invalid parent partition. However, the cpu exclusivity rules will still >> apply. >> >> Other than that, do you envision any other circumstances where we should >> allow an invalid partition to be formed? > Now that most restrictions are removed from configuration side, just go all > the way? Given that the user must check the status afterwards anyway, I > don't see technical or even usability reasons for leaving some pieces > behind. Going all the way would be easier to use too - bang in the target > config and read the resulting state to reliably find out why a partition > isn't valid, especially if we list *all* the reasons so that the user > tell whether the configuration is as intended immediately. The cpu exclusivity rule is due to the setting of CPU_EXCLUSIVE bit. This is a pre-existing condition unless you want to change how the cpuset.cpu_exclusive works. So the new rules will be: 1) The "cpuset.cpus" is not empty and the list of CPUs are exclusive. 2) The parent cgroup is a partition root (can be an invalid one). 3) The "cpuset.cpus" is a subset of the parent's cpuset.cpus.allowed. 4) No child cgroup with cpuset enabled. I think they are reasonable. What do you think? Cheers, Longman
Hello, On Fri, Aug 27, 2021 at 06:50:10PM -0400, Waiman Long wrote: > The cpu exclusivity rule is due to the setting of CPU_EXCLUSIVE bit. This is > a pre-existing condition unless you want to change how the > cpuset.cpu_exclusive works. > > So the new rules will be: > > 1) The "cpuset.cpus" is not empty and the list of CPUs are exclusive. Empty cpu list can be considered an exclusive one. > 2) The parent cgroup is a partition root (can be an invalid one). Does this mean a partition parent can't stop being a partition if one or more of its children become partitions? If so, it violates the rule that a descendant shouldn't be able to restrict what its ancestors can do. > 3) The "cpuset.cpus" is a subset of the parent's cpuset.cpus.allowed. Why not just go by effective? This would mean that a parent can't withdraw CPUs from its allowed set once descendants are configured. Restrictions like this are fine when the entire hierarchy is configured by a single entity but become awkward when configurations are multi-tiered, automated and dynamic. > 4) No child cgroup with cpuset enabled. idk, maybe? I'm having a hard time seeing the point in adding these restrictions when the state transitions are asynchronous anyway. Would it help if we try to separate what's absoluately and technically necessary and what seems reasonable or high bar and try to justify why each of the latter should be added? Thanks. -- tejun
On 8/27/21 7:35 PM, Tejun Heo wrote: > Hello, > > On Fri, Aug 27, 2021 at 06:50:10PM -0400, Waiman Long wrote: >> The cpu exclusivity rule is due to the setting of CPU_EXCLUSIVE bit. This is >> a pre-existing condition unless you want to change how the >> cpuset.cpu_exclusive works. >> >> So the new rules will be: >> >> 1) The "cpuset.cpus" is not empty and the list of CPUs are exclusive. > Empty cpu list can be considered an exclusive one. It doesn't make sense to me to have a partition with no cpu configured at all. I very much prefer the users to set cpuset.cpus first before turning it into a partition. > >> 2) The parent cgroup is a partition root (can be an invalid one). > Does this mean a partition parent can't stop being a partition if one or > more of its children become partitions? If so, it violates the rule that a > descendant shouldn't be able to restrict what its ancestors can do. No. As I said in the documentation, transitioning from partition root to member is allowed. Against, it is illogical to allow a cpuset to become a potential partition if it parent is not even a partition root at all. In the case that the parent is reverted back to a member, the child partitions will stay invalid forever unless the parent become a valid partition again. > >> 3) The "cpuset.cpus" is a subset of the parent's cpuset.cpus.allowed. > Why not just go by effective? This would mean that a parent can't withdraw > CPUs from its allowed set once descendants are configured. Restrictions like > this are fine when the entire hierarchy is configured by a single entity but > become awkward when configurations are multi-tiered, automated and dynamic. The original rule is to be based on effective cpus. However, to properly handle the case of allowing offlined cpus to be included in the partition, I have to change it to cpu_allowed instead. I can certainly change it back to effective if you prefer. > >> 4) No child cgroup with cpuset enabled. > idk, maybe? I'm having a hard time seeing the point in adding these > restrictions when the state transitions are asynchronous anyway. Would it > help if we try to separate what's absoluately and technically necessary and > what seems reasonable or high bar and try to justify why each of the latter > should be added? This rule is there mainly for ease of implementation. Otherwise, I need to add additional code to handle the conversion of child cpusets which can be rather complex and require a lot more debugging. This rule will no longer apply once the cpuset becomes a partition root. Cheers, Longman
Hello. On Fri, Aug 27, 2021 at 06:50:10PM -0400, Waiman Long <llong@redhat.com> wrote: > So the new rules will be: When I followed the thread, it seemed to me you're talking past each other a bit. I'd suggest the following terminology: - config space: what's written by the user and saved, - reality space: what's currently available (primarily subject to on-/offlinng but I think it'd be helpful to consider here also what's given by the parent), - effect space: what's actually possible and happening. Not all elements of config_space x reality_space (Cartesian product) can be represented in the effect_space (e.g. root partition with no (effective) cpus). IIUC, Waiman's "high bar" is supposed to be defined over transitions in the config_space. However, there can be independent changes in the reality_space so the rules should be actually formulated in the effect_space: The conditions for being a valid partition root rewritten into the effect space: > 1) The "cpuset.cpus" is not empty and the list of CPUs are exclusive. - effective CPUs are non-empty and exclusive wrt siblings - (E.g. setting empty cpuset.cpus might be possible but it invalidates the partition root, same as offlining or removal by an ancestor.) > 2) The parent cgroup is a partition root (can be an invalid one). - parent cgroup is a (valid) partition - (Being valid partition means owning "stolen" cpus from the parent, if the parent is not valid partition itself, you can't steal what is not owned.) - (And I think it's OK that: "the child partitions will stay invalid forever unless the parent become a valid partition again" [1].) > 3) The "cpuset.cpus" is a subset of the parent's cpuset.cpus.allowed. - I'm not sure what is the use of this condition (together with the rewrite of the 1st condition which covers effective cpus). I think it would make sense if being a valid parition root guaranteed that all configured cpuset.cpus will be available, however, that's not the case IIUC (e.g. due to offlining). > 4) No child cgroup with cpuset enabled. - A child cgroup with cpuset enabled is OK in the effect space (achievable by switching first and creating children later). - For technical reasons this may be a condition on the transitions in the config_space. Generally, most config changes should succeed and user should check (or watch) how they landed in combination with the reality_space. Regards, Michal [1] This follows the general model where ancestors can "preempt" resources from their subtree.