Message ID | 1481593143-18756-1-git-send-email-john.stultz@linaro.org |
---|---|
State | New |
Headers | show |
On Mon, Dec 12, 2016 at 5:39 PM, John Stultz <john.stultz@linaro.org> wrote: > This patch adds CAP_GROUP_MIGRATE and logic to allows a process > to migrate other tasks between cgroups. > > In Android (where this feature originated), the ActivityManager > tracks various application states (TOP_APP, FOREGROUND, > BACKGROUND, SYSTEM, etc), and then as applications change > states, the SchedPolicy logic will migrate the application tasks > between different cgroups used to control the different > application states (for example, there is a background cpuset > cgroup which can limit background tasks to stay on one low-power > cpu, and the bg_non_interactive cpuctrl cgroup can then further > limit those background tasks to a small percentage of that one > cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > I feel the approach taken there overloads CAP_SYS_NICE a bit much > for non-android environments. Efforts to re-use CAP_SYS_RESOURCE > for this purpose (which Android has since adopted) was also > stymied by concerns about risks from future cgroups that could be > considered "dangerous" by how they might change system semantics. > > So to avoid overlapping usage, this patch adds a brand new > process capability flag (CAP_CGROUP_MIGRATE), and uses it when > checking if a task can migrate other tasks between cgroups. > > I've tested this with AOSP master (though its a bit hacked in as > I still need to properly get the selinux bits aware of the new > capability bit) with selinux set to permissive and it seems to be > working well. > > Thoughts and feedback would be appreciated! > > Cc: Tejun Heo <tj@kernel.org> > Cc: Li Zefan <lizefan@huawei.com> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team <kernel-team@android.com> > Cc: Rom Lemarchand <romlem@android.com> > Cc: Colin Cross <ccross@android.com> > Cc: Dmitry Shmidt <dimitrysh@google.com> > Cc: Todd Kjos <tkjos@google.com> > Cc: Christian Poetzsch <christian.potzsch@imgtec.com> > Cc: Amit Pundir <amit.pundir@linaro.org> > Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Serge E. Hallyn <serge@hallyn.com> > Cc: Andy Lutomirski <luto@amacapital.net> > Cc: linux-api@vger.kernel.org > Acked-by: Serge Hallyn <serge@hallyn.com> After sending this I just realized that this is changed enough I should probably remove Serge's Acked-by here. Apologies. But otherwise feedback on this would be appreciated! thanks -john
Hi John, On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: > This patch adds CAP_GROUP_MIGRATE and logic to allows a process s/CAP_GROUP_MIGRATE/CAP_CGROUP_MIGRATE/ > to migrate other tasks between cgroups. > > In Android (where this feature originated), the ActivityManager > tracks various application states (TOP_APP, FOREGROUND, > BACKGROUND, SYSTEM, etc), and then as applications change > states, the SchedPolicy logic will migrate the application tasks > between different cgroups used to control the different > application states (for example, there is a background cpuset > cgroup which can limit background tasks to stay on one low-power > cpu, and the bg_non_interactive cpuctrl cgroup can then further > limit those background tasks to a small percentage of that one > cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > I feel the approach taken there overloads CAP_SYS_NICE a bit much > for non-android environments. Efforts to re-use CAP_SYS_RESOURCE > for this purpose (which Android has since adopted) was also > stymied by concerns about risks from future cgroups that could be > considered "dangerous" by how they might change system semantics. > > So to avoid overlapping usage, this patch adds a brand new > process capability flag (CAP_CGROUP_MIGRATE), and uses it when > checking if a task can migrate other tasks between cgroups. > > I've tested this with AOSP master (though its a bit hacked in as > I still need to properly get the selinux bits aware of the new > capability bit) with selinux set to permissive and it seems to be > working well. > > Thoughts and feedback would be appreciated! So, back to the discussion of silos. I understand the argument for wanting a new silo. But, in that case can we at least try not to make it a single-use silo? How about CAP_CGROUP_CONTROL or some such, with the idea that this might be a capability that allows the holder to step outside usual cgroup rules? At the moment, that capability would allow only one such step, but maybe there would be others in the future. Cheers, Michael > Cc: Tejun Heo <tj@kernel.org> > Cc: Li Zefan <lizefan@huawei.com> > Cc: Jonathan Corbet <corbet@lwn.net> > Cc: cgroups@vger.kernel.org > Cc: Android Kernel Team <kernel-team@android.com> > Cc: Rom Lemarchand <romlem@android.com> > Cc: Colin Cross <ccross@android.com> > Cc: Dmitry Shmidt <dimitrysh@google.com> > Cc: Todd Kjos <tkjos@google.com> > Cc: Christian Poetzsch <christian.potzsch@imgtec.com> > Cc: Amit Pundir <amit.pundir@linaro.org> > Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Serge E. Hallyn <serge@hallyn.com> > Cc: Andy Lutomirski <luto@amacapital.net> > Cc: linux-api@vger.kernel.org > Acked-by: Serge Hallyn <serge@hallyn.com> > Signed-off-by: John Stultz <john.stultz@linaro.org> > --- > v2: Renamed to just CAP_CGROUP_MIGRATE as reccomended by Tejun > v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael > v4: Send out properly folded down version of the patch. :P > v5: Switch back to CAP_CGROUP_MIGRATE due to concerns from Andy > --- > include/uapi/linux/capability.h | 5 ++++- > kernel/cgroup.c | 3 ++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index 49bc062..32d3829 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -349,8 +349,11 @@ struct vfs_cap_data { > > #define CAP_AUDIT_READ 37 > > +/* Allow migration of other tasks between cgroups */ > > -#define CAP_LAST_CAP CAP_AUDIT_READ > +#define CAP_CGROUP_MIGRATE 38 > + > +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE > > #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 2ee9ec3..784f115 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, > */ > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > - !uid_eq(cred->euid, tcred->suid)) > + !uid_eq(cred->euid, tcred->suid) && > + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)) > ret = -EACCES; > > if (!ret && cgroup_on_dfl(dst_cgrp)) { > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
On Tue, Dec 13, 2016 at 1:47 AM, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: > So, back to the discussion of silos. I understand the argument for > wanting a new silo. But, in that case can we at least try not to make > it a single-use silo? > > How about CAP_CGROUP_CONTROL or some such, with the idea that this > might be a capability that allows the holder to step outside usual > cgroup rules? At the moment, that capability would allow only one such > step, but maybe there would be others in the future. This sounds reasonable to me. Tejun/Andy: Objections? thanks -john
On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: > Hi John, > > On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: >> This patch adds CAP_GROUP_MIGRATE and logic to allows a process > s/CAP_GROUP_MIGRATE/CAP_CGROUP_MIGRATE/ > >> to migrate other tasks between cgroups. >> >> In Android (where this feature originated), the ActivityManager >> tracks various application states (TOP_APP, FOREGROUND, >> BACKGROUND, SYSTEM, etc), and then as applications change >> states, the SchedPolicy logic will migrate the application tasks >> between different cgroups used to control the different >> application states (for example, there is a background cpuset >> cgroup which can limit background tasks to stay on one low-power >> cpu, and the bg_non_interactive cpuctrl cgroup can then further >> limit those background tasks to a small percentage of that one >> cpu's cpu time). >> >> However, for security reasons, Android doesn't want to make the >> system_server (the process that runs the ActivityManager and >> SchedPolicy logic), run as root. So in the Android common.git >> kernel, they have some logic to allow cgroups to loosen their >> permissions so CAP_SYS_NICE tasks can migrate other tasks between >> cgroups. >> >> I feel the approach taken there overloads CAP_SYS_NICE a bit much >> for non-android environments. Efforts to re-use CAP_SYS_RESOURCE >> for this purpose (which Android has since adopted) was also >> stymied by concerns about risks from future cgroups that could be >> considered "dangerous" by how they might change system semantics. >> >> So to avoid overlapping usage, this patch adds a brand new >> process capability flag (CAP_CGROUP_MIGRATE), and uses it when >> checking if a task can migrate other tasks between cgroups. >> >> I've tested this with AOSP master (though its a bit hacked in as >> I still need to properly get the selinux bits aware of the new >> capability bit) with selinux set to permissive and it seems to be >> working well. >> >> Thoughts and feedback would be appreciated! > So, back to the discussion of silos. I understand the argument for > wanting a new silo. But, in that case can we at least try not to make > it a single-use silo? > > How about CAP_CGROUP_CONTROL or some such, with the idea that this > might be a capability that allows the holder to step outside usual > cgroup rules? At the moment, that capability would allow only one such > step, but maybe there would be others in the future. I agree, but want to put it more strongly. The granularity of capabilities can never be fine enough for some people, and this is an example of a case where you're going a bit too far. If the use case is Android as you say, you don't need this. As my friends on the far side of the aisle would say, "just write SELinux policy" to correctly control access as required. > > Cheers, > > Michael > > >> Cc: Tejun Heo <tj@kernel.org> >> Cc: Li Zefan <lizefan@huawei.com> >> Cc: Jonathan Corbet <corbet@lwn.net> >> Cc: cgroups@vger.kernel.org >> Cc: Android Kernel Team <kernel-team@android.com> >> Cc: Rom Lemarchand <romlem@android.com> >> Cc: Colin Cross <ccross@android.com> >> Cc: Dmitry Shmidt <dimitrysh@google.com> >> Cc: Todd Kjos <tkjos@google.com> >> Cc: Christian Poetzsch <christian.potzsch@imgtec.com> >> Cc: Amit Pundir <amit.pundir@linaro.org> >> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> >> Cc: Kees Cook <keescook@chromium.org> >> Cc: Serge E. Hallyn <serge@hallyn.com> >> Cc: Andy Lutomirski <luto@amacapital.net> >> Cc: linux-api@vger.kernel.org >> Acked-by: Serge Hallyn <serge@hallyn.com> >> Signed-off-by: John Stultz <john.stultz@linaro.org> >> --- >> v2: Renamed to just CAP_CGROUP_MIGRATE as reccomended by Tejun >> v3: Switched to just using CAP_SYS_RESOURCE as suggested by Michael >> v4: Send out properly folded down version of the patch. :P >> v5: Switch back to CAP_CGROUP_MIGRATE due to concerns from Andy >> --- >> include/uapi/linux/capability.h | 5 ++++- >> kernel/cgroup.c | 3 ++- >> 2 files changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h >> index 49bc062..32d3829 100644 >> --- a/include/uapi/linux/capability.h >> +++ b/include/uapi/linux/capability.h >> @@ -349,8 +349,11 @@ struct vfs_cap_data { >> >> #define CAP_AUDIT_READ 37 >> >> +/* Allow migration of other tasks between cgroups */ >> >> -#define CAP_LAST_CAP CAP_AUDIT_READ >> +#define CAP_CGROUP_MIGRATE 38 >> + >> +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE >> >> #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) >> >> diff --git a/kernel/cgroup.c b/kernel/cgroup.c >> index 2ee9ec3..784f115 100644 >> --- a/kernel/cgroup.c >> +++ b/kernel/cgroup.c >> @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, >> */ >> if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && >> !uid_eq(cred->euid, tcred->uid) && >> - !uid_eq(cred->euid, tcred->suid)) >> + !uid_eq(cred->euid, tcred->suid) && >> + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)) >> ret = -EACCES; >> >> if (!ret && cgroup_on_dfl(dst_cgrp)) { >> -- >> 2.7.4 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-api" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >
On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >> Hi John, >> >> On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: >>> This patch adds CAP_GROUP_MIGRATE and logic to allows a process >> s/CAP_GROUP_MIGRATE/CAP_CGROUP_MIGRATE/ >> >>> to migrate other tasks between cgroups. >>> >>> In Android (where this feature originated), the ActivityManager >>> tracks various application states (TOP_APP, FOREGROUND, >>> BACKGROUND, SYSTEM, etc), and then as applications change >>> states, the SchedPolicy logic will migrate the application tasks >>> between different cgroups used to control the different >>> application states (for example, there is a background cpuset >>> cgroup which can limit background tasks to stay on one low-power >>> cpu, and the bg_non_interactive cpuctrl cgroup can then further >>> limit those background tasks to a small percentage of that one >>> cpu's cpu time). >>> >>> However, for security reasons, Android doesn't want to make the >>> system_server (the process that runs the ActivityManager and >>> SchedPolicy logic), run as root. So in the Android common.git >>> kernel, they have some logic to allow cgroups to loosen their >>> permissions so CAP_SYS_NICE tasks can migrate other tasks between >>> cgroups. >>> >>> I feel the approach taken there overloads CAP_SYS_NICE a bit much >>> for non-android environments. Efforts to re-use CAP_SYS_RESOURCE >>> for this purpose (which Android has since adopted) was also >>> stymied by concerns about risks from future cgroups that could be >>> considered "dangerous" by how they might change system semantics. >>> >>> So to avoid overlapping usage, this patch adds a brand new >>> process capability flag (CAP_CGROUP_MIGRATE), and uses it when >>> checking if a task can migrate other tasks between cgroups. >>> >>> I've tested this with AOSP master (though its a bit hacked in as >>> I still need to properly get the selinux bits aware of the new >>> capability bit) with selinux set to permissive and it seems to be >>> working well. >>> >>> Thoughts and feedback would be appreciated! >> So, back to the discussion of silos. I understand the argument for >> wanting a new silo. But, in that case can we at least try not to make >> it a single-use silo? >> >> How about CAP_CGROUP_CONTROL or some such, with the idea that this >> might be a capability that allows the holder to step outside usual >> cgroup rules? At the moment, that capability would allow only one such >> step, but maybe there would be others in the future. > > I agree, but want to put it more strongly. The granularity of > capabilities can never be fine enough for some people, and this > is an example of a case where you're going a bit too far. If the > use case is Android as you say, you don't need this. As my friends > on the far side of the aisle would say, "just write SELinux policy" > to correctly control access as required. So.. The trouble is that while selinux is good for restricting permissions, the in-kernel permission checks here are already too restrictive. It seems one must first loosen things up before we can tighten it with selinux rules. Or are you suggesting the system_server run as root + further selinux limitations? I worry, the Android developers may still be hesitant to do that. thanks -john
On 12/13/2016 8:49 AM, John Stultz wrote: > On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >>> Hi John, >>> >>> On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: >>>> This patch adds CAP_GROUP_MIGRATE and logic to allows a process >>> s/CAP_GROUP_MIGRATE/CAP_CGROUP_MIGRATE/ >>> >>>> to migrate other tasks between cgroups. >>>> >>>> In Android (where this feature originated), the ActivityManager >>>> tracks various application states (TOP_APP, FOREGROUND, >>>> BACKGROUND, SYSTEM, etc), and then as applications change >>>> states, the SchedPolicy logic will migrate the application tasks >>>> between different cgroups used to control the different >>>> application states (for example, there is a background cpuset >>>> cgroup which can limit background tasks to stay on one low-power >>>> cpu, and the bg_non_interactive cpuctrl cgroup can then further >>>> limit those background tasks to a small percentage of that one >>>> cpu's cpu time). >>>> >>>> However, for security reasons, Android doesn't want to make the >>>> system_server (the process that runs the ActivityManager and >>>> SchedPolicy logic), run as root. So in the Android common.git >>>> kernel, they have some logic to allow cgroups to loosen their >>>> permissions so CAP_SYS_NICE tasks can migrate other tasks between >>>> cgroups. >>>> >>>> I feel the approach taken there overloads CAP_SYS_NICE a bit much >>>> for non-android environments. Efforts to re-use CAP_SYS_RESOURCE >>>> for this purpose (which Android has since adopted) was also >>>> stymied by concerns about risks from future cgroups that could be >>>> considered "dangerous" by how they might change system semantics. >>>> >>>> So to avoid overlapping usage, this patch adds a brand new >>>> process capability flag (CAP_CGROUP_MIGRATE), and uses it when >>>> checking if a task can migrate other tasks between cgroups. >>>> >>>> I've tested this with AOSP master (though its a bit hacked in as >>>> I still need to properly get the selinux bits aware of the new >>>> capability bit) with selinux set to permissive and it seems to be >>>> working well. >>>> >>>> Thoughts and feedback would be appreciated! >>> So, back to the discussion of silos. I understand the argument for >>> wanting a new silo. But, in that case can we at least try not to make >>> it a single-use silo? >>> >>> How about CAP_CGROUP_CONTROL or some such, with the idea that this >>> might be a capability that allows the holder to step outside usual >>> cgroup rules? At the moment, that capability would allow only one such >>> step, but maybe there would be others in the future. >> I agree, but want to put it more strongly. The granularity of >> capabilities can never be fine enough for some people, and this >> is an example of a case where you're going a bit too far. If the >> use case is Android as you say, you don't need this. As my friends >> on the far side of the aisle would say, "just write SELinux policy" >> to correctly control access as required. > So.. The trouble is that while selinux is good for restricting > permissions, the in-kernel permission checks here are already too > restrictive. Why did the original authors of cgroups make it that restrictive? If there isn't a good reason, loosen it up. If there is a good reason, then pay heed to it. > It seems one must first loosen things up before we can > tighten it with selinux rules. You're looking at splitting the granularity hair. Is your userspace code really so delicate that it can't handle the existing, "coarse" privilege and needs to protect at the "fine" granularity you're proposing? > Or are you suggesting the system_server > run as root + further selinux limitations? I worry, the Android > developers may still be hesitant to do that. Unlike many of my peers, I am not afraid of running good solid services with privilege. A proper implementation of system_server ought to be able to run completely unconstrained without causing anyone the least concern. I understand all the arguments against that, and am disinclined to get into the religious debates that ensue. So no, I am not going to suggest running system server as root, but I am going to suggest giving it the capability currently required and clamping it down with SELinux policy. > > thanks > -john >
On Tue, Dec 13, 2016 at 9:17 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 12/13/2016 8:49 AM, John Stultz wrote: >> On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >>>> How about CAP_CGROUP_CONTROL or some such, with the idea that this >>>> might be a capability that allows the holder to step outside usual >>>> cgroup rules? At the moment, that capability would allow only one such >>>> step, but maybe there would be others in the future. >>> I agree, but want to put it more strongly. The granularity of >>> capabilities can never be fine enough for some people, and this >>> is an example of a case where you're going a bit too far. If the >>> use case is Android as you say, you don't need this. As my friends >>> on the far side of the aisle would say, "just write SELinux policy" >>> to correctly control access as required. >> So.. The trouble is that while selinux is good for restricting >> permissions, the in-kernel permission checks here are already too >> restrictive. > > Why did the original authors of cgroups make it that restrictive? > If there isn't a good reason, loosen it up. If there is a good > reason, then pay heed to it. That's what this patch is proposing. And I agree with Michael that the newly proposed cap was a bit to narrowly focused on my immediate use case, and broadening it to CGROUP_CONTROL is smart. Then that capability could be further restricted w/ selinux policy, as you suggest. thanks -john
On 12/13/2016 9:24 AM, John Stultz wrote: > On Tue, Dec 13, 2016 at 9:17 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 12/13/2016 8:49 AM, John Stultz wrote: >>> On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >>>>> How about CAP_CGROUP_CONTROL or some such, with the idea that this >>>>> might be a capability that allows the holder to step outside usual >>>>> cgroup rules? At the moment, that capability would allow only one such >>>>> step, but maybe there would be others in the future. >>>> I agree, but want to put it more strongly. The granularity of >>>> capabilities can never be fine enough for some people, and this >>>> is an example of a case where you're going a bit too far. If the >>>> use case is Android as you say, you don't need this. As my friends >>>> on the far side of the aisle would say, "just write SELinux policy" >>>> to correctly control access as required. >>> So.. The trouble is that while selinux is good for restricting >>> permissions, the in-kernel permission checks here are already too >>> restrictive. >> Why did the original authors of cgroups make it that restrictive? >> If there isn't a good reason, loosen it up. If there is a good >> reason, then pay heed to it. > That's what this patch is proposing. And I agree with Michael that the > newly proposed cap was a bit to narrowly focused on my immediate use > case, and broadening it to CGROUP_CONTROL is smart. Then that > capability could be further restricted w/ selinux policy, as you > suggest. Adding a new capability is unnecessary. The current use of CAP_SYS_NICE, while arguably obscure, provides as much "security" as a new capability does. While cgroups are a wonderful thing, they don't need a separate capability. > > thanks > -john >
On Tue, Dec 13, 2016 at 9:48 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 12/13/2016 9:24 AM, John Stultz wrote: >> On Tue, Dec 13, 2016 at 9:17 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>> On 12/13/2016 8:49 AM, John Stultz wrote: >>>> On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >>>>>> How about CAP_CGROUP_CONTROL or some such, with the idea that this >>>>>> might be a capability that allows the holder to step outside usual >>>>>> cgroup rules? At the moment, that capability would allow only one such >>>>>> step, but maybe there would be others in the future. >>>>> I agree, but want to put it more strongly. The granularity of >>>>> capabilities can never be fine enough for some people, and this >>>>> is an example of a case where you're going a bit too far. If the >>>>> use case is Android as you say, you don't need this. As my friends >>>>> on the far side of the aisle would say, "just write SELinux policy" >>>>> to correctly control access as required. >>>> So.. The trouble is that while selinux is good for restricting >>>> permissions, the in-kernel permission checks here are already too >>>> restrictive. >>> Why did the original authors of cgroups make it that restrictive? >>> If there isn't a good reason, loosen it up. If there is a good >>> reason, then pay heed to it. >> That's what this patch is proposing. And I agree with Michael that the >> newly proposed cap was a bit to narrowly focused on my immediate use >> case, and broadening it to CGROUP_CONTROL is smart. Then that >> capability could be further restricted w/ selinux policy, as you >> suggest. > > Adding a new capability is unnecessary. The current use of CAP_SYS_NICE, > while arguably obscure, provides as much "security" as a new capability > does. While cgroups are a wonderful thing, they don't need a separate > capability. The trouble is that CAP_SYS_NICE or _RESOURCE (which was tried in an earlier version of this patch) aren't necessarily appropriate for non-android systems. See Andy's objection here: https://lkml.org/lkml/2016/11/8/946 thanks -john
On 12/13/2016 10:13 AM, John Stultz wrote: > On Tue, Dec 13, 2016 at 9:48 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >> On 12/13/2016 9:24 AM, John Stultz wrote: >>> On Tue, Dec 13, 2016 at 9:17 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>> On 12/13/2016 8:49 AM, John Stultz wrote: >>>>> On Tue, Dec 13, 2016 at 8:39 AM, Casey Schaufler <casey@schaufler-ca.com> wrote: >>>>>> On 12/13/2016 1:47 AM, Michael Kerrisk (man-pages) wrote: >>>>>>> How about CAP_CGROUP_CONTROL or some such, with the idea that this >>>>>>> might be a capability that allows the holder to step outside usual >>>>>>> cgroup rules? At the moment, that capability would allow only one such >>>>>>> step, but maybe there would be others in the future. >>>>>> I agree, but want to put it more strongly. The granularity of >>>>>> capabilities can never be fine enough for some people, and this >>>>>> is an example of a case where you're going a bit too far. If the >>>>>> use case is Android as you say, you don't need this. As my friends >>>>>> on the far side of the aisle would say, "just write SELinux policy" >>>>>> to correctly control access as required. >>>>> So.. The trouble is that while selinux is good for restricting >>>>> permissions, the in-kernel permission checks here are already too >>>>> restrictive. >>>> Why did the original authors of cgroups make it that restrictive? >>>> If there isn't a good reason, loosen it up. If there is a good >>>> reason, then pay heed to it. >>> That's what this patch is proposing. And I agree with Michael that the >>> newly proposed cap was a bit to narrowly focused on my immediate use >>> case, and broadening it to CGROUP_CONTROL is smart. Then that >>> capability could be further restricted w/ selinux policy, as you >>> suggest. >> Adding a new capability is unnecessary. The current use of CAP_SYS_NICE, >> while arguably obscure, provides as much "security" as a new capability >> does. While cgroups are a wonderful thing, they don't need a separate >> capability. > The trouble is that CAP_SYS_NICE or _RESOURCE (which was tried in an > earlier version of this patch) aren't necessarily appropriate for > non-android systems. See Andy's objection here: > https://lkml.org/lkml/2016/11/8/946 Then we need to see what those as-yet-unimplemented systems require and how to address them. I don't think that taking the "someone might want it" approach is really appropriate. > > thanks > -john >
Hello, On Tue, Dec 13, 2016 at 08:08:16AM -0800, John Stultz wrote: > On Tue, Dec 13, 2016 at 1:47 AM, Michael Kerrisk (man-pages) > <mtk.manpages@gmail.com> wrote: > > On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: > > So, back to the discussion of silos. I understand the argument for > > wanting a new silo. But, in that case can we at least try not to make > > it a single-use silo? > > > > How about CAP_CGROUP_CONTROL or some such, with the idea that this > > might be a capability that allows the holder to step outside usual > > cgroup rules? At the moment, that capability would allow only one such > > step, but maybe there would be others in the future. > > This sounds reasonable to me. Tejun/Andy: Objections? Control group control? The word control has a specific meaning for cgroups and that second control doesn't make much sense to me. Given how this is mostly to patch up a hole in v1's delegation model and how migration operations are different from others, I doubt that we will end up overloading it. Maybe just CAP_CGROUP? Thanks. -- tejun
Hello, Casey. On Tue, Dec 13, 2016 at 10:32:14AM -0800, Casey Schaufler wrote: > > The trouble is that CAP_SYS_NICE or _RESOURCE (which was tried in an > > earlier version of this patch) aren't necessarily appropriate for > > non-android systems. See Andy's objection here: > > https://lkml.org/lkml/2016/11/8/946 > > Then we need to see what those as-yet-unimplemented systems > require and how to address them. I don't think that taking > the "someone might want it" approach is really appropriate. I understands that there can be reservations regarding adding a new CAP but this isn't about someone possibly wanting it in the future. It's more about overloading existing CAPs leading to permitting unintended operations. e.g. ppl who've been delegating CAP_SYS_RESOURCES would automatically end up delegating cgroup organization without intending so. Using an existing cap would have been nice but it just doesn't look like we have a good one to overload. Thanks. -- tejun
On Tue, Dec 13, 2016 at 10:40 AM, Tejun Heo <tj@kernel.org> wrote: > Hello, > > On Tue, Dec 13, 2016 at 08:08:16AM -0800, John Stultz wrote: >> On Tue, Dec 13, 2016 at 1:47 AM, Michael Kerrisk (man-pages) >> <mtk.manpages@gmail.com> wrote: >> > On 13 December 2016 at 02:39, John Stultz <john.stultz@linaro.org> wrote: >> > So, back to the discussion of silos. I understand the argument for >> > wanting a new silo. But, in that case can we at least try not to make >> > it a single-use silo? >> > >> > How about CAP_CGROUP_CONTROL or some such, with the idea that this >> > might be a capability that allows the holder to step outside usual >> > cgroup rules? At the moment, that capability would allow only one such >> > step, but maybe there would be others in the future. >> >> This sounds reasonable to me. Tejun/Andy: Objections? > > Control group control? The word control has a specific meaning for > cgroups and that second control doesn't make much sense to me. But this would go against the long tradition of RAS syndrome and things like "struct task_struct". :) > Given > how this is mostly to patch up a hole in v1's delegation model and how > migration operations are different from others, I doubt that we will > end up overloading it. Maybe just CAP_CGROUP? Sounds ok to me. thanks -john
On Tue, Dec 13, 2016 at 10:47:19AM -0800, John Stultz wrote: > > Control group control? The word control has a specific meaning for > > cgroups and that second control doesn't make much sense to me. > > But this would go against the long tradition of RAS syndrome and > things like "struct task_struct". :) Well, now that you put it that way, it's starting to look good. :) But, let's just go for CAP_CGROUP if everyone is okay with it. Thanks. -- tejun
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 49bc062..32d3829 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -349,8 +349,11 @@ struct vfs_cap_data { #define CAP_AUDIT_READ 37 +/* Allow migration of other tasks between cgroups */ -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_CGROUP_MIGRATE 38 + +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2ee9ec3..784f115 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, */ if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && !uid_eq(cred->euid, tcred->uid) && - !uid_eq(cred->euid, tcred->suid)) + !uid_eq(cred->euid, tcred->suid) && + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)) ret = -EACCES; if (!ret && cgroup_on_dfl(dst_cgrp)) {