diff mbox

[2/2] proc: Add /proc/<pid>/timerslack_ns interface

Message ID 1455671191-32105-3-git-send-email-john.stultz@linaro.org
State New
Headers show

Commit Message

John Stultz Feb. 17, 2016, 1:06 a.m. UTC
This patch provides a proc/PID/timerslack_ns interface which
exposes a task's timerslack value in nanoseconds and allows it
to be changed.

This allows power/performance management software to set timer
slack for other threads according to its policy for the thread
(such as when the thread is designated foreground vs. background
activity)

If the value written is non-zero, slack is set to that value.
Otherwise sets it to the default for the thread.

This interface checks that the calling task has permissions to
to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we
can ensure arbitrary apps do not change the timer slack for other
apps.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>

---
 Documentation/filesystems/proc.txt | 18 ++++++++++
 fs/proc/base.c                     | 69 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

-- 
1.9.1

Comments

John Stultz Feb. 17, 2016, 8:49 p.m. UTC | #1
On Wed, Feb 17, 2016 at 11:35 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Tue, 16 Feb 2016 17:06:31 -0800 John Stultz <john.stultz@linaro.org> wrote:

>

>> This patch provides a proc/PID/timerslack_ns interface which

>> exposes a task's timerslack value in nanoseconds and allows it

>> to be changed.

>>

>> This allows power/performance management software to set timer

>> slack for other threads according to its policy for the thread

>> (such as when the thread is designated foreground vs. background

>> activity)

>>

>> If the value written is non-zero, slack is set to that value.

>> Otherwise sets it to the default for the thread.

>>

>> This interface checks that the calling task has permissions to

>> to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we

>> can ensure arbitrary apps do not change the timer slack for other

>> apps.

>

> hm.  What the heck is PTRACE_MODE_ATTACH_FSCREDS and why was it chosen?


Somewhat out of necessity. I found due to recent changes (sha1:
caaee6234d0)  to __ptrace_may_access() one must use FSCREDS or
REALCREDS. So PTRACE_MODE_ATTACH on its own won't work. ("What the
heck" was what I thought too when I noticed my test wasn't working :)

Since we're accessing this via a filesystem interface, I picked
FSCREDS.  But if I chose wrong here, please let me know.

>> +     err = kstrtoull(strstrip(buffer), 10, &slack_ns);

>> +     if (err < 0)

>> +             return err;

>

> Use kstrtoull_from_user()?


Ok. I'll rework it to use this.

thanks!
-john
John Stultz Feb. 17, 2016, 8:51 p.m. UTC | #2
On Wed, Feb 17, 2016 at 12:18 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 17 Feb 2016 12:09:08 -0800 Kees Cook <keescook@chromium.org> wrote:

>

>> On Wed, Feb 17, 2016 at 11:35 AM, Andrew Morton

>> <akpm@linux-foundation.org> wrote:

>> > On Tue, 16 Feb 2016 17:06:31 -0800 John Stultz <john.stultz@linaro.org> wrote:

>> >

>> >> This patch provides a proc/PID/timerslack_ns interface which

>> >> exposes a task's timerslack value in nanoseconds and allows it

>> >> to be changed.

>> >>

>> >> This allows power/performance management software to set timer

>> >> slack for other threads according to its policy for the thread

>> >> (such as when the thread is designated foreground vs. background

>> >> activity)

>> >>

>> >> If the value written is non-zero, slack is set to that value.

>> >> Otherwise sets it to the default for the thread.

>> >>

>> >> This interface checks that the calling task has permissions to

>> >> to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we

>> >> can ensure arbitrary apps do not change the timer slack for other

>> >> apps.

>> >

>> > hm.  What the heck is PTRACE_MODE_ATTACH_FSCREDS and why was it chosen?

>>

>> This says the writer needs to have ptrace "attach" level of access,

>> and that it should be checked with fscreds, as is the standard for

>> most /proc things like that.

>

> The only place where PTRACE_MODE_ATTACH_FSCREDS is used in all of Linux

> is /prc/pid/stack.  Makes me curious!


Other uses may be using a combination PTRACE_MODE_ATTACH|PTRACE_MODE_FSCREDS.

>> > The procfs file's permissions are 0644, yes?  So a process's

>> > timer_slack is world-readable?  hm.

>>

>> This should be 600, IMO.

>

> Sounds safer.


Ok. Reworking the patch to use that as well.


Andrew: I saw you added these to -mm already. Would you prefer a fixup
patch ontop, or should I just send out a folded down v3 of the
patchset?

thanks
-john
John Stultz Feb. 17, 2016, 10:29 p.m. UTC | #3
On Wed, Feb 17, 2016 at 12:18 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Wed, 17 Feb 2016 12:09:08 -0800 Kees Cook <keescook@chromium.org> wrote:

>> On Wed, Feb 17, 2016 at 11:35 AM, Andrew Morton

>> > The procfs file's permissions are 0644, yes?  So a process's

>> > timer_slack is world-readable?  hm.

>>

>> This should be 600, IMO.

>

> Sounds safer.


So I've gone ahead and addressed this and the other feedback you had.
But this bit made me realize that I may have missed a key aspect to
the interface that Android needs.

In particular, the whole point here is to allow a controlling task to
modify other tasks' timerslack to limit background tasks' power usage
(and to modify them back to normal when the background tasks become
foreground tasks). Note that on android different tasks run as
different users.

Currently, the controlling process has minimally elevated privileges
(CAP_SYS_NICE). The initial review suggested those privileges should
be higher (PTRACE_MODE_ATTACH), which I've implemented.  However, I'm
realizing that by moving to the proc interface, the filesystem
permissions here put yet another barrier in the way.

While the 600 permissions makes initial sense, it does limit these
controlling tasks with extra privileges (though not root) from
modifying the timerslack, since they cannot open the file to begin
with.

So.... Does world writable (plus the PTRACE_MODE_ATTACH_FSCREDS check)
make more sense here?  Or is there a better way for a system to tweak
the default permissions for procfs entries? (And if so, does that
render the PTRACE_MODE_ATTACH... check unnecessary?).

Apologies. I'm fighting a head-cold, so I'm not feeling particularly sharp here.

thanks
-john
John Stultz Feb. 17, 2016, 10:51 p.m. UTC | #4
On Wed, Feb 17, 2016 at 2:45 PM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Feb 17, 2016 at 2:29 PM, John Stultz <john.stultz@linaro.org> wrote:

>> On Wed, Feb 17, 2016 at 12:18 PM, Andrew Morton

>> <akpm@linux-foundation.org> wrote:

>>> On Wed, 17 Feb 2016 12:09:08 -0800 Kees Cook <keescook@chromium.org> wrote:

>>>> On Wed, Feb 17, 2016 at 11:35 AM, Andrew Morton

>>>> > The procfs file's permissions are 0644, yes?  So a process's

>>>> > timer_slack is world-readable?  hm.

>>>>

>>>> This should be 600, IMO.

>>>

>>> Sounds safer.

>>

>> So I've gone ahead and addressed this and the other feedback you had.

>> But this bit made me realize that I may have missed a key aspect to

>> the interface that Android needs.

>>

>> In particular, the whole point here is to allow a controlling task to

>> modify other tasks' timerslack to limit background tasks' power usage

>> (and to modify them back to normal when the background tasks become

>> foreground tasks). Note that on android different tasks run as

>> different users.

>>

>> Currently, the controlling process has minimally elevated privileges

>> (CAP_SYS_NICE). The initial review suggested those privileges should

>> be higher (PTRACE_MODE_ATTACH), which I've implemented.  However, I'm

>> realizing that by moving to the proc interface, the filesystem

>> permissions here put yet another barrier in the way.

>>

>> While the 600 permissions makes initial sense, it does limit these

>> controlling tasks with extra privileges (though not root) from

>> modifying the timerslack, since they cannot open the file to begin

>> with.

>>

>> So.... Does world writable (plus the PTRACE_MODE_ATTACH_FSCREDS check)

>> make more sense here?  Or is there a better way for a system to tweak

>> the default permissions for procfs entries? (And if so, does that

>> render the PTRACE_MODE_ATTACH... check unnecessary?).

>>

>> Apologies. I'm fighting a head-cold, so I'm not feeling particularly sharp here.

>

> Is timerslack sensitive at all? You could add the ptrace test to the

> _show function too, maybe. Then 0666 would solve the open issue

> without leaking the timerslack.


I don't see how timerslack would be sensitive, but probably many
mistakes start out that way, so not being cavalier about it seems
wise.  :)

Ok. Sounds like you and Andrew are on the same page wrt 666 +
PTRACE_MODE_ATTACH, and that seems like it would be workable.

I'll get that implemented here shortly.

Thanks so much again for the feedback! Really appreciate it!
-john
John Stultz July 13, 2016, 11:47 p.m. UTC | #5
On Tue, Feb 16, 2016 at 5:06 PM, John Stultz <john.stultz@linaro.org> wrote:
> This patch provides a proc/PID/timerslack_ns interface which

> exposes a task's timerslack value in nanoseconds and allows it

> to be changed.

>

> This allows power/performance management software to set timer

> slack for other threads according to its policy for the thread

> (such as when the thread is designated foreground vs. background

> activity)

>

> If the value written is non-zero, slack is set to that value.

> Otherwise sets it to the default for the thread.

>

> This interface checks that the calling task has permissions to

> to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we

> can ensure arbitrary apps do not change the timer slack for other

> apps.


Sigh.

So I wanted to pull this thread up again, because when I originally
proposed upstreaming the PR_SET_TIMERSLACK_PID feature from the AOSP
common.git tree, the first objection from Arjan was that it only
required CAP_SYS_NICE:
   http://lkml.iu.edu/hypermail/linux/kernel/1506.3/01491.html

And reasonably, setting timerslack to very large values does have the
potential to effect applications much further then what a task could
do previously with CAP_SYS_NICE.

CAP_SYS_PTRACE was suggested instead, as that allows applications to
manipulate other tasks more drastically.

(At the time, I checked with some of the Android developers, and got
no objection to changing to use this capability.)

However, after submitting the changes to Android required to support
the upstreamed /proc/<tid>/timerslack_ns interface, I've gotten some
objections with adding CAP_SYS_PTRACE to the system_server, as this
would allow the system_server to be able to inspect and modify memory
on any task in the system. This gives the system_server privileged to
effect applications much further then what it could do previously.

So I worry I'm a bit stuck here. For general systems, CAP_SYS_NICE is
too low a level of privilege  to set a tasks timerslack, but
apparently CAP_SYS_PTRACE is too high a privilege for Android's
system_server to require just to set a tasks timerslack value.

So I wanted to ask again if we might consider backing this down to
CAP_SYS_NICE, or if we can instead introduce a new CAP_SYS_TIMERSLACK
or something to provide the needed in-between capability level.

Thoughts?

thanks
-john
John Stultz July 14, 2016, 4:01 p.m. UTC | #6
On Thu, Jul 14, 2016 at 6:42 AM, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On 7/14/2016 5:48 AM, Serge E. Hallyn wrote:

>

>> Can someone give a detailed explanation of what you could do with

>> the new timerslack feature and compare it to what you can do with

>> sys_nice?

>>

>

> what you can do with the timerslack feature is add upto 4 seconds of extra

> time/delay on top of each select()/poll()/nanosleep()/... (basically

> anything that

> uses hrtimers on behalf of the user), and then also control within that

> 4 second window exactly when that extra delay ends

> (which may help a timing attack kind of scenario)


So the interface actually allows for 64bits of nanoseconds, so more or
less infinite delay if nothing else is happening.

thanks
-john
John Stultz July 14, 2016, 4:09 p.m. UTC | #7
On Thu, Jul 14, 2016 at 5:48 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> Quoting Kees Cook (keescook@chromium.org):

>> I think the original CAP_SYS_NICE should be fine. A malicious

>> CAP_SYS_NICE process can do plenty of insane things, I don't feel like

>> the timer slack adds to any realistic risks.

>

> Can someone give a detailed explanation of what you could do with

> the new timerslack feature and compare it to what you can do with

> sys_nice?


Looking at the man page for CAP_SYS_NICE, it looks like such a task
can set a task as SCHED_FIFO, so they could fork some spinning
processes and set them all SCHED_FIFO 99, in effect delaying all other
tasks for an infinite amount of time.

So one might argue setting large timerslack vlaues isn't that
different risk wise?

thanks
-john
diff mbox

Patch

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 843b045b..7f5607a 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -43,6 +43,7 @@  Table of Contents
   3.7   /proc/<pid>/task/<tid>/children - Information about task children
   3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
   3.9   /proc/<pid>/map_files - Information about memory mapped files
+  3.10  /proc/<pid>/timerslack_ns - Task timerslack value
 
   4	Configuring procfs
   4.1	Mount options
@@ -1862,6 +1863,23 @@  time one can open(2) mappings from the listings of two processes and
 comparing their inode numbers to figure out which anonymous memory areas
 are actually shared.
 
+3.10	/proc/<pid>/timerslack_ns - Task timerslack value
+---------------------------------------------------------
+This file provides the value of the task's timerslack value in nanoseconds.
+This value specifies a amount of time that normal timers may be deferred
+in order to coalesce timers and avoid unnecessary wakeups.
+
+This allows a task's interactivity vs power consumption trade off to be
+adjusted.
+
+Writing 0 to the file will set the tasks timerslack to the default value.
+
+Valid values are from 0 - ULLONG_MAX
+
+An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
+permissions on the task specified to change its timerslack_ns value.
+
+
 ------------------------------------------------------------------------------
 Configuring procfs
 ------------------------------------------------------------------------------
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4f764c2..d7c51ca 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2257,6 +2257,74 @@  static const struct file_operations proc_timers_operations = {
 	.release	= seq_release_private,
 };
 
+static ssize_t timerslack_ns_write(struct file *file, const char __user *buf,
+					size_t count, loff_t *offset)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *p;
+	char buffer[PROC_NUMBUF];
+	u64 slack_ns;
+	int err;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+
+	err = kstrtoull(strstrip(buffer), 10, &slack_ns);
+	if (err < 0)
+		return err;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	if (ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)) {
+		if (slack_ns == 0)
+			p->timer_slack_ns = p->default_timer_slack_ns;
+		else
+			p->timer_slack_ns = slack_ns;
+	} else
+		count = -EINVAL;
+
+	put_task_struct(p);
+
+	return count;
+}
+
+static int timerslack_ns_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *p;
+
+	p = get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+
+	task_lock(p);
+	seq_printf(m, "%llu\n", p->timer_slack_ns);
+	task_unlock(p);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static int timerslack_ns_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, timerslack_ns_show, inode);
+}
+
+static const struct file_operations proc_pid_set_timerslack_ns_operations = {
+	.open		= timerslack_ns_open,
+	.read		= seq_read,
+	.write		= timerslack_ns_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static int proc_pident_instantiate(struct inode *dir,
 	struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
@@ -2831,6 +2899,7 @@  static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_CHECKPOINT_RESTORE
 	REG("timers",	  S_IRUGO, proc_timers_operations),
 #endif
+	REG("timerslack_ns", S_IRUGO|S_IWUSR, proc_pid_set_timerslack_ns_operations),
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)