Message ID | 20230109153348.5625-1-gregory.price@memverge.com |
---|---|
Headers | show |
Series | Checkpoint Support for Syscall User Dispatch | expand |
On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > char state; > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > + unlikely(current->ptrace & PT_SUSPEND_SYSCALL_USER_DISPATCH)) > + return false; > + > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > return false; > So by making syscall_user_dispatch() return false, we'll make syscall_trace_enter() continue to handle things, and supposedly you want to land in ptrace_report_syscall_entry(), right? > diff --git a/kernel/ptrace.c b/kernel/ptrace.c > index 54482193e1ed..a6ad815bd4be 100644 > --- a/kernel/ptrace.c > +++ b/kernel/ptrace.c > @@ -370,6 +370,11 @@ static int check_ptrace_options(unsigned long data) > if (data & ~(unsigned long)PTRACE_O_MASK) > return -EINVAL; > > + if (unlikely(data & PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH)) { > + if (!IS_ENABLED(CONFIG_CHECKPOINT_RESTART)) > + return -EINVAL; > + } Should setting this then not also depend on having SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' things.
On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote: > ---------- Forwarded message --------- > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed, Jan 18, 2023 at 12:16 PM > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall > User Dispatch Suspension > To: Gregory Price <gourry.memverge@gmail.com> > > > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > > char state; > > > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > > + unlikely(current->ptrace & > PT_SUSPEND_SYSCALL_USER_DISPATCH)) > > + return false; > > + > > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > > return false; > > > > So by making syscall_user_dispatch() return false, we'll make > syscall_trace_enter() continue to handle things, and supposedly you want > to land in ptrace_report_syscall_entry(), right? > > ... snip ... > > Should setting this then not also depend on having > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' > things. Hm, this is an interesting question. My thoughts are that I want the process to handle the syscall as-if syscall user dispatch was not present at all, regardless of SYSCALL_TRACE. This is because some software, like CRIU, actually injects syscalls to run in the context of the software in an effort to collect resources. So I actually *want* those 'funny' things to occur, because they're most likely intentional. I don't necessarily want to intercept system calls that subsequently occur (although i might). So if this feature required SYSCALL_TRACE, you would no longer be able to inject system calls ala CRIU. That's also my understanding of the SECCOMP_SUSPEND feature as well, it's intended specifically to allow *otherwise disallowed* syscalls to be injected into the process and SECCOMP bypassed. (in this case, SECCOMP_SUSPEND requires root for exactly this reason).
On Wed, Jan 18, 2023 at 02:49:31PM -0500, Gregory Price wrote: > On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote: > > ---------- Forwarded message --------- > > From: Peter Zijlstra <peterz@infradead.org> > > Date: Wed, Jan 18, 2023 at 12:16 PM > > Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall > > User Dispatch Suspension > > To: Gregory Price <gourry.memverge@gmail.com> > > > > > > On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote: > > > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs) > > > struct syscall_user_dispatch *sd = ¤t->syscall_dispatch; > > > char state; > > > > > > + if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) && > > > + unlikely(current->ptrace & > > PT_SUSPEND_SYSCALL_USER_DISPATCH)) > > > + return false; > > > + > > > if (likely(instruction_pointer(regs) - sd->offset < sd->len)) > > > return false; > > > > > > > So by making syscall_user_dispatch() return false, we'll make > > syscall_trace_enter() continue to handle things, and supposedly you want > > to land in ptrace_report_syscall_entry(), right? > > > > ... snip ... > > > > Should setting this then not also depend on having > > SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny' > > things. > > Hm, this is an interesting question. My thoughts are that I want the > process to handle the syscall as-if syscall user dispatch was not > present at all, regardless of SYSCALL_TRACE. > > This is because some software, like CRIU, actually injects syscalls to > run in the context of the software in an effort to collect resources. Oh, right. I used to know that. > So I actually *want* those 'funny' things to occur, because they're most > likely intentional. I don't necessarily want to intercept system calls > that subsequently occur (although i might). > > So if this feature required SYSCALL_TRACE, you would no longer be able > to inject system calls ala CRIU. Yeah, I suppose you're right. It makes it a very sharp instrument, but I suppose you get what you asked for.
Syscall user dispatch makes it possible to cleanly intercept system calls from user-land. However, most transparent checkpoint software presently leverages some combination of ptrace and system call injection to place software in a ready-to-checkpoint state. If Syscall User Dispatch is enabled at the time of being quiesced, injected system calls will subsequently be interposed upon and dispatched to the task's signal handler. This patch set implements 3 features to enable software such as CRIU to cleanly interpose upon software leveraging syscall user dispatch. - Implement PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH, akin to a similar feature for SECCOMP. This allows a ptracer to temporarily disable syscall user dispatch, making syscall injection possible. - Implement an fs/proc extension that reports whether Syscall User Dispatch is being used in proc/status. A similar value is present for SECCOMP, and is used to determine whether special logic is needed during checkpoint/resume. - Implement a getter interface for Syscall User Dispatch config info. To resume successfully, the checkpoint/resume software has to save and restore this information. Presently this configuration is write-only, with no way for C/R software to save it. Signed-off-by: Gregory Price <gregory.price@memverge.com> Gregory Price (3): ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension fs/proc/array: Add Syscall User Dispatch to proc status prctl,syscall_user_dispatch: add a getter for configuration info .../admin-guide/syscall-user-dispatch.rst | 18 +++++++ fs/proc/array.c | 8 +++ include/linux/ptrace.h | 2 + include/linux/syscall_user_dispatch.h | 7 +++ include/uapi/linux/prctl.h | 3 ++ include/uapi/linux/ptrace.h | 6 ++- kernel/entry/syscall_user_dispatch.c | 19 +++++++ kernel/ptrace.c | 5 ++ kernel/sys.c | 4 ++ .../syscall_user_dispatch/sud_test.c | 54 +++++++++++++++++++ 10 files changed, 125 insertions(+), 1 deletion(-)