Message ID | b69df1e42d1235996682178013f61d4120b3b361.1622351443.git.luto@kernel.org |
---|---|
State | New |
Headers | show |
Series | [RFC,v2,1/2] x86/fpu: Fix state corruption in __fpu__restore_sig() | expand |
On 5/30/21 3:02 PM, Thomas Gleixner wrote: > Andy, > > On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote: >> >> Cc: stable@vger.kernel.org >> Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates") >> Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com > > Debugged-by ... > >> Signed-off-by: Andy Lutomirski <luto@kernel.org> > > ... > >> /* >> - * Clear the FPU state back to init state. >> - * >> - * Called by sys_execve(), by the signal handler code and by various >> - * error paths. >> + * Reset current's user FPU states to the init states. The caller promises >> + * that current's supervisor states (in memory or CPU regs as appropriate) >> + * as well as the XSAVE header in memory are intact. ^^^ The caller promises this >> */ >> -static void fpu__clear(struct fpu *fpu, bool user_only) >> +void fpu__clear_user_states(struct fpu *fpu) >> { >> WARN_ON_FPU(fpu != ¤t->thread.fpu); >> ... >> + /* >> + * Reset user states in registers. >> + */ >> + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); >> + >> + /* >> + * Now all FPU registers have their desired values. Inform the >> + * FPU state machine that current's FPU registers are in the >> + * hardware registers. >> + */ >> fpregs_mark_activate(); >> + >> fpregs_unlock(); > > This is as wrong as before. The corrupted task->fpu.state still > survives. See above. This function is not intended to fix the header. > > For f*cks sake, I gave you a reproducer and a working patch and I > explained it in great length what's broken and what needs to be fixed. This patch fixes your reproducer and my (to-be-sent) reproducer. I tested it on a machine that physically has the XRSTORS instruction but I disabled it using virt. Are you still seeing failures with this patch applied? I can try to test on a different CPU. > > And of course you kept the bug which was in the offending commit, > i.e. not wiping the task->fpu.state corruption which causes the next > XRSTOR to fail: > > [ 34.095020] Bad FPU state detected at copy_kernel_to_fpregs+0x28/0x40, reinitializing FPU registers. > [ 34.095052] WARNING: CPU: 0 PID: 1364 at arch/x86/mm/extable.c:65 ex_handler_fprestore+0x5f/0x70 > ... > [ 34.153472] switch_fpu_return+0x40/0xb0 > [ 34.154196] exit_to_user_mode_prepare+0x8f/0x180 > [ 34.155060] syscall_exit_to_user_mode+0x23/0x50 > [ 34.155912] do_syscall_64+0x4d/0xb0 I don't think that the code path that calls fpu__clear_user_states() is subject to header corruption. If it is, then both your patch and my patch have issues -- trying to fish the supervisor state out of a corrupt XSTATE buffer is asking for trouble. > > IOW, this is exactly the same shit as we had before. So what is decent > about this? Define decent... > > Why the heck do you think I wasted a couple of days to: > > - Analyze the root cause > > - Destill a trivial C reproducer > > - Come up with a fully working and completely correct fix > > Just because, right? > > I'm fine with splitting up clear_all() and clear_user(), but what you > provided is as much of a clusterfuck as the commit it pretends to fix. > > Your's seriously grumpy > > Thomas >
On Sun, May 30 2021 at 16:41, Andy Lutomirski wrote: > On 5/30/21 3:02 PM, Thomas Gleixner wrote: >>> /* >>> - * Clear the FPU state back to init state. >>> - * >>> - * Called by sys_execve(), by the signal handler code and by various >>> - * error paths. >>> + * Reset current's user FPU states to the init states. The caller promises >>> + * that current's supervisor states (in memory or CPU regs as appropriate) >>> + * as well as the XSAVE header in memory are intact. > > ^^^ The caller promises this Yes, I misread this, but it's more than non-obvious. > This patch fixes your reproducer and my (to-be-sent) reproducer. I > tested it on a machine that physically has the XRSTORS instruction but I > disabled it using virt. Are you still seeing failures with this patch > applied? I can try to test on a different CPU. Seems I applied the patch, built it and then failed to actually boot that kernel. I retested with brain awake and it indeed works. Sorry for the rant! Thanks, tglx
On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote: > /* > - * Clear the FPU state back to init state. > - * > - * Called by sys_execve(), by the signal handler code and by various > - * error paths. > + * Reset current's user FPU states to the init states. The caller promises > + * that current's supervisor states (in memory or CPU regs as appropriate) > + * as well as the XSAVE header in memory are intact. > */ > -static void fpu__clear(struct fpu *fpu, bool user_only) > +void fpu__clear_user_states(struct fpu *fpu) > { > WARN_ON_FPU(fpu != ¤t->thread.fpu); > > if (!static_cpu_has(X86_FEATURE_FPU)) { This can only be safely called if XSAVES is available. So this check is bogus as it actually should check for !XSAVES. And if at all it should be: if (WARN_ON_ONCE(!XSAVES)) .... This is exactly the stuff which causes subtle problems down the road. I have no idea why you are insisting on having this conditional at the call site. It's just an invitation for trouble because someone finds this function and calls it unconditionally. And he will miss the 'promise' part in the comment as I did. Something like the below. Thanks, tglx --- --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -32,7 +32,7 @@ extern void fpu__save(struct fpu *fpu); extern int fpu__restore_sig(void __user *buf, int ia32_frame); extern void fpu__drop(struct fpu *fpu); extern int fpu__copy(struct task_struct *dst, struct task_struct *src); -extern void fpu__clear_user_states(struct fpu *fpu); +extern void fpu__handle_sig_restore_fail(struct fpu *fpu); extern void fpu__clear_all(struct fpu *fpu); extern int fpu__exception_code(struct fpu *fpu, int trap_nr); --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -354,45 +354,72 @@ static inline void copy_init_fpstate_to_ } /* - * Clear the FPU state back to init state. - * - * Called by sys_execve(), by the signal handler code and by various - * error paths. + * Reset current's user FPU states to the init states after a restore from + * user space supplied state failed in __fpu_restore_sig(). */ -static void fpu__clear(struct fpu *fpu, bool user_only) +void fpu__handle_sig_restore_fail(struct fpu *fpu) { WARN_ON_FPU(fpu != ¤t->thread.fpu); - if (!static_cpu_has(X86_FEATURE_FPU)) { - fpu__drop(fpu); - fpu__initialize(fpu); + /* + * With XSAVES this can just reset the user space state in the + * registers. The content of task->fpu.state.xsave will be + * overwritten by the next XSAVES, so it does not matter. + * + * For !XSAVES systems task->fpu.state.xsave must be + * reinitialized. As these systems do not have supervisor states, + * start over from a clean state. + */ + if (!static_cpu_has(X86_FEATURE_XSAVES)) { + fpu__clear_all(fpu); return; } fpregs_lock(); - if (user_only) { - if (!fpregs_state_valid(fpu, smp_processor_id()) && - xfeatures_mask_supervisor()) - copy_kernel_to_xregs(&fpu->state.xsave, - xfeatures_mask_supervisor()); - copy_init_fpstate_to_fpregs(xfeatures_mask_user()); - } else { - copy_init_fpstate_to_fpregs(xfeatures_mask_all); + /* + * Ensure that current's supervisor states are loaded into + * their corresponding registers. + */ + if (!fpregs_state_valid(fpu, smp_processor_id()) && + xfeatures_mask_supervisor()) { + copy_kernel_to_xregs(&fpu->state.xsave, + xfeatures_mask_supervisor()); } + /* Reset user states in registers. */ + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); + + /* + * Now all FPU registers have their desired values. Inform the + * FPU state machine that current's FPU registers are in the + * hardware registers. + */ fpregs_mark_activate(); + fpregs_unlock(); } -void fpu__clear_user_states(struct fpu *fpu) -{ - fpu__clear(fpu, true); -} +/* + * Reset current's FPU registers (user and supervisor) to their INIT values. + * This function does not trust the in-memory XSAVE image to be valid at all; + * it is safe to call even if the contents of fpu->state may be entirely + * malicious, including the header. + * + * This means that it must not use XSAVE, as XSAVE reads the header from + * memory. + * + * This does not change the actual hardware registers; when fpu__clear_all() + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode + * will reload the hardware registers from memory. + */ void fpu__clear_all(struct fpu *fpu) { - fpu__clear(fpu, false); + fpregs_lock(); + fpu__drop(fpu); + fpu__initialize(fpu); + fpregs_unlock(); } /* --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -345,6 +345,7 @@ static int __fpu__restore_sig(void __use */ fpregs_lock(); pagefault_disable(); + ret = copy_user_to_fpregs_zeroing(buf_fx, user_xfeatures, fx_only); pagefault_enable(); if (!ret) { @@ -382,10 +383,9 @@ static int __fpu__restore_sig(void __use } /* - * By setting TIF_NEED_FPU_LOAD it is ensured that our xstate is - * not modified on context switch and that the xstate is considered - * to be loaded again on return to userland (overriding last_cpu avoids - * the optimisation). + * By setting TIF_NEED_FPU_LOAD, we ensure that any context switches + * or kernel_fpu_begin() operations (due to scheduling, page faults, + * softirq, etc) do not access fpu->state. */ fpregs_lock(); @@ -406,10 +406,18 @@ static int __fpu__restore_sig(void __use u64 init_bv = xfeatures_mask_user() & ~user_xfeatures; if (using_compacted_format()) { + /* + * copy_user_to_xstate() may write complete garbage + * to the user states, but it will not corrupt the + * XSAVE header, nor will it corrupt any supervisor + * states. + */ ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); } else { ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); - + /* + * Careful, the XSAVE header may be corrupt now. + */ if (!ret && state_size > offsetof(struct xregs_state, header)) ret = validate_user_xstate_header(&fpu->state.xsave.header); } @@ -457,6 +465,15 @@ static int __fpu__restore_sig(void __use fpregs_lock(); ret = copy_kernel_to_fregs_err(&fpu->state.fsave); } + + /* + * At this point, we have modified the hardware FPU regs. We must + * either mark them active for current or invalidate them for + * any other task that may have owned them. + * + * Yes, the nonsensically named fpregs_deactivate() function + * ignores its parameter and marks this CPU's regs invalid. + */ if (!ret) fpregs_mark_activate(); else @@ -465,7 +482,7 @@ static int __fpu__restore_sig(void __use err_out: if (ret) - fpu__clear_user_states(fpu); + fpu__handle_sig_restore_fail(fpu); return ret; }
On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote: > On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote: >> /* >> - * Clear the FPU state back to init state. >> - * >> - * Called by sys_execve(), by the signal handler code and by various >> - * error paths. >> + * Reset current's user FPU states to the init states. The caller promises >> + * that current's supervisor states (in memory or CPU regs as appropriate) >> + * as well as the XSAVE header in memory are intact. >> */ >> -static void fpu__clear(struct fpu *fpu, bool user_only) >> +void fpu__clear_user_states(struct fpu *fpu) >> { >> WARN_ON_FPU(fpu != ¤t->thread.fpu); >> >> if (!static_cpu_has(X86_FEATURE_FPU)) { > > This can only be safely called if XSAVES is available. So this check is > bogus as it actually should check for !XSAVES. And if at all it should > be: > > if (WARN_ON_ONCE(!XSAVES)) > .... > > This is exactly the stuff which causes subtle problems down the road. > > I have no idea why you are insisting on having this conditional at the > call site. It's just an invitation for trouble because someone finds > this function and calls it unconditionally. And he will miss the > 'promise' part in the comment as I did. And of course there is: __fpu__restore_sig() if (!buf) { fpu__clear_user_states(fpu); return 0; } and handle_signal() if (!failed) fpu__clear_user_states(fpu); which invoke that function unconditionally. Thanks, tglx
On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote: > On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote: > > > On Sat, May 29 2021 at 22:12, Andy Lutomirski wrote: > >> /* > >> - * Clear the FPU state back to init state. > >> - * > >> - * Called by sys_execve(), by the signal handler code and by various > >> - * error paths. > >> + * Reset current's user FPU states to the init states. The caller promises > >> + * that current's supervisor states (in memory or CPU regs as appropriate) > >> + * as well as the XSAVE header in memory are intact. > >> */ > >> -static void fpu__clear(struct fpu *fpu, bool user_only) > >> +void fpu__clear_user_states(struct fpu *fpu) > >> { > >> WARN_ON_FPU(fpu != ¤t->thread.fpu); > >> > >> if (!static_cpu_has(X86_FEATURE_FPU)) { > > > > This can only be safely called if XSAVES is available. So this check is > > bogus as it actually should check for !XSAVES. And if at all it should > > be: > > > > if (WARN_ON_ONCE(!XSAVES)) > > .... > > > > This is exactly the stuff which causes subtle problems down the road. > > > > I have no idea why you are insisting on having this conditional at the > > call site. It's just an invitation for trouble because someone finds > > this function and calls it unconditionally. And he will miss the > > 'promise' part in the comment as I did. > > And of course there is: > > __fpu__restore_sig() > > if (!buf) { > fpu__clear_user_states(fpu); > return 0; > } > > and > > handle_signal() > > if (!failed) > fpu__clear_user_states(fpu); This looks okay. Really there are two callers of fpu__clear_all() that are special: execve: Just in case some part of the xstate buffer mode that’s supposed to be invariant got corrupted or in case there is some side channel that can leak the INIT-but-not-zeroed contents of a state to user code, we should really wipe the memory completely across privilege boundaries. __fpu__restore_sig: the utterly daft copy from user space needs special recovery. Maybe the right solution is to rename it. Instead of fpu__clear_all(), how about fpu__wipe_and_reset()? > > which invoke that function unconditionally. > > Thanks, > > tglx >
On Mon, May 31 2021 at 20:56, Thomas Gleixner wrote: > On Mon, May 31 2021 at 12:01, Thomas Gleixner wrote: > __fpu__restore_sig() > > if (!buf) { > fpu__clear_user_states(fpu); > return 0; > } > > and > > handle_signal() > > if (!failed) > fpu__clear_user_states(fpu); > > which invoke that function unconditionally. So we cannot warn there. This is all wrong and everything should use copy_kernel_to_xstate() after copying the buffer from user space. But of course allocating memory there is daft. There is also xstateregs_set() which invokes fpstate_init() on fail which means it blows away _ALL_ state including supervisor state. Even without supervisor state this function is bonkers. If the ptracer provides a bogus data set then this just invalidates the target tasks FPU state for no real good reason. This should just use a kernel buffer. If the copy from user fails, the caller gets the EFAULT. If the header is bogus, then copy_kernel_to_xstate() returns -EINVAL and that's handed back to the caller. No reason to invalidate anything. Thanks, tglx
On Mon, May 31 2021 at 12:30, Andy Lutomirski wrote: > On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote: >> And of course there is: >> >> __fpu__restore_sig() >> >> if (!buf) { >> fpu__clear_user_states(fpu); >> return 0; >> } >> >> and >> >> handle_signal() >> >> if (!failed) >> fpu__clear_user_states(fpu); > > This looks okay. Looks okay is meh... That other stuff obviously looked okay as well... That FPU code is an unpenetrable mess. > Really there are two callers of fpu__clear_all() that are special: > > execve: Just in case some part of the xstate buffer mode that’s > supposed to be invariant got corrupted or in case there is some side > channel that can leak the INIT-but-not-zeroed contents of a state to > user code, we should really wipe the memory completely across > privilege boundaries. > > __fpu__restore_sig: the utterly daft copy from user space needs > special recovery. > > Maybe the right solution is to rename it. Instead of fpu__clear_all(), > how about fpu__wipe_and_reset()? The right solution is to just use copy_user_to_xstate() unconditionally. That fixes the issue even without cleaning up that fpu_clear() mess which we want to do nevertheless. I have a similar fix for the related xstateregs_set() trainwreck, but that really needs to allocate a buffer because it's operating on a different task contrary to signal handling. I'm too tired now to test the xstateregs_set() muck, but if you want to have a look: https://tglx.de/~tglx/patches.tar Thanks, tglx --- --- a/arch/x86/include/asm/fpu/xstate.h +++ b/arch/x86/include/asm/fpu/xstate.h @@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask); void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask); - -/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */ -int validate_user_xstate_header(const struct xstate_header *hdr); - #endif --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __use if (use_xsave() && !fx_only) { u64 init_bv = xfeatures_mask_user() & ~user_xfeatures; - if (using_compacted_format()) { - ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); - } else { - ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); - - if (!ret && state_size > offsetof(struct xregs_state, header)) - ret = validate_user_xstate_header(&fpu->state.xsave.header); - } + ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); if (ret) goto err_out; --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -515,7 +515,7 @@ int using_compacted_format(void) } /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */ -int validate_user_xstate_header(const struct xstate_header *hdr) +static int validate_user_xstate_header(const struct xstate_header *hdr) { /* No unknown or supervisor features may be set */ if (hdr->xfeatures & ~xfeatures_mask_user())
On 5/31/21 3:46 PM, Thomas Gleixner wrote: > On Mon, May 31 2021 at 12:30, Andy Lutomirski wrote: >> On Mon, May 31, 2021, at 11:56 AM, Thomas Gleixner wrote: >>> And of course there is: >>> >>> __fpu__restore_sig() >>> >>> if (!buf) { >>> fpu__clear_user_states(fpu); >>> return 0; >>> } >>> >>> and >>> >>> handle_signal() >>> >>> if (!failed) >>> fpu__clear_user_states(fpu); >> >> This looks okay. > > Looks okay is meh... That other stuff obviously looked okay as well... > > That FPU code is an unpenetrable mess. > >> Really there are two callers of fpu__clear_all() that are special: >> >> execve: Just in case some part of the xstate buffer mode that’s >> supposed to be invariant got corrupted or in case there is some side >> channel that can leak the INIT-but-not-zeroed contents of a state to >> user code, we should really wipe the memory completely across >> privilege boundaries. >> >> __fpu__restore_sig: the utterly daft copy from user space needs >> special recovery. >> >> Maybe the right solution is to rename it. Instead of fpu__clear_all(), >> how about fpu__wipe_and_reset()? > > The right solution is to just use copy_user_to_xstate() unconditionally. > > That fixes the issue even without cleaning up that fpu_clear() mess > which we want to do nevertheless. > > I have a similar fix for the related xstateregs_set() trainwreck, but > that really needs to allocate a buffer because it's operating on a > different task contrary to signal handling. > > I'm too tired now to test the xstateregs_set() muck, but if you want to > have a look: > > https://tglx.de/~tglx/patches.tar > > Thanks, > > tglx > --- > --- a/arch/x86/include/asm/fpu/xstate.h > +++ b/arch/x86/include/asm/fpu/xstate.h > @@ -112,8 +112,4 @@ void copy_supervisor_to_kernel(struct xr > void copy_dynamic_supervisor_to_kernel(struct xregs_state *xstate, u64 mask); > void copy_kernel_to_dynamic_supervisor(struct xregs_state *xstate, u64 mask); > > - > -/* Validate an xstate header supplied by userspace (ptrace or sigreturn) */ > -int validate_user_xstate_header(const struct xstate_header *hdr); > - > #endif > --- a/arch/x86/kernel/fpu/signal.c > +++ b/arch/x86/kernel/fpu/signal.c > @@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __use > if (use_xsave() && !fx_only) { > u64 init_bv = xfeatures_mask_user() & ~user_xfeatures; > > - if (using_compacted_format()) { > - ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); > - } else { > - ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); > - > - if (!ret && state_size > offsetof(struct xregs_state, header)) > - ret = validate_user_xstate_header(&fpu->state.xsave.header); > - } > + ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); > if (ret) > goto err_out; > > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -515,7 +515,7 @@ int using_compacted_format(void) > } > > /* Validate an xstate header supplied by userspace (ptrace or sigreturn) */ > -int validate_user_xstate_header(const struct xstate_header *hdr) > +static int validate_user_xstate_header(const struct xstate_header *hdr) > { > /* No unknown or supervisor features may be set */ > if (hdr->xfeatures & ~xfeatures_mask_user()) > I like it. Here's my current patch pile. Not even really compile tested -- it's bedtime. Your patch is copied in verbatim.
> +static void sigusr1(int sig, siginfo_t *info, void *uc_void) > +{ > + ucontext_t *uc = uc_void; > + uint8_t *fpstate = (uint8_t *)uc->uc_mcontext.fpregs; > + uint64_t *xfeatures = (uint64_t *)(fpstate + 512); > + > + printf("\tOriginal XFEATURES = 0x%lx\n", *xfeatures); > + *(xfeatures+2) = 0xfffffff; > + //*xfeatures = ~(uint64_t)0; > +} I was trying to think of something which might be more durable than this, like if a new feature started using bytes 16->23 for something useful. How about a memset() of the entire 64-byte header to 0xff? On to [PATCH v3 2/5] x86/fpu: Fix state corruption in ... > Instead of trying to give sensible semantics to the fpu clear functions > in the presence of XSAVE header corruption, simply avoid corrupting the > header in the first place by using copy_user_to_xstate(). Should we mention that the verbatim copying via __copy_from_user() copies the (bad) reserved bit contents and how copy_user_to_xstate() avoids it? Maybe: __copy_from_user() copies the entire user buffer into the kernel buffer, verbatim. This means that the kernel buffer may now contain entirely invalid state on which XRSTOR will #GP. validate_user_xstate_header() can detect some of that corruption, but that leaves the onus on callers to clear the buffer. Avoid corruption of the kernel XSAVE buffer. Use copy_user_to_xstate() which validates the XSAVE header contents *BEFORE* copying the buffer into the kernel. Note: copy_user_to_xstate() was previously only called for compacted-format kernel buffers. It functions for both compacted and non-compacted forms. Using it for the non-compacted form will be slower because of multiple __copy_from_user() operations in the compacted-format code, but that cost is less important than simpler code in an already slow path. > --- a/arch/x86/kernel/fpu/signal.c > +++ b/arch/x86/kernel/fpu/signal.c > @@ -405,14 +405,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) > if (use_xsave() && !fx_only) { > u64 init_bv = xfeatures_mask_user() & ~user_xfeatures; > > - if (using_compacted_format()) { > - ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); > - } else { > - ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); > - > - if (!ret && state_size > offsetof(struct xregs_state, header)) > - ret = validate_user_xstate_header(&fpu->state.xsave.header); > - } > + ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); > if (ret) > goto err_out; >
On to patch 3: > fpu__clear_all() and fpu__clear_user_states() share an implementation, > and the resulting code is almost unreadable. Clean it up. Could we flesh this changelog out a bit? I basically write these in the process of understanding what the patch does, so this is less of "your changelog needs help" and more of, "here's some more detail if you want it": fpu__clear() currently resets both register state and kernel XSAVE buffer state. It has two modes: one for all state (supervisor and user) and another for user state only. fpu__clear_all() uses the "all state" (user_only=0) mode, while a number of signal paths use the user_only=1 mode. Make fpu__clear() work only for user state (user_only=1) and remove the "all state" (user_only=0) code. Rename it to match so it can be used by the signal paths. Replace the "all state" (user_only=0) fpu__clear() functionality. Use the TIF_NEED_FPU_LOAD functionality instead of making any actual hardware registers changes in this path. > arch/x86/kernel/fpu/core.c | 63 +++++++++++++++++++++++++------------- > 1 file changed, 42 insertions(+), 21 deletions(-) > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > index 571220ac8bea..a01cbb6a08bb 100644 > --- a/arch/x86/kernel/fpu/core.c > +++ b/arch/x86/kernel/fpu/core.c > @@ -354,45 +354,66 @@ static inline void copy_init_fpstate_to_fpregs(u64 features_mask) > } > > /* > - * Clear the FPU state back to init state. > - * > - * Called by sys_execve(), by the signal handler code and by various > - * error paths. > + * Reset current's user FPU states to the init states. current's supervisor > + * states, if any, are not modified by this function. The XSTATE header > + * in memory is assumed to be intact when this is called. > */ > -static void fpu__clear(struct fpu *fpu, bool user_only) > +void fpu__clear_user_states(struct fpu *fpu) > { > WARN_ON_FPU(fpu != ¤t->thread.fpu); > > if (!static_cpu_has(X86_FEATURE_FPU)) { > - fpu__drop(fpu); > - fpu__initialize(fpu); > + fpu__clear_all(fpu); > return; > } > > fpregs_lock(); > > - if (user_only) { > - if (!fpregs_state_valid(fpu, smp_processor_id()) && > - xfeatures_mask_supervisor()) > - copy_kernel_to_xregs(&fpu->state.xsave, > - xfeatures_mask_supervisor()); > - copy_init_fpstate_to_fpregs(xfeatures_mask_user()); > - } else { > - copy_init_fpstate_to_fpregs(xfeatures_mask_all); > - } > + /* > + * Ensure that current's supervisor states are loaded into > + * their corresponding registers. > + */ > + if (!fpregs_state_valid(fpu, smp_processor_id()) && > + xfeatures_mask_supervisor()) > + copy_kernel_to_xregs(&fpu->state.xsave, > + xfeatures_mask_supervisor()); > > + /* > + * Reset user states in registers. > + */ > + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); > + > + /* > + * Now all FPU registers have their desired values. Inform the > + * FPU state machine that current's FPU registers are in the > + * hardware registers. > + */ > fpregs_mark_activate(); > + > fpregs_unlock(); > } > > -void fpu__clear_user_states(struct fpu *fpu) > -{ > - fpu__clear(fpu, true); > -} > +/* > + * Reset current's FPU registers (user and supervisor) to their INIT values. > + * This is used by execve(); out of an abundance of caution, it completely > + * wipes and resets the XSTATE buffer in memory. > + * > + * Note that XSAVE (unlike XSAVES) expects the XSTATE buffer in memory to > + * be valid, so there are certain forms of corruption of the XSTATE buffer > + * in memory that would survive initializing the FPU registers and XSAVEing > + * them to memory. > + * > + * This does not change the actual hardware registers; when fpu__clear_all() > + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode > + * will reload the hardware registers from memory. > + */ > void fpu__clear_all(struct fpu *fpu) > { > - fpu__clear(fpu, false); > + fpregs_lock(); > + fpu__drop(fpu); > + fpu__initialize(fpu); > + fpregs_unlock(); > } Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right next to the fpu__initialize() call? It would make it painfully obvious which call is responsible. The naming isn't super helpful here.
On Tue, Jun 1, 2021, at 11:06 AM, Dave Hansen wrote: > On to patch 3: > > > fpu__clear_all() and fpu__clear_user_states() share an implementation, > > and the resulting code is almost unreadable. Clean it up. > > Could we flesh this changelog out a bit? I basically write these in the > process of understanding what the patch does, so this is less of "your > changelog needs help" and more of, "here's some more detail if you want it": > > fpu__clear() currently resets both register state and kernel XSAVE > buffer state. It has two modes: one for all state (supervisor and user) > and another for user state only. fpu__clear_all() uses the "all state" > (user_only=0) mode, while a number of signal paths use the user_only=1 mode. > > Make fpu__clear() work only for user state (user_only=1) and remove the > "all state" (user_only=0) code. Rename it to match so it can be used by > the signal paths. > > Replace the "all state" (user_only=0) fpu__clear() functionality. Use > the TIF_NEED_FPU_LOAD functionality instead of making any actual > hardware registers changes in this path. I’ll steal some of this. > > > arch/x86/kernel/fpu/core.c | 63 +++++++++++++++++++++++++------------- > > 1 file changed, 42 insertions(+), 21 deletions(-) > > > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > > index 571220ac8bea..a01cbb6a08bb 100644 > > --- a/arch/x86/kernel/fpu/core.c > > +++ b/arch/x86/kernel/fpu/core.c > > @@ -354,45 +354,66 @@ static inline void copy_init_fpstate_to_fpregs(u64 features_mask) > > } > > > > /* > > - * Clear the FPU state back to init state. > > - * > > - * Called by sys_execve(), by the signal handler code and by various > > - * error paths. > > + * Reset current's user FPU states to the init states. current's supervisor > > + * states, if any, are not modified by this function. The XSTATE header > > + * in memory is assumed to be intact when this is called. > > */ > > -static void fpu__clear(struct fpu *fpu, bool user_only) > > +void fpu__clear_user_states(struct fpu *fpu) > > { > > WARN_ON_FPU(fpu != ¤t->thread.fpu); > > > > if (!static_cpu_has(X86_FEATURE_FPU)) { > > - fpu__drop(fpu); > > - fpu__initialize(fpu); > > + fpu__clear_all(fpu); > > return; > > } > > > > fpregs_lock(); > > > > - if (user_only) { > > - if (!fpregs_state_valid(fpu, smp_processor_id()) && > > - xfeatures_mask_supervisor()) > > - copy_kernel_to_xregs(&fpu->state.xsave, > > - xfeatures_mask_supervisor()); > > - copy_init_fpstate_to_fpregs(xfeatures_mask_user()); > > - } else { > > - copy_init_fpstate_to_fpregs(xfeatures_mask_all); > > - } > > + /* > > + * Ensure that current's supervisor states are loaded into > > + * their corresponding registers. > > + */ > > + if (!fpregs_state_valid(fpu, smp_processor_id()) && > > + xfeatures_mask_supervisor()) > > + copy_kernel_to_xregs(&fpu->state.xsave, > > + xfeatures_mask_supervisor()); > > > > + /* > > + * Reset user states in registers. > > + */ > > + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); > > + > > + /* > > + * Now all FPU registers have their desired values. Inform the > > + * FPU state machine that current's FPU registers are in the > > + * hardware registers. > > + */ > > fpregs_mark_activate(); > > + > > fpregs_unlock(); > > } > > > > -void fpu__clear_user_states(struct fpu *fpu) > > -{ > > - fpu__clear(fpu, true); > > -} > > +/* > > + * Reset current's FPU registers (user and supervisor) to their INIT values. > > + * This is used by execve(); out of an abundance of caution, it completely > > + * wipes and resets the XSTATE buffer in memory. > > + * > > + * Note that XSAVE (unlike XSAVES) expects the XSTATE buffer in memory to > > + * be valid, so there are certain forms of corruption of the XSTATE buffer > > + * in memory that would survive initializing the FPU registers and XSAVEing > > + * them to memory. > > + * > > + * This does not change the actual hardware registers; when fpu__clear_all() > > + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode > > + * will reload the hardware registers from memory. > > + */ > > void fpu__clear_all(struct fpu *fpu) > > { > > - fpu__clear(fpu, false); > > + fpregs_lock(); > > + fpu__drop(fpu); > > + fpu__initialize(fpu); > > + fpregs_unlock(); > > } > > Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right > next to the fpu__initialize() call? It would make it painfully obvious > which call is responsible. The naming isn't super helpful here. > Hmm. I was actually thinking of open-coding it all. fpu__initialize() and fpu__drop() have very few callers, and I’m not even convinced that the other callers are calling them for a valid reason.
On Tue, Jun 01 2021 at 11:06, Dave Hansen wrote: > On to patch 3: > > Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD right > next to the fpu__initialize() call? It would make it painfully obvious > which call is responsible. The naming isn't super helpful here. I've done that and picked up your changelog improvements. I did some other tweaks myself during the day when I had a few cycles. Here is my current pile which is based on Andy's. https://tglx.de/~tglx/patches.tar Need to vanish for an hour. Can work on it afterwards again. Thanks, Thomas
On 6/1/21 11:14 AM, Andy Lutomirski wrote: >> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD >> right next to the fpu__initialize() call? It would make it >> painfully obvious which call is responsible. The naming isn't >> super helpful here. >> > Hmm. I was actually thinking of open-coding it all. > fpu__initialize() and fpu__drop() have very few callers, and I’m not > even convinced that the other callers are calling them for a valid > reason. Fine by me. We all know what the TIF flag does, but its connection to this code is totally opaque. It's a place where removing the abstraction might actually make sense.
On 6/1/21 11:35 AM, Dave Hansen wrote: > On 6/1/21 11:14 AM, Andy Lutomirski wrote: >>> Nit: Could we move the detailed comments about TIF_NEED_FPU_LOAD >>> right next to the fpu__initialize() call? It would make it >>> painfully obvious which call is responsible. The naming isn't >>> super helpful here. >>> >> Hmm. I was actually thinking of open-coding it all. >> fpu__initialize() and fpu__drop() have very few callers, and I’m not >> even convinced that the other callers are calling them for a valid >> reason. > > Fine by me. We all know what the TIF flag does, but its connection to > this code is totally opaque. It's a place where removing the > abstraction might actually make sense. > After mulling this over for a while, I think the TIF flag's existence is fine, but the APIs are all awful. I think we should give the states in the little FPU state machine explicit names: ACTIVE: If the current CPU's FPU regs are ACTIVE, then those regs are the current task's registers and current->fpu does not necessarily contain the current task's registers. (This is !TIF_NEED_FPU_LOAD.) INACTIVE: If the current CPU's FPU regs are INACTIVE, then those regs belong to the kernel and do not belong to any particular task. The current task's FPU registers are in current->fpu. PRESERVED: If the current CPU's FPU regs are PRESERVED for a given task, then they are guaranteed to match that task's task->fpu. It is possible to tell whether a given task's FPU regs are PRESERVED on a given CPU, but it is not possible to tell whether a given CPU's regs are PRESERVED or INACTIVE, as it is possible that fpu_fpregs_owner_ctx points to memory that has been reused for a different purpose, so dereferencing ->last_fpu is not safe. If a non-current task is running, then its FPU state may not be accessed (obviously). Similarly, in interrupt context, neither FPU regs nor current->fpu may be accessed, as interrupts can nest inside fpregs_lock()ed regions and the state may be indeterminate. If a non-current task is *stopped*, then its FPU registers may be read. If its FPU registers are invalidated, then they may also be written. fpu__prepare_read() and fpu__prepare_write() may be used for these purposes. (But fpu__prepare_read() should be rewritten. It currently copies the regs to memory but leaves the state ACTIVE, which is just silly, especially the absurd way it manages the FSAVE clobber workaround. It should just do switch_fpu_prepare().) And we add some APIs that have locking assertions: fpu__current_regs_active(): returns true if the current task's regs are ACTIVE. This replaces most checks for TIF_NEED_FPU_LOAD. Asserts !preemptible(). and more to come, but not today.
On Tue, Jun 01 2021 at 00:46, Thomas Gleixner wrote: > I'm too tired now to test the xstateregs_set() muck, but if you want to > have a look: > > https://tglx.de/~tglx/patches.tar And of course the last patch in that series was f*cked up by me. Working set with all updates included is available at the URL above. Thanks, tglx
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 571220ac8bea..55792ef6ba6d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -354,45 +354,65 @@ static inline void copy_init_fpstate_to_fpregs(u64 features_mask) } /* - * Clear the FPU state back to init state. - * - * Called by sys_execve(), by the signal handler code and by various - * error paths. + * Reset current's user FPU states to the init states. The caller promises + * that current's supervisor states (in memory or CPU regs as appropriate) + * as well as the XSAVE header in memory are intact. */ -static void fpu__clear(struct fpu *fpu, bool user_only) +void fpu__clear_user_states(struct fpu *fpu) { WARN_ON_FPU(fpu != ¤t->thread.fpu); if (!static_cpu_has(X86_FEATURE_FPU)) { - fpu__drop(fpu); - fpu__initialize(fpu); + fpu__clear_all(fpu); return; } fpregs_lock(); - if (user_only) { - if (!fpregs_state_valid(fpu, smp_processor_id()) && - xfeatures_mask_supervisor()) - copy_kernel_to_xregs(&fpu->state.xsave, - xfeatures_mask_supervisor()); - copy_init_fpstate_to_fpregs(xfeatures_mask_user()); - } else { - copy_init_fpstate_to_fpregs(xfeatures_mask_all); - } + /* + * Ensure that current's supervisor states are loaded into + * their corresponding registers. + */ + if (!fpregs_state_valid(fpu, smp_processor_id()) && + xfeatures_mask_supervisor()) + copy_kernel_to_xregs(&fpu->state.xsave, + xfeatures_mask_supervisor()); + /* + * Reset user states in registers. + */ + copy_init_fpstate_to_fpregs(xfeatures_mask_user()); + + /* + * Now all FPU registers have their desired values. Inform the + * FPU state machine that current's FPU registers are in the + * hardware registers. + */ fpregs_mark_activate(); + fpregs_unlock(); } -void fpu__clear_user_states(struct fpu *fpu) -{ - fpu__clear(fpu, true); -} +/* + * Reset current's FPU registers (user and supervisor) to their INIT values. + * This function does not trust the in-memory XSAVE image to be valid at all; + * it is safe to call even if the contents of fpu->state may be entirely + * malicious, including the header. + * + * This means that it must not use XSAVE, as XSAVE reads the header from + * memory. + * + * This does not change the actual hardware registers; when fpu__clear_all() + * returns, TIF_NEED_FPU_LOAD will be set, and a subsequent exit to user mode + * will reload the hardware registers from memory. + */ void fpu__clear_all(struct fpu *fpu) { - fpu__clear(fpu, false); + fpregs_lock(); + fpu__drop(fpu); + fpu__initialize(fpu); + fpregs_unlock(); } /* diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index a4ec65317a7f..3ff7e75c7b8f 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -345,6 +345,7 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) */ fpregs_lock(); pagefault_disable(); + ret = copy_user_to_fpregs_zeroing(buf_fx, user_xfeatures, fx_only); pagefault_enable(); if (!ret) { @@ -382,10 +383,9 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) } /* - * By setting TIF_NEED_FPU_LOAD it is ensured that our xstate is - * not modified on context switch and that the xstate is considered - * to be loaded again on return to userland (overriding last_cpu avoids - * the optimisation). + * By setting TIF_NEED_FPU_LOAD, we ensure that any context switches + * or kernel_fpu_begin() operations (due to scheduling, page faults, + * softirq, etc) do not access fpu->state. */ fpregs_lock(); @@ -406,10 +406,18 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) u64 init_bv = xfeatures_mask_user() & ~user_xfeatures; if (using_compacted_format()) { + /* + * copy_user_to_xstate() may write complete garbage + * to the user states, but it will not corrupt the + * XSAVE header, nor will it corrupt any supervisor + * states. + */ ret = copy_user_to_xstate(&fpu->state.xsave, buf_fx); } else { ret = __copy_from_user(&fpu->state.xsave, buf_fx, state_size); - + /* + * Careful, the XSAVE header may be corrupt now. + */ if (!ret && state_size > offsetof(struct xregs_state, header)) ret = validate_user_xstate_header(&fpu->state.xsave.header); } @@ -457,6 +465,15 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) fpregs_lock(); ret = copy_kernel_to_fregs_err(&fpu->state.fsave); } + + /* + * At this point, we have modified the hardware FPU regs. We must + * either mark them active for current or invalidate them for + * any other task that may have owned them. + * + * Yes, the nonsensically named fpregs_deactivate() function + * ignores its parameter and marks this CPU's regs invalid. + */ if (!ret) fpregs_mark_activate(); else @@ -464,8 +481,27 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) fpregs_unlock(); err_out: - if (ret) - fpu__clear_user_states(fpu); + if (ret) { + if (using_compacted_format()) { + /* + * Our failed attempt to load the new state is + * guaranteed to have preserved the XSAVE header + * and all supervisor state. + */ + fpu__clear_user_states(fpu); + } else { + /* + * If an error above caused garbage to be written + * to XFEATURES, then XSAVE would preserve that + * garbage in memory, and a subsequent XRSTOR would + * fail or worse. XSAVES does not have this problem. + * + * Wipe out the FPU state and start over from a clean + * slate. + */ + fpu__clear_all(fpu); + } + } return ret; }
Prior to adding XSAVES support, some of fpu__clear()'s callers required that fpu__clear() successfully reset current's FPU state even if current's XSAVE header was corrupt. commit b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates") split fpu__clear() into two different functions, neither of which was capable of correctly handling a corrupt header. As a practical matter, trying to specify what fpu__clear_user_states() is supposed to do if current's XSAVE area is corrupt is difficult -- it's supposed to preverve supervisor states, but the entire XSAVES buffer may be unusable due to corrupt XFEATURES and/or XCOMP_BV. Improve the situation by completely separating fpu__clear_all() and fpu__clear_user_states(). The former is, once again, a full reset. The latter requires an intact header and resets only user states. Update and document __fpu__restore_sig() accordingly. Cc: stable@vger.kernel.org Fixes: b860eb8dce59 ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates") Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com Signed-off-by: Andy Lutomirski <luto@kernel.org> --- arch/x86/kernel/fpu/core.c | 62 ++++++++++++++++++++++++------------ arch/x86/kernel/fpu/signal.c | 50 +++++++++++++++++++++++++---- 2 files changed, 84 insertions(+), 28 deletions(-)